No problem. It's a good discussion so we can examine and improve accordingly.

I am still not very sure about the topology, or how tasks are grouped.
>From description, it seems looks as the link below:

http://i.imgur.com/92L2XY1.png

Each GroomServer is viewed as a group, and each group will launch 3
tasks by default (as default xml defined). So the corresponded
messages, emitted from source like queue, is sent to each group for
consumption? And how do task communicate between groups/ tasks?




On 11 April 2014 16:43, Edward J. Yoon <[email protected]> wrote:
> My rough idea assumes that dedicated Hama is installed on machines that
> generates logs, and the number of child tasks will be launched equally per
> GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
> At first superstep, one task broadcasts the Topology after grouping the
> Tasks into 3 groups.
>
> == Group1 ==
> server1:60001
> server2:60001
> server3:60001
>
> == Group2 ==
> server1:60002
> server2:60002
> server3:60002
>
> == Group3 ==
> server1:60003
> server2:60003
> server3:60003
>
> Based on this Topolgy, tasks reflects proper class and executes it. Then,
> it'll work like Storm flow. I didn't think about FT issue yet. :-)
>
>
>
> On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <[email protected]>wrote:
>
>> Or we can have POC first and then see how it relates to the issue we
>> might need to fix.
>>
>> On 11 April 2014 16:10, Chia-Hung Lin <[email protected]> wrote:
>> > In that case are we going to organize multiple tasks into a group? A
>> > job has N bsp groups (bsp task in current code), in turn each group
>> > contain multiple tasks (and all tasks are on the same server)?
>> >
>> > If this is the case, how do they send messages or communicate between
>> > groups? group to group? A task (within a group) can arbitrary send the
>> > messages?
>> >
>> > I have this question because this would have implication on FT. IIRC
>> > Storm is a CEP framework, and messages can be sent arbitrary to every
>> > bolt. The issue with such computation is that it's not a simple task
>> > when performing checkpoint. Generally it's done through communication
>> > induced checkpointing. Otherwise like storm they ack and redo each
>> > message when necessary; an option is something like batch (in storm
>> > like trident batch processing if I am correct) transactional
>> > processing.
>> >
>> > What I can think of right now is, with current structure, grouping
>> > every N messages a superstep, and then asynchronously checkpointing,
>> > which may be similar to trident batch processing.
>> >
>> > I understand it's still far away based on the current status. I
>> > suppose it's good if we can take that into consideration beforehand as
>> > well.
>> >
>> >
>> >
>> >
>> >
>> > On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote:
>> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable
>> >> bolts seems pretty nice (especially, chainable bolts can be really
>> >> useful in case of real-time join operation).
>> >>
>> >> I think, we can also implement similar functions of Storm's task
>> >> grouping and chainable bolts on BSP. My rough idea is:
>> >>
>> >> 1. Launches multi-tasks per node (as number of group of Bolts). For
>> example:
>> >>
>> >> +---------------+
>> >> |    Server1    |
>> >> +---------------+
>> >> Task-1. tailing bolt
>> >> Task-2. split sentence bolt
>> >> Task-3. wordcount bolt
>> >>
>> >> 2. Assign the tasks to proper group.
>> >> --
>> >> 3. Each task executes their user-defined function and sends messages
>> >> to task of next group.
>> >> 4. Synchronizes all.
>> >> --
>> >> 5. Finally, repeat the above 3 ~ 4 process.
>> >>
>> >> In here, only the difficult one is how to determine the task group at
>> >> initial superstep. So, I'd like to add below one to BSPPeer interface.
>> >>
>> >>   /**
>> >>    * @return the names of locally adjacent peers (including this peer).
>> >>    */
>> >>   public String[] getAdjacentPeerNames();
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]>
>> wrote:
>> >>> great~
>> >>>
>> >>>
>> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>:
>> >>>
>> >>>>
>> >>>>     [
>> >>>>
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430
>> ]
>> >>>>
>> >>>> Edward J. Yoon commented on HAMA-883:
>> >>>> -------------------------------------
>> >>>>
>> >>>> NOTE: my fellow worker is currently working on this issue -
>> >>>> https://github.com/garudakang/meerkat
>> >>>>
>> >>>> > [Research Task] Massive log event aggregation in real time using
>> Apache
>> >>>> Hama
>> >>>> >
>> >>>>
>> ----------------------------------------------------------------------------
>> >>>> >
>> >>>> >                 Key: HAMA-883
>> >>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >>>> >             Project: Hama
>> >>>> >          Issue Type: Task
>> >>>> >            Reporter: Edward J. Yoon
>> >>>> >
>> >>>> > BSP tasks can be used for aggregating log data streamed in real
>> time.
>> >>>> With this research task, we might able to platformization these kind
>> of
>> >>>> processing.
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> This message was sent by Atlassian JIRA
>> >>>> (v6.2#6252)
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ------
>> >>> Yexi Jiang,
>> >>> ECS 251,  [email protected]
>> >>> School of Computer and Information Science,
>> >>> Florida International University
>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>
>> >>
>> >>
>> >> --
>> >> Edward J. Yoon (@eddieyoon)
>> >> Chief Executive Officer
>> >> DataSayer Co., Ltd.
>>
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer Co., Ltd.

Reply via email to