My rough idea assumes that dedicated Hama is installed on machines that generates logs, and the number of child tasks will be launched equally per GroomServer. So, if the groups == 3, framework launches 3 tasks per node. At first superstep, one task broadcasts the Topology after grouping the Tasks into 3 groups.
== Group1 == server1:60001 server2:60001 server3:60001 == Group2 == server1:60002 server2:60002 server3:60002 == Group3 == server1:60003 server2:60003 server3:60003 Based on this Topolgy, tasks reflects proper class and executes it. Then, it'll work like Storm flow. I didn't think about FT issue yet. :-) On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <[email protected]>wrote: > Or we can have POC first and then see how it relates to the issue we > might need to fix. > > On 11 April 2014 16:10, Chia-Hung Lin <[email protected]> wrote: > > In that case are we going to organize multiple tasks into a group? A > > job has N bsp groups (bsp task in current code), in turn each group > > contain multiple tasks (and all tasks are on the same server)? > > > > If this is the case, how do they send messages or communicate between > > groups? group to group? A task (within a group) can arbitrary send the > > messages? > > > > I have this question because this would have implication on FT. IIRC > > Storm is a CEP framework, and messages can be sent arbitrary to every > > bolt. The issue with such computation is that it's not a simple task > > when performing checkpoint. Generally it's done through communication > > induced checkpointing. Otherwise like storm they ack and redo each > > message when necessary; an option is something like batch (in storm > > like trident batch processing if I am correct) transactional > > processing. > > > > What I can think of right now is, with current structure, grouping > > every N messages a superstep, and then asynchronously checkpointing, > > which may be similar to trident batch processing. > > > > I understand it's still far away based on the current status. I > > suppose it's good if we can take that into consideration beforehand as > > well. > > > > > > > > > > > > On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote: > >> Yesterday, I had survey the Storm. Storm's task grouping and chainable > >> bolts seems pretty nice (especially, chainable bolts can be really > >> useful in case of real-time join operation). > >> > >> I think, we can also implement similar functions of Storm's task > >> grouping and chainable bolts on BSP. My rough idea is: > >> > >> 1. Launches multi-tasks per node (as number of group of Bolts). For > example: > >> > >> +---------------+ > >> | Server1 | > >> +---------------+ > >> Task-1. tailing bolt > >> Task-2. split sentence bolt > >> Task-3. wordcount bolt > >> > >> 2. Assign the tasks to proper group. > >> -- > >> 3. Each task executes their user-defined function and sends messages > >> to task of next group. > >> 4. Synchronizes all. > >> -- > >> 5. Finally, repeat the above 3 ~ 4 process. > >> > >> In here, only the difficult one is how to determine the task group at > >> initial superstep. So, I'd like to add below one to BSPPeer interface. > >> > >> /** > >> * @return the names of locally adjacent peers (including this peer). > >> */ > >> public String[] getAdjacentPeerNames(); > >> > >> > >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]> > wrote: > >>> great~ > >>> > >>> > >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>: > >>> > >>>> > >>>> [ > >>>> > https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430 > ] > >>>> > >>>> Edward J. Yoon commented on HAMA-883: > >>>> ------------------------------------- > >>>> > >>>> NOTE: my fellow worker is currently working on this issue - > >>>> https://github.com/garudakang/meerkat > >>>> > >>>> > [Research Task] Massive log event aggregation in real time using > Apache > >>>> Hama > >>>> > > >>>> > ---------------------------------------------------------------------------- > >>>> > > >>>> > Key: HAMA-883 > >>>> > URL: https://issues.apache.org/jira/browse/HAMA-883 > >>>> > Project: Hama > >>>> > Issue Type: Task > >>>> > Reporter: Edward J. Yoon > >>>> > > >>>> > BSP tasks can be used for aggregating log data streamed in real > time. > >>>> With this research task, we might able to platformization these kind > of > >>>> processing. > >>>> > >>>> > >>>> > >>>> -- > >>>> This message was sent by Atlassian JIRA > >>>> (v6.2#6252) > >>>> > >>> > >>> > >>> > >>> -- > >>> ------ > >>> Yexi Jiang, > >>> ECS 251, [email protected] > >>> School of Computer and Information Science, > >>> Florida International University > >>> Homepage: http://users.cis.fiu.edu/~yjian004/ > >> > >> > >> > >> -- > >> Edward J. Yoon (@eddieyoon) > >> Chief Executive Officer > >> DataSayer Co., Ltd. > -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer Co., Ltd.
