No problem. It's a good discussion so we can examine and improve accordingly.
I am still not very sure about the topology, or how tasks are grouped. >From description, it seems looks as the link below: http://i.imgur.com/92L2XY1.png Each GroomServer is viewed as a group, and each group will launch 3 tasks by default (as default xml defined). So the corresponded messages, emitted from source like queue, is sent to each group for consumption? And how do task communicate between groups/ tasks? On 11 April 2014 16:43, Edward J. Yoon <[email protected]> wrote: > My rough idea assumes that dedicated Hama is installed on machines that > generates logs, and the number of child tasks will be launched equally per > GroomServer. So, if the groups == 3, framework launches 3 tasks per node. > At first superstep, one task broadcasts the Topology after grouping the > Tasks into 3 groups. > > == Group1 == > server1:60001 > server2:60001 > server3:60001 > > == Group2 == > server1:60002 > server2:60002 > server3:60002 > > == Group3 == > server1:60003 > server2:60003 > server3:60003 > > Based on this Topolgy, tasks reflects proper class and executes it. Then, > it'll work like Storm flow. I didn't think about FT issue yet. :-) > > > > On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <[email protected]>wrote: > >> Or we can have POC first and then see how it relates to the issue we >> might need to fix. >> >> On 11 April 2014 16:10, Chia-Hung Lin <[email protected]> wrote: >> > In that case are we going to organize multiple tasks into a group? A >> > job has N bsp groups (bsp task in current code), in turn each group >> > contain multiple tasks (and all tasks are on the same server)? >> > >> > If this is the case, how do they send messages or communicate between >> > groups? group to group? A task (within a group) can arbitrary send the >> > messages? >> > >> > I have this question because this would have implication on FT. IIRC >> > Storm is a CEP framework, and messages can be sent arbitrary to every >> > bolt. The issue with such computation is that it's not a simple task >> > when performing checkpoint. Generally it's done through communication >> > induced checkpointing. Otherwise like storm they ack and redo each >> > message when necessary; an option is something like batch (in storm >> > like trident batch processing if I am correct) transactional >> > processing. >> > >> > What I can think of right now is, with current structure, grouping >> > every N messages a superstep, and then asynchronously checkpointing, >> > which may be similar to trident batch processing. >> > >> > I understand it's still far away based on the current status. I >> > suppose it's good if we can take that into consideration beforehand as >> > well. >> > >> > >> > >> > >> > >> > On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote: >> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable >> >> bolts seems pretty nice (especially, chainable bolts can be really >> >> useful in case of real-time join operation). >> >> >> >> I think, we can also implement similar functions of Storm's task >> >> grouping and chainable bolts on BSP. My rough idea is: >> >> >> >> 1. Launches multi-tasks per node (as number of group of Bolts). For >> example: >> >> >> >> +---------------+ >> >> | Server1 | >> >> +---------------+ >> >> Task-1. tailing bolt >> >> Task-2. split sentence bolt >> >> Task-3. wordcount bolt >> >> >> >> 2. Assign the tasks to proper group. >> >> -- >> >> 3. Each task executes their user-defined function and sends messages >> >> to task of next group. >> >> 4. Synchronizes all. >> >> -- >> >> 5. Finally, repeat the above 3 ~ 4 process. >> >> >> >> In here, only the difficult one is how to determine the task group at >> >> initial superstep. So, I'd like to add below one to BSPPeer interface. >> >> >> >> /** >> >> * @return the names of locally adjacent peers (including this peer). >> >> */ >> >> public String[] getAdjacentPeerNames(); >> >> >> >> >> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]> >> wrote: >> >>> great~ >> >>> >> >>> >> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>: >> >>> >> >>>> >> >>>> [ >> >>>> >> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430 >> ] >> >>>> >> >>>> Edward J. Yoon commented on HAMA-883: >> >>>> ------------------------------------- >> >>>> >> >>>> NOTE: my fellow worker is currently working on this issue - >> >>>> https://github.com/garudakang/meerkat >> >>>> >> >>>> > [Research Task] Massive log event aggregation in real time using >> Apache >> >>>> Hama >> >>>> > >> >>>> >> ---------------------------------------------------------------------------- >> >>>> > >> >>>> > Key: HAMA-883 >> >>>> > URL: https://issues.apache.org/jira/browse/HAMA-883 >> >>>> > Project: Hama >> >>>> > Issue Type: Task >> >>>> > Reporter: Edward J. Yoon >> >>>> > >> >>>> > BSP tasks can be used for aggregating log data streamed in real >> time. >> >>>> With this research task, we might able to platformization these kind >> of >> >>>> processing. >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> This message was sent by Atlassian JIRA >> >>>> (v6.2#6252) >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> ------ >> >>> Yexi Jiang, >> >>> ECS 251, [email protected] >> >>> School of Computer and Information Science, >> >>> Florida International University >> >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >> >> >> >> >> >> >> >> -- >> >> Edward J. Yoon (@eddieyoon) >> >> Chief Executive Officer >> >> DataSayer Co., Ltd. >> > > > > -- > Edward J. Yoon (@eddieyoon) > Chief Executive Officer > DataSayer Co., Ltd.
