No .. Please read my mail again. One task creates the topology map and broadcast to all peers at first super step.
MapWritable<GroupName, List<HostName>> topology; .. On Sat, Apr 12, 2014 at 3:16 AM, Chia-Hung Lin <[email protected]> wrote: > No problem. It's a good discussion so we can examine and improve accordingly. > > I am still not very sure about the topology, or how tasks are grouped. > From description, it seems looks as the link below: > > http://i.imgur.com/92L2XY1.png > > Each GroomServer is viewed as a group, and each group will launch 3 > tasks by default (as default xml defined). So the corresponded > messages, emitted from source like queue, is sent to each group for > consumption? And how do task communicate between groups/ tasks? > > > > > On 11 April 2014 16:43, Edward J. Yoon <[email protected]> wrote: >> My rough idea assumes that dedicated Hama is installed on machines that >> generates logs, and the number of child tasks will be launched equally per >> GroomServer. So, if the groups == 3, framework launches 3 tasks per node. >> At first superstep, one task broadcasts the Topology after grouping the >> Tasks into 3 groups. >> >> == Group1 == >> server1:60001 >> server2:60001 >> server3:60001 >> >> == Group2 == >> server1:60002 >> server2:60002 >> server3:60002 >> >> == Group3 == >> server1:60003 >> server2:60003 >> server3:60003 >> >> Based on this Topolgy, tasks reflects proper class and executes it. Then, >> it'll work like Storm flow. I didn't think about FT issue yet. :-) >> >> >> >> On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <[email protected]>wrote: >> >>> Or we can have POC first and then see how it relates to the issue we >>> might need to fix. >>> >>> On 11 April 2014 16:10, Chia-Hung Lin <[email protected]> wrote: >>> > In that case are we going to organize multiple tasks into a group? A >>> > job has N bsp groups (bsp task in current code), in turn each group >>> > contain multiple tasks (and all tasks are on the same server)? >>> > >>> > If this is the case, how do they send messages or communicate between >>> > groups? group to group? A task (within a group) can arbitrary send the >>> > messages? >>> > >>> > I have this question because this would have implication on FT. IIRC >>> > Storm is a CEP framework, and messages can be sent arbitrary to every >>> > bolt. The issue with such computation is that it's not a simple task >>> > when performing checkpoint. Generally it's done through communication >>> > induced checkpointing. Otherwise like storm they ack and redo each >>> > message when necessary; an option is something like batch (in storm >>> > like trident batch processing if I am correct) transactional >>> > processing. >>> > >>> > What I can think of right now is, with current structure, grouping >>> > every N messages a superstep, and then asynchronously checkpointing, >>> > which may be similar to trident batch processing. >>> > >>> > I understand it's still far away based on the current status. I >>> > suppose it's good if we can take that into consideration beforehand as >>> > well. >>> > >>> > >>> > >>> > >>> > >>> > On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote: >>> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable >>> >> bolts seems pretty nice (especially, chainable bolts can be really >>> >> useful in case of real-time join operation). >>> >> >>> >> I think, we can also implement similar functions of Storm's task >>> >> grouping and chainable bolts on BSP. My rough idea is: >>> >> >>> >> 1. Launches multi-tasks per node (as number of group of Bolts). For >>> example: >>> >> >>> >> +---------------+ >>> >> | Server1 | >>> >> +---------------+ >>> >> Task-1. tailing bolt >>> >> Task-2. split sentence bolt >>> >> Task-3. wordcount bolt >>> >> >>> >> 2. Assign the tasks to proper group. >>> >> -- >>> >> 3. Each task executes their user-defined function and sends messages >>> >> to task of next group. >>> >> 4. Synchronizes all. >>> >> -- >>> >> 5. Finally, repeat the above 3 ~ 4 process. >>> >> >>> >> In here, only the difficult one is how to determine the task group at >>> >> initial superstep. So, I'd like to add below one to BSPPeer interface. >>> >> >>> >> /** >>> >> * @return the names of locally adjacent peers (including this peer). >>> >> */ >>> >> public String[] getAdjacentPeerNames(); >>> >> >>> >> >>> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]> >>> wrote: >>> >>> great~ >>> >>> >>> >>> >>> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>: >>> >>> >>> >>>> >>> >>>> [ >>> >>>> >>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430 >>> ] >>> >>>> >>> >>>> Edward J. Yoon commented on HAMA-883: >>> >>>> ------------------------------------- >>> >>>> >>> >>>> NOTE: my fellow worker is currently working on this issue - >>> >>>> https://github.com/garudakang/meerkat >>> >>>> >>> >>>> > [Research Task] Massive log event aggregation in real time using >>> Apache >>> >>>> Hama >>> >>>> > >>> >>>> >>> ---------------------------------------------------------------------------- >>> >>>> > >>> >>>> > Key: HAMA-883 >>> >>>> > URL: https://issues.apache.org/jira/browse/HAMA-883 >>> >>>> > Project: Hama >>> >>>> > Issue Type: Task >>> >>>> > Reporter: Edward J. Yoon >>> >>>> > >>> >>>> > BSP tasks can be used for aggregating log data streamed in real >>> time. >>> >>>> With this research task, we might able to platformization these kind >>> of >>> >>>> processing. >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> This message was sent by Atlassian JIRA >>> >>>> (v6.2#6252) >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> ------ >>> >>> Yexi Jiang, >>> >>> ECS 251, [email protected] >>> >>> School of Computer and Information Science, >>> >>> Florida International University >>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>> >> >>> >> >>> >> >>> >> -- >>> >> Edward J. Yoon (@eddieyoon) >>> >> Chief Executive Officer >>> >> DataSayer Co., Ltd. >>> >> >> >> >> -- >> Edward J. Yoon (@eddieyoon) >> Chief Executive Officer >> DataSayer Co., Ltd. -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer Co., Ltd.
