Or we can have POC first and then see how it relates to the issue we might need to fix.
On 11 April 2014 16:10, Chia-Hung Lin <[email protected]> wrote: > In that case are we going to organize multiple tasks into a group? A > job has N bsp groups (bsp task in current code), in turn each group > contain multiple tasks (and all tasks are on the same server)? > > If this is the case, how do they send messages or communicate between > groups? group to group? A task (within a group) can arbitrary send the > messages? > > I have this question because this would have implication on FT. IIRC > Storm is a CEP framework, and messages can be sent arbitrary to every > bolt. The issue with such computation is that it's not a simple task > when performing checkpoint. Generally it's done through communication > induced checkpointing. Otherwise like storm they ack and redo each > message when necessary; an option is something like batch (in storm > like trident batch processing if I am correct) transactional > processing. > > What I can think of right now is, with current structure, grouping > every N messages a superstep, and then asynchronously checkpointing, > which may be similar to trident batch processing. > > I understand it's still far away based on the current status. I > suppose it's good if we can take that into consideration beforehand as > well. > > > > > > On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote: >> Yesterday, I had survey the Storm. Storm's task grouping and chainable >> bolts seems pretty nice (especially, chainable bolts can be really >> useful in case of real-time join operation). >> >> I think, we can also implement similar functions of Storm's task >> grouping and chainable bolts on BSP. My rough idea is: >> >> 1. Launches multi-tasks per node (as number of group of Bolts). For example: >> >> +---------------+ >> | Server1 | >> +---------------+ >> Task-1. tailing bolt >> Task-2. split sentence bolt >> Task-3. wordcount bolt >> >> 2. Assign the tasks to proper group. >> -- >> 3. Each task executes their user-defined function and sends messages >> to task of next group. >> 4. Synchronizes all. >> -- >> 5. Finally, repeat the above 3 ~ 4 process. >> >> In here, only the difficult one is how to determine the task group at >> initial superstep. So, I'd like to add below one to BSPPeer interface. >> >> /** >> * @return the names of locally adjacent peers (including this peer). >> */ >> public String[] getAdjacentPeerNames(); >> >> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]> wrote: >>> great~ >>> >>> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>: >>> >>>> >>>> [ >>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430] >>>> >>>> Edward J. Yoon commented on HAMA-883: >>>> ------------------------------------- >>>> >>>> NOTE: my fellow worker is currently working on this issue - >>>> https://github.com/garudakang/meerkat >>>> >>>> > [Research Task] Massive log event aggregation in real time using Apache >>>> Hama >>>> > >>>> ---------------------------------------------------------------------------- >>>> > >>>> > Key: HAMA-883 >>>> > URL: https://issues.apache.org/jira/browse/HAMA-883 >>>> > Project: Hama >>>> > Issue Type: Task >>>> > Reporter: Edward J. Yoon >>>> > >>>> > BSP tasks can be used for aggregating log data streamed in real time. >>>> With this research task, we might able to platformization these kind of >>>> processing. >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v6.2#6252) >>>> >>> >>> >>> >>> -- >>> ------ >>> Yexi Jiang, >>> ECS 251, [email protected] >>> School of Computer and Information Science, >>> Florida International University >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >> >> >> >> -- >> Edward J. Yoon (@eddieyoon) >> Chief Executive Officer >> DataSayer Co., Ltd.
