My rough idea assumes that dedicated Hama is installed on machines that
generates logs, and the number of child tasks will be launched equally per
GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
At first superstep, one task broadcasts the Topology after grouping the
Tasks into 3 groups.

== Group1 ==
server1:60001
server2:60001
server3:60001

== Group2 ==
server1:60002
server2:60002
server3:60002

== Group3 ==
server1:60003
server2:60003
server3:60003

Based on this Topolgy, tasks reflects proper class and executes it. Then,
it'll work like Storm flow. I didn't think about FT issue yet. :-)



On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <[email protected]>wrote:

> Or we can have POC first and then see how it relates to the issue we
> might need to fix.
>
> On 11 April 2014 16:10, Chia-Hung Lin <[email protected]> wrote:
> > In that case are we going to organize multiple tasks into a group? A
> > job has N bsp groups (bsp task in current code), in turn each group
> > contain multiple tasks (and all tasks are on the same server)?
> >
> > If this is the case, how do they send messages or communicate between
> > groups? group to group? A task (within a group) can arbitrary send the
> > messages?
> >
> > I have this question because this would have implication on FT. IIRC
> > Storm is a CEP framework, and messages can be sent arbitrary to every
> > bolt. The issue with such computation is that it's not a simple task
> > when performing checkpoint. Generally it's done through communication
> > induced checkpointing. Otherwise like storm they ack and redo each
> > message when necessary; an option is something like batch (in storm
> > like trident batch processing if I am correct) transactional
> > processing.
> >
> > What I can think of right now is, with current structure, grouping
> > every N messages a superstep, and then asynchronously checkpointing,
> > which may be similar to trident batch processing.
> >
> > I understand it's still far away based on the current status. I
> > suppose it's good if we can take that into consideration beforehand as
> > well.
> >
> >
> >
> >
> >
> > On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote:
> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable
> >> bolts seems pretty nice (especially, chainable bolts can be really
> >> useful in case of real-time join operation).
> >>
> >> I think, we can also implement similar functions of Storm's task
> >> grouping and chainable bolts on BSP. My rough idea is:
> >>
> >> 1. Launches multi-tasks per node (as number of group of Bolts). For
> example:
> >>
> >> +---------------+
> >> |    Server1    |
> >> +---------------+
> >> Task-1. tailing bolt
> >> Task-2. split sentence bolt
> >> Task-3. wordcount bolt
> >>
> >> 2. Assign the tasks to proper group.
> >> --
> >> 3. Each task executes their user-defined function and sends messages
> >> to task of next group.
> >> 4. Synchronizes all.
> >> --
> >> 5. Finally, repeat the above 3 ~ 4 process.
> >>
> >> In here, only the difficult one is how to determine the task group at
> >> initial superstep. So, I'd like to add below one to BSPPeer interface.
> >>
> >>   /**
> >>    * @return the names of locally adjacent peers (including this peer).
> >>    */
> >>   public String[] getAdjacentPeerNames();
> >>
> >>
> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]>
> wrote:
> >>> great~
> >>>
> >>>
> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>:
> >>>
> >>>>
> >>>>     [
> >>>>
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430
> ]
> >>>>
> >>>> Edward J. Yoon commented on HAMA-883:
> >>>> -------------------------------------
> >>>>
> >>>> NOTE: my fellow worker is currently working on this issue -
> >>>> https://github.com/garudakang/meerkat
> >>>>
> >>>> > [Research Task] Massive log event aggregation in real time using
> Apache
> >>>> Hama
> >>>> >
> >>>>
> ----------------------------------------------------------------------------
> >>>> >
> >>>> >                 Key: HAMA-883
> >>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
> >>>> >             Project: Hama
> >>>> >          Issue Type: Task
> >>>> >            Reporter: Edward J. Yoon
> >>>> >
> >>>> > BSP tasks can be used for aggregating log data streamed in real
> time.
> >>>> With this research task, we might able to platformization these kind
> of
> >>>> processing.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message was sent by Atlassian JIRA
> >>>> (v6.2#6252)
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> ------
> >>> Yexi Jiang,
> >>> ECS 251,  [email protected]
> >>> School of Computer and Information Science,
> >>> Florida International University
> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>
> >>
> >>
> >> --
> >> Edward J. Yoon (@eddieyoon)
> >> Chief Executive Officer
> >> DataSayer Co., Ltd.
>



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer Co., Ltd.

Reply via email to