In that case are we going to organize multiple tasks into a group? A
job has N bsp groups (bsp task in current code), in turn each group
contain multiple tasks (and all tasks are on the same server)?

If this is the case, how do they send messages or communicate between
groups? group to group? A task (within a group) can arbitrary send the
messages?

I have this question because this would have implication on FT. IIRC
Storm is a CEP framework, and messages can be sent arbitrary to every
bolt. The issue with such computation is that it's not a simple task
when performing checkpoint. Generally it's done through communication
induced checkpointing. Otherwise like storm they ack and redo each
message when necessary; an option is something like batch (in storm
like trident batch processing if I am correct) transactional
processing.

What I can think of right now is, with current structure, grouping
every N messages a superstep, and then asynchronously checkpointing,
which may be similar to trident batch processing.

I understand it's still far away based on the current status. I
suppose it's good if we can take that into consideration beforehand as
well.





On 11 April 2014 13:40, Edward J. Yoon <[email protected]> wrote:
> Yesterday, I had survey the Storm. Storm's task grouping and chainable
> bolts seems pretty nice (especially, chainable bolts can be really
> useful in case of real-time join operation).
>
> I think, we can also implement similar functions of Storm's task
> grouping and chainable bolts on BSP. My rough idea is:
>
> 1. Launches multi-tasks per node (as number of group of Bolts). For example:
>
> +---------------+
> |    Server1    |
> +---------------+
> Task-1. tailing bolt
> Task-2. split sentence bolt
> Task-3. wordcount bolt
>
> 2. Assign the tasks to proper group.
> --
> 3. Each task executes their user-defined function and sends messages
> to task of next group.
> 4. Synchronizes all.
> --
> 5. Finally, repeat the above 3 ~ 4 process.
>
> In here, only the difficult one is how to determine the task group at
> initial superstep. So, I'd like to add below one to BSPPeer interface.
>
>   /**
>    * @return the names of locally adjacent peers (including this peer).
>    */
>   public String[] getAdjacentPeerNames();
>
>
> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <[email protected]> wrote:
>> great~
>>
>>
>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <[email protected]>:
>>
>>>
>>>     [
>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430]
>>>
>>> Edward J. Yoon commented on HAMA-883:
>>> -------------------------------------
>>>
>>> NOTE: my fellow worker is currently working on this issue -
>>> https://github.com/garudakang/meerkat
>>>
>>> > [Research Task] Massive log event aggregation in real time using Apache
>>> Hama
>>> >
>>> ----------------------------------------------------------------------------
>>> >
>>> >                 Key: HAMA-883
>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>>> >             Project: Hama
>>> >          Issue Type: Task
>>> >            Reporter: Edward J. Yoon
>>> >
>>> > BSP tasks can be used for aggregating log data streamed in real time.
>>> With this research task, we might able to platformization these kind of
>>> processing.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)
>>>
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  [email protected]
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer Co., Ltd.

Reply via email to