Re: Queries on Hama Usage

Thomas Jungblut Fri, 23 Mar 2012 04:02:22 -0700

Hey,

1: Each task has its own RPC Server, so you directly send to a task, rather
than to a groom.
2: BSPMessageBundle is a bundle of messages that are batched per
destination to improve the transfer speed. Combiners are there to do the
same purpose, so you return a "message-batch" when combining.
3: Hadoop is input-driven. That's from the functional programming where you
have an input list and apply functions like map and reduce on it.
BSP is not strongly functional related and we had no input before. For
several task no input is a valid input, e.G. realtime processing. However
you want to control the parallelization factor by controlling how many
tasks are launched.
So it is a mixture of backward compatibility and freedom of launching a few
tasks in a cluster without input.


Regarding your other mail, if you want to contribute parts of a mapreduce
version, feel free to code one. I have not scheduled it to any release
since this is just a "side-effect" example.

Hope I clarified it :)
Thanks!

Am 23. März 2012 11:47 schrieb Praveen Sripati <[email protected]>:

> Hi,
>
> 1. 0.4.0 introduced multiple tasks on groom servers. How does the framework
> send a message to a particular task on a groom server. If I am not wrong,
> BSPPeer.send() sends messages to all the tasks on a groom server and it is
> an overhead.
>
> 2. What is the difference between message combiners (0.4.0) and
> BSPMessageBundle (0.3.0)?
>
> 3. What is the significance of BSPJob.setNumBspTask()? I thought that in
> Hama the input will be split and a task will be spawned for each split in
> the groom server similar to Hadoop?
>
> Regards,
> Praveen
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Queries on Hama Usage

Reply via email to