As the # of tasks increases, the # of messages increases exponentially. So,
it might be useful to consider/benchmark sending the messages to the
master, let it compute and send the data back to all the tasks.

Praveen

On Thu, Apr 5, 2012 at 10:51 AM, Thomas Jungblut <
[email protected]> wrote:

> The sync() is much more expensive than sending n^2 messages.
> But it would be very interesting to benchmark both against each other.
> Another interesting thing would be to know how HAMA-546 [1] could be used
> to distribute the centers to multiple tasks.
>
> [1] https://issues.apache.org/jira/browse/HAMA-546
>
> Am 5. April 2012 02:51 schrieb Praveen Sripati <[email protected]>:
>
> >
> >
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html
> >
> > > Now we are going to broadcast each of this computed averages to the
> other
> > tasks. Then we are going to sync so all messages can be delivered.
> >
> > Instead of sending the computed averages to all the tasks, we could send
> > them to a master task and let the master task do all the computations and
> > send it back to all the nodes. This way we are decreasing the number of
> > messages from n*n to 2*n and less computation on the non-master tasks.
> >
> > Also, can we have a dedicated master task or make one of the bsp task as
> a
> > master task?
> >
> > Thanks,
> > Praveen
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <[email protected]>
>

Reply via email to