This is really not helpful, since the centers must be stored on each task
in RAM.
So the delta updates of each task must be broadcasted to a master,
calculated and send back.
And what are the machines doing during the master tasks calculates?
This is simply waste of resources and I very much doubt that it will be
faster.

And n^2 is not exponential growth FYI.

Am 5. April 2012 16:45 schrieb Praveen Sripati <[email protected]>:

> As the # of tasks increases, the # of messages increases exponentially. So,
> it might be useful to consider/benchmark sending the messages to the
> master, let it compute and send the data back to all the tasks.
>
> Praveen
>
> On Thu, Apr 5, 2012 at 10:51 AM, Thomas Jungblut <
> [email protected]> wrote:
>
> > The sync() is much more expensive than sending n^2 messages.
> > But it would be very interesting to benchmark both against each other.
> > Another interesting thing would be to know how HAMA-546 [1] could be used
> > to distribute the centers to multiple tasks.
> >
> > [1] https://issues.apache.org/jira/browse/HAMA-546
> >
> > Am 5. April 2012 02:51 schrieb Praveen Sripati <[email protected]
> >:
> >
> > >
> > >
> >
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html
> > >
> > > > Now we are going to broadcast each of this computed averages to the
> > other
> > > tasks. Then we are going to sync so all messages can be delivered.
> > >
> > > Instead of sending the computed averages to all the tasks, we could
> send
> > > them to a master task and let the master task do all the computations
> and
> > > send it back to all the nodes. This way we are decreasing the number of
> > > messages from n*n to 2*n and less computation on the non-master tasks.
> > >
> > > Also, can we have a dedicated master task or make one of the bsp task
> as
> > a
> > > master task?
> > >
> > > Thanks,
> > > Praveen
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <[email protected]>
> >
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Reply via email to