This is really not helpful, since the centers must be stored on each task in RAM. So the delta updates of each task must be broadcasted to a master, calculated and send back. And what are the machines doing during the master tasks calculates? This is simply waste of resources and I very much doubt that it will be faster.
And n^2 is not exponential growth FYI. Am 5. April 2012 16:45 schrieb Praveen Sripati <[email protected]>: > As the # of tasks increases, the # of messages increases exponentially. So, > it might be useful to consider/benchmark sending the messages to the > master, let it compute and send the data back to all the tasks. > > Praveen > > On Thu, Apr 5, 2012 at 10:51 AM, Thomas Jungblut < > [email protected]> wrote: > > > The sync() is much more expensive than sending n^2 messages. > > But it would be very interesting to benchmark both against each other. > > Another interesting thing would be to know how HAMA-546 [1] could be used > > to distribute the centers to multiple tasks. > > > > [1] https://issues.apache.org/jira/browse/HAMA-546 > > > > Am 5. April 2012 02:51 schrieb Praveen Sripati <[email protected] > >: > > > > > > > > > > > http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html > > > > > > > Now we are going to broadcast each of this computed averages to the > > other > > > tasks. Then we are going to sync so all messages can be delivered. > > > > > > Instead of sending the computed averages to all the tasks, we could > send > > > them to a master task and let the master task do all the computations > and > > > send it back to all the nodes. This way we are decreasing the number of > > > messages from n*n to 2*n and less computation on the non-master tasks. > > > > > > Also, can we have a dedicated master task or make one of the bsp task > as > > a > > > master task? > > > > > > Thanks, > > > Praveen > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin <[email protected]> > > > -- Thomas Jungblut Berlin <[email protected]>
