As the # of tasks increases, the # of messages increases exponentially. So, it might be useful to consider/benchmark sending the messages to the master, let it compute and send the data back to all the tasks.
Praveen On Thu, Apr 5, 2012 at 10:51 AM, Thomas Jungblut < [email protected]> wrote: > The sync() is much more expensive than sending n^2 messages. > But it would be very interesting to benchmark both against each other. > Another interesting thing would be to know how HAMA-546 [1] could be used > to distribute the centers to multiple tasks. > > [1] https://issues.apache.org/jira/browse/HAMA-546 > > Am 5. April 2012 02:51 schrieb Praveen Sripati <[email protected]>: > > > > > > http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html > > > > > Now we are going to broadcast each of this computed averages to the > other > > tasks. Then we are going to sync so all messages can be delivered. > > > > Instead of sending the computed averages to all the tasks, we could send > > them to a master task and let the master task do all the computations and > > send it back to all the nodes. This way we are decreasing the number of > > messages from n*n to 2*n and less computation on the non-master tasks. > > > > Also, can we have a dedicated master task or make one of the bsp task as > a > > master task? > > > > Thanks, > > Praveen > > > > > > -- > Thomas Jungblut > Berlin <[email protected]> >
