The sync() is much more expensive than sending n^2 messages. But it would be very interesting to benchmark both against each other. Another interesting thing would be to know how HAMA-546 [1] could be used to distribute the centers to multiple tasks.
[1] https://issues.apache.org/jira/browse/HAMA-546 Am 5. April 2012 02:51 schrieb Praveen Sripati <[email protected]>: > > http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html > > > Now we are going to broadcast each of this computed averages to the other > tasks. Then we are going to sync so all messages can be delivered. > > Instead of sending the computed averages to all the tasks, we could send > them to a master task and let the master task do all the computations and > send it back to all the nodes. This way we are decreasing the number of > messages from n*n to 2*n and less computation on the non-master tasks. > > Also, can we have a dedicated master task or make one of the bsp task as a > master task? > > Thanks, > Praveen > -- Thomas Jungblut Berlin <[email protected]>
