Thanks for your tips, I transfer this to our dev-list for discussion. 2011/9/24 changguanghui <[email protected]>
> I think,maybe, It is important to find some algorithm or some problem which > is more suitable for using HAMA. Then, people can observe the contrast to > the results between HAMA and MapReduce. Because more people want to know why > they should choose HAMA, when they should choose HAMA..... > > -----邮件原件----- > 发件人: Thomas Jungblut [mailto:[email protected]] > 发送时间: 2011年9月23日 19:39 > 收件人: [email protected] > 主题: Re: compared with MapReduce ,what is the advantage of HAMA? > > Hi, > to clearly state the advantage: you have less overhead. > Let me illustrate an algorithm for mindist search, I renamed it to graph > exploration. This will apply on Shortest Paths, too. > I wrote about it here: > > http://codingwiththomas.blogspot.com/2011/04/graph-exploration-with-hadoop-mapreduce.html > > Basically the algorithm groups the components of the graph and assigns the > lowest key of the group as an identifier for the component. > Usually you are solving graph problems with MapReduce with a technique > called "Message Passing". > So you are going to send messages to other vertices in every map step. Then > you have to shuffle, sort and reduce the vertices to compute the result. > This isn't done with a single iteration, so you have to chain several > map/reduce jobs. > > For each iteration you inherit the overhead of sorting and shuffeling. > Additional you have to do this on the disk. > > Hama provides a message passing interface, so you don't have to take care > of > writing each message to HDFS. > Each iteration, which is in MapReduce a full job execution, is called a > superstep in BSP. > Each superstep is faster than a full job execution in Hadoop, because you > don't have the overhead with spilling to disk, job setup, sorting and > shuffeling. > In addition you can put your whole graph into RAM, this will speed up the > computation anyways. Hadoop does not offer this capability yet. > > But I want to point out some facts that are not positive though: > Currently no benchmarks against Hadoop or other Frameworks like Giraph or > GoldenORB exist, so we can't say: we are the best/fastest/coolest. > And graph algorithms are a hard way to code. As you can see, I have written > lots of code to get this running. That is because I have to take care of > the > partitioning, vertex messaging and IO stuff by myself. > For that purpose we are going to release a Pregel API which makes the > development of graph algorithms a lot more easier. > You can get a sneak peek here: > https://issues.apache.org/jira/browse/HAMA-409 > > That was a lot of text, but I hope to clarify a lot. > > Best Regards, > Thomas > > 2011/9/23 changguanghui <[email protected]> > > > Hi Thomas, > > > > Could you provide a concrete instance to illustrate the advantage of > HAMA, > > when HAMA vs. MapReduce? > > > > For example,SSSP on HAMA vs. SSSP on MapReduce. So ,I can catch the idea > of > > HAMA quickly. > > > > Thank you very much! > > > > Changguanghui > > > -- Thomas Jungblut Berlin mobile: 0170-3081070 business: [email protected] private: [email protected]
