Re: Comparing BSP and MR

Thomas Jungblut Thu, 08 Dec 2011 23:10:33 -0800

Hi Praveen,

I have yeen the question on Stackoverflow, but I thought you're going to
ask this on the mailing lists either.
For Giraph you are quite correct, all the stuff is submitted as a MR job.
But a full map stage is not a superstep, the whole computation is a done in
one mapping phase.
In Hama instead, we have our own infrastructure (so no MR Job Submission),
but we share Hadoop HDFS.
We (Hama) are a BSP framework which aims to get a higher abstraction than
Giraph, which is just focused on graph computing.


Where are the incoming, outgoing messages and state stored


Memory.

 Map Phase in MR is similar to Computation Phase in BSP. BSP allows for
> process to exchange data in the communication phase, but there is no
> communication between the mappers in the Map Phase. Though the data flows
> from Map tasks to Reducer tasks. Please correct me if I am wrong. Any other
> significant differences?


When joining Hama, I started my blog (thx edward) and written abit about
the comparision to MapReduce:

If you translate MapReduce into BSP, then your map-phase will be the local
> computation-phase. After that you are going to merge the map output and
> sort it. That would be the communication phase. Now comes the Barrier
> Synchronisation: You know that no reducer can run if not all map task
> completed. So this step is a bit merged with the communication phase, but
> after that it is entering a new local computation phase: the reduce-phase.
> So you see, BSP is not the same like MapReduce, but you can actually
> describe MapReduce with BSP.


http://codingwiththomas.blogspot.com/2011/04/back-to-blogsphere.html

As you can see, you could write a MapReduce Engine with BSP on top of
Apache Hama.

Hope this helps, cu@StackOverflow.

2011/12/9 Praveen Sripati <[email protected]>

> Hi,
>
> I know about MapReduce/Hadoop and trying to get myself around
> BSP/Hama-Giraph by comparing MR and BSP.
>
> - Map Phase in MR is similar to Computation Phase in BSP. BSP allows for
> process to exchange data in the communication phase, but there is no
> communication between the mappers in the Map Phase. Though the data flows
> from Map tasks to Reducer tasks. Please correct me if I am wrong. Any other
> significant differences?
>
> - After going through the documentation for Hama and Giraph, noticed that
> they both use Hadoop as the underlying framework. In both Hama and Giraph
> an MR Job is submitted. Does each superstep in BSP correspond to a Job in
> MR? Where are the incoming, outgoing messages and state stored - HDFS or
> HBase or Local or pluggable?
>
> - If a Vertex is deactivated and again activated after receiving a message,
> does is run on the same node or a different node in the cluster?
>
> Regards,
> Praveen
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Comparing BSP and MR

Reply via email to