Re: Comparing BSP and MR

Thomas Jungblut Fri, 09 Dec 2011 02:02:27 -0800

>
> So a map task in MR corresponds to a computation phase in a superstep. Once
> the computation phase for a superstep is complete, the vertex output is
> stored using the defined OutputFormat, the message sent (may be) to another
> vertex and the map task is stopped. Once the barrier synchronization phase
> is complete, another set of map tasks are invoked for the vertices which
> have received a message.
>


Consult giraph for this purpose, we don't provide this functionality.

What happens if a particular node is lost in case of Hama and Giraph? Are
> the messages not persisted somewhere to be fetched later.
>

There is a checkpointer after each superstep that is materializing messages
to HDFS.

It's being the done other way, BSP is implemented in Giraph using Hadoop.
>

Yea, because Google released the MapReduce paper years before the Pregel
paper.
I would have wondered how things had turned arround for the other way.

2011/12/9 Praveen Sripati <[email protected]>

> Thanks to Thomas and Avery for the response.
>
> > For Giraph you are quite correct, all the stuff is submitted as a MR job.
> But a full map stage is not a superstep, the whole computation is a done in
> one mapping phase.
>
> So a map task in MR corresponds to a computation phase in a superstep. Once
> the computation phase for a superstep is complete, the vertex output is
> stored using the defined OutputFormat, the message sent (may be) to another
> vertex and the map task is stopped. Once the barrier synchronization phase
> is complete, another set of map tasks are invoked for the vertices which
> have received a message.
>
> In a regular MR Job (not Giraph) the number of Map tasks equals to the
> number of InputSplits. But, in case of Giraph the total number of maps to
> be launched is usually more than the number of input vertices.
>
> Please let me know if I am correct.
>
> > Where are the incoming, outgoing messages and state stored
> > Memory
>
> What happens if a particular node is lost in case of Hama and Giraph? Are
> the messages not persisted somewhere to be fetched later.
>
> > In Giraph, vertices can move around workers between supersteps.  A vertex
> will run on the worker that it is assigned to.
>
> Is data locality considered while moving vertices around workers in Giraph?
>
> > As you can see, you could write a MapReduce Engine with BSP on top of
> Apache Hama.
>
> It's being the done other way, BSP is implemented in Giraph using Hadoop.
>
> Praveen
>
> On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching <[email protected]> wrote:
>
> >  Hi Praveen,
> >
> > Answers inline.  Hope that helps!
> >
> > Avery
> >
> > On 12/8/11 10:16 PM, Praveen Sripati wrote:
> >
> > Hi,
> >
> > I know about MapReduce/Hadoop and trying to get myself around
> > BSP/Hama-Giraph by comparing MR and BSP.
> >
> > - Map Phase in MR is similar to Computation Phase in BSP. BSP allows for
> > process to exchange data in the communication phase, but there is no
> > communication between the mappers in the Map Phase. Though the data flows
> > from Map tasks to Reducer tasks. Please correct me if I am wrong. Any
> other
> > significant differences?
> >
> > I suppose you can think of it that way.  I like to compare a BSP
> superstep
> > to a MapReduce job since it's computation and communication.
> >
> > - After going through the documentation for Hama and Giraph, noticed that
> > they both use Hadoop as the underlying framework. In both Hama and Giraph
> > an MR Job is submitted. Does each superstep in BSP correspond to a Job in
> > MR? Where are the incoming, outgoing messages and state stored - HDFS or
> > HBase or Local or pluggable?
> >
> >  My understanding of Hama is that they have their own BSP framework.
> > Giraph can be run on a Hadoop installation, it does not have its own
> > computational framework.  A Giraph job is submitted to a Hadoop
> > installation as a Map-only job.  Hama will have its own BSP lauching
> > framework.
> >
> > In Giraph, the state is stored all in memory.  Graphs are loaded/stored
> > through VertexInputFormat/VertexOutputFormat (very similar to Hadoop).
>  You
> > could implement your own VertexInputFormat/VertexOutputFormat to use
> HDFS,
> > HBase, etc. as your graph stable storage.
> >
> > - If a Vertex is deactivated and again activated after receiving a
> > message, does is run on the same node or a different node in the cluster?
> >
> >  In Giraph, vertices can move around workers between supersteps.  A
> vertex
> > will run on the worker that it is assigned to.
> >
> > Regards,
> > Praveen
> >
> >
> >
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Comparing BSP and MR

Reply via email to