> > So a map task in MR corresponds to a computation phase in a superstep. Once > the computation phase for a superstep is complete, the vertex output is > stored using the defined OutputFormat, the message sent (may be) to another > vertex and the map task is stopped. Once the barrier synchronization phase > is complete, another set of map tasks are invoked for the vertices which > have received a message. >
Consult giraph for this purpose, we don't provide this functionality. What happens if a particular node is lost in case of Hama and Giraph? Are > the messages not persisted somewhere to be fetched later. > There is a checkpointer after each superstep that is materializing messages to HDFS. It's being the done other way, BSP is implemented in Giraph using Hadoop. > Yea, because Google released the MapReduce paper years before the Pregel paper. I would have wondered how things had turned arround for the other way. 2011/12/9 Praveen Sripati <[email protected]> > Thanks to Thomas and Avery for the response. > > > For Giraph you are quite correct, all the stuff is submitted as a MR job. > But a full map stage is not a superstep, the whole computation is a done in > one mapping phase. > > So a map task in MR corresponds to a computation phase in a superstep. Once > the computation phase for a superstep is complete, the vertex output is > stored using the defined OutputFormat, the message sent (may be) to another > vertex and the map task is stopped. Once the barrier synchronization phase > is complete, another set of map tasks are invoked for the vertices which > have received a message. > > In a regular MR Job (not Giraph) the number of Map tasks equals to the > number of InputSplits. But, in case of Giraph the total number of maps to > be launched is usually more than the number of input vertices. > > Please let me know if I am correct. > > > Where are the incoming, outgoing messages and state stored > > Memory > > What happens if a particular node is lost in case of Hama and Giraph? Are > the messages not persisted somewhere to be fetched later. > > > In Giraph, vertices can move around workers between supersteps. A vertex > will run on the worker that it is assigned to. > > Is data locality considered while moving vertices around workers in Giraph? > > > As you can see, you could write a MapReduce Engine with BSP on top of > Apache Hama. > > It's being the done other way, BSP is implemented in Giraph using Hadoop. > > Praveen > > On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching <[email protected]> wrote: > > > Hi Praveen, > > > > Answers inline. Hope that helps! > > > > Avery > > > > On 12/8/11 10:16 PM, Praveen Sripati wrote: > > > > Hi, > > > > I know about MapReduce/Hadoop and trying to get myself around > > BSP/Hama-Giraph by comparing MR and BSP. > > > > - Map Phase in MR is similar to Computation Phase in BSP. BSP allows for > > process to exchange data in the communication phase, but there is no > > communication between the mappers in the Map Phase. Though the data flows > > from Map tasks to Reducer tasks. Please correct me if I am wrong. Any > other > > significant differences? > > > > I suppose you can think of it that way. I like to compare a BSP > superstep > > to a MapReduce job since it's computation and communication. > > > > - After going through the documentation for Hama and Giraph, noticed that > > they both use Hadoop as the underlying framework. In both Hama and Giraph > > an MR Job is submitted. Does each superstep in BSP correspond to a Job in > > MR? Where are the incoming, outgoing messages and state stored - HDFS or > > HBase or Local or pluggable? > > > > My understanding of Hama is that they have their own BSP framework. > > Giraph can be run on a Hadoop installation, it does not have its own > > computational framework. A Giraph job is submitted to a Hadoop > > installation as a Map-only job. Hama will have its own BSP lauching > > framework. > > > > In Giraph, the state is stored all in memory. Graphs are loaded/stored > > through VertexInputFormat/VertexOutputFormat (very similar to Hadoop). > You > > could implement your own VertexInputFormat/VertexOutputFormat to use > HDFS, > > HBase, etc. as your graph stable storage. > > > > - If a Vertex is deactivated and again activated after receiving a > > message, does is run on the same node or a different node in the cluster? > > > > In Giraph, vertices can move around workers between supersteps. A > vertex > > will run on the worker that it is assigned to. > > > > Regards, > > Praveen > > > > > > > -- Thomas Jungblut Berlin <[email protected]>
