Re: When do Giraph vertices receive their messages?
Hi Vincentius Martin, Since Giraph is based on Pregel, I would refer you to the paper *Pregel: A System for Large-Scale Graph Processing *for more details. Briefly speaking, in each superstep, 1. a worker (which is responsible for a partition of vertices) receives messages from others. A worker then divided these messages according to the destID and active vertices which have incoming messages. 2. a worker runs *compute* function of each active vertex. Meanwhile, the *compute* function may generate messages to other vertices. These messages are buffered, combined and sent in batches in an asynchronous way. 3. after a worker finishes *compute* function of all active vertex, it waits for all other workers finishing their *compute* functions. What is more, it waits for all sending tasks to finish to ensure all messages can be received in next superstep. Then every worker goes into next superstep. For your second problem, messages are stored in a buffer. On Mon, Nov 10, 2014 at 6:14 PM, Puneet Agarwal puagar...@yahoo.com wrote: These are some very interesting questions. I also would like to know the answers to these. - Puneet IIT Delhi, India On Monday, November 10, 2014 9:30 AM, Vincentius Martin vincentiusmar...@gmail.com wrote: I am curious about how does Giraph receive messages before processing it I know that they use their accepted messages in the compute() method on the next superstep, but when do they receive it? If it is before the checkpoint process, is there any part in the documentation/code that I can see to understand it? Also, what mechanism that Giraph use to store messages before superstep S+1? Are they store it in a buffer or disk first? I still cannot find anything about this. Regards, Vincentius Martin -- Best Regards. --- Xing FENG PhD Candidate Database Research Group School of Computer Science and Engineering University of New South Wales NSW 2052, Sydney Phone: (+61) 413 857 288
Re: When do Giraph vertices receive their messages?
Hi Vincentius, I'd recommend checking out the code in the call() method of this class https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/graph/ComputeCallable.java to try to follow the logic that occurs during computation in a superstep, as well as the code https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java for handling message sending and the execute method in GraphTaskManager https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java which basically handles the overall control flow of everything. I've found that for Giraph at some point you're going to more or less need to dig through the code to figure out what's going on behind the scenes. Looking at the call() method and computePartition() methods in ComputeCallable are pretty enlightening. As far as messaging goes it appears that everything is flushed from the sender before the end of the superstep. Someone else please correct me if I'm wrong about any of these things; I don't want to mislead anyone. Best, Matthew Saltz On Mon, Nov 10, 2014 at 12:18 PM, Vincentius Martin vincentiusmar...@gmail.com wrote: Hi XingFeng, thanks for your answer! Yes, I have already read Pregel paper, unfortunately there are some specific steps that I still couldn't grasp. Therefore, when does the checkpoint happen? Is it before/after the step 1 (the receiving messages phase) in your explanation? Also, according to your explanation, I can deduce that at the beginning of each superstep, the messages are still in the sender workers' buffer and each of the sender workers will send them at this phase. Am I right? Regards, Vincentius Martin On Mon, Nov 10, 2014 at 5:49 PM, XingFENG xingf...@cse.unsw.edu.au wrote: Hi Vincentius Martin, Since Giraph is based on Pregel, I would refer you to the paper *Pregel: A System for Large-Scale Graph Processing *for more details. Briefly speaking, in each superstep, 1. a worker (which is responsible for a partition of vertices) receives messages from others. A worker then divided these messages according to the destID and active vertices which have incoming messages. 2. a worker runs *compute* function of each active vertex. Meanwhile, the *compute* function may generate messages to other vertices. These messages are buffered, combined and sent in batches in an asynchronous way. 3. after a worker finishes *compute* function of all active vertex, it waits for all other workers finishing their *compute* functions. What is more, it waits for all sending tasks to finish to ensure all messages can be received in next superstep. Then every worker goes into next superstep. For your second problem, messages are stored in a buffer. On Mon, Nov 10, 2014 at 6:14 PM, Puneet Agarwal puagar...@yahoo.com wrote: These are some very interesting questions. I also would like to know the answers to these. - Puneet IIT Delhi, India On Monday, November 10, 2014 9:30 AM, Vincentius Martin vincentiusmar...@gmail.com wrote: I am curious about how does Giraph receive messages before processing it I know that they use their accepted messages in the compute() method on the next superstep, but when do they receive it? If it is before the checkpoint process, is there any part in the documentation/code that I can see to understand it? Also, what mechanism that Giraph use to store messages before superstep S+1? Are they store it in a buffer or disk first? I still cannot find anything about this. Regards, Vincentius Martin -- Best Regards. --- Xing FENG PhD Candidate Database Research Group School of Computer Science and Engineering University of New South Wales NSW 2052, Sydney Phone: (+61) 413 857 288
Re: [VOTE] Apache Giraph 1.1.0 RC1
Yes, I did re-run the build this weekend, and it built succesfully for the default profile and the hadoop_2 one. I ran a couple of examples on the cluster, and it ran succesfully. I'm +1. On Tue, Nov 4, 2014 at 8:10 PM, Roman Shaposhnik ro...@shaposhnik.org wrote: On Tue, Nov 4, 2014 at 5:47 AM, Claudio Martella claudio.marte...@gmail.com wrote: I am indeed having some problems. mvn install will fail because the test is opening too many files: [snip] I have to investigate why this happens. I'm not using a different ulimit than what I have on my Mac OS X by default. Where are you building yours? This is really weird. I have not issues whatsoever on Mac OS X with the following setup: $ uname -a Darwin usxxshaporm1.corp.emc.com 12.4.1 Darwin Kernel Version 12.4.1: Tue May 21 17:04:50 PDT 2013; root:xnu-2050.40.51~1/RELEASE_X86_64 x86_64 $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 2560 pipe size(512 bytes, -p) 1 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 709 virtual memory (kbytes, -v) unlimited $ mvn --version Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-11T13:58:10-07:00) Maven home: /Users/shapor/dist/apache-maven-3.2.3 Java version: 1.7.0_51, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.8.4, arch: x86_64, family: mac Thanks, Roman. -- Claudio Martella
Enabling Giraph Level Loggin - Hadoop-2.2.0
Hi, I am running Apache Giraph 1.1.0 in Hadoop 2.2.0 as an mapreduce application. But I could not find the Giraph logs. It will be great if someone could tell me how to enable Apache giraph logging. Also, I see that group collects very detailed runtime statistics, how can I collect those stats? Thanks, Charith -- Charith Dhanushka Wickramaarachchi Tel +1 213 447 4253 Web http://apache.org/~charith http://www-scf.usc.edu/~cwickram/ http://charith.wickramaarachchi.org/ Blog http://charith.wickramaarachchi.org/ http://charithwiki.blogspot.com/ Twitter @charithwiki https://twitter.com/charithwiki This communication may contain privileged or other confidential information and is intended exclusively for the addressee/s. If you are not the intended recipient/s, or believe that you may have received this communication in error, please reply to the sender indicating that fact and delete the copy you received and in addition, you should not print, copy, retransmit, disseminate, or otherwise use the information contained in this communication. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. The sender does not accept liability for any errors or omissions