Re: Error when using ArrayListWritableText as message

2013-10-17 Thread Simon McGloin
Thanks for the advice Manuel. I created a TextArrayListMessage object to
use as a message between supersteps. I had to include a line to wipe the
ArrayList during the readFields method or else during the next superstep
the array would have unwanted Text objects in it. It seemed like the
TextArrayListMessage gets reused and the readFields just keeps adding to
the textArrayList.  e.g.

@Override
public void readFields(DataInput in) throws IOException {
int numFields = in.readInt();
textArrayList.clear(); // Have to clear the list or get unexpected results
for(int i = 0; i  numFields; i++) {
Text t = new Text(WritableUtils.readCompressedByteArray(in));
textArrayList.add(t);
}
}


On Wed, Oct 16, 2013 at 7:21 PM, Manuel Lagang manuellag...@gmail.comwrote:

 I think you need to have your message value class as TextArrayListMessage
 instead of ArrayListWritableText. That might require you to move
 TextArrayListMessage outside of ArrayListTextBug.


 On Wed, Oct 16, 2013 at 10:01 AM, Simon McGloin simonmcgl...@gmail.comwrote:

 Hey Guys,

 I've only been using Giraph a few days so am very new to it. I'm currently 
 using Giraph 1.0.0. I'm getting the error below when I try to send an 
 ArrayListWritableText message. The error happens between supersteps. If 
 you run the sample code I've included Superstep 1 never gets printed as 
 the job fails after Superstep 0. Is this a bug or am I doing something 
 wrong. In my full code I need to be able to send a list of Text based vertex 
 ids between supersteps. Should I not be using org.apache.hadoop.io.Text and 
 implement my own writable object?

 Any help is appreciated.

 Regards,

 Simon


 Caused by: java.util.concurrent.ExecutionException: 
 java.lang.IllegalArgumentException: createMessageValue: Failed to instantiate
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
  at java.util.concurrent.FutureTask.get(FutureTask.java:91)
  at 
 org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
  at 
 org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
  ... 13 more
 Caused by: java.lang.IllegalArgumentException: createMessageValue: Failed to 
 instantiate
  at 
 org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createMessageValue(ImmutableClassesGiraphConfiguration.java:581)
  at 
 org.apache.giraph.utils.ByteArrayVertexIdMessages.createData(ByteArrayVertexIdMessages.java:66)
  at 
 org.apache.giraph.utils.ByteArrayVertexIdMessages.createData(ByteArrayVertexIdMessages.java:34)
  at 
 org.apache.giraph.utils.ByteArrayVertexIdData$VertexIdDataIterator.next(ByteArrayVertexIdData.java:205)
  at 
 org.apache.giraph.comm.messages.ByteArrayMessagesPerVertexStore.addPartitionMessages(ByteArrayMessagesPerVertexStore.java:116)
  at 
 org.apache.giraph.comm.requests.SendWorkerMessagesRequest.doRequest(SendWorkerMessagesRequest.java:72)
  at 
 org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
  at 
 org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:419)
  at 
 org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:193)
  at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)
  at 
 org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)




 package com.adaptivemobile.tarantula.batchlayer.giraph.run;

 import java.io.IOException;

 import org.apache.giraph.graph.Vertex;
 import org.apache.giraph.utils.ArrayListWritable;
 import org.apache.hadoop.io.NullWritable;
 import org.apache.hadoop.io.Text;

 public class ArrayListTextBug extends VertexText, NullWritable, 
 NullWritable, ArrayListWritableText{

  @Override
  public void compute(IterableArrayListWritableText messages) throws 
 IOException {
  
  if (getSuperstep() == 0) {
  System.out.println(\nSUPERSTEP 0 -  + getId() + 
 \n--);
  TextArrayListMessage initialMessage = new 
 TextArrayListMessage();
  initialMessage.add(getId());
  this.sendMessageToAllEdges(initialMessage); 
  System.out.println(Vertex  + getId() +  sends 
 TextArrayListMessage to  + getNumEdges() +  edges);
  }
  if (getSuperstep() == 1) {
  System.out.println(\nSUPERSTEP 1 -  + getId() + 
 \n--);
  }
  }
  

Re: how to use out of core options

2013-10-17 Thread Jianqiang Ou
Thanks very much, so are you saying if I use Dgiraph.maxPartitionsInMemory
and Dgiraph.maxMessagesInMemory to make them both smaller number, then it
might work?

Thanks again,
Jian


On Thu, Oct 17, 2013 at 12:56 AM, Jyotirmoy Sundi sundi...@gmail.comwrote:

 You need to tune it per your cluster. This is what mentioned in the docs:
 *It is difficult to decide a general policy to use out-of-core
 capabilities*, as it depends on the behavior of the algorithm and the
 input graph. The exact number of partitions and messages to keep in memory
 depends on the cluster capabilities, the number of messages produced per
 superstep, and number of active vertices per superstep. Moreover, it
 depends on the type and size of vertex values and messages. For example,
 algorithms such as Belief Propagation tend to keep large vertex values,
 while algorithms such as clique computations tend to send large messages
 along. Hence, it depends on your algorithm what feature to rely on more.

 Thanks
 Sundi


 On Wed, Oct 16, 2013 at 9:41 PM, Jianqiang Ou oujianqiang...@gmail.comwrote:

 Hi Sundi,

 I just tried your method, but somehow the job failed, the attached is the
 history of the job. and it was good without the outofcore options. Do you
 have any clue why is that?

 The command I used to run the program is below:

 $HADOOP_HOME/bin/hadoop jar
 $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true
 org.apache.giraph.examples.SimplePageRankComputation -vif
 org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
 -vip /user/andy/input/tiny_graph.txt -vof
 org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
 /user/andy/output/page3 -w 3 -mc
 org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute

 Many thanks,

 Jianqiang

 On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou 
 oujianqiang...@gmail.comwrote:

 got it, thank you very much!


 On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi sundi...@gmail.comwrote:

 Put it as -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true  after GiraphRuuner
 like
 hadoop jar girap.jar org.apache.giraph.GiraphRunner 
 -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true ...




 On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou oujianqiang...@gmail.com
  wrote:

 Hi I have a question about the out of core giraph. It is said that, in
 order to use disk to store the partions, we need to use 
 giraph.useOutOfCoreGraph=true, but where should I put this statement
 to?

 BTW, I am just trying to use the pagerank or shortestpath example to
 test the out of core performance of my cluster.

 Thanks very much,
 Jian




 --
 Best Regards,
 Jyotirmoy Sundi
 Data Engineer,
 Admobius

 San Francisco, CA 94158



 On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou 
 oujianqiang...@gmail.comwrote:

 got it, thank you very much!


 On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi sundi...@gmail.comwrote:

 Put it as -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true  after GiraphRuuner
 like
 hadoop jar girap.jar org.apache.giraph.GiraphRunner 
 -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true ...




 On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou oujianqiang...@gmail.com
  wrote:

 Hi I have a question about the out of core giraph. It is said that, in
 order to use disk to store the partions, we need to use 
 giraph.useOutOfCoreGraph=true, but where should I put this statement
 to?

 BTW, I am just trying to use the pagerank or shortestpath example to
 test the out of core performance of my cluster.

 Thanks very much,
 Jian




 --
 Best Regards,
 Jyotirmoy Sundi
 Data Engineer,
 Admobius

 San Francisco, CA 94158






 --
 Best Regards,
 Jyotirmoy Sundi
 Data Engineer,
 Admobius

 San Francisco, CA 94158



Re: How to specify parameters in order to run giraph job in parallel

2013-10-17 Thread Claudio Martella
It actually depends on the setup of your cluster.

Ideally, with 15 nodes (tasktrackers) you'd want 1 mapper slot per node
(ideally to run giraph), so that you would have 14 workers, one per
computing node, plus one for master+zookeeper. Once that is reached, you
would have a number of compute threads equals to the number of threads that
you can run on each node (24 in your case).

Does this make sense to you?


On Thu, Oct 17, 2013 at 5:04 PM, Yi Lu luyi0...@gmail.com wrote:

 Hi,

 I have a computer cluster consisting of 15 slave machines and 1 master
 machine.

 On each slave machine, there are two Xeon E5-2620 CPUs. With the help of
 HT, there are 24 threads.

 I am wondering how to specify parameters in order to run giraph job in
 parallel on my cluster.

 I am using the following parameters to run a pagerank algorithm.

 hadoop jar ~/giraph-examples.jar org.apache.giraph.GiraphRunner
 SimplePageRank -vif PageRankInputFormat -vip /input -vof
 PageRankOutputFormat -op /pagerank -w 1 -mc
 SimplePageRank\$SimplePageRankMasterCompute -wc
 SimplePageRank\$SimplePageRankWorkerContext

 In particular,

 1)I know I can use “-w” to specify the number of workers. In my opinion,
 the number of workers equals to the number of mappers in hadoop except
 zookeeper. Therefore, in my case(15 slave machine), which number should be
 chosen? Is 15 a good choice? Since, I find if I input a large number, e.g.
 100, the mappers will hang.

 2)I know I can use “-Dgiraph.numComputeThreads=1” to specify vertex
 computing thread number. However, if I specify it to 10, the total runtime
 is much longer than default. I think the default is 1, which is found in
 the source code. I wonder if I want to use this parameter, which number
 should be chosen.

 3)When the giraph job is running, I use “top” command to monitor my cpu
 usage on slave machines. I find that the java process can use 200%-300% cpu
 resource. However, if I change the number of vertex computing threads to
 10, the java process can use 800% cpu resource. I think it is not a linear
 relation and I want to know why.


 Thanks for your help.

 Best,

 -Yi




-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: knowing about the vertex id of the sender of the message.

2013-10-17 Thread Claudio Martella
No, you'll have to add it to the message data.


On Thu, Oct 17, 2013 at 6:10 PM, Jyoti Yadav rao.jyoti26ya...@gmail.comwrote:

 Hi..
 In vertex computation code,at the start of the superstep every vertex
 processes its received messages.. Is there any way for the vertex to know
 who is the sender of the message it is currenty processing.?

 Thanks
 Jyoti




-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: how to use out of core options

2013-10-17 Thread Jyotirmoy Sundi
apart from these you might also want to check permissions of the dir path
where offloading of vertices and messages happen.
Ideally giraph is not meant for out-of-core if you graph is much bigger
then the cluster can handle in memory, using giraph defeats the purpose in
this case.



On Thu, Oct 17, 2013 at 8:13 AM, Jianqiang Ou oujianqiang...@gmail.comwrote:

 Thanks very much, so are you saying if I use Dgiraph.maxPartitionsInMemory
 and Dgiraph.maxMessagesInMemory to make them both smaller number, then it
 might work?

 Thanks again,
 Jian


 On Thu, Oct 17, 2013 at 12:56 AM, Jyotirmoy Sundi sundi...@gmail.comwrote:

 You need to tune it per your cluster. This is what mentioned in the docs:
 *It is difficult to decide a general policy to use out-of-core
 capabilities*, as it depends on the behavior of the algorithm and the
 input graph. The exact number of partitions and messages to keep in memory
 depends on the cluster capabilities, the number of messages produced per
 superstep, and number of active vertices per superstep. Moreover, it
 depends on the type and size of vertex values and messages. For example,
 algorithms such as Belief Propagation tend to keep large vertex values,
 while algorithms such as clique computations tend to send large messages
 along. Hence, it depends on your algorithm what feature to rely on more.

 Thanks
  Sundi


 On Wed, Oct 16, 2013 at 9:41 PM, Jianqiang Ou 
 oujianqiang...@gmail.comwrote:

 Hi Sundi,

 I just tried your method, but somehow the job failed, the attached is
 the history of the job. and it was good without the outofcore options. Do
 you have any clue why is that?

 The command I used to run the program is below:

 $HADOOP_HOME/bin/hadoop jar
 $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true
 org.apache.giraph.examples.SimplePageRankComputation -vif
 org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
 -vip /user/andy/input/tiny_graph.txt -vof
 org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
 /user/andy/output/page3 -w 3 -mc
 org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute

 Many thanks,

 Jianqiang

 On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou oujianqiang...@gmail.com
  wrote:

 got it, thank you very much!


 On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi 
 sundi...@gmail.comwrote:

 Put it as -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true  after GiraphRuuner
 like
 hadoop jar girap.jar org.apache.giraph.GiraphRunner 
 -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true ...




 On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou 
 oujianqiang...@gmail.com wrote:

 Hi I have a question about the out of core giraph. It is said that,
 in order to use disk to store the partions, we need to use 
 giraph.useOutOfCoreGraph=true, but where should I put this
 statement to?

 BTW, I am just trying to use the pagerank or shortestpath example to
 test the out of core performance of my cluster.

 Thanks very much,
 Jian




 --
 Best Regards,
 Jyotirmoy Sundi
 Data Engineer,
 Admobius

 San Francisco, CA 94158



 On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou oujianqiang...@gmail.com
  wrote:

 got it, thank you very much!


 On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi 
 sundi...@gmail.comwrote:

 Put it as -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true  after GiraphRuuner
 like
 hadoop jar girap.jar org.apache.giraph.GiraphRunner 
 -Dgiraph.useOutOfCoreMessages=true
 -Dgiraph.useOutOfCoreGraph=true ...




 On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou 
 oujianqiang...@gmail.com wrote:

 Hi I have a question about the out of core giraph. It is said that,
 in order to use disk to store the partions, we need to use 
 giraph.useOutOfCoreGraph=true, but where should I put this
 statement to?

 BTW, I am just trying to use the pagerank or shortestpath example to
 test the out of core performance of my cluster.

 Thanks very much,
 Jian




 --
 Best Regards,
 Jyotirmoy Sundi
 Data Engineer,
 Admobius

 San Francisco, CA 94158






 --
 Best Regards,
 Jyotirmoy Sundi
 Data Engineer,
 Admobius

 San Francisco, CA 94158





-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158