Exception with Large Graphs

2013-08-29 Thread Yasser Altowim
Hi,

 I am implementing an algorithm using Giraph, and I was able to run my 
algorithm on relatively small datasets (64,000,000 vertices and 128,000,000 
edges). However, when I increase the size of the dataset to 128,000,000 
vertices and 256,000,000 edges, the job takes so much time to load the 
vertices, and then it gives me the following exception.

I have tried to increase the heap size and the task timeout value in 
the mapred-site.xml configuration file, and even vary the number of workers 
from 1 to 10, but still getting the same exceptions. I have a cluster of 10 
nodes, and each node has  a 4G of ram.  Thanks in advance.

2013-08-29 10:22:53,150 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Future result not ready yet java.util.concurrent.FutureTask@1a129460
2013-08-29 10:22:53,151 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
2013-08-29 10:23:07,938 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 7769685 vertices at 14250.953615591572 vertices/sec 15539370 edges at 
28500.77593053654 edges/sec Memory (free/total/max) = 680.21M / 3207.44M / 
3555.56M
2013-08-29 10:23:14,538 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 8019685 vertices at 14533.557468366102 vertices/sec 16039370 edges at 
29065.97491865343 edges/sec Memory (free/total/max) = 906.80M / 3242.75M / 
3555.56M
2013-08-29 10:23:21,888 INFO org.apache.giraph.worker.InputSplitsCallable: 
loadFromInputSplit: Finished loading 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/9 (v=1212852, e=2425704)
2013-08-29 10:23:37,911 INFO org.apache.giraph.worker.InputSplitsHandler: 
reserveInputSplit: Reserved input split path 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19, overall roughly 
7.518797% input splits reserved
2013-08-29 10:23:37,923 INFO org.apache.giraph.worker.InputSplitsCallable: 
getInputSplit: Reserved 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19 from ZooKeeper and 
got input split 
'org.apache.giraph.io.formats.multi.InputSplitWithInputFormatIndex@24004559'
2013-08-29 10:23:44,313 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 8482537 vertices at 14585.340134636266 vertices/sec 16965074 edges at 
29169.59449002283 edges/sec Memory (free/total/max) = 538.93M / 3186.13M / 
3555.56M
2013-08-29 10:23:49,963 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 8732537 vertices at 14870.726503632277 vertices/sec 17465074 edges at 
29740.356341344923 edges/sec Memory (free/total/max) = 489.84M / 3222.56M / 
3555.56M
2013-08-29 10:34:28,371 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Future result not ready yet java.util.concurrent.FutureTask@1a129460
2013-08-29 10:34:34,847 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
2013-08-29 10:34:34,850 INFO 
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window 
metrics MBytes/sec sent = 0, MBytes/sec received = 0.0161, MBytesSent = 0.0002, 
MBytesReceived = 12.3175, ave sent req MBytes = 0, ave received req MBytes = 
0.0587, secs waited = 765.881
2013-08-29 10:34:35,698 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 649805ms for sessionid 
0x140cb1140540006, closing socket connection and attempting reconnect
2013-08-29 10:34:42,471 WARN org.apache.giraph.bsp.BspService: process: 
Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent 
state:Disconnected type:None path:null
2013-08-29 10:34:42,472 WARN org.apache.giraph.worker.InputSplitsHandler: 
process: Problem with zookeeper, got event with path null, state Disconnected, 
event type None
2013-08-29 10:34:43,819 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server slave5.ericsson-magic.net/10.126.72.165:22181
2013-08-29 10:34:44,077 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to slave5.ericsson-magic.net/10.126.72.165:22181, initiating session
2013-08-29 10:34:44,220 WARN org.apache.giraph.bsp.BspService: process: Got 
unknown null path event WatchedEvent state:Expired type:None path:null
2013-08-29 10:34:44,220 WARN org.apache.giraph.worker.InputSplitsHandler: 
process: Problem with zookeeper, got event with path null, state Expired, event 
type None
2013-08-29 10:34:44,221 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down
2013-08-29 10:34:44,240 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0x140cb1140540006 has expired, closing 
socket connection
2013-08-29 10:35:35,442 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Future result not ready yet java.util.concurrent.FutureTask@1a129460
2013-08-29 10:35:35,443 INFO 

Passing Custom Arguments for giraph.zkList

2013-08-29 Thread Ramani, Arun
Hi,

I am trying to pass a zookeeper quorum to my giraph job and it throws the 
following exception:

13/08/29 13:14:38 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.
13/08/29 13:14:38 INFO utils.ConfigurationUtils: No output format specified. 
Ensure your OutputFormat does not require one.
13/08/29 13:14:38 INFO utils.ConfigurationUtils: Setting custom argument 
[giraph.zkList] to zk1 in GiraphConfiguration
Exception in thread main java.lang.IllegalArgumentException: Unable to parse 
custom  argument: zk2:port
at 
org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:288)
at 
org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at 
com.paypal.risk.rd.giraph.AccountPropagation.run(AccountPropagation.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
com.paypal.risk.rd.giraph.AccountPropagation.main(AccountPropagation.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

I pass the zklist like this:

Hadoop jar GRAPH.jar CLASSNAME -vip CLASS NAME -vif CLASS NAME -wc CLASS 
NAME -w worker number -ca 
giraph.zkList=zk1:port,zk2:port,zk3:port,zk4:port,zk5:port

Please suggest what is wrong with this invocation.

Thanks
Arun Ramani


Re: Passing Custom Arguments for giraph.zkList

2013-08-29 Thread Claudio Martella
zk1 is supposed to be a hostname.


On Thu, Aug 29, 2013 at 11:05 PM, Ramani, Arun aram...@paypal.com wrote:

  Hi,

  I am trying to pass a zookeeper quorum to my giraph job and it throws
 the following exception:

  13/08/29 13:14:38 INFO utils.ConfigurationUtils: No edge input format
 specified. Ensure your InputFormat does not require one.
 13/08/29 13:14:38 INFO utils.ConfigurationUtils: No output format
 specified. Ensure your OutputFormat does not require one.
 13/08/29 13:14:38 INFO utils.ConfigurationUtils: Setting custom argument
 [giraph.zkList] to zk1 in GiraphConfiguration
 Exception in thread main java.lang.IllegalArgumentException: Unable to
 parse custom  argument: zk2:port
 at
 org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:288)
 at
 org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
 at
 com.paypal.risk.rd.giraph.AccountPropagation.run(AccountPropagation.java:46)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at
 com.paypal.risk.rd.giraph.AccountPropagation.main(AccountPropagation.java:98)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

  I pass the zklist like this:

  Hadoop jar GRAPH.jar CLASSNAME -vip CLASS NAME -vif CLASS NAME -wc
 CLASS NAME -w worker number -ca
 giraph.zkList=zk1:port,zk2:port,zk3:port,zk4:port,zk5:port

  Please suggest what is wrong with this invocation.

  Thanks
 Arun Ramani




-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: Passing Custom Arguments for giraph.zkList

2013-08-29 Thread Ramani, Arun
Hi Claudio,

Yes zk1, zk2, zk3, zk4 and zk5 are all zookeeper hostnames. These 5 hosts make 
a zookeeper quorum. Please let me know how to pass this.

Thanks
Arun Ramani

From: Claudio Martella 
claudio.marte...@gmail.commailto:claudio.marte...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, August 29, 2013 2:18 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Passing Custom Arguments for giraph.zkList

zk1 is supposed to be a hostname.


On Thu, Aug 29, 2013 at 11:05 PM, Ramani, Arun 
aram...@paypal.commailto:aram...@paypal.com wrote:
Hi,

I am trying to pass a zookeeper quorum to my giraph job and it throws the 
following exception:

13/08/29 13:14:38 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.
13/08/29 13:14:38 INFO utils.ConfigurationUtils: No output format specified. 
Ensure your OutputFormat does not require one.
13/08/29 13:14:38 INFO utils.ConfigurationUtils: Setting custom argument 
[giraph.zkList] to zk1 in GiraphConfiguration
Exception in thread main java.lang.IllegalArgumentException: Unable to parse 
custom  argument: zk2:port
at 
org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:288)
at 
org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at 
com.paypal.risk.rd.giraph.AccountPropagation.run(AccountPropagation.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
com.paypal.risk.rd.giraph.AccountPropagation.main(AccountPropagation.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

I pass the zklist like this:

Hadoop jar GRAPH.jar CLASSNAME -vip CLASS NAME -vif CLASS NAME -wc CLASS 
NAME -w worker number -ca 
giraph.zkList=zk1:port,zk2:port,zk3:port,zk4:port,zk5:port

Please suggest what is wrong with this invocation.

Thanks
Arun Ramani



--
   Claudio Martella
   claudio.marte...@gmail.commailto:claudio.marte...@gmail.com


Re: Passing Custom Arguments for giraph.zkList

2013-08-29 Thread Claudio Martella
the problem is not the format of the string, but the way you're passing it.
Try passing it as -D giraph.zkList=... before the giraphrunner options.
that should work.


On Thu, Aug 29, 2013 at 11:47 PM, Ramani, Arun aram...@paypal.com wrote:

  Hi Claudio,

  Yes zk1, zk2, zk3, zk4 and zk5 are all zookeeper hostnames. These 5
 hosts make a zookeeper quorum. Please let me know how to pass this.

  Thanks
 Arun Ramani

   From: Claudio Martella claudio.marte...@gmail.com
 Reply-To: user@giraph.apache.org user@giraph.apache.org
 Date: Thursday, August 29, 2013 2:18 PM
 To: user@giraph.apache.org user@giraph.apache.org
 Subject: Re: Passing Custom Arguments for giraph.zkList

   zk1 is supposed to be a hostname.


 On Thu, Aug 29, 2013 at 11:05 PM, Ramani, Arun aram...@paypal.com wrote:

  Hi,

  I am trying to pass a zookeeper quorum to my giraph job and it throws
 the following exception:

  13/08/29 13:14:38 INFO utils.ConfigurationUtils: No edge input format
 specified. Ensure your InputFormat does not require one.
 13/08/29 13:14:38 INFO utils.ConfigurationUtils: No output format
 specified. Ensure your OutputFormat does not require one.
 13/08/29 13:14:38 INFO utils.ConfigurationUtils: Setting custom argument
 [giraph.zkList] to zk1 in GiraphConfiguration
 Exception in thread main java.lang.IllegalArgumentException: Unable to
 parse custom  argument: zk2:port
 at
 org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:288)
 at
 org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
 at
 com.paypal.risk.rd.giraph.AccountPropagation.run(AccountPropagation.java:46)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at
 com.paypal.risk.rd.giraph.AccountPropagation.main(AccountPropagation.java:98)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

  I pass the zklist like this:

  Hadoop jar GRAPH.jar CLASSNAME -vip CLASS NAME -vif CLASS NAME
 -wc CLASS NAME -w worker number -ca
 giraph.zkList=zk1:port,zk2:port,zk3:port,zk4:port,zk5:port

  Please suggest what is wrong with this invocation.

  Thanks
  Arun Ramani




  --
Claudio Martella
claudio.marte...@gmail.com




-- 
   Claudio Martella
   claudio.marte...@gmail.com