RE: Exception with Large Graphs

2013-09-03 Thread Yasser Altowim
Hi Avery,

Thanks for your response. The data I am loading is almost 9 GB, and I 
have 10 nodes, each has a 4G of ram.

Best,
Yasser

From: Avery Ching [mailto:ach...@apache.org]
Sent: Friday, August 30, 2013 4:43 PM
To: user@giraph.apache.org
Subject: Re: Exception with Large Graphs

That error is from the master dying (likely due to the results of another 
worker dying).  Can you do a rough calculation of the size of data that you 
expect to be loaded and check if the memory is enough?

On 8/30/13 11:19 AM, Yasser Altowim wrote:
Guys,

   Can someone please help me with this issue? Thanks.

Best,
Yasser

From: Yasser Altowim
Sent: Thursday, August 29, 2013 11:16 AM
To: user@giraph.apache.org
Subject: Exception with Large Graphs

Hi,

 I am implementing an algorithm using Giraph, and I was able to run my 
algorithm on relatively small datasets (64,000,000 vertices and 128,000,000 
edges). However, when I increase the size of the dataset to 128,000,000 
vertices and 256,000,000 edges, the job takes so much time to load the 
vertices, and then it gives me the following exception.

I have tried to increase the heap size and the task timeout value in 
the mapred-site.xml configuration file, and even vary the number of workers 
from 1 to 10, but still getting the same exceptions. I have a cluster of 10 
nodes, and each node has  a 4G of ram.  Thanks in advance.

2013-08-29 10:22:53,150 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Future result not ready yet 
java.util.concurrent.FutureTask@1a129460
2013-08-29 10:22:53,151 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
2013-08-29 10:23:07,938 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 7769685 vertices at 14250.953615591572 vertices/sec 15539370 edges at 
28500.77593053654 edges/sec Memory (free/total/max) = 680.21M / 3207.44M / 
3555.56M
2013-08-29 10:23:14,538 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 8019685 vertices at 14533.557468366102 vertices/sec 16039370 edges at 
29065.97491865343 edges/sec Memory (free/total/max) = 906.80M / 3242.75M / 
3555.56M
2013-08-29 10:23:21,888 INFO org.apache.giraph.worker.InputSplitsCallable: 
loadFromInputSplit: Finished loading 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/9 (v=1212852, e=2425704)
2013-08-29 10:23:37,911 INFO org.apache.giraph.worker.InputSplitsHandler: 
reserveInputSplit: Reserved input split path 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19, overall roughly 
7.518797% input splits reserved
2013-08-29 10:23:37,923 INFO org.apache.giraph.worker.InputSplitsCallable: 
getInputSplit: Reserved 
/_hadoopBsp/job_201308290837_0003/_vertexInputSplitDir/19 from ZooKeeper and 
got input split 
'org.apache.giraph.io.formats.multi.InputSplitWithInputFormatIndex@24004559'
2013-08-29 10:23:44,313 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 8482537 vertices at 14585.340134636266 vertices/sec 16965074 edges at 
29169.59449002283 edges/sec Memory (free/total/max) = 538.93M / 3186.13M / 
3555.56M
2013-08-29 10:23:49,963 INFO 
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: 
Loaded 8732537 vertices at 14870.726503632277 vertices/sec 17465074 edges at 
29740.356341344923 edges/sec Memory (free/total/max) = 489.84M / 3222.56M / 
3555.56M
2013-08-29 10:34:28,371 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Future result not ready yet 
java.util.concurrent.FutureTask@1a129460
2013-08-29 10:34:34,847 INFO org.apache.giraph.utils.ProgressableUtils: 
waitFor: Waiting for 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@30d320e4
2013-08-29 10:34:34,850 INFO 
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window 
metrics MBytes/sec sent = 0, MBytes/sec received = 0.0161, MBytesSent = 0.0002, 
MBytesReceived = 12.3175, ave sent req MBytes = 0, ave received req MBytes = 
0.0587, secs waited = 765.881
2013-08-29 10:34:35,698 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 649805ms for sessionid 
0x140cb1140540006, closing socket connection and attempting reconnect
2013-08-29 10:34:42,471 WARN org.apache.giraph.bsp.BspService: process: 
Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent 
state:Disconnected type:None path:null
2013-08-29 10:34:42,472 WARN org.apache.giraph.worker.InputSplitsHandler: 
process: Problem with zookeeper, got event with path null, state Disconnected, 
event type None
2013-08-29 10:34:43,819 INFO org.apache.zookeeper

Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-09-03 Thread Jeff Peters
Thank you Lukas!!! That's EXACTLY the kind of model I was building in my
head over the weekend about why this might be happening, and why increasing
the number of AWS instances (and workers) does not solve the problem
without increasing each worker's VM. Surely Facebook can't be using it like
this if, as described, they have billions of vertices and a trillion edges.
So do you, or Avery, have any idea how you might initialize this is a more
reasonable way, and how???


On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec <
lukas.naleze...@firma.seznam.cz> wrote:

>  Hi
>
> I wasted few days on similar problem.
>
> I guess the problem was that during loading - if you have got W workers
> and W^2 partitions there are W^2 partition caches in each worker.
> Each cache can hold 10 000 vertexes by default.
> I had 26 000 000 vertexes, 60 workers -> 3600 partitions. It means that
> there can be up to 36 000 000 vertexes in caches in each worker if input
> files are random.
> Workers were assigned 450 000 vertexes but failed when they had 900 000
> vertexes in memory.
>
> Btw: Why default number of partitions is W^2 ?
>
> (I can be wrong)
> Lukas
>
>
>
> On 08/31/13 01:54, Avery Ching wrote:
>
> Ah, the new caches. =)  These make things a lot faster (bulk data
> sending), but do take up some additional memory.  if you look at
> GiraphConstants, you can find ways to change the cache sizes (this will
> reduce that memory usage).
> For example, MAX_EDGE_REQUEST_SIZE will affect the size of the edge
> cache.  MAX_MSG_REQUEST_SIZE will affect the size of the message cache.
> The caches are per worker, so 100 workers would require 50 MB  per worker
> by default.  Feel free to trim it if you like.
>
> The byte arrays for the edges are the most efficient storage possible
> (although not as performance as the native edge stores).
>
> Hope that helps,
>
> Avery
>
> On 8/29/13 4:53 PM, Jeff Peters wrote:
>
> Avery, it would seem that optimizations to Giraph have, unfortunately,
> turned the majority of the heap into "dark matter". The two snapshots are
> at unknown points in a superstep but I waited for several supersteps so
> that the activity had more or less stabilized. About the only thing
> comparable between the two snapshots are the vertexes, 192561 X
> "RecsVertex" in the new version and 191995 X "Coloring" in the old system.
> But with the new Giraph 672710176 out of 824886184 bytes are stored as
> primitive byte arrays. That's probably indicative of some very fine
> performance optimization work, but it makes it extremely difficult to know
> what's really out there, and why. I did notice that a number of caches have
> appeared that did not exist before,
> namely SendEdgeCache, SendPartitionCache, SendMessageCache
> and SendMutationsCache.
>
>  Could any of those account for a larger per-worker footprint in a modern
> Giraph? Should I simply assume that I need to force AWS to configure its
> EMR Hadoop so that each instance has fewer map tasks but with a somewhat
> larger VM max, say 3GB instead of 2GB?
>
>
> On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching  wrote:
>
>> Try dumping a histogram of memory usage from a running JVM and see where
>> the memory is going.  I can't think of anything in particular that
>> changed...
>>
>>
>> On 8/28/13 4:39 PM, Jeff Peters wrote:
>>
>>>
>>> I am tasked with updating our ancient (circa 7/10/2012) Giraph to
>>> giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now runs
>>> out of memory using the same AWS elastic-mapreduce configuration we have
>>> always used. I have never tried to configure either Giraph or the AWS
>>> Hadoop. We build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS
>>> provides us. The 8 X m2.4xlarge cluster we use seems to provide 8*14=112
>>> map tasks fitted out with 2GB heap each. Our code is completely unchanged
>>> except as required to adapt to the new Giraph APIs. Our vertex, edge, and
>>> message data are completely unchanged. On smaller jobs, that work, the
>>> aggregate heap usage high-water mark seems about the same as before, but
>>> the "committed heap" seems to run higher. I can't even make it work on a
>>> cluster of 12. In that case I get one map task that seems to end up with
>>> nearly twice as many messages as most of the others so it runs out of
>>> memory anyway. It only takes one to fail the job. Am I missing something
>>> here? Should I be configuring my new Giraph in some way I didn't used to
>>> need to with the old one?
>>>
>>>
>>
>
>
>


Benchmarking Giraph

2013-09-03 Thread Alok Kumbhare
Hi,
We are looking into some giraph benchmarks to compare against a similar
programming model and framework we are working on.

As a start we are planning to benchmark the following algorithms on data
sets with more than a billion edges.

1. Single Source Shortest Path from a given source
2. Page Rank
3. Connected Components

We have a small cluster of 16 nodes (8 core/16 gb each) to run the
benchmarks. Given that we have a few questions to help us get the best out
of giraph.

1. Which version of giraph should we use to take advantage of the
optimizations in terms of memory optimization/caching, multi-threading etc.
mentioned here
https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920?
1.0 or trunk?

2. Are the samples present in the giraph distribution for the above
algorithms a good place to start? How can we take advantage of different
optimizations, including aggregators/combiners for these algorithms?

3. Is there a document i can look at to understand the best practices for
implementing optimized vertex-centric code using the latest features and
deployment guidelines to maximize utilization.

Looking forward to your help.

Thanks,
Alok Kumbhare


Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-03 Thread Claudio Martella
try with the attached patch applied to trunk, without the mentioned -D
giraph.zkManagerDirectory.


On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:

> Hi Claudio,
>
> I tried this but it made no difference. The map tasks still fail,
> still no output, and still an
> exception in the log files - FileNotFoundException: File
> /tmp/giraph/_zkServer does not exist.
>
> [root@localhost giraph]# hadoop jar
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
>   org.apache.giraph.GiraphRunner
>  -Dgiraph.zkManagerDirectory='/tmp/giraph/'
> org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/root/input/tiny_graph.txt -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/root/output/shortestpaths -w 1
> 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_0039
> 13/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 0%
> 13/09/03 14:20:12 INFO mapred.JobClient: Job complete:
> job_201308291126_0039
> 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6
> 13/09/03 14:20:12 INFO mapred.JobClient:   Job Counters
> 13/09/03 14:20:12 INFO mapred.JobClient: Failed map tasks=1
> 13/09/03 14:20:12 INFO mapred.JobClient: Launched map tasks=2
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
> in occupied slots (ms)=16327
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all
> reduces in occupied slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0
> [root@localhost giraph]#
>
>
> When I try to run Zookeeper it still gives me an 'Address already in use'
> exception.
>
> [root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh
> start-foreground
> JMX enabled by default
> Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,882 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] -
> Invalid configuration, only one server specified (ignoring)
> 2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@78] -
> autopurge.snapRetainCount set to 3
> 2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@79] -
> autopurge.purgeInterval set to 0
> 2013-09-03 14:23:37,890 [myid:] - INFO  [main:DatadirCleanupManager@101]
> - Purge task is not scheduled.
> 2013-09-03 14:23:37,890 [myid:] - WARN  [main:QuorumPeerMain@118] -
> Either no config or no quorum defined in config, running  in standalone mode
> 2013-09-03 14:23:37,904 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] -
> Invalid configuration, only one server specified (ignoring)
> 2013-09-03 14:23:37,905 [myid:] - INFO  [main:ZooKeeperServerMain@100] -
> Starting server
> 2013-09-03 14:23:37,920 [myid:] - INFO  [main:Environment@100] - Server
> environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34
> GMT
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:host.name=localhost.localdomain
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.version=1.6.0_31
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.vendor=Sun Microsystems Inc.
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.home=/usr/java/jdk1.6.0_31/jre
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/

RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-03 Thread Ken Williams
Hi Claudio,
I tried this but it made no difference. The map tasks still fail, still no 
output, and still anexception in the log files - FileNotFoundException: File 
/tmp/giraph/_zkServer does not exist.
[root@localhost giraph]# hadoop jar 
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
   org.apache.giraph.GiraphRunner  -Dgiraph.zkManagerDirectory='/tmp/giraph/'   
  org.apache.giraph.examples.SimpleShortestPathsVertex  -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/user/root/input/tiny_graph.txt -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/user/root/output/shortestpaths -w 1 13/09/03 14:19:58 INFO 
utils.ConfigurationUtils: No edge input format specified. Ensure your 
InputFormat does not require one.13/09/03 14:19:58 WARN 
job.GiraphConfigurationValidator: Output format vertex index type is not 
known13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format 
vertex value type is not known13/09/03 14:19:58 WARN 
job.GiraphConfigurationValidator: Output format edge value type is not 
known13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled 
(default), do not allow any task retries (setting mapred.map.max.attempts = 0, 
old value = 4)13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser 
for parsing the arguments. Applications should implement Tool for the 
same.13/09/03 14:20:01 INFO mapred.JobClient: Running job: 
job_201308291126_003913/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 
0%13/09/03 14:20:12 INFO mapred.JobClient: Job complete: 
job_201308291126_003913/09/03 14:20:12 INFO mapred.JobClient: Counters: 
613/09/03 14:20:12 INFO mapred.JobClient:   Job Counters 13/09/03 14:20:12 INFO 
mapred.JobClient: Failed map tasks=113/09/03 14:20:12 INFO 
mapred.JobClient: Launched map tasks=213/09/03 14:20:12 INFO 
mapred.JobClient: Total time spent by all maps in occupied slots 
(ms)=1632713/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all 
reduces in occupied slots (ms)=013/09/03 14:20:12 INFO mapred.JobClient: 
Total time spent by all maps waiting after reserving slots (ms)=013/09/03 
14:20:12 INFO mapred.JobClient: Total time spent by all reduces waiting 
after reserving slots (ms)=0[root@localhost giraph]# 

When I try to run Zookeeper it still gives me an 'Address already in use' 
exception.
[root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh start-foregroundJMX 
enabled by defaultUsing config: 
/usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,882 [myid:] - INFO  
[main:QuorumPeerConfig@101] - Reading configuration from: 
/usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,888 [myid:] - ERROR 
[main:QuorumPeerConfig@283] - Invalid configuration, only one server specified 
(ignoring)2013-09-03 14:23:37,889 [myid:] - INFO  
[main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 32013-09-03 
14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@79] - 
autopurge.purgeInterval set to 02013-09-03 14:23:37,890 [myid:] - INFO  
[main:DatadirCleanupManager@101] - Purge task is not scheduled.2013-09-03 
14:23:37,890 [myid:] - WARN  [main:QuorumPeerMain@118] - Either no config or no 
quorum defined in config, running  in standalone mode2013-09-03 14:23:37,904 
[myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: 
/usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,905 [myid:] - ERROR 
[main:QuorumPeerConfig@283] - Invalid configuration, only one server specified 
(ignoring)2013-09-03 14:23:37,905 [myid:] - INFO  
[main:ZooKeeperServerMain@100] - Starting server2013-09-03 14:23:37,920 [myid:] 
- INFO  [main:Environment@100] - Server 
environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 
GMT2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server 
environment:host.name=localhost.localdomain2013-09-03 14:23:37,921 [myid:] - 
INFO  [main:Environment@100] - Server 
environment:java.version=1.6.0_312013-09-03 14:23:37,921 [myid:] - INFO  
[main:Environment@100] - Server environment:java.vendor=Sun Microsystems 
Inc.2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server 
environment:java.home=/usr/java/jdk1.6.0_31/jre2013-09-03 14:23:37,921 [myid:] 
- INFO  [main:Environment@100] - Server 
environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf:2013-09-03
 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server 
environment:java.library.path=/usr/java/jd

Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-03 Thread Claudio Martella
can you try defining the zookeeper manager directory from the command line?
like this -D giraph.zkManagerDirectory=/path/in/hdfs/foobar

you'll have to delete this directory by hand before each job. Just to see
if it solves the problem. Then I could know how to fix it.


On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams  wrote:

> Hi Pradeep,
>
> Yes, the zookeeper server is definitely running, I can connect to it with
> the
> command-line client
>
> [root@localhost giraph]# zkCli.sh  -server 127.0.0.1:2181
> Connecting to 127.0.0.1:2181
> 2013-09-03 11:15:45,987 [myid:] - INFO  [main:Environment@100] - Client
> environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34
> GMT
> 2013-09-03 11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client
> environment:host.name=localhost.localdomain
> 2013-09-03 11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client
> environment:java.version=1.6.0_31
> ..
> WatchedEvent state:SyncConnected type:None path:null
> [zk: 127.0.0.1:2181(CONNECTED) 0] ls /
> [hbase, zookeeper]
> [zk: 127.0.0.1:2181(CONNECTED) 1]
>
>
> However, I am a bit confused.
> If I look in the zookeeper log-file I see this port 2181 'Address already
> in use' error,
>
> 2013-09-03 10:52:24,412 [myid:] - INFO  [main:ZooKeeperServer@735] -
> minSessionTimeout set to -1
> 2013-09-03 10:52:24,413 [myid:] - INFO  [main:ZooKeeperServer@744] -
> maxSessionTimeout set to -1
> 2013-09-03 10:52:24,436 [myid:] - INFO  [main:NIOServerCnxnFactory@99] -
> binding to port 0.0.0.0/0.0.0.0:2181
> 2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] -
> Unexpected exception, exiting abnormally
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind(Native Method)
>  at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
>  at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
>
> The process listening on port 2181 is 2892, which turns out to be HBase.
>
> [root@localhost giraph]# fuser 2181/tcp
> 2181/tcp: 2892
> [root@localhost giraph]# ps aux | grep 2892
> hbase 2892  0.1  3.2 719592 119624 ?   Sl   Aug29   7:35
> /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx500m
> -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase
> -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log
> -Dhbase.home.dir=/usr/lib/hbase/bin/..
> ..
>
> So I am not sure what my zookeeper client is connecting to.
> It seems to be connecting to a zookeeper server but when I do 'ps' I
> cannot see
> a zookeeper server running.
> Here is my zoo.cfg file,
>
> maxClientCnxns=50
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/var/lib/zookeeper
> # the port at which the clients will connect
> clientPort=2181
> server.1=localhost:2888:3888
>
> Thanks for any help,
>
> Ken
>
>
> > Date: Mon, 2 Sep 2013 22:48:29 +0530
> > Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> > From: pradeep0...@gmail.com
> > To: user@giraph.apache.org
>
> >
> > Can you check if zookeeper running properly.
> >
> > On 9/2/13, Ken Williams  wrote:
> > > Hi,
> > > I am trying to one of the example programs included with Giraph
> > > 1.0.0but whatever I do I always get this same error:
> > > FileNotFoundException: File
> _bsp/_defaultZkManagerDir//_zkServer
> > > does not exist.
> > > I am running Giraph 1.0.0, on hadoop-2.0.0-cdh4.1.1 I successfully ran
> > > 'mvn clean install -Phadoop_2.0.0' with no problems.
> > > This is my input file,
> > > [root@localhost giraph]# hadoop fs -cat
> > >
> /user/root/input/tiny_graph.txt[0,0,[[1,1],[3,3]]][1,0,[[0,1],[2,2],[3,1]]][2,0,[[1,2],[4,4]]][3,0,[[0,3],[1,1],[4,4]]][4,0,[[3,4],[2,4]]]
> > > When I try to run an example program this is the output,
> > > [root@localhost giraph]# hadoop jar
> > >
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
> > > org.apache.giraph.GiraphRunner
> > > org.apache.giraph.examples.SimpleShortestPathsVertex -vif
> > >
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
> > > /user/root/input/tiny_graph.txt -of
> > > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> > > /user/root/output/shortestpaths -w 1 13/09/02 17:06:36 INFO
> > > utils.ConfigurationUt

RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-03 Thread Ken Williams
Hi Pradeep,
Yes, the zookeeper server is definitely running, I can connect to it with the 
command-line client[root@localhost giraph]# zkCli.sh  -server 
127.0.0.1:2181Connecting to 127.0.0.1:21812013-09-03 11:15:45,987 [myid:] - 
INFO  [main:Environment@100] - Client 
environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 
GMT2013-09-03 11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client 
environment:host.name=localhost.localdomain2013-09-03 11:15:45,990 [myid:] - 
INFO  [main:Environment@100] - Client 
environment:java.version=1.6.0_31..WatchedEvent state:SyncConnected 
type:None path:null[zk: 127.0.0.1:2181(CONNECTED) 0] ls /[hbase, zookeeper][zk: 
127.0.0.1:2181(CONNECTED) 1] 

However, I am a bit confused. If I look in the zookeeper log-file I see this 
port 2181 'Address already in use' error,
2013-09-03 10:52:24,412 [myid:] - INFO  [main:ZooKeeperServer@735] - 
minSessionTimeout set to -12013-09-03 10:52:24,413 [myid:] - INFO  
[main:ZooKeeperServer@744] - maxSessionTimeout set to -12013-09-03 10:52:24,436 
[myid:] - INFO  [main:NIOServerCnxnFactory@99] - binding to port 
0.0.0.0/0.0.0.0:21812013-09-03 10:52:24,447 [myid:] - ERROR 
[main:ZooKeeperServerMain@68] - Unexpected exception, exiting 
abnormallyjava.net.BindException: Address already in use at 
sun.nio.ch.Net.bind(Native Method)   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
  at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
The process listening on port 2181 is 2892, which turns out to be HBase. 
[root@localhost giraph]# fuser 2181/tcp2181/tcp: 
2892[root@localhost giraph]# ps aux | grep 2892hbase 2892  0.1  3.2 719592 
119624 ?   Sl   Aug29   7:35 /usr/java/jdk1.6.0_31/bin/java 
-XX:OnOutOfMemoryError=kill -9 %p -Xmx500m -XX:+UseConcMarkSweepGC 
-Dhbase.log.dir=/var/log/hbase 
-Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log 
-Dhbase.home.dir=/usr/lib/hbase/bin/..   ..
So I am not sure what my zookeeper client is connecting to. It seems to be 
connecting to a zookeeper server but when I do 'ps' I cannot see a zookeeper 
server running. Here is my zoo.cfg file,
maxClientCnxns=50# The number of milliseconds of each ticktickTime=2000# The 
number of ticks that the initial synchronization phase can takeinitLimit=10# 
The number of ticks that can pass between # sending a request and getting an 
acknowledgementsyncLimit=5# the directory where the snapshot is 
stored.dataDir=/var/lib/zookeeper# the port at which the clients will 
connectclientPort=2181server.1=localhost:2888:3888
Thanks for any help,
Ken

> Date: Mon, 2 Sep 2013 22:48:29 +0530
> Subject: Re: FileNotFoundException: File 
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> From: pradeep0...@gmail.com
> To: user@giraph.apache.org
> 
> Can you check if zookeeper running properly.
> 
> On 9/2/13, Ken Williams  wrote:
> > Hi,
> >   I am trying to one of the example programs included with Giraph
> > 1.0.0but whatever I do I always get this same error:
> > FileNotFoundException: File _bsp/_defaultZkManagerDir//_zkServer
> > does not exist.
> >I am running Giraph 1.0.0, on hadoop-2.0.0-cdh4.1.1I successfully ran
> > 'mvn clean install -Phadoop_2.0.0' with no problems.
> > This is my input file,
> > [root@localhost giraph]# hadoop fs -cat
> > /user/root/input/tiny_graph.txt[0,0,[[1,1],[3,3]]][1,0,[[0,1],[2,2],[3,1]]][2,0,[[1,2],[4,4]]][3,0,[[0,3],[1,1],[4,4]]][4,0,[[3,4],[2,4]]]
> >When I try to run an example program this is the output,
> > [root@localhost giraph]#  hadoop jar
> > /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
> >   org.apache.giraph.GiraphRunner
> > org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
> > /user/root/input/tiny_graph.txt -of
> > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> > /user/root/output/shortestpaths -w 1 13/09/02 17:06:36 INFO
> > utils.ConfigurationUtils: No edge input format specified. Ensure your
> > InputFormat does not require one.13/09/02 17:06:36 WARN
> > job.GiraphConfigurationValidator: Output format vertex index type is not
> > known13/09/02 17:06:36 WARN job.GiraphConfigurationValidator: Output format
> > vertex value type is not known13/09/02 17:06:36 WARN
> > job.GiraphConfigurationValidator: Output format edge value type is not
> > known13/09/02 17:06:36 INFO job.GiraphJob: run: Since checkpointing is
> > disabled (default),