Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-09-04 Thread Avery Ching
We have caches per every compute threads.  Then we have w worker caches 
per compute thread.  So the total amount of memory consumed by message 
caches per worker =
Compute threads * workers * size of cache.  The best thing is to tune 
down the size of the cache from MAX_MSG_REQUEST_SIZE to a size that 
works for your configuration.


Hope that helps,

Avery

On 9/4/13 3:33 AM, Lukas Nalezenec wrote:


Thanks,
I was not sure if it really works as I described.

> Facebook can't be using it like this if, as described, they have 
billions of vertices and a trillion edges.


Yes, its strange. I guess configuration does not help so much on large 
cluster. What might help are properties of input data.


> So do you, or Avery, have any idea how you might initialize this is 
a more reasonable way, and how???


Fast workaround is to set number of partitions to from W^2 to W or 2*W 
.  It will help if you dont have very large number of workers.

I would not change MAX_*_REQUEST_SIZE much since it may hurt performance.
You can do some preprocessing before loading data to Giraph.



How to change Giraph:
The caches could be flushed if total sum of vertexes/edges in all 
caches exceeds some number. Ideally, it should prevent not only 
OutOfMemory errors but also raising high water mark. Not sure if it 
(preventing raising HWM) is easy to do.
I am going to use almost-prebuild partitions. For my use case it would 
be ideal to detect if some cache is abandoned and i would not be used 
anymore. It would cut memory usage in caches from ~O(n^3) to ~O(n).  
It could be done by counting number of cache flushes or cache 
insertions and if some cache was not touched for long time it would be 
flushed.


There could be separated configuration MAX_*_REQUEST_SIZE for per 
partition caches during loading data.


I guess there should be simple but efficient way how to trace memory 
high-water mark. It could look like:


Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb
Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
Iteration 1 XYZ 
Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
.
.
.

Lukas





On 09/04/13 01:12, Jeff Peters wrote:
Thank you Lukas!!! That's EXACTLY the kind of model I was building in 
my head over the weekend about why this might be happening, and why 
increasing the number of AWS instances (and workers) does not solve 
the problem without increasing each worker's VM. Surely Facebook 
can't be using it like this if, as described, they have billions of 
vertices and a trillion edges. So do you, or Avery, have any idea how 
you might initialize this is a more reasonable way, and how???



On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec 
> wrote:


Hi

I wasted few days on similar problem.

I guess the problem was that during loading - if you have got W
workers and W^2 partitions there are W^2 partition caches in each
worker.
Each cache can hold 10 000 vertexes by default.
I had 26 000 000 vertexes, 60 workers -> 3600 partitions. It
means that there can be up to 36 000 000 vertexes in caches in
each worker if input files are random.
Workers were assigned 450 000 vertexes but failed when they had
900 000 vertexes in memory.

Btw: Why default number of partitions is W^2 ?

(I can be wrong)
Lukas



On 08/31/13 01:54, Avery Ching wrote:

Ah, the new caches. =)  These make things a lot faster (bulk
data sending), but do take up some additional memory.  if you
look at GiraphConstants, you can find ways to change the cache
sizes (this will reduce that memory usage).
For example, MAX_EDGE_REQUEST_SIZE will affect the size of the
edge cache. MAX_MSG_REQUEST_SIZE will affect the size of the
message cache.  The caches are per worker, so 100 workers would
require 50 MB  per worker by default.  Feel free to trim it if
you like.

The byte arrays for the edges are the most efficient storage
possible (although not as performance as the native edge stores).

Hope that helps,

Avery

On 8/29/13 4:53 PM, Jeff Peters wrote:

Avery, it would seem that optimizations to Giraph have,
unfortunately, turned the majority of the heap into "dark
matter". The two snapshots are at unknown points in a superstep
but I waited for several supersteps so that the activity had
more or less stabilized. About the only thing comparable
between the two snapshots are the vertexes, 192561 X
"RecsVertex" in the new version and 191995 X "Coloring" in the
old system. But with the new Giraph 672710176 out of 824886184
bytes are stored as primitive byte arrays. That's probably
indicative of some very fine performance optimization work, but
it makes it extremely difficult to know what's really out
there, and why. I did notice that a number of caches have
appeared that did not exist before,

Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-09-04 Thread Jeff Peters
Ok thanks Avery. But I still have two questions, the first a really dumb
newbie question. Why are there ever any number of partitions other than
exactly one per worker thread (one per worker in our case)? And a deeper
question. Even if I shrink the cache I would suppose that if Facebook has
billions of vertices they must have thousands or workers. It would seem the
cache scheme simply blows up on huge graphs no matter what you do. What am
I missing here?


On Wed, Sep 4, 2013 at 11:18 AM, Avery Ching  wrote:

>  The amount of memory for the send message cache is per worker =
> number of compute threads * number of workers * size of the cache.
>
> The number of partitions doesn't affect the memory usage very much.  My
> advice would be to dial down the cache size a bit with MAX_MSG_REQUEST_SIZE.
>
> Avery
>
>
> On 9/4/13 3:33 AM, Lukas Nalezenec wrote:
>
>
> Thanks,
> I was not sure if it really works as I described.
>
> > Facebook can't be using it like this if, as described, they have
> billions of vertices and a trillion edges.
>
> Yes, its strange. I guess configuration does not help so much on large
> cluster. What might help are properties of input data.
>
> > So do you, or Avery, have any idea how you might initialize this is a
> more reasonable way, and how???
>
> Fast workaround is to set number of partitions to from W^2 to W or 2*W .
> It will help if you dont have very large number of workers.
> I would not change MAX_*_REQUEST_SIZE much since it may hurt performance.
> You can do some preprocessing before loading data to Giraph.
>
>
>
> How to change Giraph:
> The caches could be flushed if total sum of vertexes/edges in all caches
> exceeds some number. Ideally, it should prevent not only OutOfMemory errors
> but also raising high water mark. Not sure if it (preventing raising HWM)
> is easy to do.
> I am going to use almost-prebuild partitions. For my use case it would be
> ideal to detect if some cache is abandoned and i would not be used anymore.
> It would cut memory usage in caches from ~O(n^3) to ~O(n).  It could be
> done by counting number of cache flushes or cache insertions and if some
> cache was not touched for long time it would be flushed.
>
> There could be separated configuration MAX_*_REQUEST_SIZE for per
> partition caches during loading data.
>
> I guess there should be simple but efficient way how to trace memory
> high-water mark. It could look like:
>
> Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb
> Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
> Iteration 1 XYZ 
> Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
> .
> .
> .
>
> Lukas
>
>
>
>
>
> On 09/04/13 01:12, Jeff Peters wrote:
>
> Thank you Lukas!!! That's EXACTLY the kind of model I was building in my
> head over the weekend about why this might be happening, and why increasing
> the number of AWS instances (and workers) does not solve the problem
> without increasing each worker's VM. Surely Facebook can't be using it like
> this if, as described, they have billions of vertices and a trillion edges.
> So do you, or Avery, have any idea how you might initialize this is a more
> reasonable way, and how???
>
>
> On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec <
> lukas.naleze...@firma.seznam.cz> wrote:
>
>>  Hi
>>
>> I wasted few days on similar problem.
>>
>> I guess the problem was that during loading - if you have got W workers
>> and W^2 partitions there are W^2 partition caches in each worker.
>> Each cache can hold 10 000 vertexes by default.
>> I had 26 000 000 vertexes, 60 workers -> 3600 partitions. It means that
>> there can be up to 36 000 000 vertexes in caches in each worker if input
>> files are random.
>> Workers were assigned 450 000 vertexes but failed when they had 900 000
>> vertexes in memory.
>>
>> Btw: Why default number of partitions is W^2 ?
>>
>> (I can be wrong)
>> Lukas
>>
>>
>>
>> On 08/31/13 01:54, Avery Ching wrote:
>>
>> Ah, the new caches. =)  These make things a lot faster (bulk data
>> sending), but do take up some additional memory.  if you look at
>> GiraphConstants, you can find ways to change the cache sizes (this will
>> reduce that memory usage).
>> For example, MAX_EDGE_REQUEST_SIZE will affect the size of the edge
>> cache.  MAX_MSG_REQUEST_SIZE will affect the size of the message cache.
>> The caches are per worker, so 100 workers would require 50 MB  per worker
>> by default.  Feel free to trim it if you like.
>>
>> The byte arrays for the edges are the most efficient storage possible
>> (although not as performance as the native edge stores).
>>
>> Hope that helps,
>>
>> Avery
>>
>> On 8/29/13 4:53 PM, Jeff Peters wrote:
>>
>> Avery, it would seem that optimizations to Giraph have, unfortunately,
>> turned the majority of the heap into "dark matter". The two snapshots are
>> at unknown points in a superstep but I waited for several supersteps so
>> that the activity had more or less stabilized. A

Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-09-04 Thread Avery Ching

The amount of memory for the send message cache is per worker =
number of compute threads * number of workers * size of the cache.

The number of partitions doesn't affect the memory usage very much.  My 
advice would be to dial down the cache size a bit with MAX_MSG_REQUEST_SIZE.


Avery

On 9/4/13 3:33 AM, Lukas Nalezenec wrote:


Thanks,
I was not sure if it really works as I described.

> Facebook can't be using it like this if, as described, they have 
billions of vertices and a trillion edges.


Yes, its strange. I guess configuration does not help so much on large 
cluster. What might help are properties of input data.


> So do you, or Avery, have any idea how you might initialize this is 
a more reasonable way, and how???


Fast workaround is to set number of partitions to from W^2 to W or 2*W 
.  It will help if you dont have very large number of workers.

I would not change MAX_*_REQUEST_SIZE much since it may hurt performance.
You can do some preprocessing before loading data to Giraph.



How to change Giraph:
The caches could be flushed if total sum of vertexes/edges in all 
caches exceeds some number. Ideally, it should prevent not only 
OutOfMemory errors but also raising high water mark. Not sure if it 
(preventing raising HWM) is easy to do.
I am going to use almost-prebuild partitions. For my use case it would 
be ideal to detect if some cache is abandoned and i would not be used 
anymore. It would cut memory usage in caches from ~O(n^3) to ~O(n).  
It could be done by counting number of cache flushes or cache 
insertions and if some cache was not touched for long time it would be 
flushed.


There could be separated configuration MAX_*_REQUEST_SIZE for per 
partition caches during loading data.


I guess there should be simple but efficient way how to trace memory 
high-water mark. It could look like:


Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb
Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
Iteration 1 XYZ 
Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
.
.
.

Lukas





On 09/04/13 01:12, Jeff Peters wrote:
Thank you Lukas!!! That's EXACTLY the kind of model I was building in 
my head over the weekend about why this might be happening, and why 
increasing the number of AWS instances (and workers) does not solve 
the problem without increasing each worker's VM. Surely Facebook 
can't be using it like this if, as described, they have billions of 
vertices and a trillion edges. So do you, or Avery, have any idea how 
you might initialize this is a more reasonable way, and how???



On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec 
> wrote:


Hi

I wasted few days on similar problem.

I guess the problem was that during loading - if you have got W
workers and W^2 partitions there are W^2 partition caches in each
worker.
Each cache can hold 10 000 vertexes by default.
I had 26 000 000 vertexes, 60 workers -> 3600 partitions. It
means that there can be up to 36 000 000 vertexes in caches in
each worker if input files are random.
Workers were assigned 450 000 vertexes but failed when they had
900 000 vertexes in memory.

Btw: Why default number of partitions is W^2 ?

(I can be wrong)
Lukas



On 08/31/13 01:54, Avery Ching wrote:

Ah, the new caches. =)  These make things a lot faster (bulk
data sending), but do take up some additional memory.  if you
look at GiraphConstants, you can find ways to change the cache
sizes (this will reduce that memory usage).
For example, MAX_EDGE_REQUEST_SIZE will affect the size of the
edge cache. MAX_MSG_REQUEST_SIZE will affect the size of the
message cache.  The caches are per worker, so 100 workers would
require 50 MB  per worker by default.  Feel free to trim it if
you like.

The byte arrays for the edges are the most efficient storage
possible (although not as performance as the native edge stores).

Hope that helps,

Avery

On 8/29/13 4:53 PM, Jeff Peters wrote:

Avery, it would seem that optimizations to Giraph have,
unfortunately, turned the majority of the heap into "dark
matter". The two snapshots are at unknown points in a superstep
but I waited for several supersteps so that the activity had
more or less stabilized. About the only thing comparable
between the two snapshots are the vertexes, 192561 X
"RecsVertex" in the new version and 191995 X "Coloring" in the
old system. But with the new Giraph 672710176 out of 824886184
bytes are stored as primitive byte arrays. That's probably
indicative of some very fine performance optimization work, but
it makes it extremely difficult to know what's really out
there, and why. I did notice that a number of caches have
appeared that did not exist before,
namely SendEdgeCache, SendPartitionCache, SendMessageCache
and

Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-04 Thread Claudio Martella
Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing
zookeeper is not used through the giraph.zkServerList parameter, with its
own configuration listening on port 22181.


On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams  wrote:

> H. Interesting.
>
> Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ?
>
> The only version of ZooKeeper I have installed is the one that came with
> HBase,
> and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies
> clientPort=2181
> This is the only zoo.cfg file on my machine.
>
>
> [root@localhost]# cat /etc/zookeeper/conf/zoo.cfg
> 
> maxClientCnxns=50
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/var/lib/zookeeper
> # the port at which the clients will connect
> clientPort=2181
> server.1=localhost:2888:3888
> [root@localhost Downloads]#
>
>
>
> --
> From: claudio.marte...@gmail.com
> Date: Wed, 4 Sep 2013 12:13:50 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> That should in principle not be the case, as the zookeeper started by
> Giraph listens on a different port than the default. See
> parameter giraph.zkServerPort, which defaults to 22181.
>
>
> On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams  wrote:
>
> Hi Claudio,
>
> I think I have fixed the problem.
>
>HBase runs with its own copy of ZooKeeper which listens on port 2181.
>So, when I tried to start ZooKeeper for Giraph it also tried to listen
> on port 2181
>and found it was already in use, and then it terminated - which is why
> Giraph failed.
>If I stop the HBase daemons (including its copy of ZooKeeper) then
> Giraph runs fine.
>
>Essentially there is a conflict between running ZooKeeper for Giraph,
> if there is
>already ZooKeeper running for HBase.
>
>I will try the patch and get back to you.
>
>Thanks for all your help,
>
> Ken
>
> --
> From: claudio.marte...@gmail.com
> Date: Tue, 3 Sep 2013 17:01:01 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> try with the attached patch applied to trunk, without the mentioned -D
> giraph.zkManagerDirectory.
>
>
> On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:
>
> Hi Claudio,
>
> I tried this but it made no difference. The map tasks still fail,
> still no output, and still an
> exception in the log files - FileNotFoundException: File
> /tmp/giraph/_zkServer does not exist.
>
> [root@localhost giraph]# hadoop jar
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
>   org.apache.giraph.GiraphRunner
>  -Dgiraph.zkManagerDirectory='/tmp/giraph/'
> org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/root/input/tiny_graph.txt -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/root/output/shortestpaths -w 1
> 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_0039
> 13/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 0%
> 13/09/03 14:20:12 INFO mapred.JobClient: Job complete:
> job_201308291126_0039
> 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6
> 13/09/03 14:20:12 INFO mapred.JobClient:   Job Counters
> 13/09/03 14:20:12 INFO mapred.JobClient: Failed map tasks=1
> 13/09/03 14:20:12 INFO mapred.JobClient: Launched map tasks=2
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
> in occupied slots (ms)=16327
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all
> reduces in occupied slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/09/03 14:20:

RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-04 Thread Ken Williams
H. Interesting.
Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ?
The only version of ZooKeeper I have installed is the one that came with 
HBase,and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies 
clientPort=2181This is the only zoo.cfg file on my machine.

[root@localhost]# cat /etc/zookeeper/conf/zoo.cfg maxClientCnxns=50# The 
number of milliseconds of each ticktickTime=2000# The number of ticks that the 
initial # synchronization phase can takeinitLimit=10# The number of ticks that 
can pass between # sending a request and getting an acknowledgementsyncLimit=5# 
the directory where the snapshot is stored.dataDir=/var/lib/zookeeper# the port 
at which the clients will 
connectclientPort=2181server.1=localhost:2888:3888[root@localhost Downloads]# 


From: claudio.marte...@gmail.com
Date: Wed, 4 Sep 2013 12:13:50 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: user@giraph.apache.org

That should in principle not be the case, as the zookeeper started by Giraph 
listens on a different port than the default. See parameter 
giraph.zkServerPort, which defaults to 22181.



On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams  wrote:





Hi Claudio,
I think I have fixed the problem.
   HBase runs with its own copy of ZooKeeper which listens on port 2181.   So, 
when I tried to start ZooKeeper for Giraph it also tried to listen on port 2181

   and found it was already in use, and then it terminated - which is why 
Giraph failed.   If I stop the HBase daemons (including its copy of ZooKeeper) 
then Giraph runs fine. 
   Essentially there is a conflict between running ZooKeeper for Giraph, if 
there is 

   already ZooKeeper running for HBase. 
   I will try the patch and get back to you.
   Thanks for all your help,
Ken


From: claudio.marte...@gmail.com
Date: Tue, 3 Sep 2013 17:01:01 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.


To: user@giraph.apache.org

try with the attached patch applied to trunk, without the mentioned -D 
giraph.zkManagerDirectory.



On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:





Hi Claudio,
I tried this but it made no difference. The map tasks still fail, still no 
output, and still anexception in the log files - FileNotFoundException: File 
/tmp/giraph/_zkServer does not exist.




[root@localhost giraph]# hadoop jar 
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
   org.apache.giraph.GiraphRunner  -Dgiraph.zkManagerDirectory='/tmp/giraph/'   
  org.apache.giraph.examples.SimpleShortestPathsVertex  -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/user/root/input/tiny_graph.txt -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/user/root/output/shortestpaths -w 1 



13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.13/09/03 14:19:58 WARN 
job.GiraphConfigurationValidator: Output format vertex index type is not known



13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex 
value type is not known13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: 
Output format edge value type is not known



13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled 
(default), do not allow any task retries (setting mapred.map.max.attempts = 0, 
old value = 4)13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser 
for parsing the arguments. Applications should implement Tool for the same.



13/09/03 14:20:01 INFO mapred.JobClient: Running job: 
job_201308291126_003913/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 
0%13/09/03 14:20:12 INFO mapred.JobClient: Job complete: job_201308291126_0039



13/09/03 14:20:12 INFO mapred.JobClient: Counters: 613/09/03 14:20:12 INFO 
mapred.JobClient:   Job Counters 13/09/03 14:20:12 INFO mapred.JobClient: 
Failed map tasks=113/09/03 14:20:12 INFO mapred.JobClient: Launched map 
tasks=2



13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps in 
occupied slots (ms)=1632713/09/03 14:20:12 INFO mapred.JobClient: Total 
time spent by all reduces in occupied slots (ms)=0



13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps 
waiting after reserving slots (ms)=013/09/03 14:20:12 INFO mapred.JobClient:
 Total time spent by all reduces waiting after reserving slots (ms)=0



[root@localhost giraph]# 

When I try to run Zookeeper it still gives me an 'Address already in use' 
exception.
[root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh start-foreground



JMX enabled by defaultUsing config: 
/usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,882 [myid:] - INFO  
[main:QuorumPeerConfig@101] - Reading configuration from: 
/usr/lib/zo

Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

2013-09-04 Thread Lukas Nalezenec


Thanks,
I was not sure if it really works as I described.

> Facebook can't be using it like this if, as described, they have 
billions of vertices and a trillion edges.


Yes, its strange. I guess configuration does not help so much on large 
cluster. What might help are properties of input data.


> So do you, or Avery, have any idea how you might initialize this is a 
more reasonable way, and how???


Fast workaround is to set number of partitions to from W^2 to W or 2*W 
.  It will help if you dont have very large number of workers.

I would not change MAX_*_REQUEST_SIZE much since it may hurt performance.
You can do some preprocessing before loading data to Giraph.



How to change Giraph:
The caches could be flushed if total sum of vertexes/edges in all caches 
exceeds some number. Ideally, it should prevent not only OutOfMemory 
errors but also raising high water mark. Not sure if it (preventing 
raising HWM) is easy to do.
I am going to use almost-prebuild partitions. For my use case it would 
be ideal to detect if some cache is abandoned and i would not be used 
anymore. It would cut memory usage in caches from ~O(n^3) to ~O(n).  It 
could be done by counting number of cache flushes or cache insertions 
and if some cache was not touched for long time it would be flushed.


There could be separated configuration MAX_*_REQUEST_SIZE for per 
partition caches during loading data.


I guess there should be simple but efficient way how to trace memory 
high-water mark. It could look like:


Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb
Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
Iteration 1 XYZ 
Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
.
.
.

Lukas





On 09/04/13 01:12, Jeff Peters wrote:
Thank you Lukas!!! That's EXACTLY the kind of model I was building in 
my head over the weekend about why this might be happening, and why 
increasing the number of AWS instances (and workers) does not solve 
the problem without increasing each worker's VM. Surely Facebook can't 
be using it like this if, as described, they have billions of vertices 
and a trillion edges. So do you, or Avery, have any idea how you might 
initialize this is a more reasonable way, and how???



On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec 
> wrote:


Hi

I wasted few days on similar problem.

I guess the problem was that during loading - if you have got W
workers and W^2 partitions there are W^2 partition caches in each
worker.
Each cache can hold 10 000 vertexes by default.
I had 26 000 000 vertexes, 60 workers -> 3600 partitions. It means
that there can be up to 36 000 000 vertexes in caches in each
worker if input files are random.
Workers were assigned 450 000 vertexes but failed when they had
900 000 vertexes in memory.

Btw: Why default number of partitions is W^2 ?

(I can be wrong)
Lukas



On 08/31/13 01:54, Avery Ching wrote:

Ah, the new caches. =)  These make things a lot faster (bulk data
sending), but do take up some additional memory.  if you look at
GiraphConstants, you can find ways to change the cache sizes
(this will reduce that memory usage).
For example, MAX_EDGE_REQUEST_SIZE will affect the size of the
edge cache.  MAX_MSG_REQUEST_SIZE will affect the size of the
message cache.  The caches are per worker, so 100 workers would
require 50 MB  per worker by default.  Feel free to trim it if
you like.

The byte arrays for the edges are the most efficient storage
possible (although not as performance as the native edge stores).

Hope that helps,

Avery

On 8/29/13 4:53 PM, Jeff Peters wrote:

Avery, it would seem that optimizations to Giraph have,
unfortunately, turned the majority of the heap into "dark
matter". The two snapshots are at unknown points in a superstep
but I waited for several supersteps so that the activity had
more or less stabilized. About the only thing comparable between
the two snapshots are the vertexes, 192561 X "RecsVertex" in the
new version and 191995 X "Coloring" in the old system. But with
the new Giraph 672710176 out of 824886184 bytes are stored as
primitive byte arrays. That's probably indicative of some very
fine performance optimization work, but it makes it extremely
difficult to know what's really out there, and why. I did notice
that a number of caches have appeared that did not exist before,
namely SendEdgeCache, SendPartitionCache, SendMessageCache
and SendMutationsCache.

Could any of those account for a larger per-worker footprint in
a modern Giraph? Should I simply assume that I need to force AWS
to configure its EMR Hadoop so that each instance has fewer map
tasks but with a somewhat larger VM max, say 3GB instead of 2GB?


On Wed, Aug 28, 2013 at 4:57 PM, Ave

Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-04 Thread Claudio Martella
That should in principle not be the case, as the zookeeper started by
Giraph listens on a different port than the default. See
parameter giraph.zkServerPort, which defaults to 22181.


On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams  wrote:

> Hi Claudio,
>
> I think I have fixed the problem.
>
>HBase runs with its own copy of ZooKeeper which listens on port 2181.
>So, when I tried to start ZooKeeper for Giraph it also tried to listen
> on port 2181
>and found it was already in use, and then it terminated - which is why
> Giraph failed.
>If I stop the HBase daemons (including its copy of ZooKeeper) then
> Giraph runs fine.
>
>Essentially there is a conflict between running ZooKeeper for Giraph,
> if there is
>already ZooKeeper running for HBase.
>
>I will try the patch and get back to you.
>
>Thanks for all your help,
>
> Ken
>
> --
> From: claudio.marte...@gmail.com
> Date: Tue, 3 Sep 2013 17:01:01 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> try with the attached patch applied to trunk, without the mentioned -D
> giraph.zkManagerDirectory.
>
>
> On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:
>
> Hi Claudio,
>
> I tried this but it made no difference. The map tasks still fail,
> still no output, and still an
> exception in the log files - FileNotFoundException: File
> /tmp/giraph/_zkServer does not exist.
>
> [root@localhost giraph]# hadoop jar
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
>   org.apache.giraph.GiraphRunner
>  -Dgiraph.zkManagerDirectory='/tmp/giraph/'
> org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/root/input/tiny_graph.txt -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/root/output/shortestpaths -w 1
> 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_0039
> 13/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 0%
> 13/09/03 14:20:12 INFO mapred.JobClient: Job complete:
> job_201308291126_0039
> 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6
> 13/09/03 14:20:12 INFO mapred.JobClient:   Job Counters
> 13/09/03 14:20:12 INFO mapred.JobClient: Failed map tasks=1
> 13/09/03 14:20:12 INFO mapred.JobClient: Launched map tasks=2
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
> in occupied slots (ms)=16327
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all
> reduces in occupied slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0
> [root@localhost giraph]#
>
>
> When I try to run Zookeeper it still gives me an 'Address already in use'
> exception.
>
> [root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh
> start-foreground
> JMX enabled by default
> Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,882 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] -
> Invalid configuration, only one server specified (ignoring)
> 2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@78] -
> autopurge.snapRetainCount set to 3
> 2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@79] -
> autopurge.purgeInterval set to 0
> 2013-09-03 14:23:37,890 [myid:] - INFO  [main:DatadirCleanupManager@101]
> - Purge task is not scheduled.
> 2013-09-03 14:23:37,890 [myid:] - WARN  [main:QuorumPeerMain@118] -
> Either no config or no quorum defined in config, running  in standalone mode
> 2013-09-03 14:23:37,904 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] -
> Invalid configuration, only one se

RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.

2013-09-04 Thread Ken Williams
Hi Claudio,
I think I have fixed the problem.
   HBase runs with its own copy of ZooKeeper which listens on port 2181.   So, 
when I tried to start ZooKeeper for Giraph it also tried to listen on port 2181 
  and found it was already in use, and then it terminated - which is why Giraph 
failed.   If I stop the HBase daemons (including its copy of ZooKeeper) then 
Giraph runs fine. 
   Essentially there is a conflict between running ZooKeeper for Giraph, if 
there isalready ZooKeeper running for HBase. 
   I will try the patch and get back to you.
   Thanks for all your help,
Ken
From: claudio.marte...@gmail.com
Date: Tue, 3 Sep 2013 17:01:01 +0200
Subject: Re: FileNotFoundException: File 
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: user@giraph.apache.org

try with the attached patch applied to trunk, without the mentioned -D 
giraph.zkManagerDirectory.

On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams  wrote:





Hi Claudio,
I tried this but it made no difference. The map tasks still fail, still no 
output, and still anexception in the log files - FileNotFoundException: File 
/tmp/giraph/_zkServer does not exist.


[root@localhost giraph]# hadoop jar 
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
   org.apache.giraph.GiraphRunner  -Dgiraph.zkManagerDirectory='/tmp/giraph/'   
  org.apache.giraph.examples.SimpleShortestPathsVertex  -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/user/root/input/tiny_graph.txt -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/user/root/output/shortestpaths -w 1 

13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.13/09/03 14:19:58 WARN 
job.GiraphConfigurationValidator: Output format vertex index type is not known

13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex 
value type is not known13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: 
Output format edge value type is not known

13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled 
(default), do not allow any task retries (setting mapred.map.max.attempts = 0, 
old value = 4)13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser 
for parsing the arguments. Applications should implement Tool for the same.

13/09/03 14:20:01 INFO mapred.JobClient: Running job: 
job_201308291126_003913/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 
0%13/09/03 14:20:12 INFO mapred.JobClient: Job complete: job_201308291126_0039

13/09/03 14:20:12 INFO mapred.JobClient: Counters: 613/09/03 14:20:12 INFO 
mapred.JobClient:   Job Counters 13/09/03 14:20:12 INFO mapred.JobClient: 
Failed map tasks=113/09/03 14:20:12 INFO mapred.JobClient: Launched map 
tasks=2

13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps in 
occupied slots (ms)=1632713/09/03 14:20:12 INFO mapred.JobClient: Total 
time spent by all reduces in occupied slots (ms)=0

13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps 
waiting after reserving slots (ms)=013/09/03 14:20:12 INFO mapred.JobClient:
 Total time spent by all reduces waiting after reserving slots (ms)=0

[root@localhost giraph]# 

When I try to run Zookeeper it still gives me an 'Address already in use' 
exception.
[root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh start-foreground

JMX enabled by defaultUsing config: 
/usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,882 [myid:] - INFO  
[main:QuorumPeerConfig@101] - Reading configuration from: 
/usr/lib/zookeeper/bin/../conf/zoo.cfg

2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid 
configuration, only one server specified (ignoring)2013-09-03 14:23:37,889 
[myid:] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set 
to 3

2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@79] - 
autopurge.purgeInterval set to 02013-09-03 14:23:37,890 [myid:] - INFO  
[main:DatadirCleanupManager@101] - Purge task is not scheduled.

2013-09-03 14:23:37,890 [myid:] - WARN  [main:QuorumPeerMain@118] - Either no 
config or no quorum defined in config, running  in standalone mode2013-09-03 
14:23:37,904 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading 
configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg

2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid 
configuration, only one server specified (ignoring)2013-09-03 14:23:37,905 
[myid:] - INFO  [main:ZooKeeperServerMain@100] - Starting server

2013-09-03 14:23:37,920 [myid:] - INFO  [main:Environment@100] - Server 
environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 
GMT2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server 
environment:host.name=localhost.localdomain

2013-09-03 14:23:37,921 [myid:] - INFO  [