Re: Giraph offloadPartition fails creation directory

2013-09-12 Thread Alexander Asplund
Unfortunately there's some restrictions that means I don't really have
them handy, BUT pointing me towards the local disk helped me partially
resolve it. There are rights issues with this directory, but I was
able to get by it by manually creating a separate giraph in mapreduce
local storage mapred/local/giraph and setting Giraph Options to point
to local storage/giraph/partitions and /messages

Then something strange happens. The job successfully creates 30,
exactly 30 directories, and then starts failing again. This happened
both times I ran the job. 30 directories are created in the partitions
directory, and then subsequenctly it prints to the task log something
like

DiskBackedPartitionStorage: offloadPartition: Failed to create directory ...

..and then no further directories are created after the 30. It will
attempt to create more partition directories, but it keeps failling
after the inital 30. It is quite strange.

On 9/12/13, Claudio Martella claudio.marte...@gmail.com wrote:
 Giraph does not offload partitions or messages to HDFS in the out-of-core
 module. It uses local disk on the computing nodes. By defualt, it uses the
 tasktracker local directory where for example the distributed cache is
 stored.

 Could you provide the stacktrace Giraph is spitting when failing?


 On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

 Hi,

 I'm still trying to get Giraph to work on a graph that requires more
 memory that is available. The problem is that when the Workers try to
 offload partitions, the offloading fails. The DiskBackedPartitionStore
 fails to create the directory
 _bsp/_partitions/job-/part-vertices-xxx (roughly from recall).

 The input or computation will then continue for a while, which I
 believe is because it is still managing to hold everything in memory -
 but at some point it reaches the limit where there simply is no more
 heap space, and it crashes with OOM.

 Has anybody had this problem with giraph failing to make HDFS
 directories?




 --
Claudio Martella
claudio.marte...@gmail.com



-- 
Alexander Asplund


Re: Giraph offloadPartition fails creation directory

2013-09-12 Thread Alexander Asplund
Actually, I take that back. It seems it does succeeded in creating
partitions - it just struggles with it sometimes. Should I be worried about
these errors if partition directories seem to be filling up?
On Sep 11, 2013 6:38 PM, Claudio Martella claudio.marte...@gmail.com
wrote:

 Giraph does not offload partitions or messages to HDFS in the out-of-core
 module. It uses local disk on the computing nodes. By defualt, it uses the
 tasktracker local directory where for example the distributed cache is
 stored.

 Could you provide the stacktrace Giraph is spitting when failing?


 On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.com
  wrote:

 Hi,

 I'm still trying to get Giraph to work on a graph that requires more
 memory that is available. The problem is that when the Workers try to
 offload partitions, the offloading fails. The DiskBackedPartitionStore
 fails to create the directory
 _bsp/_partitions/job-/part-vertices-xxx (roughly from recall).

 The input or computation will then continue for a while, which I
 believe is because it is still managing to hold everything in memory -
 but at some point it reaches the limit where there simply is no more
 heap space, and it crashes with OOM.

 Has anybody had this problem with giraph failing to make HDFS directories?




 --
Claudio Martella
claudio.marte...@gmail.com



Re: Giraph offloadPartition fails creation directory

2013-09-12 Thread Alexander Asplund
Actually, why is it saying it fails to create directory in the first place,
when it is trying to write files?
On Sep 12, 2013 3:04 PM, Alexander Asplund alexaspl...@gmail.com wrote:

 I can also add that there is no such issue with DiskBackedMessageStore. It
 successfully creates a large number of store files, and never starts
 failing.
 On Sep 12, 2013 2:11 PM, Alexander Asplund alexaspl...@gmail.com
 wrote:

 It's very strange.. it is definitely failing on some partitions..
 currently the disk size of a offloading worker corresponda about to the
 size of its part of the graph... but the worker attempts to create
 additional partitions, and this fails.
 On Sep 12, 2013 2:07 PM, Alexander Asplund alexaspl...@gmail.com
 wrote:

 Actually, I take that back. It seems it does succeeded in creating
 partitions - it just struggles with it sometimes. Should I be worried about
 these errors if partition directories seem to be filling up?
 On Sep 11, 2013 6:38 PM, Claudio Martella claudio.marte...@gmail.com
 wrote:

 Giraph does not offload partitions or messages to HDFS in the
 out-of-core module. It uses local disk on the computing nodes. By defualt,
 it uses the tasktracker local directory where for example the distributed
 cache is stored.

 Could you provide the stacktrace Giraph is spitting when failing?


 On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund 
 alexaspl...@gmail.com wrote:

 Hi,

 I'm still trying to get Giraph to work on a graph that requires more
 memory that is available. The problem is that when the Workers try to
 offload partitions, the offloading fails. The DiskBackedPartitionStore
 fails to create the directory
 _bsp/_partitions/job-/part-vertices-xxx (roughly from recall).

 The input or computation will then continue for a while, which I
 believe is because it is still managing to hold everything in memory -
 but at some point it reaches the limit where there simply is no more
 heap space, and it crashes with OOM.

 Has anybody had this problem with giraph failing to make HDFS
 directories?




 --
Claudio Martella
claudio.marte...@gmail.com




Giraph offloadPartition fails creation directory

2013-09-11 Thread Alexander Asplund
Hi,

I'm still trying to get Giraph to work on a graph that requires more
memory that is available. The problem is that when the Workers try to
offload partitions, the offloading fails. The DiskBackedPartitionStore
fails to create the directory
_bsp/_partitions/job-/part-vertices-xxx (roughly from recall).

The input or computation will then continue for a while, which I
believe is because it is still managing to hold everything in memory -
but at some point it reaches the limit where there simply is no more
heap space, and it crashes with OOM.

Has anybody had this problem with giraph failing to make HDFS directories?


Re: Out of core execution has no effect on GC crash

2013-09-10 Thread Alexander Asplund
Thanks, disabling GC overhead limit did the trick!

I did however run into another issue - the computation ends up
stalling when it tries to write partitions to disk. All the workers
keep sending out messages about DiskBackedPartitionStore failed to
create directory _bsp/_partitions/_jobx/part-vertices-xxx

On 9/10/13, Claudio Martella claudio.marte...@gmail.com wrote:
 As David mentions, even with OOC, the objects are still created (and yes,
 often soon destroyed after spilled to disk) putting pressure on the GC.
 Moreover, with the increase in size of the graph, the number of in-memory
 vertices is not the only increasing chunk of memory, as there are other
 memory stores around the codebase that get filled, such as caches etc.

 Try increasing the heap to something reasonable for your machines.


 On Tue, Sep 10, 2013 at 3:21 AM, David Boyd
 db...@data-tactics-corp.comwrote:

 Alexander:
 You might try turning off the GC Overhead limit
 (-XX:-UseGCOverheadLimit)
 Also you could turn on verbose GC logging (-verbose:gc
 -Xloggc:/tmp/@taskid@.gc)
 to see what is happening.
 Because the OOC still has to create and destroy objects I suspect that
 the
 heap is just
 getting really fragmented.

 There are options that you can set with Java to change the type of
 garbage
 collection and
 how it is scheduled as well.

 You might up the heap size slightly - what is the default heap size on
 your cluster?


 On 9/9/2013 8:33 PM, Alexander Asplund wrote:

 A small note: I'm not seeing any partitions directory being formed
 under _bsp, which is where I have understood that they should be
 appearing.

 On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote:

 Really appreciate the swift responses! Thanks again.

 I have not both increased mapper tasks and decreased max number of
 partitions at the same time. I first did tests with increased Mapper
 heap available, but reset the setting after it apparently caused
 other, large volume, non-Giraph jobs to crash nodes when reducers also
 were running.

 I'm curious why increasing mapper heap is a requirement. Shouldn't the
 OOC mode be able to work with the amount of heap that is available? Is
 there some agreement on the minimum amount of heap necessary for OOC
 to succeed, to guide the choice of Mapper heap amount?

 Either way, I will try increasing mapper heap again as much as
 possible, which hopefully will run.

 On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote:

 did you extend the heap available to the mapper tasks? e.g. through
 mapred.child.java.opts.


 On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

  Thanks for the reply.

 I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
 getting OOM: GC limit exceeded.

 Are there any particular cases the OOC will not be able to handle, or
 is it supposed to work in all cases? If the latter, it might be that
 I
 have made some configuration error.

 I do have one concern that might indicateI have done something wrong:
 to allow OOC to activate without crashing I had to modify the trunk
 code. This was because Giraph relied on guava-12 and
 DiskBackedPartitionStore used hasInt() - a method which does not
 exist
 in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
 used

 I suppose this problem might indicate I'm running submitting the job
 using the wrong binary. Currently I am including the giraph
 dependencies with the jar, and running using hadoop jar.

 On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote:

 OOC is used also at input superstep. try to decrease the number of
 partitions kept in memory.


 On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

  Hi,

 I'm trying to process a graph that is about 3 times the size of
 available memory. On the other hand, there is plenty of disk space.
 I
 have enabled the giraph.useOutOfCoreGraph property, but it still
 crashes with outOfMemoryError: GC limit exceeded when I try running
 my
 job.

 I'm wondering of the spilling is supposed to work during the input
 step. If so, are there any additional steps that must be taken to
 ensure it functions?

 Regards,
 Alexander Asplund



 --
 Claudio Martella
 claudio.marte...@gmail.com


 --
 Alexander Asplund



 --
 Claudio Martella
 claudio.marte...@gmail.com


 --
 Alexander Asplund




 --
 = mailto:db...@data-tactics.com 
 David W. Boyd
 Director, Engineering
 7901 Jones Branch, Suite 700
 Mclean, VA 22102
 office:   +1-571-279-2122
 fax: +1-703-506-6703
 cell: +1-703-402-7908
 ==
 http://www.data-tactics.com.**com/http://www.data-tactics.com.com/
 First Robotic Mentor - FRC, FTC - www.iliterobotics.org
 President - USSTEM Foundation - www.usstem.org

 The information contained in this message may be privileged
 and/or confidential and protected from disclosure.
 If the reader of this message is not the intended recipient

Re: Out of core execution has no effect on GC crash

2013-09-10 Thread Alexander Asplund
Correction: the computation does not actually stall - it does
complains a bit that the directories cannot be created and then
eventually moves to the next superstep. I guess this means I'm
actually fitting all the data in memory?

On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote:
 Thanks, disabling GC overhead limit did the trick!

 I did however run into another issue - the computation ends up
 stalling when it tries to write partitions to disk. All the workers
 keep sending out messages about DiskBackedPartitionStore failed to
 create directory _bsp/_partitions/_jobx/part-vertices-xxx

 On 9/10/13, Claudio Martella claudio.marte...@gmail.com wrote:
 As David mentions, even with OOC, the objects are still created (and yes,
 often soon destroyed after spilled to disk) putting pressure on the GC.
 Moreover, with the increase in size of the graph, the number of in-memory
 vertices is not the only increasing chunk of memory, as there are other
 memory stores around the codebase that get filled, such as caches etc.

 Try increasing the heap to something reasonable for your machines.


 On Tue, Sep 10, 2013 at 3:21 AM, David Boyd
 db...@data-tactics-corp.comwrote:

 Alexander:
 You might try turning off the GC Overhead limit
 (-XX:-UseGCOverheadLimit)
 Also you could turn on verbose GC logging (-verbose:gc
 -Xloggc:/tmp/@taskid@.gc)
 to see what is happening.
 Because the OOC still has to create and destroy objects I suspect that
 the
 heap is just
 getting really fragmented.

 There are options that you can set with Java to change the type of
 garbage
 collection and
 how it is scheduled as well.

 You might up the heap size slightly - what is the default heap size on
 your cluster?


 On 9/9/2013 8:33 PM, Alexander Asplund wrote:

 A small note: I'm not seeing any partitions directory being formed
 under _bsp, which is where I have understood that they should be
 appearing.

 On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote:

 Really appreciate the swift responses! Thanks again.

 I have not both increased mapper tasks and decreased max number of
 partitions at the same time. I first did tests with increased Mapper
 heap available, but reset the setting after it apparently caused
 other, large volume, non-Giraph jobs to crash nodes when reducers also
 were running.

 I'm curious why increasing mapper heap is a requirement. Shouldn't the
 OOC mode be able to work with the amount of heap that is available? Is
 there some agreement on the minimum amount of heap necessary for OOC
 to succeed, to guide the choice of Mapper heap amount?

 Either way, I will try increasing mapper heap again as much as
 possible, which hopefully will run.

 On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote:

 did you extend the heap available to the mapper tasks? e.g. through
 mapred.child.java.opts.


 On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

  Thanks for the reply.

 I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
 getting OOM: GC limit exceeded.

 Are there any particular cases the OOC will not be able to handle,
 or
 is it supposed to work in all cases? If the latter, it might be that
 I
 have made some configuration error.

 I do have one concern that might indicateI have done something
 wrong:
 to allow OOC to activate without crashing I had to modify the trunk
 code. This was because Giraph relied on guava-12 and
 DiskBackedPartitionStore used hasInt() - a method which does not
 exist
 in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
 used

 I suppose this problem might indicate I'm running submitting the job
 using the wrong binary. Currently I am including the giraph
 dependencies with the jar, and running using hadoop jar.

 On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote:

 OOC is used also at input superstep. try to decrease the number of
 partitions kept in memory.


 On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

  Hi,

 I'm trying to process a graph that is about 3 times the size of
 available memory. On the other hand, there is plenty of disk
 space.
 I
 have enabled the giraph.useOutOfCoreGraph property, but it still
 crashes with outOfMemoryError: GC limit exceeded when I try
 running
 my
 job.

 I'm wondering of the spilling is supposed to work during the input
 step. If so, are there any additional steps that must be taken to
 ensure it functions?

 Regards,
 Alexander Asplund



 --
 Claudio Martella
 claudio.marte...@gmail.com


 --
 Alexander Asplund



 --
 Claudio Martella
 claudio.marte...@gmail.com


 --
 Alexander Asplund




 --
 = mailto:db...@data-tactics.com 
 David W. Boyd
 Director, Engineering
 7901 Jones Branch, Suite 700
 Mclean, VA 22102
 office:   +1-571-279-2122
 fax: +1-703-506-6703
 cell: +1-703-402-7908
 ==
 http://www.data-tactics.com.**com/http://www.data-tactics.com.com

Re: Out of core execution has no effect on GC crash

2013-09-09 Thread Alexander Asplund
Thanks for the reply.

I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
getting OOM: GC limit exceeded.

Are there any particular cases the OOC will not be able to handle, or
is it supposed to work in all cases? If the latter, it might be that I
have made some configuration error.

I do have one concern that might indicateI have done something wrong:
to allow OOC to activate without crashing I had to modify the trunk
code. This was because Giraph relied on guava-12 and
DiskBackedPartitionStore used hasInt() - a method which does not exist
in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
used

I suppose this problem might indicate I'm running submitting the job
using the wrong binary. Currently I am including the giraph
dependencies with the jar, and running using hadoop jar.

On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote:
 OOC is used also at input superstep. try to decrease the number of
 partitions kept in memory.


 On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

 Hi,

 I'm trying to process a graph that is about 3 times the size of
 available memory. On the other hand, there is plenty of disk space. I
 have enabled the giraph.useOutOfCoreGraph property, but it still
 crashes with outOfMemoryError: GC limit exceeded when I try running my
 job.

 I'm wondering of the spilling is supposed to work during the input
 step. If so, are there any additional steps that must be taken to
 ensure it functions?

 Regards,
 Alexander Asplund




 --
Claudio Martella
claudio.marte...@gmail.com



-- 
Alexander Asplund


Re: Out of core execution has no effect on GC crash

2013-09-09 Thread Alexander Asplund
Really appreciate the swift responses! Thanks again.

I have not both increased mapper tasks and decreased max number of
partitions at the same time. I first did tests with increased Mapper
heap available, but reset the setting after it apparently caused
other, large volume, non-Giraph jobs to crash nodes when reducers also
were running.

I'm curious why increasing mapper heap is a requirement. Shouldn't the
OOC mode be able to work with the amount of heap that is available? Is
there some agreement on the minimum amount of heap necessary for OOC
to succeed, to guide the choice of Mapper heap amount?

Either way, I will try increasing mapper heap again as much as
possible, which hopefully will run.

On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote:
 did you extend the heap available to the mapper tasks? e.g. through
 mapred.child.java.opts.


 On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

 Thanks for the reply.

 I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
 getting OOM: GC limit exceeded.

 Are there any particular cases the OOC will not be able to handle, or
 is it supposed to work in all cases? If the latter, it might be that I
 have made some configuration error.

 I do have one concern that might indicateI have done something wrong:
 to allow OOC to activate without crashing I had to modify the trunk
 code. This was because Giraph relied on guava-12 and
 DiskBackedPartitionStore used hasInt() - a method which does not exist
 in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
 used

 I suppose this problem might indicate I'm running submitting the job
 using the wrong binary. Currently I am including the giraph
 dependencies with the jar, and running using hadoop jar.

 On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote:
  OOC is used also at input superstep. try to decrease the number of
  partitions kept in memory.
 
 
  On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
  alexaspl...@gmail.comwrote:
 
  Hi,
 
  I'm trying to process a graph that is about 3 times the size of
  available memory. On the other hand, there is plenty of disk space. I
  have enabled the giraph.useOutOfCoreGraph property, but it still
  crashes with outOfMemoryError: GC limit exceeded when I try running my
  job.
 
  I'm wondering of the spilling is supposed to work during the input
  step. If so, are there any additional steps that must be taken to
  ensure it functions?
 
  Regards,
  Alexander Asplund
 
 
 
 
  --
 Claudio Martella
 claudio.marte...@gmail.com
 


 --
 Alexander Asplund




 --
Claudio Martella
claudio.marte...@gmail.com



-- 
Alexander Asplund


Re: Out of core execution has no effect on GC crash

2013-09-09 Thread Alexander Asplund
A small note: I'm not seeing any partitions directory being formed
under _bsp, which is where I have understood that they should be
appearing.

On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote:
 Really appreciate the swift responses! Thanks again.

 I have not both increased mapper tasks and decreased max number of
 partitions at the same time. I first did tests with increased Mapper
 heap available, but reset the setting after it apparently caused
 other, large volume, non-Giraph jobs to crash nodes when reducers also
 were running.

 I'm curious why increasing mapper heap is a requirement. Shouldn't the
 OOC mode be able to work with the amount of heap that is available? Is
 there some agreement on the minimum amount of heap necessary for OOC
 to succeed, to guide the choice of Mapper heap amount?

 Either way, I will try increasing mapper heap again as much as
 possible, which hopefully will run.

 On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote:
 did you extend the heap available to the mapper tasks? e.g. through
 mapred.child.java.opts.


 On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
 alexaspl...@gmail.comwrote:

 Thanks for the reply.

 I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
 getting OOM: GC limit exceeded.

 Are there any particular cases the OOC will not be able to handle, or
 is it supposed to work in all cases? If the latter, it might be that I
 have made some configuration error.

 I do have one concern that might indicateI have done something wrong:
 to allow OOC to activate without crashing I had to modify the trunk
 code. This was because Giraph relied on guava-12 and
 DiskBackedPartitionStore used hasInt() - a method which does not exist
 in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
 used

 I suppose this problem might indicate I'm running submitting the job
 using the wrong binary. Currently I am including the giraph
 dependencies with the jar, and running using hadoop jar.

 On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote:
  OOC is used also at input superstep. try to decrease the number of
  partitions kept in memory.
 
 
  On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
  alexaspl...@gmail.comwrote:
 
  Hi,
 
  I'm trying to process a graph that is about 3 times the size of
  available memory. On the other hand, there is plenty of disk space. I
  have enabled the giraph.useOutOfCoreGraph property, but it still
  crashes with outOfMemoryError: GC limit exceeded when I try running
  my
  job.
 
  I'm wondering of the spilling is supposed to work during the input
  step. If so, are there any additional steps that must be taken to
  ensure it functions?
 
  Regards,
  Alexander Asplund
 
 
 
 
  --
 Claudio Martella
 claudio.marte...@gmail.com
 


 --
 Alexander Asplund




 --
Claudio Martella
claudio.marte...@gmail.com



 --
 Alexander Asplund



-- 
Alexander Asplund