Re: Giraph offloadPartition fails creation directory
Unfortunately there's some restrictions that means I don't really have them handy, BUT pointing me towards the local disk helped me partially resolve it. There are rights issues with this directory, but I was able to get by it by manually creating a separate giraph in mapreduce local storage mapred/local/giraph and setting Giraph Options to point to local storage/giraph/partitions and /messages Then something strange happens. The job successfully creates 30, exactly 30 directories, and then starts failing again. This happened both times I ran the job. 30 directories are created in the partitions directory, and then subsequenctly it prints to the task log something like DiskBackedPartitionStorage: offloadPartition: Failed to create directory ... ..and then no further directories are created after the 30. It will attempt to create more partition directories, but it keeps failling after the inital 30. It is quite strange. On 9/12/13, Claudio Martella claudio.marte...@gmail.com wrote: Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund
Re: Giraph offloadPartition fails creation directory
Actually, I take that back. It seems it does succeeded in creating partitions - it just struggles with it sometimes. Should I be worried about these errors if partition directories seem to be filling up? On Sep 11, 2013 6:38 PM, Claudio Martella claudio.marte...@gmail.com wrote: Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.com wrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com
Re: Giraph offloadPartition fails creation directory
Actually, why is it saying it fails to create directory in the first place, when it is trying to write files? On Sep 12, 2013 3:04 PM, Alexander Asplund alexaspl...@gmail.com wrote: I can also add that there is no such issue with DiskBackedMessageStore. It successfully creates a large number of store files, and never starts failing. On Sep 12, 2013 2:11 PM, Alexander Asplund alexaspl...@gmail.com wrote: It's very strange.. it is definitely failing on some partitions.. currently the disk size of a offloading worker corresponda about to the size of its part of the graph... but the worker attempts to create additional partitions, and this fails. On Sep 12, 2013 2:07 PM, Alexander Asplund alexaspl...@gmail.com wrote: Actually, I take that back. It seems it does succeeded in creating partitions - it just struggles with it sometimes. Should I be worried about these errors if partition directories seem to be filling up? On Sep 11, 2013 6:38 PM, Claudio Martella claudio.marte...@gmail.com wrote: Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.com wrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com
Giraph offloadPartition fails creation directory
Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories?
Re: Out of core execution has no effect on GC crash
Thanks, disabling GC overhead limit did the trick! I did however run into another issue - the computation ends up stalling when it tries to write partitions to disk. All the workers keep sending out messages about DiskBackedPartitionStore failed to create directory _bsp/_partitions/_jobx/part-vertices-xxx On 9/10/13, Claudio Martella claudio.marte...@gmail.com wrote: As David mentions, even with OOC, the objects are still created (and yes, often soon destroyed after spilled to disk) putting pressure on the GC. Moreover, with the increase in size of the graph, the number of in-memory vertices is not the only increasing chunk of memory, as there are other memory stores around the codebase that get filled, such as caches etc. Try increasing the heap to something reasonable for your machines. On Tue, Sep 10, 2013 at 3:21 AM, David Boyd db...@data-tactics-corp.comwrote: Alexander: You might try turning off the GC Overhead limit (-XX:-UseGCOverheadLimit) Also you could turn on verbose GC logging (-verbose:gc -Xloggc:/tmp/@taskid@.gc) to see what is happening. Because the OOC still has to create and destroy objects I suspect that the heap is just getting really fragmented. There are options that you can set with Java to change the type of garbage collection and how it is scheduled as well. You might up the heap size slightly - what is the default heap size on your cluster? On 9/9/2013 8:33 PM, Alexander Asplund wrote: A small note: I'm not seeing any partitions directory being formed under _bsp, which is where I have understood that they should be appearing. On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote: Really appreciate the swift responses! Thanks again. I have not both increased mapper tasks and decreased max number of partitions at the same time. I first did tests with increased Mapper heap available, but reset the setting after it apparently caused other, large volume, non-Giraph jobs to crash nodes when reducers also were running. I'm curious why increasing mapper heap is a requirement. Shouldn't the OOC mode be able to work with the amount of heap that is available? Is there some agreement on the minimum amount of heap necessary for OOC to succeed, to guide the choice of Mapper heap amount? Either way, I will try increasing mapper heap again as much as possible, which hopefully will run. On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote: did you extend the heap available to the mapper tasks? e.g. through mapred.child.java.opts. On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund alexaspl...@gmail.comwrote: Thanks for the reply. I tried setting giraph.maxPartitionsInMemory to 1, but I'm still getting OOM: GC limit exceeded. Are there any particular cases the OOC will not be able to handle, or is it supposed to work in all cases? If the latter, it might be that I have made some configuration error. I do have one concern that might indicateI have done something wrong: to allow OOC to activate without crashing I had to modify the trunk code. This was because Giraph relied on guava-12 and DiskBackedPartitionStore used hasInt() - a method which does not exist in guava-11 which hadoop 2 depends on. At runtime guava 11 was being used I suppose this problem might indicate I'm running submitting the job using the wrong binary. Currently I am including the giraph dependencies with the jar, and running using hadoop jar. On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote: OOC is used also at input superstep. try to decrease the number of partitions kept in memory. On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm trying to process a graph that is about 3 times the size of available memory. On the other hand, there is plenty of disk space. I have enabled the giraph.useOutOfCoreGraph property, but it still crashes with outOfMemoryError: GC limit exceeded when I try running my job. I'm wondering of the spilling is supposed to work during the input step. If so, are there any additional steps that must be taken to ensure it functions? Regards, Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- = mailto:db...@data-tactics.com David W. Boyd Director, Engineering 7901 Jones Branch, Suite 700 Mclean, VA 22102 office: +1-571-279-2122 fax: +1-703-506-6703 cell: +1-703-402-7908 == http://www.data-tactics.com.**com/http://www.data-tactics.com.com/ First Robotic Mentor - FRC, FTC - www.iliterobotics.org President - USSTEM Foundation - www.usstem.org The information contained in this message may be privileged and/or confidential and protected from disclosure. If the reader of this message is not the intended recipient
Re: Out of core execution has no effect on GC crash
Correction: the computation does not actually stall - it does complains a bit that the directories cannot be created and then eventually moves to the next superstep. I guess this means I'm actually fitting all the data in memory? On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote: Thanks, disabling GC overhead limit did the trick! I did however run into another issue - the computation ends up stalling when it tries to write partitions to disk. All the workers keep sending out messages about DiskBackedPartitionStore failed to create directory _bsp/_partitions/_jobx/part-vertices-xxx On 9/10/13, Claudio Martella claudio.marte...@gmail.com wrote: As David mentions, even with OOC, the objects are still created (and yes, often soon destroyed after spilled to disk) putting pressure on the GC. Moreover, with the increase in size of the graph, the number of in-memory vertices is not the only increasing chunk of memory, as there are other memory stores around the codebase that get filled, such as caches etc. Try increasing the heap to something reasonable for your machines. On Tue, Sep 10, 2013 at 3:21 AM, David Boyd db...@data-tactics-corp.comwrote: Alexander: You might try turning off the GC Overhead limit (-XX:-UseGCOverheadLimit) Also you could turn on verbose GC logging (-verbose:gc -Xloggc:/tmp/@taskid@.gc) to see what is happening. Because the OOC still has to create and destroy objects I suspect that the heap is just getting really fragmented. There are options that you can set with Java to change the type of garbage collection and how it is scheduled as well. You might up the heap size slightly - what is the default heap size on your cluster? On 9/9/2013 8:33 PM, Alexander Asplund wrote: A small note: I'm not seeing any partitions directory being formed under _bsp, which is where I have understood that they should be appearing. On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote: Really appreciate the swift responses! Thanks again. I have not both increased mapper tasks and decreased max number of partitions at the same time. I first did tests with increased Mapper heap available, but reset the setting after it apparently caused other, large volume, non-Giraph jobs to crash nodes when reducers also were running. I'm curious why increasing mapper heap is a requirement. Shouldn't the OOC mode be able to work with the amount of heap that is available? Is there some agreement on the minimum amount of heap necessary for OOC to succeed, to guide the choice of Mapper heap amount? Either way, I will try increasing mapper heap again as much as possible, which hopefully will run. On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote: did you extend the heap available to the mapper tasks? e.g. through mapred.child.java.opts. On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund alexaspl...@gmail.comwrote: Thanks for the reply. I tried setting giraph.maxPartitionsInMemory to 1, but I'm still getting OOM: GC limit exceeded. Are there any particular cases the OOC will not be able to handle, or is it supposed to work in all cases? If the latter, it might be that I have made some configuration error. I do have one concern that might indicateI have done something wrong: to allow OOC to activate without crashing I had to modify the trunk code. This was because Giraph relied on guava-12 and DiskBackedPartitionStore used hasInt() - a method which does not exist in guava-11 which hadoop 2 depends on. At runtime guava 11 was being used I suppose this problem might indicate I'm running submitting the job using the wrong binary. Currently I am including the giraph dependencies with the jar, and running using hadoop jar. On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote: OOC is used also at input superstep. try to decrease the number of partitions kept in memory. On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm trying to process a graph that is about 3 times the size of available memory. On the other hand, there is plenty of disk space. I have enabled the giraph.useOutOfCoreGraph property, but it still crashes with outOfMemoryError: GC limit exceeded when I try running my job. I'm wondering of the spilling is supposed to work during the input step. If so, are there any additional steps that must be taken to ensure it functions? Regards, Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- = mailto:db...@data-tactics.com David W. Boyd Director, Engineering 7901 Jones Branch, Suite 700 Mclean, VA 22102 office: +1-571-279-2122 fax: +1-703-506-6703 cell: +1-703-402-7908 == http://www.data-tactics.com.**com/http://www.data-tactics.com.com
Re: Out of core execution has no effect on GC crash
Thanks for the reply. I tried setting giraph.maxPartitionsInMemory to 1, but I'm still getting OOM: GC limit exceeded. Are there any particular cases the OOC will not be able to handle, or is it supposed to work in all cases? If the latter, it might be that I have made some configuration error. I do have one concern that might indicateI have done something wrong: to allow OOC to activate without crashing I had to modify the trunk code. This was because Giraph relied on guava-12 and DiskBackedPartitionStore used hasInt() - a method which does not exist in guava-11 which hadoop 2 depends on. At runtime guava 11 was being used I suppose this problem might indicate I'm running submitting the job using the wrong binary. Currently I am including the giraph dependencies with the jar, and running using hadoop jar. On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote: OOC is used also at input superstep. try to decrease the number of partitions kept in memory. On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm trying to process a graph that is about 3 times the size of available memory. On the other hand, there is plenty of disk space. I have enabled the giraph.useOutOfCoreGraph property, but it still crashes with outOfMemoryError: GC limit exceeded when I try running my job. I'm wondering of the spilling is supposed to work during the input step. If so, are there any additional steps that must be taken to ensure it functions? Regards, Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund
Re: Out of core execution has no effect on GC crash
Really appreciate the swift responses! Thanks again. I have not both increased mapper tasks and decreased max number of partitions at the same time. I first did tests with increased Mapper heap available, but reset the setting after it apparently caused other, large volume, non-Giraph jobs to crash nodes when reducers also were running. I'm curious why increasing mapper heap is a requirement. Shouldn't the OOC mode be able to work with the amount of heap that is available? Is there some agreement on the minimum amount of heap necessary for OOC to succeed, to guide the choice of Mapper heap amount? Either way, I will try increasing mapper heap again as much as possible, which hopefully will run. On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote: did you extend the heap available to the mapper tasks? e.g. through mapred.child.java.opts. On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund alexaspl...@gmail.comwrote: Thanks for the reply. I tried setting giraph.maxPartitionsInMemory to 1, but I'm still getting OOM: GC limit exceeded. Are there any particular cases the OOC will not be able to handle, or is it supposed to work in all cases? If the latter, it might be that I have made some configuration error. I do have one concern that might indicateI have done something wrong: to allow OOC to activate without crashing I had to modify the trunk code. This was because Giraph relied on guava-12 and DiskBackedPartitionStore used hasInt() - a method which does not exist in guava-11 which hadoop 2 depends on. At runtime guava 11 was being used I suppose this problem might indicate I'm running submitting the job using the wrong binary. Currently I am including the giraph dependencies with the jar, and running using hadoop jar. On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote: OOC is used also at input superstep. try to decrease the number of partitions kept in memory. On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm trying to process a graph that is about 3 times the size of available memory. On the other hand, there is plenty of disk space. I have enabled the giraph.useOutOfCoreGraph property, but it still crashes with outOfMemoryError: GC limit exceeded when I try running my job. I'm wondering of the spilling is supposed to work during the input step. If so, are there any additional steps that must be taken to ensure it functions? Regards, Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund
Re: Out of core execution has no effect on GC crash
A small note: I'm not seeing any partitions directory being formed under _bsp, which is where I have understood that they should be appearing. On 9/10/13, Alexander Asplund alexaspl...@gmail.com wrote: Really appreciate the swift responses! Thanks again. I have not both increased mapper tasks and decreased max number of partitions at the same time. I first did tests with increased Mapper heap available, but reset the setting after it apparently caused other, large volume, non-Giraph jobs to crash nodes when reducers also were running. I'm curious why increasing mapper heap is a requirement. Shouldn't the OOC mode be able to work with the amount of heap that is available? Is there some agreement on the minimum amount of heap necessary for OOC to succeed, to guide the choice of Mapper heap amount? Either way, I will try increasing mapper heap again as much as possible, which hopefully will run. On 9/9/13, Claudio Martella claudio.marte...@gmail.com wrote: did you extend the heap available to the mapper tasks? e.g. through mapred.child.java.opts. On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund alexaspl...@gmail.comwrote: Thanks for the reply. I tried setting giraph.maxPartitionsInMemory to 1, but I'm still getting OOM: GC limit exceeded. Are there any particular cases the OOC will not be able to handle, or is it supposed to work in all cases? If the latter, it might be that I have made some configuration error. I do have one concern that might indicateI have done something wrong: to allow OOC to activate without crashing I had to modify the trunk code. This was because Giraph relied on guava-12 and DiskBackedPartitionStore used hasInt() - a method which does not exist in guava-11 which hadoop 2 depends on. At runtime guava 11 was being used I suppose this problem might indicate I'm running submitting the job using the wrong binary. Currently I am including the giraph dependencies with the jar, and running using hadoop jar. On 9/7/13, Claudio Martella claudio.marte...@gmail.com wrote: OOC is used also at input superstep. try to decrease the number of partitions kept in memory. On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm trying to process a graph that is about 3 times the size of available memory. On the other hand, there is plenty of disk space. I have enabled the giraph.useOutOfCoreGraph property, but it still crashes with outOfMemoryError: GC limit exceeded when I try running my job. I'm wondering of the spilling is supposed to work during the input step. If so, are there any additional steps that must be taken to ensure it functions? Regards, Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund -- Alexander Asplund