Re: Giraph offloadPartition fails creation directory
Hi all, I'm seeing the same problem. I'm pasting here part of the logs that looks more relevant in case it helps. This appears on the log of every hadoop slave node. 2013-09-23 12:34:29,908 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 1 2013-09-23 12:34:29,908 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 8 2013-09-23 12:34:29,917 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/20, overall roughly 0.0% input splits reserved 2013-09-23 12:34:29,919 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/20 from ZooKeeper and got input split 'hdfs://hadoop-master:54310/some path/part-r-2:402653184+14270392' 2013-09-23 12:34:29,935 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available 2013-09-23 12:34:29,935 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded 2013-09-23 12:34:31,491 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/20 (v=9209, e=782750) 2013-09-23 12:34:31,496 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/131, overall roughly 0.71428573% input splits reserved 2013-09-23 12:34:31,497 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/131 from ZooKeeper and got input split 'hdfs://hadoop-master:54310/home/some path/part-r-00018:335544320+67108864' 2013-09-23 12:34:35,374 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/131 (v=44211, e=3680393) 2013-09-23 12:34:35,378 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/113, overall roughly 1.4285715% input splits reserved 2013-09-23 12:34:35,378 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/113 from ZooKeeper and got input split 'hdfs://hadoop-master:54310/home/some path/part-r-00016:67108864+67108864' 2013-09-23 12:34:38,161 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-365_vertices 2013-09-23 12:34:38,171 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-245_vertices 2013-09-23 12:34:38,181 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-25_vertices 2013-09-23 12:34:38,190 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-85_vertices 2013-09-23 12:34:38,205 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-345_vertices 2013-09-23 12:34:38,216 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-285_vertices 2013-09-23 12:34:38,228 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-205_vertices 2013-09-23 12:34:38,240 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-265_vertices 2013-09-23 12:34:38,255 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-5_vertices 2013-09-23 12:34:38,834 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/113 (v=43776, e=3684432) 2013-09-23 12:34:38,838 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/1, overall roughly 2.857143% input splits reserved On Fri, Sep 13, 2013 at 11:27 AM, Claudio Martella claudio.marte...@gmail.com wrote: I have no idea without the logs, especially when it happens rarely. On Fri, Sep 13, 2013 at 12:33 AM, Alexander Asplund alexaspl...@gmail.com wrote: Actually, why is it saying it fails to create directory in the first place, when it is trying to write files? On Sep 12, 2013 3:04
Re: Giraph offloadPartition fails creation directory
Weird. This is the code: if (!parent.exists()) { if (!parent.mkdirs()) { LOG.error(offloadPartition: Failed to create directory + parent. getAbsolutePath()); } } Question is why parent.mkdirs() is returning false. Could be a problem of permissions. Could you try to pass a different directory for writing, e.g. /tmp/foobar? On Mon, Sep 23, 2013 at 1:28 PM, Dionysis Logothetis dlogothe...@gmail.comwrote: offloadPartition: Failed to create directory -- Claudio Martella claudio.marte...@gmail.com
Re: Giraph offloadPartition fails creation directory
Unfortunately there's some restrictions that means I don't really have them handy, BUT pointing me towards the local disk helped me partially resolve it. There are rights issues with this directory, but I was able to get by it by manually creating a separate giraph in mapreduce local storage mapred/local/giraph and setting Giraph Options to point to local storage/giraph/partitions and /messages Then something strange happens. The job successfully creates 30, exactly 30 directories, and then starts failing again. This happened both times I ran the job. 30 directories are created in the partitions directory, and then subsequenctly it prints to the task log something like DiskBackedPartitionStorage: offloadPartition: Failed to create directory ... ..and then no further directories are created after the 30. It will attempt to create more partition directories, but it keeps failling after the inital 30. It is quite strange. On 9/12/13, Claudio Martella claudio.marte...@gmail.com wrote: Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com -- Alexander Asplund
Re: Giraph offloadPartition fails creation directory
Actually, I take that back. It seems it does succeeded in creating partitions - it just struggles with it sometimes. Should I be worried about these errors if partition directories seem to be filling up? On Sep 11, 2013 6:38 PM, Claudio Martella claudio.marte...@gmail.com wrote: Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.com wrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com
Re: Giraph offloadPartition fails creation directory
Actually, why is it saying it fails to create directory in the first place, when it is trying to write files? On Sep 12, 2013 3:04 PM, Alexander Asplund alexaspl...@gmail.com wrote: I can also add that there is no such issue with DiskBackedMessageStore. It successfully creates a large number of store files, and never starts failing. On Sep 12, 2013 2:11 PM, Alexander Asplund alexaspl...@gmail.com wrote: It's very strange.. it is definitely failing on some partitions.. currently the disk size of a offloading worker corresponda about to the size of its part of the graph... but the worker attempts to create additional partitions, and this fails. On Sep 12, 2013 2:07 PM, Alexander Asplund alexaspl...@gmail.com wrote: Actually, I take that back. It seems it does succeeded in creating partitions - it just struggles with it sometimes. Should I be worried about these errors if partition directories seem to be filling up? On Sep 11, 2013 6:38 PM, Claudio Martella claudio.marte...@gmail.com wrote: Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.com wrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com
Giraph offloadPartition fails creation directory
Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories?
Re: Giraph offloadPartition fails creation directory
Giraph does not offload partitions or messages to HDFS in the out-of-core module. It uses local disk on the computing nodes. By defualt, it uses the tasktracker local directory where for example the distributed cache is stored. Could you provide the stacktrace Giraph is spitting when failing? On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund alexaspl...@gmail.comwrote: Hi, I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to offload partitions, the offloading fails. The DiskBackedPartitionStore fails to create the directory _bsp/_partitions/job-/part-vertices-xxx (roughly from recall). The input or computation will then continue for a while, which I believe is because it is still managing to hold everything in memory - but at some point it reaches the limit where there simply is no more heap space, and it crashes with OOM. Has anybody had this problem with giraph failing to make HDFS directories? -- Claudio Martella claudio.marte...@gmail.com