[ https://issues.apache.org/jira/browse/GIRAPH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360264#comment-15360264 ]
Jose Luis Larroque commented on GIRAPH-811: ------------------------------------------- I'm using Hadoop 2.4.0 and Giraph 1.1.0 on AWS EMR. This problems happens on a random basis, but is still happenning. You guys should include this path in future releases IMHO, like 1.2.0 version. > Infinite ZooKeeper CleanUp > -------------------------- > > Key: GIRAPH-811 > URL: https://issues.apache.org/jira/browse/GIRAPH-811 > Project: Giraph > Issue Type: Bug > Components: bsp, zookeeper > Affects Versions: 1.1.0 > Reporter: Alexandre Fonseca > Labels: yarn > Attachments: GIRAPH-811.patch > > > While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT > compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never > stop even after recognizing that all supersteps had completed and the output > had been written to the output directory. > Looking at the logs, I found that the BspServiceMaster is stuck at the while > loop at the end of cleanrUpZooKeeper() (BspServiceMaster.java:1729): > {code}2013-12-08 03:51:21,698 INFO [org.apache.giraph.master.MasterThread] > master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination > of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is > now on superstep 4 > 2013-12-08 03:51:21,699 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - > setJobState: > {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on > superstep 4 > 2013-12-08 03:51:21,753 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: > Notifying master its okay to cleanup with > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master > 2013-12-08 03:51:21,790 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - > cleanUpZooKeeper: Node > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already > exists, no need to create. > 2013-12-08 03:51:21,792 INFO [org.apache.giraph.master.MasterThread] > bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max > workers = 1, split master/worker = true, is YARN-only job = true, total max > tasks = 1 > 2013-12-08 03:51:21,792 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - > cleanUpZooKeeper: Got 2 of 1 desired children from > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir > 2013-12-08 03:51:21,793 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - > cleanedUpZooKeeper: Waiting for the children of > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to > change since only got 2 nodes.{code} > As the last 2 entries show, instead of registering just 1 task ending, it > registers 2 and thus it misses the condition on line 1740. > One solution would be to change the == in line 1740 to a >=. However, the > actual issue seems to reside with the BspInputFormat.getMaxTasks() > (BspInputFormat.java:51). This function assumes that in a pure yarn execution > the total number of tasks will be equal to the maximum number of workers. > However, based on GiraphApplicationMaster:167, this is not the case. An extra > Master task is launched in addition to all the Worker tasks. > BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of > a pure yarn execution. > Compilation: > {code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code} > Execution command: > {code}$HADOOP_PREFIX/bin/hadoop jar > ~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > org.apache.giraph.examples.SimpleShortestPathsComputation -vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip > giraph/input/tiny_graph.txt -vof > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj > giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)