[ 
https://issues.apache.org/jira/browse/GIRAPH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360264#comment-15360264
 ] 

Jose Luis Larroque commented on GIRAPH-811:
-------------------------------------------

I'm using Hadoop 2.4.0 and Giraph 1.1.0 on AWS EMR. This problems happens on a 
random basis, but is still happenning. You guys should include this path in 
future releases IMHO, like 1.2.0 version. 

> Infinite ZooKeeper CleanUp
> --------------------------
>
>                 Key: GIRAPH-811
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-811
>             Project: Giraph
>          Issue Type: Bug
>          Components: bsp, zookeeper
>    Affects Versions: 1.1.0
>            Reporter: Alexandre Fonseca
>              Labels: yarn
>         Attachments: GIRAPH-811.patch
>
>
> While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT 
> compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never 
> stop even after recognizing that all supersteps had completed and the output 
> had been written to the output directory.
> Looking at the logs, I found that the BspServiceMaster is stuck at the while 
> loop at the end of cleanrUpZooKeeper() (BspServiceMaster.java:1729):
> {code}2013-12-08 03:51:21,698 INFO  [org.apache.giraph.master.MasterThread] 
> master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination 
> of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is 
> now on superstep 4
> 2013-12-08 03:51:21,699 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - 
> setJobState: 
> {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
> superstep 4
> 2013-12-08 03:51:21,753 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: 
> Notifying master its okay to cleanup with 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master
> 2013-12-08 03:51:21,790 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - 
> cleanUpZooKeeper: Node 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already 
> exists, no need to create.
> 2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
> bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max 
> workers = 1, split master/worker = true, is YARN-only job = true, total max 
> tasks = 1
> 2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - 
> cleanUpZooKeeper: Got 2 of 1 desired children from 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir
> 2013-12-08 03:51:21,793 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - 
> cleanedUpZooKeeper: Waiting for the children of 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to 
> change since only got 2 nodes.{code}
> As the last 2 entries show, instead of registering just 1 task ending, it 
> registers 2 and thus it misses the condition on line 1740.
> One solution would be to change the == in line 1740 to a >=. However, the 
> actual issue seems to reside with the BspInputFormat.getMaxTasks() 
> (BspInputFormat.java:51). This function assumes that in a pure yarn execution 
> the total number of tasks will be equal to the maximum number of workers. 
> However, based on GiraphApplicationMaster:167, this is not the case. An extra 
> Master task is launched in addition to all the Worker tasks. 
> BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of 
> a pure yarn execution.
> Compilation:
> {code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code}
> Execution command:
> {code}$HADOOP_PREFIX/bin/hadoop jar 
> ~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
>  org.apache.giraph.GiraphRunner 
> org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
> giraph/input/tiny_graph.txt -vof 
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
> giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj 
> giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to