[ https://issues.apache.org/jira/browse/GIRAPH-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899570#comment-13899570 ]
Roman Shaposhnik commented on GIRAPH-850: ----------------------------------------- Btw, the reason I think this patch is extremely safe is simply because with both latest Hadoop 1 and Hadoop 2 ZK ends up on the default CP anyway -- no need to futz with locating it. > Improve internal zookeeper launching > ------------------------------------ > > Key: GIRAPH-850 > URL: https://issues.apache.org/jira/browse/GIRAPH-850 > Project: Giraph > Issue Type: Bug > Components: zookeeper > Reporter: Alexandre Fonseca > Fix For: 1.1.0 > > Attachments: GIRAPH-850.patch > > > With the most up to date trunk, internal zookeeper launching only appears to > work with Hadoop 1.x.x MR1. > With Hadoop 2.x.x MR2, trying to run a job without specifying an external > zookeeper location results in a failed job with the following in the logs: > {code} > 2014-02-12 17:30:30,281 INFO [main] org.apache.giraph.zk.ZooKeeperManager: > onlineZooKeeperServers: Attempting to start ZooKeeper server with command > [/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/jre/bin/java, -Xmx512m, > -XX:ParallelGCThr > eads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, > -XX:MaxGCPauseMillis=100, -cp, > /tmp/hadoop-yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar, > org.apache.zookeeper.server.quorum.QuorumPeerMain, /tmp/hadoop-b > .ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper/zoo.cfg] > in directory > /tmp/hadoop-b.ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper > (...) > 2014-02-12 17:30:30,285 INFO [main] org.apache.giraph.zk.ZooKeeperManager: > onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to > igraph-02.hi.inet:22181 with poll msecs = 3000 > 2014-02-12 17:30:30,289 WARN [main] org.apache.giraph.zk.ZooKeeperManager: > onlineZooKeeperServers: Got ConnectException > java.net.ConnectException: Connection refused > (...) > 2014-02-12 17:30:30,413 INFO > [org.apache.giraph.zk.ZooKeeperManager$StreamCollector] > org.apache.giraph.zk.ZooKeeperManager$StreamCollector: readLines: Error: > Could not find or load main class > org.apache.zookeeper.server.quorum.QuorumPeerMain > (...) > {code} > It clearly is unable to launch Zookeeper as it can't find the necessary class > in the classpath. Looking at the command with which it tries to launch > Zookeeper, we can see that it has specified a classpath of: > {code} > -cp, /tmp/hadoop/yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar > {code} > which is a HDFS location. > It seems that with Hadoop 2.x.x, the function Job.getJar() returns a HDFS > path to the jar instead of the path to the local copy of the jar in the > DirectoryCache. Hadoop 1.x.x appears to return a correct path as I didn't > detect any problem there. > The whole logic of finding the Zookeeper classpath seems extremely convoluted > to me (not to mention broken as just shown for both MR2 and YARN). Since the > currently running Java process has to have the zookeeper classes in its > classpath anyway (because some of the classes in Giraph refer to Zookeeper > classes), wouldn't it make more sense to just have the child java process > starting Zookeeper simply inherit the classpath? -- This message was sent by Atlassian JIRA (v6.1.5#6160)