Alexandre Fonseca created GIRAPH-850:
----------------------------------------

             Summary: Improve internal zookeeper launching
                 Key: GIRAPH-850
                 URL: https://issues.apache.org/jira/browse/GIRAPH-850
             Project: Giraph
          Issue Type: Bug
          Components: zookeeper
            Reporter: Alexandre Fonseca
             Fix For: 1.1.0


With the most up to date trunk, internal zookeeper launching only appears to 
work with Hadoop 1.x.x MR1.

With Hadoop 2.x.x MR2, trying to run a job without specifying an external 
zookeeper location results in a failed job with the following in the logs:

{code}
2014-02-12 17:30:30,281 INFO [main] org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Attempting to start ZooKeeper server with command 
[/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/jre/bin/java, -Xmx512m, 
-XX:ParallelGCThr
eads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, 
-XX:MaxGCPauseMillis=100, -cp, 
/tmp/hadoop-yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar, 
org.apache.zookeeper.server.quorum.QuorumPeerMain, /tmp/hadoop-b
.ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper/zoo.cfg]
 in directory 
/tmp/hadoop-b.ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper
(...)
2014-02-12 17:30:30,285 INFO [main] org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to 
igraph-02.hi.inet:22181 with poll msecs = 3000
2014-02-12 17:30:30,289 WARN [main] org.apache.giraph.zk.ZooKeeperManager: 
onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
(...)
2014-02-12 17:30:30,413 INFO 
[org.apache.giraph.zk.ZooKeeperManager$StreamCollector] 
org.apache.giraph.zk.ZooKeeperManager$StreamCollector: readLines: Error: Could 
not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain
(...)
{code}

It clearly is unable to launch Zookeeper as it can't find the necessary class 
in the classpath. Looking at the command with which it tries to launch 
Zookeeper, we can see that it has specified a classpath of:

{code}
-cp, /tmp/hadoop/yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar
{code}

which is a HDFS location.

It seems that with Hadoop 2.x.x, the function Job.getJar() returns a HDFS path 
to the jar instead of the path to the local copy of the jar in the 
DirectoryCache. Hadoop 1.x.x appears to return a correct path as I didn't 
detect any problem there.

The whole logic of finding the Zookeeper classpath seems extremely convoluted 
to me (not to mention broken as just shown for both MR2 and YARN). Since the 
currently running Java process has to have the zookeeper classes in its 
classpath anyway (because some of the classes in Giraph refer to Zookeeper 
classes), wouldn't it make more sense to just have the child java process 
starting Zookeeper simply inherit the classpath?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to