[
https://issues.apache.org/jira/browse/GIRAPH-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandre Fonseca updated GIRAPH-850:
-------------------------------------
Attachment: GIRAPH-850.patch
This patch implements my idea of letting the Zookeeper java process simply
inherit the classpath of the current giraph process.
As a nice side-effect, internal Zookeeper now works with Yarn too so I also
removed the check done on the Yarn ApplicationMaster.
Tested as working in Hadoop 1.2.1 MR1, Hadoop 2.2.0 MR2 and Hadoop 2.2.0 Yarn
on a 5-node cluster.
Passes mvn verify.
> Improve internal zookeeper launching
> ------------------------------------
>
> Key: GIRAPH-850
> URL: https://issues.apache.org/jira/browse/GIRAPH-850
> Project: Giraph
> Issue Type: Bug
> Components: zookeeper
> Reporter: Alexandre Fonseca
> Fix For: 1.1.0
>
> Attachments: GIRAPH-850.patch
>
>
> With the most up to date trunk, internal zookeeper launching only appears to
> work with Hadoop 1.x.x MR1.
> With Hadoop 2.x.x MR2, trying to run a job without specifying an external
> zookeeper location results in a failed job with the following in the logs:
> {code}
> 2014-02-12 17:30:30,281 INFO [main] org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Attempting to start ZooKeeper server with command
> [/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/jre/bin/java, -Xmx512m,
> -XX:ParallelGCThr
> eads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70,
> -XX:MaxGCPauseMillis=100, -cp,
> /tmp/hadoop-yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar,
> org.apache.zookeeper.server.quorum.QuorumPeerMain, /tmp/hadoop-b
> .ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper/zoo.cfg]
> in directory
> /tmp/hadoop-b.ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper
> (...)
> 2014-02-12 17:30:30,285 INFO [main] org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
> igraph-02.hi.inet:22181 with poll msecs = 3000
> 2014-02-12 17:30:30,289 WARN [main] org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Got ConnectException
> java.net.ConnectException: Connection refused
> (...)
> 2014-02-12 17:30:30,413 INFO
> [org.apache.giraph.zk.ZooKeeperManager$StreamCollector]
> org.apache.giraph.zk.ZooKeeperManager$StreamCollector: readLines: Error:
> Could not find or load main class
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> (...)
> {code}
> It clearly is unable to launch Zookeeper as it can't find the necessary class
> in the classpath. Looking at the command with which it tries to launch
> Zookeeper, we can see that it has specified a classpath of:
> {code}
> -cp, /tmp/hadoop/yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar
> {code}
> which is a HDFS location.
> It seems that with Hadoop 2.x.x, the function Job.getJar() returns a HDFS
> path to the jar instead of the path to the local copy of the jar in the
> DirectoryCache. Hadoop 1.x.x appears to return a correct path as I didn't
> detect any problem there.
> The whole logic of finding the Zookeeper classpath seems extremely convoluted
> to me (not to mention broken as just shown for both MR2 and YARN). Since the
> currently running Java process has to have the zookeeper classes in its
> classpath anyway (because some of the classes in Giraph refer to Zookeeper
> classes), wouldn't it make more sense to just have the child java process
> starting Zookeeper simply inherit the classpath?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)