[ 
https://issues.apache.org/jira/browse/GIRAPH-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899570#comment-13899570
 ] 

Roman Shaposhnik commented on GIRAPH-850:
-----------------------------------------

Btw, the reason I think this patch is extremely safe is simply because with 
both latest Hadoop 1 and Hadoop 2 ZK ends up on the default CP anyway -- no 
need to futz with locating it.

> Improve internal zookeeper launching
> ------------------------------------
>
>                 Key: GIRAPH-850
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-850
>             Project: Giraph
>          Issue Type: Bug
>          Components: zookeeper
>            Reporter: Alexandre Fonseca
>             Fix For: 1.1.0
>
>         Attachments: GIRAPH-850.patch
>
>
> With the most up to date trunk, internal zookeeper launching only appears to 
> work with Hadoop 1.x.x MR1.
> With Hadoop 2.x.x MR2, trying to run a job without specifying an external 
> zookeeper location results in a failed job with the following in the logs:
> {code}
> 2014-02-12 17:30:30,281 INFO [main] org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Attempting to start ZooKeeper server with command 
> [/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/jre/bin/java, -Xmx512m, 
> -XX:ParallelGCThr
> eads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, 
> -XX:MaxGCPauseMillis=100, -cp, 
> /tmp/hadoop-yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar, 
> org.apache.zookeeper.server.quorum.QuorumPeerMain, /tmp/hadoop-b
> .ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper/zoo.cfg]
>  in directory 
> /tmp/hadoop-b.ajf/nm-local-dir/usercache/b.ajf/appcache/application_1392221733726_0002/work/_bspZooKeeper
> (...)
> 2014-02-12 17:30:30,285 INFO [main] org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to 
> igraph-02.hi.inet:22181 with poll msecs = 3000
> 2014-02-12 17:30:30,289 WARN [main] org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Got ConnectException
> java.net.ConnectException: Connection refused
> (...)
> 2014-02-12 17:30:30,413 INFO 
> [org.apache.giraph.zk.ZooKeeperManager$StreamCollector] 
> org.apache.giraph.zk.ZooKeeperManager$StreamCollector: readLines: Error: 
> Could not find or load main class 
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> (...)
> {code}
> It clearly is unable to launch Zookeeper as it can't find the necessary class 
> in the classpath. Looking at the command with which it tries to launch 
> Zookeeper, we can see that it has specified a classpath of:
> {code}
> -cp, /tmp/hadoop/yarn/staging/b.ajf/.staging/job_1392221733726_0002/job.jar
> {code}
> which is a HDFS location.
> It seems that with Hadoop 2.x.x, the function Job.getJar() returns a HDFS 
> path to the jar instead of the path to the local copy of the jar in the 
> DirectoryCache. Hadoop 1.x.x appears to return a correct path as I didn't 
> detect any problem there.
> The whole logic of finding the Zookeeper classpath seems extremely convoluted 
> to me (not to mention broken as just shown for both MR2 and YARN). Since the 
> currently running Java process has to have the zookeeper classes in its 
> classpath anyway (because some of the classes in Giraph refer to Zookeeper 
> classes), wouldn't it make more sense to just have the child java process 
> starting Zookeeper simply inherit the classpath?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to