Re: --jars works in yarn-client but not yarn-cluster mode, why?

2015-05-20 Thread Marcelo Vanzin
Hello,

Sorry for the delay. The issue you're running into is because most HBase
classes are in the system class path, while jars added with --jars are
only visible to the application class loader created by Spark. So classes
in the system class path cannot see them.

You can work around this by setting --driver-classpath
/opt/.../htrace-core-3.1.0-incubating.jar and --conf
spark.executor.extraClassPath=
/opt/.../htrace-core-3.1.0-incubating.jar in your spark-submit command
line. (You can also add those configs to your spark-defaults.conf to avoid
having to type them all the time; and don't forget to include any other
jars that might be needed.)


On Mon, May 18, 2015 at 11:14 PM, Fengyun RAO raofeng...@gmail.com wrote:

 Thanks, Marcelo!


 Below is the full log,


 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/avro-tools-1.7.6-cdh5.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/05/19 14:08:58 INFO yarn.ApplicationMaster: Registered signal handlers for 
 [TERM, HUP, INT]
 15/05/19 14:08:59 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
 appattempt_1432015548391_0003_01
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing view acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing modify acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(nobody, 
 raofengyun); users with modify permissions: Set(nobody, raofengyun)
 15/05/19 14:09:00 INFO yarn.ApplicationMaster: Starting the user application 
 in a separate Thread
 15/05/19 14:09:00 INFO yarn.ApplicationMaster: Waiting for spark context 
 initialization
 15/05/19 14:09:00 INFO yarn.ApplicationMaster: Waiting for spark context 
 initialization ...
 15/05/19 14:09:00 INFO spark.SparkContext: Running Spark version 1.3.0
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing view acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing modify acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(nobody, 
 raofengyun); users with modify permissions: Set(nobody, raofengyun)
 15/05/19 14:09:01 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/05/19 14:09:01 INFO Remoting: Starting remoting
 15/05/19 14:09:01 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkDriver@gs-server-v-127:7191]
 15/05/19 14:09:01 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://sparkDriver@gs-server-v-127:7191]
 15/05/19 14:09:01 INFO util.Utils: Successfully started service 'sparkDriver' 
 on port 7191.
 15/05/19 14:09:01 INFO spark.SparkEnv: Registering MapOutputTracker
 15/05/19 14:09:01 INFO spark.SparkEnv: Registering BlockManagerMaster
 15/05/19 14:09:01 INFO storage.DiskBlockManager: Created local directory at 
 /data1/cdh/yarn/nm/usercache/raofengyun/appcache/application_1432015548391_0003/blockmgr-3250910b-693e-46ff-b057-26d552fd8abd
 15/05/19 14:09:01 INFO storage.MemoryStore: MemoryStore started with capacity 
 259.7 MB
 15/05/19 14:09:01 INFO spark.HttpFileServer: HTTP File server directory is 
 /data1/cdh/yarn/nm/usercache/raofengyun/appcache/application_1432015548391_0003/httpd-5bc614bc-d8b1-473d-a807-4d9252eb679d
 15/05/19 14:09:01 INFO spark.HttpServer: Starting HTTP Server
 15/05/19 14:09:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/05/19 14:09:01 INFO server.AbstractConnector: Started 
 SocketConnector@0.0.0.0:9349
 15/05/19 14:09:01 INFO util.Utils: Successfully started service 'HTTP file 
 server' on port 9349.
 15/05/19 14:09:01 INFO spark.SparkEnv: Registering OutputCommitCoordinator
 15/05/19 14:09:01 INFO ui.JettyUtils: Adding filter: 
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
 15/05/19 14:09:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/05/19 14:09:01 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:63023
 15/05/19 14:09:01 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 63023.
 15/05/19 14:09:01 INFO ui.SparkUI: Started SparkUI at 
 http://gs-server-v-127:63023
 15/05/19 14:09:02 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/05/19 14:09:02 INFO netty.NettyBlockTransferService: Server created on 
 33526
 15/05/19 14:09:02 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/05/19 14:09:02 INFO storage.BlockManagerMasterActor: Registering block 
 manager gs-server-v-127:33526 with 259.7 MB RAM, 

Re: --jars works in yarn-client but not yarn-cluster mode, why?

2015-05-20 Thread Fengyun RAO
Thank you so much, Marcelo!

It WORKS!

2015-05-21 2:05 GMT+08:00 Marcelo Vanzin van...@cloudera.com:

 Hello,

 Sorry for the delay. The issue you're running into is because most HBase
 classes are in the system class path, while jars added with --jars are
 only visible to the application class loader created by Spark. So classes
 in the system class path cannot see them.

 You can work around this by setting --driver-classpath
 /opt/.../htrace-core-3.1.0-incubating.jar and --conf
 spark.executor.extraClassPath=
 /opt/.../htrace-core-3.1.0-incubating.jar in your spark-submit command
 line. (You can also add those configs to your spark-defaults.conf to avoid
 having to type them all the time; and don't forget to include any other
 jars that might be needed.)


 On Mon, May 18, 2015 at 11:14 PM, Fengyun RAO raofeng...@gmail.com
 wrote:

 Thanks, Marcelo!


 Below is the full log,


 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/avro-tools-1.7.6-cdh5.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/05/19 14:08:58 INFO yarn.ApplicationMaster: Registered signal handlers 
 for [TERM, HUP, INT]
 15/05/19 14:08:59 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
 appattempt_1432015548391_0003_01
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing view acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing modify acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(nobody, raofengyun); users with modify permissions: Set(nobody, 
 raofengyun)
 15/05/19 14:09:00 INFO yarn.ApplicationMaster: Starting the user application 
 in a separate Thread
 15/05/19 14:09:00 INFO yarn.ApplicationMaster: Waiting for spark context 
 initialization
 15/05/19 14:09:00 INFO yarn.ApplicationMaster: Waiting for spark context 
 initialization ...
 15/05/19 14:09:00 INFO spark.SparkContext: Running Spark version 1.3.0
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing view acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: Changing modify acls to: 
 nobody,raofengyun
 15/05/19 14:09:00 INFO spark.SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(nobody, raofengyun); users with modify permissions: Set(nobody, 
 raofengyun)
 15/05/19 14:09:01 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/05/19 14:09:01 INFO Remoting: Starting remoting
 15/05/19 14:09:01 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkDriver@gs-server-v-127:7191]
 15/05/19 14:09:01 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://sparkDriver@gs-server-v-127:7191]
 15/05/19 14:09:01 INFO util.Utils: Successfully started service 
 'sparkDriver' on port 7191.
 15/05/19 14:09:01 INFO spark.SparkEnv: Registering MapOutputTracker
 15/05/19 14:09:01 INFO spark.SparkEnv: Registering BlockManagerMaster
 15/05/19 14:09:01 INFO storage.DiskBlockManager: Created local directory at 
 /data1/cdh/yarn/nm/usercache/raofengyun/appcache/application_1432015548391_0003/blockmgr-3250910b-693e-46ff-b057-26d552fd8abd
 15/05/19 14:09:01 INFO storage.MemoryStore: MemoryStore started with 
 capacity 259.7 MB
 15/05/19 14:09:01 INFO spark.HttpFileServer: HTTP File server directory is 
 /data1/cdh/yarn/nm/usercache/raofengyun/appcache/application_1432015548391_0003/httpd-5bc614bc-d8b1-473d-a807-4d9252eb679d
 15/05/19 14:09:01 INFO spark.HttpServer: Starting HTTP Server
 15/05/19 14:09:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/05/19 14:09:01 INFO server.AbstractConnector: Started 
 SocketConnector@0.0.0.0:9349
 15/05/19 14:09:01 INFO util.Utils: Successfully started service 'HTTP file 
 server' on port 9349.
 15/05/19 14:09:01 INFO spark.SparkEnv: Registering OutputCommitCoordinator
 15/05/19 14:09:01 INFO ui.JettyUtils: Adding filter: 
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
 15/05/19 14:09:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/05/19 14:09:01 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:63023
 15/05/19 14:09:01 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 63023.
 15/05/19 14:09:01 INFO ui.SparkUI: Started SparkUI at 
 http://gs-server-v-127:63023
 15/05/19 14:09:02 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/05/19 14:09:02 INFO netty.NettyBlockTransferService: Server created on 
 33526
 15/05/19 14:09:02 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/05/19 14:09:02 INFO 

Re: --jars works in yarn-client but not yarn-cluster mode, why?

2015-05-19 Thread Fengyun RAO
Thanks, Marcelo!


Below is the full log,


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/avro-tools-1.7.6-cdh5.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/19 14:08:58 INFO yarn.ApplicationMaster: Registered signal
handlers for [TERM, HUP, INT]
15/05/19 14:08:59 INFO yarn.ApplicationMaster: ApplicationAttemptId:
appattempt_1432015548391_0003_01
15/05/19 14:09:00 INFO spark.SecurityManager: Changing view acls to:
nobody,raofengyun
15/05/19 14:09:00 INFO spark.SecurityManager: Changing modify acls to:
nobody,raofengyun
15/05/19 14:09:00 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(nobody, raofengyun); users with modify permissions:
Set(nobody, raofengyun)
15/05/19 14:09:00 INFO yarn.ApplicationMaster: Starting the user
application in a separate Thread
15/05/19 14:09:00 INFO yarn.ApplicationMaster: Waiting for spark
context initialization
15/05/19 14:09:00 INFO yarn.ApplicationMaster: Waiting for spark
context initialization ...
15/05/19 14:09:00 INFO spark.SparkContext: Running Spark version 1.3.0
15/05/19 14:09:00 INFO spark.SecurityManager: Changing view acls to:
nobody,raofengyun
15/05/19 14:09:00 INFO spark.SecurityManager: Changing modify acls to:
nobody,raofengyun
15/05/19 14:09:00 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(nobody, raofengyun); users with modify permissions:
Set(nobody, raofengyun)
15/05/19 14:09:01 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/19 14:09:01 INFO Remoting: Starting remoting
15/05/19 14:09:01 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkDriver@gs-server-v-127:7191]
15/05/19 14:09:01 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkDriver@gs-server-v-127:7191]
15/05/19 14:09:01 INFO util.Utils: Successfully started service
'sparkDriver' on port 7191.
15/05/19 14:09:01 INFO spark.SparkEnv: Registering MapOutputTracker
15/05/19 14:09:01 INFO spark.SparkEnv: Registering BlockManagerMaster
15/05/19 14:09:01 INFO storage.DiskBlockManager: Created local
directory at 
/data1/cdh/yarn/nm/usercache/raofengyun/appcache/application_1432015548391_0003/blockmgr-3250910b-693e-46ff-b057-26d552fd8abd
15/05/19 14:09:01 INFO storage.MemoryStore: MemoryStore started with
capacity 259.7 MB
15/05/19 14:09:01 INFO spark.HttpFileServer: HTTP File server
directory is 
/data1/cdh/yarn/nm/usercache/raofengyun/appcache/application_1432015548391_0003/httpd-5bc614bc-d8b1-473d-a807-4d9252eb679d
15/05/19 14:09:01 INFO spark.HttpServer: Starting HTTP Server
15/05/19 14:09:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/19 14:09:01 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:9349
15/05/19 14:09:01 INFO util.Utils: Successfully started service 'HTTP
file server' on port 9349.
15/05/19 14:09:01 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/05/19 14:09:01 INFO ui.JettyUtils: Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/05/19 14:09:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/19 14:09:01 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:63023
15/05/19 14:09:01 INFO util.Utils: Successfully started service
'SparkUI' on port 63023.
15/05/19 14:09:01 INFO ui.SparkUI: Started SparkUI at
http://gs-server-v-127:63023
15/05/19 14:09:02 INFO cluster.YarnClusterScheduler: Created
YarnClusterScheduler
15/05/19 14:09:02 INFO netty.NettyBlockTransferService: Server created on 33526
15/05/19 14:09:02 INFO storage.BlockManagerMaster: Trying to register
BlockManager
15/05/19 14:09:02 INFO storage.BlockManagerMasterActor: Registering
block manager gs-server-v-127:33526 with 259.7 MB RAM,
BlockManagerId(driver, gs-server-v-127, 33526)
15/05/19 14:09:02 INFO storage.BlockManagerMaster: Registered BlockManager
15/05/19 14:09:02 INFO scheduler.EventLoggingListener: Logging events
to 
hdfs://gs-server-v-127:8020/user/spark/applicationHistory/application_1432015548391_0003
15/05/19 14:09:02 INFO yarn.ApplicationMaster: Listen to driver:
akka.tcp://sparkDriver@gs-server-v-127:7191/user/YarnScheduler
15/05/19 14:09:02 INFO cluster.YarnClusterSchedulerBackend:
ApplicationMaster registered as
Actor[akka://sparkDriver/user/YarnAM#1902752386]
15/05/19 14:09:02 INFO client.RMProxy: Connecting to ResourceManager
at gs-server-v-127/10.200.200.56:8030
15/05/19 14:09:02 INFO yarn.YarnRMClient: Registering the ApplicationMaster
15/05/19 14:09:03 INFO yarn.YarnAllocator: Will request 2 executor
containers, each with 1 cores and 4480 MB memory 

Re: --jars works in yarn-client but not yarn-cluster mode, why?

2015-05-14 Thread Fengyun RAO
thanks, Wilfred.

In our program, the htrace-core-3.1.0-incubating.jar dependency is only
required in the executor, not in the driver.
while in both yarn-client and yarn-cluster, the executor runs in
cluster.

and it's clearly in yarn-cluster mode, the jar IS in
spark.yarn.secondary.jars,
but still throws ClassNotFoundException

2015-05-14 18:52 GMT+08:00 Wilfred Spiegelenburg 
wspiegelenb...@cloudera.com:

 In the cluster the driver runs in the cluster and not locally in the
 spark-submit JVM. This changes what is available on your classpath. It
 looks like you are running into a similar situation as described in
 SPARK-5377.

 Wilfred

 On 14/05/2015 13:47, Fengyun RAO wrote:

 I look into the Environment in both modes.

 yarn-client:
 spark.jars

 local:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar,file:/home/xxx/my-app.jar

 yarn-cluster:
 spark.yarn.secondary.jars

 local:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar

 I wonder why htrace exists in spark.yarn.secondary.jars but still not
 found in URLClassLoader.

 I tried both local and file mode for the jar, still the same error.


 2015-05-14 11:37 GMT+08:00 Fengyun RAO raofeng...@gmail.com
 mailto:raofeng...@gmail.com:


 Hadoop version: CDH 5.4.

 We need to connect to HBase, thus need extra

 /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
 dependency.

 It works in yarn-client mode:
 spark-submit --class xxx.xxx.MyApp --master yarn-client
 --num-executors 10 --executor-memory 10g --jars

 /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
 my-app.jar /input /output

 However, if we change yarn-client to yarn-cluster', it throws an
 ClassNotFoundException (actually the class exists in
 htrace-core-3.1.0-incubating.jar):

 Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
 at
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)
 at
 org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
 at
 org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
 at
 org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)
 at
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:850)
 at
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.init(ConnectionManager.java:635)
 ... 21 more
 Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


 Why --jars doesn't work in yarn-cluster mode? How to add extra
 dependency in yarn-cluster mode?



 --

 ---
 You received this message because you are subscribed to the Google
 Groups CDH Users group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to cdh-user+unsubscr...@cloudera.org
 mailto:cdh-user+unsubscr...@cloudera.org.
 For more options, visit https://groups.google.com/a/cloudera.org/d/optout
 .


 --
 Wilfred Spiegelenburg
 Backline Customer Operations Engineer
 YARN/MapReduce/Spark

 http://www.cloudera.com
 --
 http://five.sentenc.es

 --

 --- You received this message because you are subscribed to the Google
 Groups CDH Users group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to cdh-user+unsubscr...@cloudera.org.
 For more options, visit https://groups.google.com/a/cloudera.org/d/optout.



Re: --jars works in yarn-client but not yarn-cluster mode, why?

2015-05-13 Thread Fengyun RAO
I look into the Environment in both modes.

yarn-client:
spark.jars
local:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar,file:/home/xxx/my-app.jar
yarn-cluster:
spark.yarn.secondary.jars
local:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
I wonder why htrace exists in spark.yarn.secondary.jars but still not
found in URLClassLoader.

I tried both local and file mode for the jar, still the same error.


2015-05-14 11:37 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 Hadoop version: CDH 5.4.

 We need to connect to HBase, thus need extra
 /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
 dependency.

 It works in yarn-client mode:
 spark-submit --class xxx.xxx.MyApp --master yarn-client --num-executors
 10 --executor-memory 10g --jars
 /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
 my-app.jar /input /output

 However, if we change yarn-client to yarn-cluster', it throws an
 ClassNotFoundException (actually the class exists in
 htrace-core-3.1.0-incubating.jar):

 Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
   at 
 org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)
   at 
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:850)
   at 
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.init(ConnectionManager.java:635)
   ... 21 more
 Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


 Why --jars doesn't work in yarn-cluster mode? How to add extra dependency in 
 yarn-cluster mode?





--jars works in yarn-client but not yarn-cluster mode, why?

2015-05-13 Thread Fengyun RAO
Hadoop version: CDH 5.4.

We need to connect to HBase, thus need extra
/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
dependency.

It works in yarn-client mode:
spark-submit --class xxx.xxx.MyApp --master yarn-client --num-executors 10
--executor-memory 10g --jars
/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar
my-app.jar /input /output

However, if we change yarn-client to yarn-cluster', it throws an
ClassNotFoundException (actually the class exists in
htrace-core-3.1.0-incubating.jar):

Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
at 
org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:850)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.init(ConnectionManager.java:635)
... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


Why --jars doesn't work in yarn-cluster mode? How to add extra
dependency in yarn-cluster mode?