Re: yarn-cluster mode throwing NullPointerException
Hi Rachana, Are you by any chance saying something like this in your code ? "sparkConf.setMaster("yarn-cluster");" SparkContext is not supported with yarn-cluster mode. I think you are hitting this bug -- > https://issues.apache.org/jira/browse/SPARK-7504. This got fixed in Spark-1.4.0, so you can try in 1.4.0 Regards Venkata krishnan On Sun, Oct 11, 2015 at 8:49 PM, Rachana Srivastava < rachana.srivast...@markmonitor.com> wrote: > I am trying to submit a job using yarn-cluster mode using spark-submit > command. My code works fine when I use yarn-client mode. > > > > *Cloudera Version:* > > CDH-5.4.7-1.cdh5.4.7.p0.3 > > > > *Command Submitted:* > > spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming" \ > > --driver-java-options > "-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \ > > --conf > "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" > \ > > --conf > "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" > \ > > --num-executors 2 \ > > --executor-cores 2 \ > > ../target/mm-XXX-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar \ > > yarn-cluster 10 "XXX:2181" "XXX:9092" groups kafkaurl 5 \ > > "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeature.properties" > \ > > "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeatureContent.properties" > \ > > "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/hdfsOutputNEWScript/OUTPUTYarn2" > false > > > > > > *Log Details:* > > INFO : org.apache.spark.SparkContext - Running Spark version 1.3.0 > > INFO : org.apache.spark.SecurityManager - Changing view acls to: ec2-user > > INFO : org.apache.spark.SecurityManager - Changing modify acls to: ec2-user > > INFO : org.apache.spark.SecurityManager - SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(ec2-user); > users with modify permissions: Set(ec2-user) > > INFO : akka.event.slf4j.Slf4jLogger - Slf4jLogger started > > INFO : Remoting - Starting remoting > > INFO : Remoting - Remoting started; listening on addresses > :[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579] > > INFO : Remoting - Remoting now listens on addresses: > [akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579] > > INFO : org.apache.spark.util.Utils - Successfully started service > 'sparkDriver' on port 49579. > > INFO : org.apache.spark.SparkEnv - Registering MapOutputTracker > > INFO : org.apache.spark.SparkEnv - Registering BlockManagerMaster > > INFO : org.apache.spark.storage.DiskBlockManager - Created local directory > at > /tmp/spark-1c805495-c7c4-471d-973f-b1ae0e2c8ff9/blockmgr-fff1946f-a716-40fc-a62d-bacba5b17638 > > INFO : org.apache.spark.storage.MemoryStore - MemoryStore started with > capacity 265.4 MB > > INFO : org.apache.spark.HttpFileServer - HTTP File server directory is > /tmp/spark-8ed6f513-854f-4ee4-95ea-87185364eeaf/httpd-75cee1e7-af7a-4c82-a9ff-a124ce7ca7ae > > INFO : org.apache.spark.HttpServer - Starting HTTP Server > > INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT > > INFO : org.spark-project.jetty.server.AbstractConnector - Started > SocketConnector@0.0.0.0:46671 > > INFO : org.apache.spark.util.Utils - Successfully started service 'HTTP > file server' on port 46671. > > INFO : org.apache.spark.SparkEnv - Registering OutputCommitCoordinator > > INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT > > INFO : org.spark-project.jetty.server.AbstractConnector - Started > SelectChannelConnector@0.0.0.0:4040 > > INFO : org.apache.spark.util.Utils - Successfully started service > 'SparkUI' on port 4040. > > INFO : org.apache.spark.ui.SparkUI - Started SparkUI at > http://ip-10-0-0-XXX.us-west-2.compute.internal:4040 > > INFO : org.apache.spark.SparkContext - Added JAR > file:/home/ec2-user/CE/correlationengine/scripts/../target/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar > at > http://10.0.0.XXX:46671/jars/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar > with timestamp 1444620509463 > > INFO : org.apache.spark.scheduler.cluster.YarnClusterScheduler - Created > YarnClusterScheduler > > ERROR: org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend - > Application ID is not set. > > INFO : org.apache.spark.network.netty.NettyBlockTransferService - Server > created on 33880 > > INFO : org.apache.spark.storage.BlockManagerMaster - Trying to register > BlockManager > > INFO : org.apache.spark.storage.BlockManagerMasterActor - Registering > block manager ip-10-0-0-XXX.us-west-2.compute.internal:33880 with 265.4 MB > RAM, BlockManagerId(, ip-10-0-0-XXX.us-west-2.compute.internal, > 33880) > > INFO : org.apache.spark.storage.BlockManagerMaster - Registered > BlockManager > > INFO : org.apache.spark.scheduler.EventLoggingListener - Logging events to >
yarn-cluster mode throwing NullPointerException
I am trying to submit a job using yarn-cluster mode using spark-submit command. My code works fine when I use yarn-client mode. Cloudera Version: CDH-5.4.7-1.cdh5.4.7.p0.3 Command Submitted: spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming" \ --driver-java-options "-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \ --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \ --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \ --num-executors 2 \ --executor-cores 2 \ ../target/mm-XXX-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar \ yarn-cluster 10 "XXX:2181" "XXX:9092" groups kafkaurl 5 \ "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeature.properties" \ "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeatureContent.properties" \ "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/hdfsOutputNEWScript/OUTPUTYarn2" false Log Details: INFO : org.apache.spark.SparkContext - Running Spark version 1.3.0 INFO : org.apache.spark.SecurityManager - Changing view acls to: ec2-user INFO : org.apache.spark.SecurityManager - Changing modify acls to: ec2-user INFO : org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user) INFO : akka.event.slf4j.Slf4jLogger - Slf4jLogger started INFO : Remoting - Starting remoting INFO : Remoting - Remoting started; listening on addresses :[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579] INFO : Remoting - Remoting now listens on addresses: [akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579] INFO : org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 49579. INFO : org.apache.spark.SparkEnv - Registering MapOutputTracker INFO : org.apache.spark.SparkEnv - Registering BlockManagerMaster INFO : org.apache.spark.storage.DiskBlockManager - Created local directory at /tmp/spark-1c805495-c7c4-471d-973f-b1ae0e2c8ff9/blockmgr-fff1946f-a716-40fc-a62d-bacba5b17638 INFO : org.apache.spark.storage.MemoryStore - MemoryStore started with capacity 265.4 MB INFO : org.apache.spark.HttpFileServer - HTTP File server directory is /tmp/spark-8ed6f513-854f-4ee4-95ea-87185364eeaf/httpd-75cee1e7-af7a-4c82-a9ff-a124ce7ca7ae INFO : org.apache.spark.HttpServer - Starting HTTP Server INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT INFO : org.spark-project.jetty.server.AbstractConnector - Started SocketConnector@0.0.0.0:46671 INFO : org.apache.spark.util.Utils - Successfully started service 'HTTP file server' on port 46671. INFO : org.apache.spark.SparkEnv - Registering OutputCommitCoordinator INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT INFO : org.spark-project.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:4040 INFO : org.apache.spark.util.Utils - Successfully started service 'SparkUI' on port 4040. INFO : org.apache.spark.ui.SparkUI - Started SparkUI at http://ip-10-0-0-XXX.us-west-2.compute.internal:4040 INFO : org.apache.spark.SparkContext - Added JAR file:/home/ec2-user/CE/correlationengine/scripts/../target/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar at http://10.0.0.XXX:46671/jars/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar with timestamp 1444620509463 INFO : org.apache.spark.scheduler.cluster.YarnClusterScheduler - Created YarnClusterScheduler ERROR: org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend - Application ID is not set. INFO : org.apache.spark.network.netty.NettyBlockTransferService - Server created on 33880 INFO : org.apache.spark.storage.BlockManagerMaster - Trying to register BlockManager INFO : org.apache.spark.storage.BlockManagerMasterActor - Registering block manager ip-10-0-0-XXX.us-west-2.compute.internal:33880 with 265.4 MB RAM, BlockManagerId(, ip-10-0-0-XXX.us-west-2.compute.internal, 33880) INFO : org.apache.spark.storage.BlockManagerMaster - Registered BlockManager INFO : org.apache.spark.scheduler.EventLoggingListener - Logging events to hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/spark/applicationHistory/spark-application-1444620509497 Exception in thread "main" java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580) at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32) at org.apache.spark.SparkContext.(SparkContext.scala:541) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) at com.markmonitor.antifraud.ce.KafkaURLStreaming.main(KafkaURLStreaming.java:91) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at