Re: yarn-cluster mode throwing NullPointerException

2015-10-12 Thread Venkatakrishnan Sowrirajan
Hi Rachana,


Are you by any chance saying something like this in your code
​?
​

"sparkConf.setMaster("yarn-cluster");"

​SparkContext is not supported with yarn-cluster mode.​


I think you are hitting this bug -- >
https://issues.apache.org/jira/browse/SPARK-7504. This got fixed in
Spark-1.4.0, so you can try in 1.4.0

Regards
Venkata krishnan

On Sun, Oct 11, 2015 at 8:49 PM, Rachana Srivastava <
rachana.srivast...@markmonitor.com> wrote:

> I am trying to submit a job using yarn-cluster mode using spark-submit
> command.  My code works fine when I use yarn-client mode.
>
>
>
> *Cloudera Version:*
>
> CDH-5.4.7-1.cdh5.4.7.p0.3
>
>
>
> *Command Submitted:*
>
> spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming"  \
>
> --driver-java-options
> "-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \
>
> --conf
> "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
> \
>
> --conf
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
> \
>
> --num-executors 2 \
>
> --executor-cores 2 \
>
> ../target/mm-XXX-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar \
>
> yarn-cluster 10 "XXX:2181" "XXX:9092" groups kafkaurl 5 \
>
> "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeature.properties"
> \
>
> "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeatureContent.properties"
> \
>
> "hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/hdfsOutputNEWScript/OUTPUTYarn2"
> false
>
>
>
>
>
> *Log Details:*
>
> INFO : org.apache.spark.SparkContext - Running Spark version 1.3.0
>
> INFO : org.apache.spark.SecurityManager - Changing view acls to: ec2-user
>
> INFO : org.apache.spark.SecurityManager - Changing modify acls to: ec2-user
>
> INFO : org.apache.spark.SecurityManager - SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(ec2-user);
> users with modify permissions: Set(ec2-user)
>
> INFO : akka.event.slf4j.Slf4jLogger - Slf4jLogger started
>
> INFO : Remoting - Starting remoting
>
> INFO : Remoting - Remoting started; listening on addresses
> :[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579]
>
> INFO : Remoting - Remoting now listens on addresses:
> [akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579]
>
> INFO : org.apache.spark.util.Utils - Successfully started service
> 'sparkDriver' on port 49579.
>
> INFO : org.apache.spark.SparkEnv - Registering MapOutputTracker
>
> INFO : org.apache.spark.SparkEnv - Registering BlockManagerMaster
>
> INFO : org.apache.spark.storage.DiskBlockManager - Created local directory
> at
> /tmp/spark-1c805495-c7c4-471d-973f-b1ae0e2c8ff9/blockmgr-fff1946f-a716-40fc-a62d-bacba5b17638
>
> INFO : org.apache.spark.storage.MemoryStore - MemoryStore started with
> capacity 265.4 MB
>
> INFO : org.apache.spark.HttpFileServer - HTTP File server directory is
> /tmp/spark-8ed6f513-854f-4ee4-95ea-87185364eeaf/httpd-75cee1e7-af7a-4c82-a9ff-a124ce7ca7ae
>
> INFO : org.apache.spark.HttpServer - Starting HTTP Server
>
> INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
>
> INFO : org.spark-project.jetty.server.AbstractConnector - Started
> SocketConnector@0.0.0.0:46671
>
> INFO : org.apache.spark.util.Utils - Successfully started service 'HTTP
> file server' on port 46671.
>
> INFO : org.apache.spark.SparkEnv - Registering OutputCommitCoordinator
>
> INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
>
> INFO : org.spark-project.jetty.server.AbstractConnector - Started
> SelectChannelConnector@0.0.0.0:4040
>
> INFO : org.apache.spark.util.Utils - Successfully started service
> 'SparkUI' on port 4040.
>
> INFO : org.apache.spark.ui.SparkUI - Started SparkUI at
> http://ip-10-0-0-XXX.us-west-2.compute.internal:4040
>
> INFO : org.apache.spark.SparkContext - Added JAR
> file:/home/ec2-user/CE/correlationengine/scripts/../target/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> at
> http://10.0.0.XXX:46671/jars/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> with timestamp 1444620509463
>
> INFO : org.apache.spark.scheduler.cluster.YarnClusterScheduler - Created
> YarnClusterScheduler
>
> ERROR: org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend -
> Application ID is not set.
>
> INFO : org.apache.spark.network.netty.NettyBlockTransferService - Server
> created on 33880
>
> INFO : org.apache.spark.storage.BlockManagerMaster - Trying to register
> BlockManager
>
> INFO : org.apache.spark.storage.BlockManagerMasterActor - Registering
> block manager ip-10-0-0-XXX.us-west-2.compute.internal:33880 with 265.4 MB
> RAM, BlockManagerId(, ip-10-0-0-XXX.us-west-2.compute.internal,
> 33880)
>
> INFO : org.apache.spark.storage.BlockManagerMaster - Registered
> BlockManager
>
> INFO : org.apache.spark.scheduler.EventLoggingListener - Logging events to
> 

yarn-cluster mode throwing NullPointerException

2015-10-11 Thread Rachana Srivastava
I am trying to submit a job using yarn-cluster mode using spark-submit command. 
 My code works fine when I use yarn-client mode.

Cloudera Version:
CDH-5.4.7-1.cdh5.4.7.p0.3

Command Submitted:
spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming"  \
--driver-java-options 
"-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \
--conf 
"spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
 \
--conf 
"spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
 \
--num-executors 2 \
--executor-cores 2 \
../target/mm-XXX-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar \
yarn-cluster 10 "XXX:2181" "XXX:9092" groups kafkaurl 5 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeature.properties"
 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeatureContent.properties"
 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/hdfsOutputNEWScript/OUTPUTYarn2"
  false


Log Details:
INFO : org.apache.spark.SparkContext - Running Spark version 1.3.0
INFO : org.apache.spark.SecurityManager - Changing view acls to: ec2-user
INFO : org.apache.spark.SecurityManager - Changing modify acls to: ec2-user
INFO : org.apache.spark.SecurityManager - SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(ec2-user); users 
with modify permissions: Set(ec2-user)
INFO : akka.event.slf4j.Slf4jLogger - Slf4jLogger started
INFO : Remoting - Starting remoting
INFO : Remoting - Remoting started; listening on addresses 
:[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579]
INFO : Remoting - Remoting now listens on addresses: 
[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579]
INFO : org.apache.spark.util.Utils - Successfully started service 'sparkDriver' 
on port 49579.
INFO : org.apache.spark.SparkEnv - Registering MapOutputTracker
INFO : org.apache.spark.SparkEnv - Registering BlockManagerMaster
INFO : org.apache.spark.storage.DiskBlockManager - Created local directory at 
/tmp/spark-1c805495-c7c4-471d-973f-b1ae0e2c8ff9/blockmgr-fff1946f-a716-40fc-a62d-bacba5b17638
INFO : org.apache.spark.storage.MemoryStore - MemoryStore started with capacity 
265.4 MB
INFO : org.apache.spark.HttpFileServer - HTTP File server directory is 
/tmp/spark-8ed6f513-854f-4ee4-95ea-87185364eeaf/httpd-75cee1e7-af7a-4c82-a9ff-a124ce7ca7ae
INFO : org.apache.spark.HttpServer - Starting HTTP Server
INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
INFO : org.spark-project.jetty.server.AbstractConnector - Started 
SocketConnector@0.0.0.0:46671
INFO : org.apache.spark.util.Utils - Successfully started service 'HTTP file 
server' on port 46671.
INFO : org.apache.spark.SparkEnv - Registering OutputCommitCoordinator
INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
INFO : org.spark-project.jetty.server.AbstractConnector - Started 
SelectChannelConnector@0.0.0.0:4040
INFO : org.apache.spark.util.Utils - Successfully started service 'SparkUI' on 
port 4040.
INFO : org.apache.spark.ui.SparkUI - Started SparkUI at 
http://ip-10-0-0-XXX.us-west-2.compute.internal:4040
INFO : org.apache.spark.SparkContext - Added JAR 
file:/home/ec2-user/CE/correlationengine/scripts/../target/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
 at 
http://10.0.0.XXX:46671/jars/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
 with timestamp 1444620509463
INFO : org.apache.spark.scheduler.cluster.YarnClusterScheduler - Created 
YarnClusterScheduler
ERROR: org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend - 
Application ID is not set.
INFO : org.apache.spark.network.netty.NettyBlockTransferService - Server 
created on 33880
INFO : org.apache.spark.storage.BlockManagerMaster - Trying to register 
BlockManager
INFO : org.apache.spark.storage.BlockManagerMasterActor - Registering block 
manager ip-10-0-0-XXX.us-west-2.compute.internal:33880 with 265.4 MB RAM, 
BlockManagerId(, ip-10-0-0-XXX.us-west-2.compute.internal, 33880)
INFO : org.apache.spark.storage.BlockManagerMaster - Registered BlockManager
INFO : org.apache.spark.scheduler.EventLoggingListener - Logging events to 
hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/spark/applicationHistory/spark-application-1444620509497
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)
at 
org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
at org.apache.spark.SparkContext.(SparkContext.scala:541)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
at 
com.markmonitor.antifraud.ce.KafkaURLStreaming.main(KafkaURLStreaming.java:91)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at