Hi,

  suddenly our spark job on yarn started failing silently without showing
any error, following is the trace in verbose mode





Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property:
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property:
spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/log4j.properties
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property:
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property:
spark.yarn.historyServer.address=http://ds-hnn002.dev.abc.com:18088
Adding default property:
spark.yarn.am.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.ui.showConsoleProgress=true
Adding default property: spark.shuffle.service.port=7337
Adding default property: spark.master=yarn-client
Adding default property:
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property:
spark.eventLog.dir=hdfs://my-hadoop-dev/user/spark/applicationHistory
Adding default property:
spark.yarn.jar=local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar
Parsed arguments:
  master                  yarn
  deployMode              null
  executorMemory          3G
  executorCores           null
  totalExecutorCores      null
  propertiesFile          /usr/lib/spark/conf/spark-defaults.conf
  driverMemory            4G
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  /usr/lib/hadoop/lib/native
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            30
  files                   null
  pyFiles                 null
  archives                null
  mainClass               null
  primaryResource         file:/home/xyz/code/updb/spark/updb2vw_testing.py
  name                    updb2vw_testing.py
  childArgs               [--date 2015-05-20]
  jars                    null
  packages                null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file
/usr/lib/spark/conf/spark-defaults.conf:
  spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.yarn.jar ->
local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar
  spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.yarn.historyServer.address -> http://ds-hnn002.dev.abc.com:18088
  spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.eventLog.enabled -> true
  spark.ui.showConsoleProgress -> true
  spark.serializer -> org.apache.spark.serializer.KryoSerializer
  spark.executor.extraJavaOptions ->
-Dlog4j.configuration=file:///etc/spark/log4j.properties
  spark.shuffle.service.enabled -> true
  spark.shuffle.service.port -> 7337
  spark.eventLog.dir -> hdfs://my-hadoop-dev/user/spark/applicationHistory
  spark.master -> yarn-client


Main class:
org.apache.spark.deploy.PythonRunner
Arguments:
file:/home/xyz/code/updb/spark/updb2vw_testing.py
null
--date
2015-05-20
System properties:
spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.driver.memory -> 4G
spark.executor.memory -> 3G
spark.yarn.jar ->
local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar
spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.executor.instances -> 30
spark.yarn.historyServer.address -> http://ds-hnn002.dev.abc.com:18088
spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.ui.showConsoleProgress -> true
spark.eventLog.enabled -> true
spark.yarn.dist.files -> file:/home/xyz/code/updb/spark/updb2vw_testing.py
SPARK_SUBMIT -> true
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.executor.extraJavaOptions ->
-Dlog4j.configuration=file:///etc/spark/log4j.properties
spark.shuffle.service.enabled -> true
spark.app.name -> updb2vw_testing.py
spark.shuffle.service.port -> 7337
spark.eventLog.dir -> hdfs://my-hadoop-dev/user/spark/applicationHistory
spark.master -> yarn-client
Classpath elements:



spark.akka.frameSize=60
spark.app.name=updb2vw_2015-05-20
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.driver.maxResultSize=2G
spark.driver.memory=4G
spark.eventLog.dir=hdfs://my-hadoop-dev/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/log4j.properties
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.instances=30
spark.executor.memory=3G
spark.master=yarn-client
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.manager=hash
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.task.maxFailures=6
spark.ui.showConsoleProgress=true
spark.yarn.am.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.yarn.dist.files=file:/home/xyz/code/updb/spark/updb2vw_testing.py
spark.yarn.executor.memoryOverhead=2000
spark.yarn.historyServer.address=http://ds-hnn002.dev.abc.com:18088
spark.yarn.jar=local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/06/22 17:04:45 WARN Utils: Your hostname, datasci01.dev.abc.com resolves
to a loopback address: 127.0.0.1; using 10.0.3.197 instead (on interface
eth0)
15/06/22 17:04:45 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Traceback (most recent call last):
  File "/home/xyz/code/updb/spark/updb2vw_testing.py", line 125, in <module>
    spark_context = pyspark.SparkContext(conf=conf)
  File "/usr/lib/spark/python/pyspark/context.py", line 111, in __init__
    conf, jsc, profiler_cls)
  File "/usr/lib/spark/python/pyspark/context.py", line 159, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/lib/spark/python/pyspark/context.py", line 212, in
_initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File
"/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
701, in __call__
  File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
        at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113)
        at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
        at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:379)
        at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:214)
        at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)

15/06/22 17:08:27 ERROR Utils: Uncaught exception in thread delete Spark
local dirs
java.lang.NullPointerException
        at
org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
Exception in thread "delete Spark local dirs" java.lang.NullPointerException
        at
org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
        at
org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

any idea whats going on here ?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-on-yarn-failing-silently-tp23437.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to