Hi Hussam,

Have you looked at the stdout and stderr files from the worker process? You can 
find them in the “work” directory under SPARK_HOME on the slave node. They 
might have some information about why it crashed. Otherwise, I’d recommend 
profiling the workers with tools like jmap or jstack to see what objects take 
up memory. Commonly the problem may be having too low a level of parallelism 
set.

Matei

On Nov 12, 2013, at 8:53 AM, [email protected] wrote:

> Hi,
> 
> Using spark 0.8 and hadoop 1.2.1 with cluster of 2 node each have 16 CPU and 
> allocated 8G of RAM
> 
> I am running into a use case that if I try to save a very large 
> JavaRDD<String> that was created using paralleize from Java List<String> my 
> job workers are failing as follows
> 
> 13/11/11 19:23:48 INFO Worker: Executor app-20131111191414-0001/2 finished 
> with state FAILED message Command exited with code 1 exitStatus 1
> 
> Looks like the spark driver trying 5 times to execute the  then decide to 
> kill the process
> 
> Any help on how to get more info on the reason of failure or what code 1 
> existStatus 1 would means here?
> 
> Any setting or configuration that I can use in spark that would dump more 
> info on error?
> 
> Here's my logs
> 
> 13/11/11 19:14:50 INFO Worker: Asked to launch executor 
> app-20131111190659-0000/0 for OMDBQueryService
> 13/11/11 19:14:50 INFO ExecutorRunner: Launch command: "java" "-cp" 
> ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
>  "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" 
> "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" 
> "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.StandaloneExecutorBackend" 
> "akka://spark@poc1:54482/user/StandaloneScheduler" "0" "poc3" "16"
> 13/11/11 19:16:47 INFO Worker: Executor app-20131111190659-0000/0 finished 
> with state FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:16:47 INFO Worker: Asked to launch executor 
> app-20131111190659-0000/2 for OMDBQueryService
> 13/11/11 19:16:47 INFO ExecutorRunner: Launch command: "java" "-cp" 
> ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
>  "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" 
> "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" 
> "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.StandaloneExecutorBackend" 
> "akka://spark@poc1:54482/user/StandaloneScheduler" "2" "poc3" "16"
> 13/11/11 19:16:53 INFO Worker: Executor app-20131111190659-0000/2 finished 
> with state FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:16:53 INFO Worker: Asked to launch executor 
> app-20131111190659-0000/4 for OMDBQueryService
> 13/11/11 19:16:53 INFO ExecutorRunner: Launch command: "java" "-cp" 
> ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
>  "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" 
> "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" 
> "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.StandaloneExecutorBackend" 
> "akka://spark@poc1:54482/user/StandaloneScheduler" "4" "poc3" "16"
> 13/11/11 19:17:02 INFO Worker: Executor app-20131111190659-0000/4 finished 
> with state FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:17:02 INFO Worker: Asked to launch executor 
> app-20131111190659-0000/6 for OMDBQueryService
> 13/11/11 19:17:02 INFO ExecutorRunner: Launch command: "java" "-cp" 
> ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
>  "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" 
> "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" 
> "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.StandaloneExecutorBackend" 
> "akka://spark@poc1:54482/user/StandaloneScheduler" "6" "poc3" "16"
> 13/11/11 19:17:09 INFO Worker: Executor app-20131111190659-0000/6 finished 
> with state FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:17:09 INFO Worker: Asked to launch executor 
> app-20131111190659-0000/8 for OMDBQueryService
> 13/11/11 19:17:09 INFO ExecutorRunner: Launch command: "java" "-cp" 
> ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
>  "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" 
> "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" 
> "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.StandaloneExecutorBackend" 
> "akka://spark@poc1:54482/user/StandaloneScheduler" "8" "poc3" "16"
> 13/11/11 19:17:17 INFO Worker: Executor app-20131111190659-0000/8 finished 
> with state FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:17:17 INFO Worker: Asked to launch executor 
> app-20131111190659-0000/10 for OMDBQueryService
> 13/11/11 19:17:17 INFO ExecutorRunner: Launch command: "java" "-cp" 
> ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
>  "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" 
> "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" 
> "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" 
> "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" 
> "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" 
> "org.apache.spark.executor.StandaloneExecutorBackend" 
> "akka://spark@poc1:54482/user/StandaloneScheduler" "10" "poc3" "16"
> 13/11/11 19:17:20 INFO Worker: Asked to kill executor 
> app-20131111190659-0000/10
> 13/11/11 19:17:20 INFO ExecutorRunner: Killing process!
> 13/11/11 19:17:20 INFO ExecutorRunner: Runner thread for executor 
> app-20131111190659-0000/10 interrupted
> 13/11/11 19:17:21 INFO Worker: Executor app-20131111190659-0000/10 finished 
> with state KILLED
> 
> Thanks,
> Hussam

Reply via email to