Re: Mesos Spark Tasks - Lost

Tim Chen Wed, 20 May 2015 00:22:10 -0700

Can you share your exact spark-submit command line?

And also cluster mode is not yet released yet (1.4) and doesn't support
spark-shell, so I think you're just using client mode unless you're using
latest master.


Tim

On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis <panga...@gmail.com>
wrote:

> Hello all,
>
> I am facing a weird issue for the last couple of days running Spark on top
> of Mesos and I need your help. I am running Mesos in a private cluster and
> managed to deploy successfully  hdfs, cassandra, marathon and play but
> Spark is not working for a reason. I have tried so far:
> different java versions (1.6 and 1.7 oracle and openjdk), different
> spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
> different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.
>
> More specifically while local tasks complete fine, in cluster mode all the
> tasks get lost.
> (both using spark-shell and spark-submit)
> From the worker log I see something like this:
>
> -------------------------------------------------------------------
> I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
> 'hdfs:/XXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
> I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
> 'hdfs://XXXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
> Client
> I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
> 'hdfs://XXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
> I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
> into
> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
> *Error: Could not find or load main class two*
>
> -------------------------------------------------------------------
>
> And from the Spark Terminal:
>
> -------------------------------------------------------------------
> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
> SparkPi.scala:35
> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
> SparkPi.scala:35
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
> failure: Lost task 7.3 in stage 0.0 (TID 26, XXXXXXXX): ExecutorLostFailure
> (executor lost)
> Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org
> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> ......
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> -------------------------------------------------------------------
>
> Any help will be greatly appreciated!
>
> Regards,
> Panagiotis
>

Re: Mesos Spark Tasks - Lost

Reply via email to