Re: Mesos Spark Tasks - Lost

Panagiotis Garefalakis Wed, 20 May 2015 07:46:34 -0700

Tim thanks for your reply,

I am following this quite clear mesos-spark tutorial:
https://docs.mesosphere.com/tutorials/run-spark-on-mesos/
So mainly I tried running spark-shell which locally works fine but when the
jobs are submitted through mesos something goes wrong!


My question is: is there a some extra configuration needed for the workers
(that is not mentioned at the tutorial) ??

The Executor Lost message I get is really generic so I dont know whats
going on..
Please check the attached mesos execution event log.

Thanks again,
Panagiotis


On Wed, May 20, 2015 at 8:21 AM, Tim Chen <t...@mesosphere.io> wrote:

> Can you share your exact spark-submit command line?
>
> And also cluster mode is not yet released yet (1.4) and doesn't support
> spark-shell, so I think you're just using client mode unless you're using
> latest master.
>
> Tim
>
> On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis <
> panga...@gmail.com> wrote:
>
>> Hello all,
>>
>> I am facing a weird issue for the last couple of days running Spark on
>> top of Mesos and I need your help. I am running Mesos in a private cluster
>> and managed to deploy successfully  hdfs, cassandra, marathon and play but
>> Spark is not working for a reason. I have tried so far:
>> different java versions (1.6 and 1.7 oracle and openjdk), different
>> spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
>> different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.
>>
>> More specifically while local tasks complete fine, in cluster mode all
>> the tasks get lost.
>> (both using spark-shell and spark-submit)
>> From the worker log I see something like this:
>>
>> -------------------------------------------------------------------
>> I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
>> 'hdfs:/XXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
>> I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
>> 'hdfs://XXXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
>> Client
>> I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
>> 'hdfs://XXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
>> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
>> I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
>> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
>> into
>> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
>> *Error: Could not find or load main class two*
>>
>> -------------------------------------------------------------------
>>
>> And from the Spark Terminal:
>>
>> -------------------------------------------------------------------
>> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
>> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
>> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
>> SparkPi.scala:35
>> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
>> SparkPi.scala:35
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 7.3 in stage 0.0 (TID 26, XXXXXXXX): ExecutorLostFailure
>> (executor lost)
>> Driver stacktrace: at
>> org.apache.spark.scheduler.DAGScheduler.org
>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> ......
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> -------------------------------------------------------------------
>>
>> Any help will be greatly appreciated!
>>
>> Regards,
>> Panagiotis
>>
>
>

-sparklogs-spark-shell-1431993674182-EVENT_LOG_1
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Mesos Spark Tasks - Lost

Reply via email to