Tim thanks for your reply, I am following this quite clear mesos-spark tutorial: https://docs.mesosphere.com/tutorials/run-spark-on-mesos/ So mainly I tried running spark-shell which locally works fine but when the jobs are submitted through mesos something goes wrong!
My question is: is there a some extra configuration needed for the workers (that is not mentioned at the tutorial) ?? The Executor Lost message I get is really generic so I dont know whats going on.. Please check the attached mesos execution event log. Thanks again, Panagiotis On Wed, May 20, 2015 at 8:21 AM, Tim Chen <t...@mesosphere.io> wrote: > Can you share your exact spark-submit command line? > > And also cluster mode is not yet released yet (1.4) and doesn't support > spark-shell, so I think you're just using client mode unless you're using > latest master. > > Tim > > On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis < > panga...@gmail.com> wrote: > >> Hello all, >> >> I am facing a weird issue for the last couple of days running Spark on >> top of Mesos and I need your help. I am running Mesos in a private cluster >> and managed to deploy successfully hdfs, cassandra, marathon and play but >> Spark is not working for a reason. I have tried so far: >> different java versions (1.6 and 1.7 oracle and openjdk), different >> spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1), >> different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies. >> >> More specifically while local tasks complete fine, in cluster mode all >> the tasks get lost. >> (both using spark-shell and spark-submit) >> From the worker log I see something like this: >> >> ------------------------------------------------------------------- >> I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI >> 'hdfs:/XXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' >> I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI >> 'hdfs://XXXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop >> Client >> I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from >> 'hdfs://XXXXXXXX:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to >> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' >> I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource >> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' >> into >> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3' >> *Error: Could not find or load main class two* >> >> ------------------------------------------------------------------- >> >> And from the Spark Terminal: >> >> ------------------------------------------------------------------- >> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 >> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled >> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at >> SparkPi.scala:35 >> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at >> SparkPi.scala:35 >> Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent >> failure: Lost task 7.3 in stage 0.0 (TID 26, XXXXXXXX): ExecutorLostFailure >> (executor lost) >> Driver stacktrace: at >> org.apache.spark.scheduler.DAGScheduler.org >> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> ...... >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> >> ------------------------------------------------------------------- >> >> Any help will be greatly appreciated! >> >> Regards, >> Panagiotis >> > >
-sparklogs-spark-shell-1431993674182-EVENT_LOG_1
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org