Hi Stephen,

Sometimes it's just missing something simple, either like a user name
problem or file dependency, etc.

Can you share what's in the stdout/stderr in your task sandbox directory
(available via Mesos UI, clicking on the task and sandbox)?

And also super helpful if you can find in the slave.log that ran one of
your failed task, find the logs when it reported TASK_FAILED or TASK_LOST
of that task and it should say reasons why mesos slave couldn't run the
task.

Thanks,

Tim



On Fri, Apr 24, 2015 at 2:15 PM, Stephen Carman <scar...@coldlight.com>
wrote:

> So I can’t for the life of me to get something even simple working for
> Spark on Mesos.
>
> I installed a 3 master, 3 slave mesos cluster, which is all configured,
> but I can’t for the life of me even get the spark shell to work properly.
>
> I get errors like this
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 5
> in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 23, 10.253.1.117): ExecutorLostFailure (executor
> 20150424-104711-1375862026-5050-20113-S1 lost)
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
>         at scala.Option.foreach(Option.scala:236)
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> I tried both mesos 0.21 and 0.22 and they both produce the same error…
>
> My version of spark is 1.3.1 with hadoop 2.6, I just downloaded the
> pre-build from the site, or is that wrong and i have to build it myself?
>
> I have my mesos_native_java_library, spark executor URI and mesos master
> set in my spark-env.sh, they to the best of my abilities seem correct.
>
> Does anyone have any insight into this at all? I’m running this on red hat
> 7 with 8 CPU cores and 14gb of ram per slave, so 24 cores total and 42gb of
> ram total.
>
> Anyone have any idea at all what is going on here?
>
> Thanks,
> Steve
> This e-mail is intended solely for the above-mentioned recipient and it
> may contain confidential or privileged information. If you have received it
> in error, please notify us immediately and delete the e-mail. You must not
> copy, distribute, disclose or take any action in reliance on it. In
> addition, the contents of an attachment to this e-mail may contain software
> viruses which could damage your own computer system. While ColdLight
> Solutions, LLC has taken every reasonable precaution to minimize this risk,
> we cannot accept liability for any damage which you sustain as a result of
> software viruses. You should perform your own virus checks before opening
> the attachment.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to