Github user ivanwick commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39755988
  
    This patch fixes a bug with PySpark shell running on Mesos.
    
    Without the spark.executor.uri property, PySpark reports lost tasks because 
the slave is looking for the spark-executor in the wrong path and can never 
start it.  It logs several  "Lost TID" and "Executor lost", while the scheduler 
re-queues the lost tasks.  They again fail for the same reason, finally ending 
with:
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/pyspark/rdd.py", 
line 539, in sum
        return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/pyspark/rdd.py", 
line 505, in reduce
        vals = self.mapPartitions(func).collect()
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/pyspark/rdd.py", 
line 469, in collect
        bytesInJava = self._jrdd.collect().iterator()
      File 
"/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
 line 537, in __call__
      File 
"/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
 line 300, in get_return_value
    py4j.protocol.Py4JJavaError14/04/05 14:10:48 INFO TaskSetManager: 
Re-queueing tasks for 201404020012-1174907072-5050-22936-8 from TaskSet 0.0
    14/04/05 14:10:48 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
have all completed, from pool 
    : An error occurred while calling o21.collect.
    : org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 4 times 
(most recent failure: unknown)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    
    ```
    
    The stderr of each slave in the Mesos framework reports:
    ```
    sh: 1: /opt/spark/spark-0.9.0-incubating-bin-cdh4/sbin/spark-executor: not 
found
    ```
    because this path doesn't exist on the slave nodes (this happens to be the 
path where it's installed on the head node).
    
    When spark.executor.uri is set, as it is with the Scala repl, Mesos is able 
to download the Spark dist package and run it from the framework temp directory 
on the slave.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to