Brad Willard created SPARK-4779:
-----------------------------------

             Summary: PySpark Shuffle Fails Looking for Files that Don't Exist 
when low on Memory
                 Key: SPARK-4779
                 URL: https://issues.apache.org/jira/browse/SPARK-4779
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Shuffle
    Affects Versions: 1.1.0
         Environment: ec2 launched cluster with scripts
6 Nodes
c3.2xlarge

            Reporter: Brad Willard


When Spark is tight on memory it starts saying files don't exist during shuffle 
causing tasks to fail and be rebuilt destroying performance.

The same code works flawlessly with smaller datasets with less memory pressure 
I assume.

14/12/06 18:39:37 WARN scheduler.TaskSetManager: Lost task 292.0 in stage 3.0 
(TID 1099, ip-10-13-192-209.ec2.internal): 
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/root/spark/python/pyspark/worker.py", line 79, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/root/spark/python/pyspark/serializers.py", line 196, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/root/spark/python/pyspark/serializers.py", line 127, in dump_stream
    for obj in iterator:
  File "/root/spark/python/pyspark/serializers.py", line 185, in _batched
    for item in iterator:
  File "/root/spark/python/pyspark/shuffle.py", line 370, in _external_items
    self.mergeCombiners(self.serializer.load_stream(open(p)),
IOError: [Errno 2] No such file or directory: 
'/mnt/spark/spark-local-20141206182702-8748/python/16070/66618000/1/18'

        org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
        org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:91)
        org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:87)
        
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
        scala.collection.Iterator$$anon$12.next(Iterator.scala:357)
        
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
        scala.collection.Iterator$$anon$12.next(Iterator.scala:357)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
        
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
        
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to