[ 
https://issues.apache.org/jira/browse/SPARK-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263925#comment-14263925
 ] 

Brad Willard commented on SPARK-4779:
-------------------------------------

[~davies] I've already killed the environment and have moved on to spark 1.2.0

> PySpark Shuffle Fails Looking for Files that Don't Exist when low on Memory
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-4779
>                 URL: https://issues.apache.org/jira/browse/SPARK-4779
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Shuffle
>    Affects Versions: 1.1.0
>         Environment: ec2 launched cluster with scripts
> 6 Nodes
> c3.2xlarge
>            Reporter: Brad Willard
>
> When Spark is tight on memory it starts saying files don't exist during 
> shuffle causing tasks to fail and be rebuilt destroying performance.
> The same code works flawlessly with smaller datasets with less memory 
> pressure I assume.
> {code}
> 14/12/06 18:39:37 WARN scheduler.TaskSetManager: Lost task 292.0 in stage 3.0 
> (TID 1099, ip-10-13-192-209.ec2.internal): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/root/spark/python/pyspark/worker.py", line 79, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/root/spark/python/pyspark/serializers.py", line 196, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File "/root/spark/python/pyspark/serializers.py", line 127, in dump_stream
>     for obj in iterator:
>   File "/root/spark/python/pyspark/serializers.py", line 185, in _batched
>     for item in iterator:
>   File "/root/spark/python/pyspark/shuffle.py", line 370, in _external_items
>     self.mergeCombiners(self.serializer.load_stream(open(p)),
> IOError: [Errno 2] No such file or directory: 
> '/mnt/spark/spark-local-20141206182702-8748/python/16070/66618000/1/18'
>         
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
>         org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:91)
>         org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:87)
>         
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>         scala.collection.Iterator$$anon$12.next(Iterator.scala:357)
>         
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>         scala.collection.Iterator$$anon$12.next(Iterator.scala:357)
>         scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>         
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>         
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>         org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to