[ https://issues.apache.org/jira/browse/SPARK-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263925#comment-14263925 ]
Brad Willard commented on SPARK-4779: ------------------------------------- [~davies] I've already killed the environment and have moved on to spark 1.2.0 > PySpark Shuffle Fails Looking for Files that Don't Exist when low on Memory > --------------------------------------------------------------------------- > > Key: SPARK-4779 > URL: https://issues.apache.org/jira/browse/SPARK-4779 > Project: Spark > Issue Type: Bug > Components: PySpark, Shuffle > Affects Versions: 1.1.0 > Environment: ec2 launched cluster with scripts > 6 Nodes > c3.2xlarge > Reporter: Brad Willard > > When Spark is tight on memory it starts saying files don't exist during > shuffle causing tasks to fail and be rebuilt destroying performance. > The same code works flawlessly with smaller datasets with less memory > pressure I assume. > {code} > 14/12/06 18:39:37 WARN scheduler.TaskSetManager: Lost task 292.0 in stage 3.0 > (TID 1099, ip-10-13-192-209.ec2.internal): > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/root/spark/python/pyspark/worker.py", line 79, in main > serializer.dump_stream(func(split_index, iterator), outfile) > File "/root/spark/python/pyspark/serializers.py", line 196, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File "/root/spark/python/pyspark/serializers.py", line 127, in dump_stream > for obj in iterator: > File "/root/spark/python/pyspark/serializers.py", line 185, in _batched > for item in iterator: > File "/root/spark/python/pyspark/shuffle.py", line 370, in _external_items > self.mergeCombiners(self.serializer.load_stream(open(p)), > IOError: [Errno 2] No such file or directory: > '/mnt/spark/spark-local-20141206182702-8748/python/16070/66618000/1/18' > > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124) > org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:91) > org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:87) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > scala.collection.Iterator$$anon$12.next(Iterator.scala:357) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > scala.collection.Iterator$$anon$12.next(Iterator.scala:357) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335) > > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209) > > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) > > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > > org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org