[ https://issues.apache.org/jira/browse/SPARK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093219#comment-14093219 ]
Jim Blomo commented on SPARK-1284: ---------------------------------- I will try to reproduce on the 1.1 branch later this week, thanks for the update! > pyspark hangs after IOError on Executor > --------------------------------------- > > Key: SPARK-1284 > URL: https://issues.apache.org/jira/browse/SPARK-1284 > Project: Spark > Issue Type: Bug > Components: PySpark > Reporter: Jim Blomo > Assignee: Davies Liu > > When running a reduceByKey over a cached RDD, Python fails with an exception, > but the failure is not detected by the task runner. Spark and the pyspark > shell hang waiting for the task to finish. > The error is: > {code} > PySpark worker failed with exception: > Traceback (most recent call last): > File "/home/hadoop/spark/python/pyspark/worker.py", line 77, in main > serializer.dump_stream(func(split_index, iterator), outfile) > File "/home/hadoop/spark/python/pyspark/serializers.py", line 182, in > dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File "/home/hadoop/spark/python/pyspark/serializers.py", line 118, in > dump_stream > self._write_with_length(obj, stream) > File "/home/hadoop/spark/python/pyspark/serializers.py", line 130, in > _write_with_length > stream.write(serialized) > IOError: [Errno 104] Connection reset by peer > 14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as > 4257 bytes in 47 ms > Traceback (most recent call last): > File "/home/hadoop/spark/python/pyspark/daemon.py", line 117, in > launch_worker > worker(listen_sock) > File "/home/hadoop/spark/python/pyspark/daemon.py", line 107, in worker > outfile.flush() > IOError: [Errno 32] Broken pipe > {code} > I can reproduce the error by running take(10) on the cached RDD before > running reduceByKey (which looks at the whole input file). > Affects Version 1.0.0-SNAPSHOT (4d88030486) -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org