[jira] [Commented] (SPARK-1284) pyspark hangs after IOError on Executor
[ https://issues.apache.org/jira/browse/SPARK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097808#comment-14097808 ] Jim Blomo commented on SPARK-1284: -- Hi, having trouble compiling either master or branch-1.1, I sent a request to the mailing list for help. Are there any compiled snapshots? pyspark hangs after IOError on Executor --- Key: SPARK-1284 URL: https://issues.apache.org/jira/browse/SPARK-1284 Project: Spark Issue Type: Bug Components: PySpark Reporter: Jim Blomo Assignee: Davies Liu When running a reduceByKey over a cached RDD, Python fails with an exception, but the failure is not detected by the task runner. Spark and the pyspark shell hang waiting for the task to finish. The error is: {code} PySpark worker failed with exception: Traceback (most recent call last): File /home/hadoop/spark/python/pyspark/worker.py, line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File /home/hadoop/spark/python/pyspark/serializers.py, line 182, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File /home/hadoop/spark/python/pyspark/serializers.py, line 118, in dump_stream self._write_with_length(obj, stream) File /home/hadoop/spark/python/pyspark/serializers.py, line 130, in _write_with_length stream.write(serialized) IOError: [Errno 104] Connection reset by peer 14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 4257 bytes in 47 ms Traceback (most recent call last): File /home/hadoop/spark/python/pyspark/daemon.py, line 117, in launch_worker worker(listen_sock) File /home/hadoop/spark/python/pyspark/daemon.py, line 107, in worker outfile.flush() IOError: [Errno 32] Broken pipe {code} I can reproduce the error by running take(10) on the cached RDD before running reduceByKey (which looks at the whole input file). Affects Version 1.0.0-SNAPSHOT (4d88030486) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1284) pyspark hangs after IOError on Executor
[ https://issues.apache.org/jira/browse/SPARK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093219#comment-14093219 ] Jim Blomo commented on SPARK-1284: -- I will try to reproduce on the 1.1 branch later this week, thanks for the update! pyspark hangs after IOError on Executor --- Key: SPARK-1284 URL: https://issues.apache.org/jira/browse/SPARK-1284 Project: Spark Issue Type: Bug Components: PySpark Reporter: Jim Blomo Assignee: Davies Liu When running a reduceByKey over a cached RDD, Python fails with an exception, but the failure is not detected by the task runner. Spark and the pyspark shell hang waiting for the task to finish. The error is: {code} PySpark worker failed with exception: Traceback (most recent call last): File /home/hadoop/spark/python/pyspark/worker.py, line 77, in main serializer.dump_stream(func(split_index, iterator), outfile) File /home/hadoop/spark/python/pyspark/serializers.py, line 182, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File /home/hadoop/spark/python/pyspark/serializers.py, line 118, in dump_stream self._write_with_length(obj, stream) File /home/hadoop/spark/python/pyspark/serializers.py, line 130, in _write_with_length stream.write(serialized) IOError: [Errno 104] Connection reset by peer 14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 4257 bytes in 47 ms Traceback (most recent call last): File /home/hadoop/spark/python/pyspark/daemon.py, line 117, in launch_worker worker(listen_sock) File /home/hadoop/spark/python/pyspark/daemon.py, line 107, in worker outfile.flush() IOError: [Errno 32] Broken pipe {code} I can reproduce the error by running take(10) on the cached RDD before running reduceByKey (which looks at the whole input file). Affects Version 1.0.0-SNAPSHOT (4d88030486) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1097) ConcurrentModificationException
[ https://issues.apache.org/jira/browse/SPARK-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020278#comment-14020278 ] Jim Blomo commented on SPARK-1097: -- FYI still seeing this on spark 1.0, Hadoop 2.4 {code:java} java.util.ConcurrentModificationException (java.util.ConcurrentModificationException) java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) java.util.HashMap$KeyIterator.next(HashMap.java:956) java.util.AbstractCollection.addAll(AbstractCollection.java:341) java.util.HashSet.init(HashSet.java:117) org.apache.hadoop.conf.Configuration.init(Configuration.java:671) com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:98) org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2402) org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2436) org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2418) org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) org.apache.hadoop.mapred.LineRecordReader.init(LineRecordReader.java:107) org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:190) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200) org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175) org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175) org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:174) {code} ConcurrentModificationException --- Key: SPARK-1097 URL: https://issues.apache.org/jira/browse/SPARK-1097 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Fabrizio Milo Attachments: nravi_Conf_Spark-1388.patch {noformat} 14/02/16 08:18:45 WARN TaskSetManager: Loss was due to java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926) at java.util.HashMap$KeyIterator.next(HashMap.java:960) at java.util.AbstractCollection.addAll(AbstractCollection.java:341) at java.util.HashSet.init(HashSet.java:117) at org.apache.hadoop.conf.Configuration.init(Configuration.java:554) at org.apache.hadoop.mapred.JobConf.init(JobConf.java:439) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:110) at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:154) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at
[jira] [Created] (SPARK-1353) IllegalArgumentException when writing to disk
Jim Blomo created SPARK-1353: Summary: IllegalArgumentException when writing to disk Key: SPARK-1353 URL: https://issues.apache.org/jira/browse/SPARK-1353 Project: Apache Spark Issue Type: Bug Components: Block Manager Environment: AWS EMR 3.2.30-49.59.amzn1.x86_64 #1 SMP x86_64 GNU/Linux Reporter: Jim Blomo Priority: Minor The Executor may fail when trying to mmap a file bigger than Integer.MAX_VALUE due to the constraints of FileChannel.map (http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode, long, long)). The signature takes longs, but the size value must be less than MAX_VALUE. This manifests with the following backtrace: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:98) at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:337) at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:281) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:430) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:38) at org.apache.spark.rdd.RDD.iterator(RDD.scala:220) at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:85) -- This message was sent by Atlassian JIRA (v6.2#6252)