[jira] [Commented] (SPARK-1284) pyspark hangs after IOError on Executor

2014-08-14 Thread Jim Blomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097808#comment-14097808
 ] 

Jim Blomo commented on SPARK-1284:
--

Hi, having trouble compiling either master or branch-1.1, I sent a request to 
the mailing list for help.  Are there any compiled snapshots?

 pyspark hangs after IOError on Executor
 ---

 Key: SPARK-1284
 URL: https://issues.apache.org/jira/browse/SPARK-1284
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Reporter: Jim Blomo
Assignee: Davies Liu

 When running a reduceByKey over a cached RDD, Python fails with an exception, 
 but the failure is not detected by the task runner.  Spark and the pyspark 
 shell hang waiting for the task to finish.
 The error is:
 {code}
 PySpark worker failed with exception:
 Traceback (most recent call last):
   File /home/hadoop/spark/python/pyspark/worker.py, line 77, in main
 serializer.dump_stream(func(split_index, iterator), outfile)
   File /home/hadoop/spark/python/pyspark/serializers.py, line 182, in 
 dump_stream
 self.serializer.dump_stream(self._batched(iterator), stream)
   File /home/hadoop/spark/python/pyspark/serializers.py, line 118, in 
 dump_stream
 self._write_with_length(obj, stream)
   File /home/hadoop/spark/python/pyspark/serializers.py, line 130, in 
 _write_with_length
 stream.write(serialized)
 IOError: [Errno 104] Connection reset by peer
 14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 
 4257 bytes in 47 ms
 Traceback (most recent call last):
   File /home/hadoop/spark/python/pyspark/daemon.py, line 117, in 
 launch_worker
 worker(listen_sock)
   File /home/hadoop/spark/python/pyspark/daemon.py, line 107, in worker
 outfile.flush()
 IOError: [Errno 32] Broken pipe
 {code}
 I can reproduce the error by running take(10) on the cached RDD before 
 running reduceByKey (which looks at the whole input file).
 Affects Version 1.0.0-SNAPSHOT (4d88030486)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1284) pyspark hangs after IOError on Executor

2014-08-11 Thread Jim Blomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093219#comment-14093219
 ] 

Jim Blomo commented on SPARK-1284:
--

I will try to reproduce on the 1.1 branch later this week, thanks for the 
update!

 pyspark hangs after IOError on Executor
 ---

 Key: SPARK-1284
 URL: https://issues.apache.org/jira/browse/SPARK-1284
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Reporter: Jim Blomo
Assignee: Davies Liu

 When running a reduceByKey over a cached RDD, Python fails with an exception, 
 but the failure is not detected by the task runner.  Spark and the pyspark 
 shell hang waiting for the task to finish.
 The error is:
 {code}
 PySpark worker failed with exception:
 Traceback (most recent call last):
   File /home/hadoop/spark/python/pyspark/worker.py, line 77, in main
 serializer.dump_stream(func(split_index, iterator), outfile)
   File /home/hadoop/spark/python/pyspark/serializers.py, line 182, in 
 dump_stream
 self.serializer.dump_stream(self._batched(iterator), stream)
   File /home/hadoop/spark/python/pyspark/serializers.py, line 118, in 
 dump_stream
 self._write_with_length(obj, stream)
   File /home/hadoop/spark/python/pyspark/serializers.py, line 130, in 
 _write_with_length
 stream.write(serialized)
 IOError: [Errno 104] Connection reset by peer
 14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 
 4257 bytes in 47 ms
 Traceback (most recent call last):
   File /home/hadoop/spark/python/pyspark/daemon.py, line 117, in 
 launch_worker
 worker(listen_sock)
   File /home/hadoop/spark/python/pyspark/daemon.py, line 107, in worker
 outfile.flush()
 IOError: [Errno 32] Broken pipe
 {code}
 I can reproduce the error by running take(10) on the cached RDD before 
 running reduceByKey (which looks at the whole input file).
 Affects Version 1.0.0-SNAPSHOT (4d88030486)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1097) ConcurrentModificationException

2014-06-06 Thread Jim Blomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020278#comment-14020278
 ] 

Jim Blomo commented on SPARK-1097:
--

FYI still seeing this on spark 1.0, Hadoop 2.4

{code:java}
java.util.ConcurrentModificationException 
(java.util.ConcurrentModificationException)
java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
java.util.HashMap$KeyIterator.next(HashMap.java:956)
java.util.AbstractCollection.addAll(AbstractCollection.java:341)
java.util.HashSet.init(HashSet.java:117)
org.apache.hadoop.conf.Configuration.init(Configuration.java:671)
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:98)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2402)
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2436)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2418)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
org.apache.hadoop.mapred.LineRecordReader.init(LineRecordReader.java:107)
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:190)
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181)
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200)
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:174)
{code}

 ConcurrentModificationException
 ---

 Key: SPARK-1097
 URL: https://issues.apache.org/jira/browse/SPARK-1097
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Fabrizio Milo
 Attachments: nravi_Conf_Spark-1388.patch


 {noformat}
 14/02/16 08:18:45 WARN TaskSetManager: Loss was due to 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
   at java.util.HashMap$KeyIterator.next(HashMap.java:960)
   at java.util.AbstractCollection.addAll(AbstractCollection.java:341)
   at java.util.HashSet.init(HashSet.java:117)
   at org.apache.hadoop.conf.Configuration.init(Configuration.java:554)
   at org.apache.hadoop.mapred.JobConf.init(JobConf.java:439)
   at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:110)
   at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:154)
   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32)
   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at 
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
   at 
 

[jira] [Created] (SPARK-1353) IllegalArgumentException when writing to disk

2014-03-30 Thread Jim Blomo (JIRA)
Jim Blomo created SPARK-1353:


 Summary: IllegalArgumentException when writing to disk
 Key: SPARK-1353
 URL: https://issues.apache.org/jira/browse/SPARK-1353
 Project: Apache Spark
  Issue Type: Bug
  Components: Block Manager
 Environment: AWS EMR 3.2.30-49.59.amzn1.x86_64 #1 SMP  x86_64 GNU/Linux
Reporter: Jim Blomo
Priority: Minor


The Executor may fail when trying to mmap a file bigger than Integer.MAX_VALUE 
due to the constraints of FileChannel.map 
(http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode,
 long, long)).  The signature takes longs, but the size value must be less than 
MAX_VALUE.  This manifests with the following backtrace:

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:98)
at 
org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:337)
at 
org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:281)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:430)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:38)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:85)



--
This message was sent by Atlassian JIRA
(v6.2#6252)