Github user bonitao commented on the issue: https://github.com/apache/spark/pull/11748 Hi @JoshRosen , I am trying spark 2.0, and I believe I am hitting a bug that was introduced in this commit. In summary, the problem is that when kryo serialization is enabled and you have an RDD with less elements than the default parallelism being serialized with kryo, spark will attempt to create an empty ChunkedByteBuffer and this code will throw "chunks must be non-empty". If you believe there is a better forum for me to discuss this, let me know. Happy to contribute pull requests if appropriate. The problem is easy to reproduce. First, open a spark shell. ```` spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism=2 ```` Then just try to serialize a RDD with a single element (two elements or above works fine, non kryo serialization works fine): ```` sc.makeRDD("element" :: Nil).persist(org.apache.spark.storage.StorageLevel.DISK_ONLY).count ```` And you get back: ``` [Stage 0:> (0 + 0) / 2]ERROR [12:35:15.701] [Executor task launch worker-0] org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalArgumentException: requirement failed: chunks must be non-empty at scala.Predef$.require(Predef.scala:224) ~[scala-library-2.11.8.jar:na] at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:41) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:52) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:101) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:439) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:672) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.scheduler.Task.run(Task.scala:85) ~[spark-core_2.11-2.0.0.jar:2.0.0] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) ~[spark-core_2.11-2.0.0.jar:2.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] ERROR [12:35:15.743] [task-result-getter-1] org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 0.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: requirement failed: chunks must be non-empty at scala.Predef$.require(Predef.scala:224) at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:41) at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:52) at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:101) at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:439) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:672) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1872) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1885) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1898) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1912) at org.apache.spark.rdd.RDD.count(RDD.scala:1111) ... 48 elided Caused by: java.lang.IllegalArgumentException: requirement failed: chunks must be non-empty at scala.Predef$.require(Predef.scala:224) at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:41) at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:52) at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:101) at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:439) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:672) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` I am in mac, and I used the stock preview binary from `http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-preview-bin-hadoop2.7.tgz`. A custom built v2.0.0-rc1 behaved the same in a linux box. The 1.6.x series has no problems.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org