Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t
Thanks Sean. Kind Regards, Sachit Murarka On Mon, Mar 8, 2021 at 6:23 PM Sean Owen wrote: > It's there in the error: No space left on device > You ran out of disk space (local disk) on one of your machines. > > On Mon, Mar 8, 2021 at 2:02 AM Sachit Murarka > wrote: > >> Hi All, >> >> I am getting the following error in my spark job. >> >> Can someone please have a look ? >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException: >> java.io.IOException: No space left on device\n\tat >> com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat >> com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat >> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat >> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat >> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat >> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat >> org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat >> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat >> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat >> org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat >> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat >> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat >> java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No >> space left on device\n\tat java.io.FileOutputStream.writeBytes(Native >> Method)\n\tat >> java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat >> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat >> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat >> net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat >> com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19 >> more\n\nDriver stacktrace:\n\tat >> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat >> scala.Option.foreach(Option.scala:257)\n\tat >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat >> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat >> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat >> org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat >> org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat >> org.apache.spark.SparkContex
Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t
It's there in the error: No space left on device You ran out of disk space (local disk) on one of your machines. On Mon, Mar 8, 2021 at 2:02 AM Sachit Murarka wrote: > Hi All, > > I am getting the following error in my spark job. > > Can someone please have a look ? > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException: > java.io.IOException: No space left on device\n\tat > com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat > com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat > com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat > com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat > org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat > org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat > org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat > java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No > space left on device\n\tat java.io.FileOutputStream.writeBytes(Native > Method)\n\tat > java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat > net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat > com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19 > more\n\nDriver stacktrace:\n\tat > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat > scala.Option.foreach(Option.scala:257)\n\tat > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)\n\tat > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)\n\tat > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat > org.apach
Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t
Hi Gourav, I am using Pyspark . Spark version 2.4.4. I have checked its not an space issue. Also I am using mount directory for storing temp files. Thanks Sachit On Mon, 8 Mar 2021, 13:53 Gourav Sengupta, wrote: > Hi, > > it will be much help if you could at least format the message before > asking people to go through it. Also I am pretty sure that the error is > mentioned in the first line itself. > > Any ideas regarding the SPARK version, and environment that you are using? > > > Thanks and Regards, > Gourav Sengupta > > On Mon, Mar 8, 2021 at 8:02 AM Sachit Murarka > wrote: > >> Hi All, >> >> I am getting the following error in my spark job. >> >> Can someone please have a look ? >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException: >> java.io.IOException: No space left on device\n\tat >> com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat >> com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat >> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat >> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat >> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat >> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat >> org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat >> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat >> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat >> org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat >> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat >> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat >> java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No >> space left on device\n\tat java.io.FileOutputStream.writeBytes(Native >> Method)\n\tat >> java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat >> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat >> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat >> net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat >> com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19 >> more\n\nDriver stacktrace:\n\tat >> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat >> scala.Option.foreach(Option.scala:257)\n\tat >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat >>
Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t
Hi, it will be much help if you could at least format the message before asking people to go through it. Also I am pretty sure that the error is mentioned in the first line itself. Any ideas regarding the SPARK version, and environment that you are using? Thanks and Regards, Gourav Sengupta On Mon, Mar 8, 2021 at 8:02 AM Sachit Murarka wrote: > Hi All, > > I am getting the following error in my spark job. > > Can someone please have a look ? > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException: > java.io.IOException: No space left on device\n\tat > com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat > com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat > com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat > com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat > org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat > org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat > org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat > java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No > space left on device\n\tat java.io.FileOutputStream.writeBytes(Native > Method)\n\tat > java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat > net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat > com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19 > more\n\nDriver stacktrace:\n\tat > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat > scala.Option.foreach(Option.scala:257)\n\tat > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)\n\tat > org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)\n\tat > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)\n\
com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t
Hi All, I am getting the following error in my spark job. Can someone please have a look ? org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\tat com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No space left on device\n\tat java.io.FileOutputStream.writeBytes(Native Method)\n\tat java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19 more\n\nDriver stacktrace:\n\tat org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat scala.Option.foreach(Option.scala:257)\n\tat org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)\n\tat org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat org.apache.spark.rdd.RDD.withScope(RDD.scala:363)\n\tat org.apache.spark.rdd.RDD.collect(RDD.scala:944)\n\tat org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:166)\n\tat org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke
Re: SparkContext initialization error- java.io.IOException: No space left on device
Thank you both - yup: the /tmp disk space was filled up:) On Sun, Sep 6, 2015 at 11:51 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Use the following command if needed: > df -i /tmp > > See > https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available > > On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > >> The folder is in "/tmp" by default. Could you use "df -h" to check the >> free space of /tmp? >> >> Best Regards, >> Shixiong Zhu >> >> 2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>: >> >>> Has anyone seen this error? Not sure which dir the program was trying to >>> write to. >>> >>> I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client >>> mode. >>> >>> 15/09/04 21:36:06 ERROR SparkContext: Error adding jar >>> (java.io.IOException: No space left on device), was the --addJars option >>> used? >>> >>> 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext. >>> >>> java.io.IOException: No space left on device >>> >>> at java.io.FileOutputStream.writeBytes(Native Method) >>> >>> at java.io.FileOutputStream.write(FileOutputStream.java:300) >>> >>> at >>> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178) >>> >>> at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213) >>> >>> at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318) >>> >>> at >>> java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163) >>> >>> at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338) >>> >>> at >>> org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432) >>> >>> at >>> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338) >>> >>> at >>> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) >>> >>> at >>> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) >>> >>> at >>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) >>> >>> at >>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>> >>> at org.apache.spark.SparkContext.(SparkContext.scala:497) >>> >>> Thanks, >>> Shenyan >>> >> >> >
Re: SparkContext initialization error- java.io.IOException: No space left on device
The folder is in "/tmp" by default. Could you use "df -h" to check the free space of /tmp? Best Regards, Shixiong Zhu 2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>: > Has anyone seen this error? Not sure which dir the program was trying to > write to. > > I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client > mode. > > 15/09/04 21:36:06 ERROR SparkContext: Error adding jar > (java.io.IOException: No space left on device), was the --addJars option > used? > > 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext. > > java.io.IOException: No space left on device > > at java.io.FileOutputStream.writeBytes(Native Method) > > at java.io.FileOutputStream.write(FileOutputStream.java:300) > > at > java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178) > > at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213) > > at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318) > > at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163) > > at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338) > > at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432) > > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338) > > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) > > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) > > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) > > at org.apache.spark.SparkContext.(SparkContext.scala:497) > > Thanks, > Shenyan >
Re: SparkContext initialization error- java.io.IOException: No space left on device
Use the following command if needed: df -i /tmp See https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > The folder is in "/tmp" by default. Could you use "df -h" to check the > free space of /tmp? > > Best Regards, > Shixiong Zhu > > 2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>: > >> Has anyone seen this error? Not sure which dir the program was trying to >> write to. >> >> I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client >> mode. >> >> 15/09/04 21:36:06 ERROR SparkContext: Error adding jar >> (java.io.IOException: No space left on device), was the --addJars option >> used? >> >> 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext. >> >> java.io.IOException: No space left on device >> >> at java.io.FileOutputStream.writeBytes(Native Method) >> >> at java.io.FileOutputStream.write(FileOutputStream.java:300) >> >> at >> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178) >> >> at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213) >> >> at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318) >> >> at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163) >> >> at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338) >> >> at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432) >> >> at >> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338) >> >> at >> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) >> >> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) >> >> at >> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) >> >> at >> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >> >> at org.apache.spark.SparkContext.(SparkContext.scala:497) >> >> Thanks, >> Shenyan >> > >
SparkContext initialization error- java.io.IOException: No space left on device
Has anyone seen this error? Not sure which dir the program was trying to write to. I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client mode. 15/09/04 21:36:06 ERROR SparkContext: Error adding jar (java.io.IOException: No space left on device), was the --addJars option used? 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext. java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:300) at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178) at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213) at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318) at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163) at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338) at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.(SparkContext.scala:497) Thanks, Shenyan
Re: java.io.IOException: No space left on device--regd.
While the job is running, just look in the directory and see whats the root cause of it (is it the logs? is it the shuffle? etc). Here's a few configuration options which you can try: - Disable shuffle : spark.shuffle.spill=false (It might end up in OOM) - Enable log rotation: sparkConf.set(spark.executor.logs.rolling.strategy, size) .set(spark.executor.logs.rolling.size.maxBytes, 1024) .set(spark.executor.logs.rolling.maxRetainedFiles, 3) Thanks Best Regards On Mon, Jul 6, 2015 at 10:44 AM, Devarajan Srinivasan devathecool1...@gmail.com wrote: Hi , I am trying to run an ETL on spark which involves expensive shuffle operation. Basically I require a self-join to be performed on a sparkDataFrame RDD . The job runs fine for around 15 hours and when the stage(which performs the sef-join) is about to complete, I get a *java.io.IOException: No space left on device*. I initially thought this could be due to *spark.local.dir* pointing to */tmp* directory which was configured with *2GB* of space, since this job requires expensive shuffles,spark requires more space to write the shuffle files. Hence I configured *spark.local.dir* to point to a different directory which has *1TB* of space. But still I get the same *no space left exception*. What could be the root cause of this issue? Thanks in advance. *Exception stacktrace:* *java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:87) at org.apache.spark.storage.DiskBlockObjectWriter.org http://org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:229) at org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:87) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297) at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244) at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285) at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:370) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)*
Re: java.io.IOException: No space left on device--regd.
You can also set these in the spark-env.sh file : export SPARK_WORKER_DIR=/mnt/spark/ export SPARK_LOCAL_DIR=/mnt/spark/ Thanks Best Regards On Mon, Jul 6, 2015 at 12:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: While the job is running, just look in the directory and see whats the root cause of it (is it the logs? is it the shuffle? etc). Here's a few configuration options which you can try: - Disable shuffle : spark.shuffle.spill=false (It might end up in OOM) - Enable log rotation: sparkConf.set(spark.executor.logs.rolling.strategy, size) .set(spark.executor.logs.rolling.size.maxBytes, 1024) .set(spark.executor.logs.rolling.maxRetainedFiles, 3) Thanks Best Regards On Mon, Jul 6, 2015 at 10:44 AM, Devarajan Srinivasan devathecool1...@gmail.com wrote: Hi , I am trying to run an ETL on spark which involves expensive shuffle operation. Basically I require a self-join to be performed on a sparkDataFrame RDD . The job runs fine for around 15 hours and when the stage(which performs the sef-join) is about to complete, I get a *java.io.IOException: No space left on device*. I initially thought this could be due to *spark.local.dir* pointing to */tmp* directory which was configured with *2GB* of space, since this job requires expensive shuffles,spark requires more space to write the shuffle files. Hence I configured *spark.local.dir* to point to a different directory which has *1TB* of space. But still I get the same *no space left exception*. What could be the root cause of this issue? Thanks in advance. *Exception stacktrace:* *java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:87) at org.apache.spark.storage.DiskBlockObjectWriter.org http://org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:229) at org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:87) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297) at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244) at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285) at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:370) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)*
java.io.IOException: No space left on device--regd.
Hi , I am trying to run an ETL on spark which involves expensive shuffle operation. Basically I require a self-join to be performed on a sparkDataFrame RDD . The job runs fine for around 15 hours and when the stage(which performs the sef-join) is about to complete, I get a *java.io.IOException: No space left on device*. I initially thought this could be due to *spark.local.dir* pointing to */tmp* directory which was configured with *2GB* of space, since this job requires expensive shuffles,spark requires more space to write the shuffle files. Hence I configured *spark.local.dir* to point to a different directory which has *1TB* of space. But still I get the same *no space left exception*. What could be the root cause of this issue? Thanks in advance. *Exception stacktrace:* *java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:87) at org.apache.spark.storage.DiskBlockObjectWriter.org http://org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:229) at org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:87) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297) at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244) at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285) at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:370) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)*
Re: java.io.IOException: No space left on device while doing repartitioning in Spark
It could be filling up your /tmp directory. You need to set your spark.local.dir or you can also specify SPARK_WORKER_DIR to another location which has sufficient space. Thanks Best Regards On Mon, May 4, 2015 at 7:27 PM, shahab shahab.mok...@gmail.com wrote: Hi, I am getting No space left on device exception when doing repartitioning of approx. 285 MB of data while these is still 2 GB space left ?? does it mean that repartitioning needs more space (more than 2 GB) for repartitioning of 285 MB of data ?? best, /Shahab java.io.IOException: No space left on device at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:473) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:569) at org.apache.spark.util.Utils$.copyStream(Utils.scala:331) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:728) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
java.io.IOException: No space left on device while doing repartitioning in Spark
Hi, I am getting No space left on device exception when doing repartitioning of approx. 285 MB of data while these is still 2 GB space left ?? does it mean that repartitioning needs more space (more than 2 GB) for repartitioning of 285 MB of data ?? best, /Shahab java.io.IOException: No space left on device at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:473) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:569) at org.apache.spark.util.Utils$.copyStream(Utils.scala:331) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:728) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Re: java.io.IOException: No space left on device while doing repartitioning in Spark
See https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available What's the value for spark.local.dir property ? Cheers On Mon, May 4, 2015 at 6:57 AM, shahab shahab.mok...@gmail.com wrote: Hi, I am getting No space left on device exception when doing repartitioning of approx. 285 MB of data while these is still 2 GB space left ?? does it mean that repartitioning needs more space (more than 2 GB) for repartitioning of 285 MB of data ?? best, /Shahab java.io.IOException: No space left on device at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205) at sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:473) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:569) at org.apache.spark.util.Utils$.copyStream(Utils.scala:331) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:728) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Re: java.io.IOException: No space left on device
Or multiple volumes. The LOCAL_DIRS (YARN) and SPARK_LOCAL_DIRS (Mesos, Standalone) environment variables and the spark.local.dir property control where temporary data is written. The default is /tmp. See http://spark.apache.org/docs/latest/configuration.html#runtime-environment for more details. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Apr 29, 2015 at 6:19 AM, Anshul Singhle ans...@betaglide.com wrote: Do you have multiple disks? Maybe your work directory is not in the right disk? On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote: Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got No space left on device.. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.IOException: No space left on device
Makes sense. / is where /tmp would be. However, 230G should be plenty of space. If you have INFO logging turned on (set in $SPARK_HOME/conf/log4j.properties), you'll see messages about saving data to disk that will list sizes. The web console also has some summary information about this. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Apr 29, 2015 at 6:25 AM, selim namsi selim.na...@gmail.com wrote: This is the output of df -h so as you can see I'm using only one disk mounted on / df -h Filesystem Size Used Avail Use% Mounted on /dev/sda8 276G 34G 229G 13% /none4.0K 0 4.0K 0% /sys/fs/cgroup udev7.8G 4.0K 7.8G 1% /dev tmpfs 1.6G 1.4M 1.6G 1% /runnone5.0M 0 5.0M 0% /run/locknone7.8G 37M 7.8G 1% /run/shmnone 100M 40K 100M 1% /run/user /dev/sda1 496M 55M 442M 11% /boot/efi Also when running the program, I noticed that the Used% disk space related to the partition mounted on / was growing very fast On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle ans...@betaglide.com wrote: Do you have multiple disks? Maybe your work directory is not in the right disk? On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote: Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got No space left on device.. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
java.io.IOException: No space left on device
Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got No space left on device.. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.IOException: No space left on device
Do you have multiple disks? Maybe your work directory is not in the right disk? On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote: Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got No space left on device.. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.IOException: No space left on device
This is the output of df -h so as you can see I'm using only one disk mounted on / df -h Filesystem Size Used Avail Use% Mounted on /dev/sda8 276G 34G 229G 13% /none4.0K 0 4.0K 0% /sys/fs/cgroup udev7.8G 4.0K 7.8G 1% /dev tmpfs 1.6G 1.4M 1.6G 1% /runnone5.0M 0 5.0M 0% /run/locknone7.8G 37M 7.8G 1% /run/shmnone 100M 40K 100M 1% /run/user /dev/sda1 496M 55M 442M 11% /boot/efi Also when running the program, I noticed that the Used% disk space related to the partition mounted on / was growing very fast On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle ans...@betaglide.com wrote: Do you have multiple disks? Maybe your work directory is not in the right disk? On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote: Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got No space left on device.. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.IOException: No space left on device
Sorry I put the log messages when creating the thread in http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-td22702.html but I forgot that raw messages will not be sent in emails. So this is the log related to the error : 15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_0 not found, computing it 15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_0 locally 15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_1 not found, computing it 15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_1 locally 15/04/29 02:49:13 WARN MemoryStore: Not enough space to cache rdd_19_1 in memory! (computed 1106.0 MB so far) 15/04/29 02:49:13 INFO MemoryStore: Memory use = 234.0 MB (blocks) + 2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage limit = 3.1 GB. 15/04/29 02:49:13 WARN CacheManager: Persisting partition rdd_19_1 to disk instead. 15/04/29 02:49:28 WARN MemoryStore: Not enough space to cache rdd_19_0 in memory! (computed 1745.7 MB so far) 15/04/29 02:49:28 INFO MemoryStore: Memory use = 234.0 MB (blocks) + 2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage limit = 3.1 GB. 15/04/29 02:49:28 WARN CacheManager: Persisting partition rdd_19_0 to disk instead. 15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_0 failed 15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_1 failed 15/04/29 03:56:12 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 7) java.io.IOException: No space left on *device *It seems that the partitions rdd_19_0 and rdd_9=19_1 needs both of them 2.9 GB. Thanks On Wed, Apr 29, 2015 at 12:34 PM Dean Wampler deanwamp...@gmail.com wrote: Makes sense. / is where /tmp would be. However, 230G should be plenty of space. If you have INFO logging turned on (set in $SPARK_HOME/conf/log4j.properties), you'll see messages about saving data to disk that will list sizes. The web console also has some summary information about this. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Apr 29, 2015 at 6:25 AM, selim namsi selim.na...@gmail.com wrote: This is the output of df -h so as you can see I'm using only one disk mounted on / df -h Filesystem Size Used Avail Use% Mounted on /dev/sda8 276G 34G 229G 13% /none4.0K 0 4.0K 0% /sys/fs/cgroup udev7.8G 4.0K 7.8G 1% /dev tmpfs 1.6G 1.4M 1.6G 1% /runnone5.0M 0 5.0M 0% /run/locknone7.8G 37M 7.8G 1% /run/shmnone 100M 40K 100M 1% /run/user /dev/sda1 496M 55M 442M 11% /boot/efi Also when running the program, I noticed that the Used% disk space related to the partition mounted on / was growing very fast On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle ans...@betaglide.com wrote: Do you have multiple disks? Maybe your work directory is not in the right disk? On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote: Hi, I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf output,the training data is a file containing 156060 (size 8.1M). The problem is that when trying to presist a partition into memory and there is not enought memory, the partition is persisted on disk and despite Having 229G of free disk space, I got No space left on device.. This is how I'm running the program : ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv testData.tsv And this is a part of the log: If you need more informations, please let me know. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org