Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t

2021-03-08 Thread Sachit Murarka
Thanks Sean.

Kind Regards,
Sachit Murarka


On Mon, Mar 8, 2021 at 6:23 PM Sean Owen  wrote:

> It's there in the error: No space left on device
> You ran out of disk space (local disk) on one of your machines.
>
> On Mon, Mar 8, 2021 at 2:02 AM Sachit Murarka 
> wrote:
>
>> Hi All,
>>
>> I am getting the following error in my spark job.
>>
>> Can someone please have a look ?
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException:
>> java.io.IOException: No space left on device\n\tat
>> com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat
>> com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat
>> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat
>> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat
>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat
>> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat
>> org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat
>> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat
>> org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat
>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat
>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
>> java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No
>> space left on device\n\tat java.io.FileOutputStream.writeBytes(Native
>> Method)\n\tat
>> java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat
>> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat
>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat
>> net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat
>> com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19
>> more\n\nDriver stacktrace:\n\tat
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
>> scala.Option.foreach(Option.scala:257)\n\tat
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat
>> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat
>> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat
>> org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat
>> org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat
>> org.apache.spark.SparkContex

Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t

2021-03-08 Thread Sean Owen
It's there in the error: No space left on device
You ran out of disk space (local disk) on one of your machines.

On Mon, Mar 8, 2021 at 2:02 AM Sachit Murarka 
wrote:

> Hi All,
>
> I am getting the following error in my spark job.
>
> Can someone please have a look ?
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException:
> java.io.IOException: No space left on device\n\tat
> com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat
> com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat
> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat
> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat
> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat
> org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat
> org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
> java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No
> space left on device\n\tat java.io.FileOutputStream.writeBytes(Native
> Method)\n\tat
> java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat
> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat
> net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat
> com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19
> more\n\nDriver stacktrace:\n\tat
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
> scala.Option.foreach(Option.scala:257)\n\tat
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)\n\tat
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)\n\tat
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat
> org.apach

Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t

2021-03-08 Thread Sachit Murarka
Hi Gourav,

I am using Pyspark . Spark version 2.4.4.
I have checked its not an space issue. Also I am using mount directory for
storing temp files.

Thanks
Sachit

On Mon, 8 Mar 2021, 13:53 Gourav Sengupta, 
wrote:

> Hi,
>
> it will be much help if you could at least format the message before
> asking people to go through it. Also I am pretty sure that the error is
> mentioned in the first line itself.
>
> Any ideas regarding the SPARK version, and environment that you are using?
>
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Mon, Mar 8, 2021 at 8:02 AM Sachit Murarka 
> wrote:
>
>> Hi All,
>>
>> I am getting the following error in my spark job.
>>
>> Can someone please have a look ?
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException:
>> java.io.IOException: No space left on device\n\tat
>> com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat
>> com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat
>> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat
>> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat
>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat
>> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat
>> org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat
>> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat
>> org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat
>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat
>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
>> java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No
>> space left on device\n\tat java.io.FileOutputStream.writeBytes(Native
>> Method)\n\tat
>> java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat
>> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat
>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat
>> net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat
>> com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19
>> more\n\nDriver stacktrace:\n\tat
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
>> scala.Option.foreach(Option.scala:257)\n\tat
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat
>>

Re: com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t

2021-03-08 Thread Gourav Sengupta
Hi,

it will be much help if you could at least format the message before asking
people to go through it. Also I am pretty sure that the error is mentioned
in the first line itself.

Any ideas regarding the SPARK version, and environment that you are using?


Thanks and Regards,
Gourav Sengupta

On Mon, Mar 8, 2021 at 8:02 AM Sachit Murarka 
wrote:

> Hi All,
>
> I am getting the following error in my spark job.
>
> Can someone please have a look ?
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException:
> java.io.IOException: No space left on device\n\tat
> com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat
> com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat
> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat
> com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat
> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat
> org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat
> org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
> java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No
> space left on device\n\tat java.io.FileOutputStream.writeBytes(Native
> Method)\n\tat
> java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat
> org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat
> net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat
> com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19
> more\n\nDriver stacktrace:\n\tat
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
> scala.Option.foreach(Option.scala:257)\n\tat
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)\n\tat
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)\n\tat
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)\n\

com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device\n\t

2021-03-08 Thread Sachit Murarka
Hi All,

I am getting the following error in my spark job.

Can someone please have a look ?

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage
41.0 (TID 80817, executor 193): com.esotericsoftware.kryo.KryoException:
java.io.IOException: No space left on device\n\tat
com.esotericsoftware.kryo.io.Output.flush(Output.java:188)\n\tat
com.esotericsoftware.kryo.io.Output.require(Output.java:164)\n\tat
com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)\n\tat
com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237)\n\tat
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49)\n\tat
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38)\n\tat
com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)\n\tat
org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:245)\n\tat
org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:134)\n\tat
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:241)\n\tat
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)\n\tat
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)\n\tat
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)\n\tat
org.apache.spark.scheduler.Task.run(Task.scala:123)\n\tat
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)\n\tat
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)\n\tat
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.IOException: No
space left on device\n\tat java.io.FileOutputStream.writeBytes(Native
Method)\n\tat
java.io.FileOutputStream.write(FileOutputStream.java:326)\n\tat
org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)\n\tat
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n\tat
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n\tat
net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:240)\n\tat
com.esotericsoftware.kryo.io.Output.flush(Output.java:186)\n\t... 19
more\n\nDriver stacktrace:\n\tat
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)\n\tat
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)\n\tat
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)\n\tat
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)\n\tat
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)\n\tat
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)\n\tat
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)\n\tat
scala.Option.foreach(Option.scala:257)\n\tat
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)\n\tat
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)\n\tat
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)\n\tat
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)\n\tat
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n\tat
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)\n\tat
org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)\n\tat
org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)\n\tat
org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)\n\tat
org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)\n\tat
org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)\n\tat
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n\tat
org.apache.spark.rdd.RDD.withScope(RDD.scala:363)\n\tat
org.apache.spark.rdd.RDD.collect(RDD.scala:944)\n\tat
org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:166)\n\tat
org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)\n\tat
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat
sun.reflect.DelegatingMethodAccessorImpl.invoke

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread shenyan zhen
Thank you both - yup: the /tmp disk space was filled up:)

On Sun, Sep 6, 2015 at 11:51 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Use the following command if needed:
> df -i /tmp
>
> See
> https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available
>
> On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:
>
>> The folder is in "/tmp" by default. Could you use "df -h" to check the
>> free space of /tmp?
>>
>> Best Regards,
>> Shixiong Zhu
>>
>> 2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>:
>>
>>> Has anyone seen this error? Not sure which dir the program was trying to
>>> write to.
>>>
>>> I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client
>>> mode.
>>>
>>> 15/09/04 21:36:06 ERROR SparkContext: Error adding jar
>>> (java.io.IOException: No space left on device), was the --addJars option
>>> used?
>>>
>>> 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext.
>>>
>>> java.io.IOException: No space left on device
>>>
>>> at java.io.FileOutputStream.writeBytes(Native Method)
>>>
>>> at java.io.FileOutputStream.write(FileOutputStream.java:300)
>>>
>>> at
>>> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178)
>>>
>>> at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213)
>>>
>>> at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318)
>>>
>>> at
>>> java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163)
>>>
>>> at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338)
>>>
>>> at
>>> org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432)
>>>
>>> at
>>> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338)
>>>
>>> at
>>> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561)
>>>
>>> at
>>> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115)
>>>
>>> at
>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>>>
>>> at
>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>>>
>>> at org.apache.spark.SparkContext.(SparkContext.scala:497)
>>>
>>> Thanks,
>>> Shenyan
>>>
>>
>>
>


Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread Shixiong Zhu
The folder is in "/tmp" by default. Could you use "df -h" to check the free
space of /tmp?

Best Regards,
Shixiong Zhu

2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>:

> Has anyone seen this error? Not sure which dir the program was trying to
> write to.
>
> I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client
> mode.
>
> 15/09/04 21:36:06 ERROR SparkContext: Error adding jar
> (java.io.IOException: No space left on device), was the --addJars option
> used?
>
> 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext.
>
> java.io.IOException: No space left on device
>
> at java.io.FileOutputStream.writeBytes(Native Method)
>
> at java.io.FileOutputStream.write(FileOutputStream.java:300)
>
> at
> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178)
>
> at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213)
>
> at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318)
>
> at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163)
>
> at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338)
>
> at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432)
>
> at
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338)
>
> at
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561)
>
> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115)
>
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>
> at org.apache.spark.SparkContext.(SparkContext.scala:497)
>
> Thanks,
> Shenyan
>


Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread Ted Yu
Use the following command if needed:
df -i /tmp

See
https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available

On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:

> The folder is in "/tmp" by default. Could you use "df -h" to check the
> free space of /tmp?
>
> Best Regards,
> Shixiong Zhu
>
> 2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>:
>
>> Has anyone seen this error? Not sure which dir the program was trying to
>> write to.
>>
>> I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client
>> mode.
>>
>> 15/09/04 21:36:06 ERROR SparkContext: Error adding jar
>> (java.io.IOException: No space left on device), was the --addJars option
>> used?
>>
>> 15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext.
>>
>> java.io.IOException: No space left on device
>>
>> at java.io.FileOutputStream.writeBytes(Native Method)
>>
>> at java.io.FileOutputStream.write(FileOutputStream.java:300)
>>
>> at
>> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178)
>>
>> at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213)
>>
>> at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318)
>>
>> at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163)
>>
>> at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338)
>>
>> at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432)
>>
>> at
>> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338)
>>
>> at
>> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561)
>>
>> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115)
>>
>> at
>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>>
>> at
>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>>
>> at org.apache.spark.SparkContext.(SparkContext.scala:497)
>>
>> Thanks,
>> Shenyan
>>
>
>


SparkContext initialization error- java.io.IOException: No space left on device

2015-09-04 Thread shenyan zhen
Has anyone seen this error? Not sure which dir the program was trying to
write to.

I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client mode.

15/09/04 21:36:06 ERROR SparkContext: Error adding jar
(java.io.IOException: No space left on device), was the --addJars option
used?

15/09/04 21:36:08 ERROR SparkContext: Error initializing SparkContext.

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:300)

at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:178)

at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:213)

at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:318)

at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:163)

at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:338)

at org.apache.spark.deploy.yarn.Client.createConfArchive(Client.scala:432)

at
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:338)

at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561)

at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115)

at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)

at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)

at org.apache.spark.SparkContext.(SparkContext.scala:497)

Thanks,
Shenyan


Re: java.io.IOException: No space left on device--regd.

2015-07-06 Thread Akhil Das
While the job is running, just look in the directory and see whats the root
cause of it (is it the logs? is it the shuffle? etc). Here's a few
configuration options which you can try:

- Disable shuffle : spark.shuffle.spill=false (It might end up in OOM)
- Enable log rotation:

sparkConf.set(spark.executor.logs.rolling.strategy, size)
.set(spark.executor.logs.rolling.size.maxBytes, 1024)
.set(spark.executor.logs.rolling.maxRetainedFiles, 3)


Thanks
Best Regards

On Mon, Jul 6, 2015 at 10:44 AM, Devarajan Srinivasan 
devathecool1...@gmail.com wrote:

 Hi ,

  I am trying to run an ETL on spark which involves expensive shuffle
 operation. Basically I require a self-join to be performed on a
 sparkDataFrame RDD . The job runs fine for around 15 hours and when the
 stage(which performs the sef-join) is about to complete, I get a 
 *java.io.IOException:
 No space left on device*. I initially thought this could be due  to
 *spark.local.dir* pointing to */tmp* directory which was configured with
 *2GB* of space, since this job requires expensive shuffles,spark
 requires  more space to write the  shuffle files. Hence I configured
 *spark.local.dir* to point to a different directory which has *1TB* of
 space. But still I get the same *no space left exception*. What could be
 the root cause of this issue?


 Thanks in advance.

 *Exception stacktrace:*

 *java.io.IOException: No space left on device
   at java.io.FileOutputStream.writeBytes(Native Method)
   at java.io.FileOutputStream.write(FileOutputStream.java:345)
   at 
 org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:87)
   at org.apache.spark.storage.DiskBlockObjectWriter.org 
 http://org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:229)
   at 
 org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:87)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
   at 
 org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297)
   at 
 org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244)
   at 
 org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99)
   at 
 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
   at 
 java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
   at 
 java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285)
   at 
 java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
   at 
 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
   at 
 org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204)
   at 
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:370)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)*





Re: java.io.IOException: No space left on device--regd.

2015-07-06 Thread Akhil Das
You can also set these in the spark-env.sh file :

export SPARK_WORKER_DIR=/mnt/spark/
export SPARK_LOCAL_DIR=/mnt/spark/



Thanks
Best Regards

On Mon, Jul 6, 2015 at 12:29 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 While the job is running, just look in the directory and see whats the
 root cause of it (is it the logs? is it the shuffle? etc). Here's a few
 configuration options which you can try:

 - Disable shuffle : spark.shuffle.spill=false (It might end up in OOM)
 - Enable log rotation:

 sparkConf.set(spark.executor.logs.rolling.strategy, size)
 .set(spark.executor.logs.rolling.size.maxBytes, 1024)
 .set(spark.executor.logs.rolling.maxRetainedFiles, 3)


 Thanks
 Best Regards

 On Mon, Jul 6, 2015 at 10:44 AM, Devarajan Srinivasan 
 devathecool1...@gmail.com wrote:

 Hi ,

  I am trying to run an ETL on spark which involves expensive shuffle
 operation. Basically I require a self-join to be performed on a
 sparkDataFrame RDD . The job runs fine for around 15 hours and when the
 stage(which performs the sef-join) is about to complete, I get a 
 *java.io.IOException:
 No space left on device*. I initially thought this could be due  to
 *spark.local.dir* pointing to */tmp* directory which was configured with
 *2GB* of space, since this job requires expensive shuffles,spark
 requires  more space to write the  shuffle files. Hence I configured
 *spark.local.dir* to point to a different directory which has *1TB* of
 space. But still I get the same *no space left exception*. What could be
 the root cause of this issue?


 Thanks in advance.

 *Exception stacktrace:*

 *java.io.IOException: No space left on device
  at java.io.FileOutputStream.writeBytes(Native Method)
  at java.io.FileOutputStream.write(FileOutputStream.java:345)
  at 
 org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:87)
  at org.apache.spark.storage.DiskBlockObjectWriter.org 
 http://org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:229)
  at 
 org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:87)
  at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
  at 
 org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297)
  at 
 org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244)
  at 
 org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99)
  at 
 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
  at 
 java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
  at 
 java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285)
  at 
 java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230)
  at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at 
 java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576)
  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
  at 
 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
  at 
 org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204)
  at 
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:370)
  at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
  at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
  at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:64)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)*






java.io.IOException: No space left on device--regd.

2015-07-05 Thread Devarajan Srinivasan
Hi ,

 I am trying to run an ETL on spark which involves expensive shuffle
operation. Basically I require a self-join to be performed on a
sparkDataFrame RDD . The job runs fine for around 15 hours and when the
stage(which performs the sef-join) is about to complete, I get a
*java.io.IOException:
No space left on device*. I initially thought this could be due  to
*spark.local.dir* pointing to */tmp* directory which was configured with
*2GB* of space, since this job requires expensive shuffles,spark requires
more space to write the  shuffle files. Hence I configured *spark.local.dir*
to point to a different directory which has *1TB* of space. But still I get
the same *no space left exception*. What could be the root cause of this
issue?


Thanks in advance.

*Exception stacktrace:*

*java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at 
org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream$$anonfun$write$3.apply$mcV$sp(BlockObjectWriter.scala:87)
at org.apache.spark.storage.DiskBlockObjectWriter.org
http://org.apache.spark.storage.DiskBlockObjectWriter.org$apache$spark$storage$DiskBlockObjectWriter$$callWithTiming(BlockObjectWriter.scala:229)
at 
org.apache.spark.storage.DiskBlockObjectWriter$TimeTrackingOutputStream.write(BlockObjectWriter.scala:87)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at 
org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:297)
at 
org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:244)
at 
org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:99)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
at 
java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285)
at 
java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at 
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:370)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)*


Re: java.io.IOException: No space left on device while doing repartitioning in Spark

2015-05-05 Thread Akhil Das
It could be filling up your /tmp directory. You need to set your
spark.local.dir or you can also specify SPARK_WORKER_DIR to another
location which has sufficient space.

Thanks
Best Regards

On Mon, May 4, 2015 at 7:27 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I am getting No space left on device exception when doing repartitioning
  of approx. 285 MB of data  while these is still 2 GB space left ??

 does it mean that repartitioning needs more space (more than 2 GB) for
 repartitioning of 285 MB of data ??

 best,
 /Shahab

 java.io.IOException: No space left on device
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
   at sun.nio.ch.IOUtil.write(IOUtil.java:51)
   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
   at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:473)
   at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:569)
   at org.apache.spark.util.Utils$.copyStream(Utils.scala:331)
   at 
 org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at 
 org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:728)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)




java.io.IOException: No space left on device while doing repartitioning in Spark

2015-05-04 Thread shahab
Hi,

I am getting No space left on device exception when doing repartitioning
 of approx. 285 MB of data  while these is still 2 GB space left ??

does it mean that repartitioning needs more space (more than 2 GB) for
repartitioning of 285 MB of data ??

best,
/Shahab

java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at 
sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:473)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:569)
at org.apache.spark.util.Utils$.copyStream(Utils.scala:331)
at 
org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at 
org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:728)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Re: java.io.IOException: No space left on device while doing repartitioning in Spark

2015-05-04 Thread Ted Yu
See
https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available

What's the value for spark.local.dir property ?

Cheers

On Mon, May 4, 2015 at 6:57 AM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I am getting No space left on device exception when doing repartitioning
  of approx. 285 MB of data  while these is still 2 GB space left ??

 does it mean that repartitioning needs more space (more than 2 GB) for
 repartitioning of 285 MB of data ??

 best,
 /Shahab

 java.io.IOException: No space left on device
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
   at sun.nio.ch.IOUtil.write(IOUtil.java:51)
   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
   at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:473)
   at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:569)
   at org.apache.spark.util.Utils$.copyStream(Utils.scala:331)
   at 
 org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at 
 org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:728)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)




Re: java.io.IOException: No space left on device

2015-04-29 Thread Dean Wampler
Or multiple volumes. The LOCAL_DIRS (YARN) and SPARK_LOCAL_DIRS (Mesos,
Standalone) environment variables and the spark.local.dir property control
where temporary data is written. The default is /tmp.

See
http://spark.apache.org/docs/latest/configuration.html#runtime-environment
for more details.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Apr 29, 2015 at 6:19 AM, Anshul Singhle ans...@betaglide.com
wrote:

 Do you have multiple disks? Maybe your work directory is not in the right
 disk?

 On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com
 wrote:

 Hi,

 I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
 output,the training data is a file containing 156060 (size 8.1M).

 The problem is that when trying to presist a partition into memory and
 there
 is not enought memory, the partition is persisted on disk and despite
 Having
 229G of free disk space, I got  No space left on device..

 This is how I'm running the program :

 ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
 local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
 testData.tsv

 And this is a part of the log:



 If you need more informations, please let me know.
 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: java.io.IOException: No space left on device

2015-04-29 Thread Dean Wampler
Makes sense. / is where /tmp would be. However, 230G should be plenty of
space. If you have INFO logging turned on (set in
$SPARK_HOME/conf/log4j.properties), you'll see messages about saving data
to disk that will list sizes. The web console also has some summary
information about this.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Apr 29, 2015 at 6:25 AM, selim namsi selim.na...@gmail.com wrote:

 This is the output of df -h so as you can see I'm using only one disk
 mounted on /

 df -h
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sda8   276G   34G  229G  13% /none4.0K 0  4.0K   0% 
 /sys/fs/cgroup
 udev7.8G  4.0K  7.8G   1% /dev
 tmpfs   1.6G  1.4M  1.6G   1% /runnone5.0M 0  5.0M   
 0% /run/locknone7.8G   37M  7.8G   1% /run/shmnone
 100M   40K  100M   1% /run/user
 /dev/sda1   496M   55M  442M  11% /boot/efi

 Also when running the program, I noticed that the Used% disk space related
 to the partition mounted on / was growing very fast

 On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle ans...@betaglide.com
 wrote:

 Do you have multiple disks? Maybe your work directory is not in the right
 disk?

 On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com
 wrote:

 Hi,

 I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
 output,the training data is a file containing 156060 (size 8.1M).

 The problem is that when trying to presist a partition into memory and
 there
 is not enought memory, the partition is persisted on disk and despite
 Having
 229G of free disk space, I got  No space left on device..

 This is how I'm running the program :

 ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
 local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
 testData.tsv

 And this is a part of the log:



 If you need more informations, please let me know.
 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




java.io.IOException: No space left on device

2015-04-29 Thread Selim Namsi
Hi,

I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
output,the training data is a file containing 156060 (size 8.1M).

The problem is that when trying to presist a partition into memory and there
is not enought memory, the partition is persisted on disk and despite Having
229G of free disk space, I got  No space left on device..

This is how I'm running the program : 

./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
testData.tsv

And this is a part of the log:



If you need more informations, please let me know.
Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: java.io.IOException: No space left on device

2015-04-29 Thread Anshul Singhle
Do you have multiple disks? Maybe your work directory is not in the right
disk?

On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote:

 Hi,

 I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
 output,the training data is a file containing 156060 (size 8.1M).

 The problem is that when trying to presist a partition into memory and
 there
 is not enought memory, the partition is persisted on disk and despite
 Having
 229G of free disk space, I got  No space left on device..

 This is how I'm running the program :

 ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
 local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
 testData.tsv

 And this is a part of the log:



 If you need more informations, please let me know.
 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: java.io.IOException: No space left on device

2015-04-29 Thread selim namsi
This is the output of df -h so as you can see I'm using only one disk
mounted on /

df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda8   276G   34G  229G  13% /none4.0K 0
4.0K   0% /sys/fs/cgroup
udev7.8G  4.0K  7.8G   1% /dev
tmpfs   1.6G  1.4M  1.6G   1% /runnone5.0M 0
5.0M   0% /run/locknone7.8G   37M  7.8G   1% /run/shmnone
  100M   40K  100M   1% /run/user
/dev/sda1   496M   55M  442M  11% /boot/efi

Also when running the program, I noticed that the Used% disk space related
to the partition mounted on / was growing very fast

On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle ans...@betaglide.com
wrote:

 Do you have multiple disks? Maybe your work directory is not in the right
 disk?

 On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com
 wrote:

 Hi,

 I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
 output,the training data is a file containing 156060 (size 8.1M).

 The problem is that when trying to presist a partition into memory and
 there
 is not enought memory, the partition is persisted on disk and despite
 Having
 229G of free disk space, I got  No space left on device..

 This is how I'm running the program :

 ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline --master
 local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
 testData.tsv

 And this is a part of the log:



 If you need more informations, please let me know.
 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: java.io.IOException: No space left on device

2015-04-29 Thread selim namsi
Sorry I put the log messages when creating the thread in
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-td22702.html
but I forgot that raw messages will not be sent in emails.

So this is the log related to the error :

15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_0 not found, computing it
15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_0 locally
15/04/29 02:48:50 INFO CacheManager: Partition rdd_19_1 not found, computing it
15/04/29 02:48:50 INFO BlockManager: Found block rdd_15_1 locally
15/04/29 02:49:13 WARN MemoryStore: Not enough space to cache rdd_19_1
in memory! (computed 1106.0 MB so far)
15/04/29 02:49:13 INFO MemoryStore: Memory use = 234.0 MB (blocks) +
2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage
limit = 3.1 GB.
15/04/29 02:49:13 WARN CacheManager: Persisting partition rdd_19_1 to
disk instead.
15/04/29 02:49:28 WARN MemoryStore: Not enough space to cache rdd_19_0
in memory! (computed 1745.7 MB so far)
15/04/29 02:49:28 INFO MemoryStore: Memory use = 234.0 MB (blocks) +
2.6 GB (scratch space shared across 2 thread(s)) = 2.9 GB. Storage
limit = 3.1 GB.
15/04/29 02:49:28 WARN CacheManager: Persisting partition rdd_19_0 to
disk instead.
15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_0 failed
15/04/29 03:56:12 WARN BlockManager: Putting block rdd_19_1 failed
15/04/29 03:56:12 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 7)
java.io.IOException: No space left on *device

*It seems that the partitions rdd_19_0 and rdd_9=19_1 needs both of
them  2.9 GB.

Thanks


On Wed, Apr 29, 2015 at 12:34 PM Dean Wampler deanwamp...@gmail.com wrote:

 Makes sense. / is where /tmp would be. However, 230G should be plenty of
 space. If you have INFO logging turned on (set in
 $SPARK_HOME/conf/log4j.properties), you'll see messages about saving data
 to disk that will list sizes. The web console also has some summary
 information about this.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Apr 29, 2015 at 6:25 AM, selim namsi selim.na...@gmail.com
 wrote:

 This is the output of df -h so as you can see I'm using only one disk
 mounted on /

 df -h
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sda8   276G   34G  229G  13% /none4.0K 0  4.0K   0% 
 /sys/fs/cgroup
 udev7.8G  4.0K  7.8G   1% /dev
 tmpfs   1.6G  1.4M  1.6G   1% /runnone5.0M 0  5.0M   
 0% /run/locknone7.8G   37M  7.8G   1% /run/shmnone
 100M   40K  100M   1% /run/user
 /dev/sda1   496M   55M  442M  11% /boot/efi

 Also when running the program, I noticed that the Used% disk space
 related to the partition mounted on / was growing very fast

 On Wed, Apr 29, 2015 at 12:19 PM Anshul Singhle ans...@betaglide.com
 wrote:

 Do you have multiple disks? Maybe your work directory is not in the
 right disk?

 On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com
 wrote:

 Hi,

 I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
 output,the training data is a file containing 156060 (size 8.1M).

 The problem is that when trying to presist a partition into memory and
 there
 is not enought memory, the partition is persisted on disk and despite
 Having
 229G of free disk space, I got  No space left on device..

 This is how I'm running the program :

 ./spark-submit --class com.custom.sentimentAnalysis.MainPipeline
 --master
 local[2] --driver-memory 5g ml_pipeline.jar labeledTrainData.tsv
 testData.tsv

 And this is a part of the log:



 If you need more informations, please let me know.
 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-No-space-left-on-device-tp22702.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org