[ 
https://issues.apache.org/jira/browse/SPARK-34788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-34788:
-------------------------
    Description: 
When the disk is full, Spark throws FileNotFoundException instead of 
IOException with the hint. It's quite a confusing error to users:
{code:java}
9/03/26 09:03:45 ERROR ShuffleBlockFetcherIterator: Failed to create input 
stream from local block
java.io.IOException: Error in reading 
FileSegmentManagedBuffer{file=/local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data,
 offset=110254956, length=1875458}
        at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:111)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:442)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.sort_addToSorter_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:622)
        at 
org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:98)
        at 
org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:95)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
        at org.apache.spark.scheduler.Task.run(Task.scala:112)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: 
/local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data
 (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at 
org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:100)
        ... 35 more{code}
And there's probably a way to detect the disk full: when we get 
`FileNotFoundException`, we try 
[http://weblog.janek.org/Archive/2004/12/20/ExceptionWhenWritingToAFu.html] to 
see if SyncFailedException throws. If SyncFailedException throws, then we throw 
IOException with the disk full hint.

  was:
When the disk is full, Spark throws FileNotFoundException instead of 
IOException with the hint. It's quite a confusing error to users.

 

And there's probably a way to detect the disk full: when we get 
`FileNotFoundException`, we try 
[http://weblog.janek.org/Archive/2004/12/20/ExceptionWhenWritingToAFu.html] to 
see if SyncFailedException throws. If SyncFailedException throws, then we throw 
IOException with the disk full hint.


> Spark throws FileNotFoundException instead of IOException when disk is full
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-34788
>                 URL: https://issues.apache.org/jira/browse/SPARK-34788
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.2.0
>            Reporter: wuyi
>            Priority: Major
>
> When the disk is full, Spark throws FileNotFoundException instead of 
> IOException with the hint. It's quite a confusing error to users:
> {code:java}
> 9/03/26 09:03:45 ERROR ShuffleBlockFetcherIterator: Failed to create input 
> stream from local block
> java.io.IOException: Error in reading 
> FileSegmentManagedBuffer{file=/local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data,
>  offset=110254956, length=1875458}
>       at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:111)
>       at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:442)
>       at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
>       at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>       at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>       at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>       at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.sort_addToSorter_0$(Unknown
>  Source)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:622)
>       at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:98)
>       at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:95)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>       at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
>       at org.apache.spark.scheduler.Task.run(Task.scala:112)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data
>  (No such file or directory)
>       at java.io.FileInputStream.open0(Native Method)
>       at java.io.FileInputStream.open(FileInputStream.java:195)
>       at java.io.FileInputStream.<init>(FileInputStream.java:138)
>       at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:100)
>       ... 35 more{code}
> And there's probably a way to detect the disk full: when we get 
> `FileNotFoundException`, we try 
> [http://weblog.janek.org/Archive/2004/12/20/ExceptionWhenWritingToAFu.html] 
> to see if SyncFailedException throws. If SyncFailedException throws, then we 
> throw IOException with the disk full hint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to