[ 
https://issues.apache.org/jira/browse/SPARK-34788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572506#comment-17572506
 ] 

wuyi commented on SPARK-34788:
------------------------------

> Why don't you ensure enough disks from the beginning?

On the write side, since Spark usually uses iterators to write data into disk, 
it's not possible to know the size in advance.

On the read side (the error in this ticket is actually raised when reading a 
file), it can be overkill to me if we'd check if the disk is full each time 
before reading a file.

 

> This goal of this Jira issue seems to aim `cosmetic` change which has not 
>much benefit.

Yea, this won't fix the root cause but gives a better error msg for users.

> Spark throws FileNotFoundException instead of IOException when disk is full
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-34788
>                 URL: https://issues.apache.org/jira/browse/SPARK-34788
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.2.0
>            Reporter: wuyi
>            Priority: Major
>
> When the disk is full, Spark throws FileNotFoundException instead of 
> IOException with the hint. It's quite a confusing error to users:
> {code:java}
> 9/03/26 09:03:45 ERROR ShuffleBlockFetcherIterator: Failed to create input 
> stream from local block
> java.io.IOException: Error in reading 
> FileSegmentManagedBuffer{file=/local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data,
>  offset=110254956, length=1875458}
>       at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:111)
>       at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:442)
>       at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
>       at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>       at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>       at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>       at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.sort_addToSorter_0$(Unknown
>  Source)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:622)
>       at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:98)
>       at 
> org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:95)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>       at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
>       at org.apache.spark.scheduler.Task.run(Task.scala:112)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data
>  (No such file or directory)
>       at java.io.FileInputStream.open0(Native Method)
>       at java.io.FileInputStream.open(FileInputStream.java:195)
>       at java.io.FileInputStream.<init>(FileInputStream.java:138)
>       at 
> org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:100)
>       ... 35 more{code}
> (The cause only says the file is not found, but we believe it's highly 
> possible due to the disk full issue after investigation.)
> And there's probably a way to detect the disk full: when we get 
> `FileNotFoundException`, we try 
> [http://weblog.janek.org/Archive/2004/12/20/ExceptionWhenWritingToAFu.html] 
> to see if SyncFailedException throws. If SyncFailedException throws, then we 
> throw IOException with the disk full hint.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to