[jira] [Commented] (SPARK-52508) Read from fallback storage should consider replication delay

Aparna Garg (Jira) Sat, 20 Sep 2025 11:36:05 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-52508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021454#comment-18021454
 ]


Aparna Garg commented on SPARK-52508:
-------------------------------------

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/51200

> Read from fallback storage should consider replication delay
> ------------------------------------------------------------
>
>                 Key: SPARK-52508
>                 URL: https://issues.apache.org/jira/browse/SPARK-52508
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes
>    Affects Versions: 4.1.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> Using the storage decommissioning feature on Kubernetes with a distributed 
> filesystem as the fallback storage might run into the situation where an 
> executor cannot see the shuffle data on the distributed filesystem that has 
> just been written by the decommissioned executor. This is caused by some 
> replication delay. Given the dependent executor knows the location of the 
> shuffle data is the fallback storage, it can defer reading on a 
> {{FileNotFoundException}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-52508) Read from fallback storage should consider replication delay

Reply via email to