[PR] [SPARK-52507][K8S] Attempt to read missing block from fallback storage [spark]

via GitHub Tue, 17 Jun 2025 02:46:00 -0700


EnricoMi opened a new pull request, #51202:
URL: https://github.com/apache/spark/pull/51202


   ### What changes were proposed in this pull request?
   On the presence of a fallback storage, `ShuffleBlockFetcherIterator` seeing 
a fetch failure can optimistically try to read a block from the fallback 
storage, as it might have been migrated from a decommissioned executor to the 
fallback storage. If storage migration happens **only** to the fallback storage 
(#51201), then this assumption is even more optimistic.
   
   Note: This optimistic attempt to find the missing shuffle data on the 
fallback storage would collide with some replication delay handled in #51200.
   
   ### Why are the changes needed?
   In a Kubernetes environment, executors may be decommissioned. With a 
fallback storage configured, shuffle data will be migrated to other executors 
or the fallback storage. Tasks that start during a decommissioning phase of 
another executor might read blocks from that executor after it has been 
decommissioned. The task does not know the new location of the migrated block. 
Given a fallback storage is configured, it could optimistically try to read the 
block from the fallback storage.
   
   This avoids a stage retry, which otherwise is an expensive way to fetch the 
current block address after a block migration.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit test and manual testing in a [Kubernetes 
setup](https://gist.github.com/EnricoMi/e9daa1176bce4c1211af3f3c5848112a).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-52507][K8S] Attempt to read missing block from fallback storage [spark]

Reply via email to