[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168287#comment-17168287 ]
Matt Cheah commented on SPARK-28210: ------------------------------------ [~devaraj] [~tianczha] Thanks for expressing interest in this! This patch blocks on the shuffle metadata APIs patch, which one can find here: [https://github.com/apache/spark/pull/28618.|https://github.com/apache/spark/pull/28618] I think after merging the shuffle metadata API change, we can provide the appropriate reader APIs and then integrate the usage of shuffle metadata accordingly. I originally had a diff here: [https://github.com/mccheah/spark/pull/12], but it's fallen far out of sync with the patches proposed against upstream Spark. The proposed reader API can be found on the shuffle API design document: [https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6.|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6] But we really cannot make any progress here unless we have the shuffle metadata storage APIs and integration complete. Can you take a review through the Apache patch listed above, give your +1 or feedback on how it can be improved, and then we can go from there? > Shuffle Storage API: Reads > -------------------------- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core > Affects Versions: 3.1.0 > Reporter: Matt Cheah > Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org