[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196013#comment-17196013 ] Attila Zsolt Piros commented on SPARK-28210: [~tianczha] [~devaraj] I would like to work on this issue if that's fine for you. I would like to progress along the ideas of the linked PR: to pass the metadata when the reducer task is constructed. > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171696#comment-17171696 ] Tianchen Zhang commented on SPARK-28210: Thank you [~mcheah] for the input. Sure I will start reviewing the design and PR for the metadata tracking and help push this forward. > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168287#comment-17168287 ] Matt Cheah commented on SPARK-28210: [~devaraj] [~tianczha] Thanks for expressing interest in this! This patch blocks on the shuffle metadata APIs patch, which one can find here: [https://github.com/apache/spark/pull/28618.|https://github.com/apache/spark/pull/28618] I think after merging the shuffle metadata API change, we can provide the appropriate reader APIs and then integrate the usage of shuffle metadata accordingly. I originally had a diff here: [https://github.com/mccheah/spark/pull/12], but it's fallen far out of sync with the patches proposed against upstream Spark. The proposed reader API can be found on the shuffle API design document: [https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6.|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6] But we really cannot make any progress here unless we have the shuffle metadata storage APIs and integration complete. Can you take a review through the Apache patch listed above, give your +1 or feedback on how it can be improved, and then we can go from there? > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166027#comment-17166027 ] Tianchen Zhang commented on SPARK-28210: Hi [~devaraj], do you mind share some ideas about your change? Is it based on the linked PR in this JIRA? We are also finding a way to have our own storage implementation but are blocked by the lack of reader's API. Thanks. > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads
[ https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165455#comment-17165455 ] Devaraj Kavali commented on SPARK-28210: [~mcheah] We are also interested in this feature. I see you have done some work on this task, Can I take up this task to move it to closure? thanks > Shuffle Storage API: Reads > -- > > Key: SPARK-28210 > URL: https://issues.apache.org/jira/browse/SPARK-28210 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Matt Cheah >Priority: Major > > As part of the effort to store shuffle data in arbitrary places, this issue > tracks implementing an API for reading the shuffle data stored by the write > API. Also ensure that the existing shuffle implementation is refactored to > use the API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org