[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-09-15 Thread Attila Zsolt Piros (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196013#comment-17196013
 ] 

Attila Zsolt Piros commented on SPARK-28210:


 [~tianczha] [~devaraj] I would like to work on this issue if that's fine for 
you. I would like to progress along the ideas of the linked PR: to pass the 
metadata when the reducer task is constructed. 

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-08-05 Thread Tianchen Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171696#comment-17171696
 ] 

Tianchen Zhang commented on SPARK-28210:


Thank you [~mcheah] for the input. Sure I will start reviewing the design and 
PR for the metadata tracking and help push this forward.

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-07-30 Thread Matt Cheah (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168287#comment-17168287
 ] 

Matt Cheah commented on SPARK-28210:


[~devaraj]  [~tianczha] Thanks for expressing interest in this! This patch 
blocks on the shuffle metadata APIs patch, which one can find here: 
[https://github.com/apache/spark/pull/28618.|https://github.com/apache/spark/pull/28618]

I think after merging the shuffle metadata API change, we can provide the 
appropriate reader APIs and then integrate the usage of shuffle metadata 
accordingly. I originally had a diff here: 
[https://github.com/mccheah/spark/pull/12], but it's fallen far out of sync 
with the patches proposed against upstream Spark.

The proposed reader API can be found on the shuffle API design document: 
[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6.|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6]

But we really cannot make any progress here unless we have the shuffle metadata 
storage APIs and integration complete. Can you take a review through the Apache 
patch listed above, give your +1 or feedback on how it can be improved, and 
then we can go from there?

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-07-27 Thread Tianchen Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166027#comment-17166027
 ] 

Tianchen Zhang commented on SPARK-28210:


Hi [~devaraj], do you mind share some ideas about your change? Is it based on 
the linked PR in this JIRA? We are also finding a way to have our own storage 
implementation but are blocked by the lack of reader's API. Thanks.

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-07-26 Thread Devaraj Kavali (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165455#comment-17165455
 ] 

Devaraj Kavali commented on SPARK-28210:


[~mcheah] We are also interested in this feature. I see you have done some work 
on this task, Can I take up this task to move it to closure? thanks

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org