[jira] [Commented] (SPARK-20715) MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker

2018-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609291#comment-16609291
 ] 

Apache Spark commented on SPARK-20715:
--

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/22382

> MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and 
> MapOutputTracker
> 
>
> Key: SPARK-20715
> URL: https://issues.apache.org/jira/browse/SPARK-20715
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Shuffle
>Affects Versions: 2.3.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 2.3.0
>
>
> Today the MapOutputTracker and ShuffleMapStage both maintain their own copies 
> of MapStatuses. This creates the potential for bugs in case these two pieces 
> of state become out of sync.
> I believe that we can improve our ability to reason about the code by storing 
> this information only in the MapOutputTracker. This can also help to reduce 
> driver memory consumption.
> I will provide more details in my PR, where I'll walk through the detailed 
> arguments as to why we can take these two different metadata tracking formats 
> and consolidate without loss of performance or correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20715) MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker

2018-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609290#comment-16609290
 ] 

Apache Spark commented on SPARK-20715:
--

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/22382

> MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and 
> MapOutputTracker
> 
>
> Key: SPARK-20715
> URL: https://issues.apache.org/jira/browse/SPARK-20715
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Shuffle
>Affects Versions: 2.3.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 2.3.0
>
>
> Today the MapOutputTracker and ShuffleMapStage both maintain their own copies 
> of MapStatuses. This creates the potential for bugs in case these two pieces 
> of state become out of sync.
> I believe that we can improve our ability to reason about the code by storing 
> this information only in the MapOutputTracker. This can also help to reduce 
> driver memory consumption.
> I will provide more details in my PR, where I'll walk through the detailed 
> arguments as to why we can take these two different metadata tracking formats 
> and consolidate without loss of performance or correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20715) MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker

2017-05-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007351#comment-16007351
 ] 

Apache Spark commented on SPARK-20715:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/17955

> MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and 
> MapOutputTracker
> 
>
> Key: SPARK-20715
> URL: https://issues.apache.org/jira/browse/SPARK-20715
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Shuffle
>Affects Versions: 2.3.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Today the MapOutputTracker and ShuffleMapStage both maintain their own copies 
> of MapStatuses. This creates the potential for bugs in case these two pieces 
> of state become out of sync.
> I believe that we can improve our ability to reason about the code by storing 
> this information only in the MapOutputTracker. This can also help to reduce 
> driver memory consumption.
> I will provide more details in my PR, where I'll walk through the detailed 
> arguments as to why we can take these two different metadata tracking formats 
> and consolidate without loss of performance or correctness.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org