[ 
https://issues.apache.org/jira/browse/SPARK-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7375:
-----------------------------------

    Assignee: Apache Spark

> Avoid defensive copying in SQL exchange operator when sort-based shuffle 
> buffers data in serialized form
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7375
>                 URL: https://issues.apache.org/jira/browse/SPARK-7375
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Josh Rosen
>            Assignee: Apache Spark
>
> The original sort-based shuffle buffers shuffle input records in memory while 
> sorting them. This causes problems when mutable records are presented to the 
> shuffle, which happens in Spark SQL's Exchange operator. To work around this 
> issue, SPARK-2967 and SPARK-4479 added defensive copying of shuffle inputs in 
> the Exchange operator when sort-based shuffle is enabled.
> I think that [~sandyr]'s recent patch for enabling serialization of records 
> in sort-based shuffle (SPARK-4550) and my proposed {{unsafe}}-based shuffle 
> path (SPARK-7081) may allow us to avoid this defensive copying in certain 
> cases (since our patches cause records to be serialized one-at-a-time and 
> remove the buffering of deserialized records).
> As mentioned in SPARK-4479, a long-term fix for this issue might be to add 
> hooks for informing the shuffle about object (im)mutability in order to allow 
> the shuffle layer to decide whether to copy. In the meantime, though, I think 
> that we should just extend the checks added in SPARK-4479 to avoid copies 
> when these new serialized sort paths are used.
> /cc [~rxin] [~marmbrus] [~yhuai]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to