Ohad Raviv created SPARK-41277:
----------------------------------

             Summary: Save and leverage shuffle key in tblproperties
                 Key: SPARK-41277
                 URL: https://issues.apache.org/jira/browse/SPARK-41277
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: Ohad Raviv


I'm not sure if I'm not missing anything trivial.

In a typical process, many datasets get materialized and many of them after a 
shuffle (e.g join). then they would again be involved in further actions and 
often use the same key.

Wouldn't it make sense to save the shuffle key along with the table to avoid 
unnecessary shuffles?

Also, the implementation seems quite straightforward - to just leverage the 
bucketing mechanism.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to