[jira] [Updated] (SPARK-37377) Initial implementation of Storage-Partitioned Join

Chao Sun (Jira) Mon, 04 Apr 2022 19:37:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chao Sun updated SPARK-37377:
-----------------------------
    Description: This Jira tracks the initial implementation of 
storage-partitioned join.  (was: Currently {{Partitioning}} is defined as 
follow:
{code:scala}
@Evolving
public interface Partitioning {
  int numPartitions();
  boolean satisfy(Distribution distribution);
}
{code}

There are two issues with the interface: 1) it uses a deprecated 
{{Distribution}} interface, and should switch to 
{{org.apache.spark.sql.connector.distributions.Distribution}}. 2) currently 
there is no way to use this in join where we want to compare reported 
partitionings from both sides and decide whether they are "compatible" (and 
thus allows Spark to eliminate shuffle). )

> Initial implementation of Storage-Partitioned Join
> --------------------------------------------------
>
>                 Key: SPARK-37377
>                 URL: https://issues.apache.org/jira/browse/SPARK-37377
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>             Fix For: 3.4.0
>
>
> This Jira tracks the initial implementation of storage-partitioned join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37377) Initial implementation of Storage-Partitioned Join

Reply via email to