[ https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao Sun updated SPARK-37377: ----------------------------- Description: This Jira tracks the initial implementation of storage-partitioned join. (was: Currently {{Partitioning}} is defined as follow: {code:scala} @Evolving public interface Partitioning { int numPartitions(); boolean satisfy(Distribution distribution); } {code} There are two issues with the interface: 1) it uses a deprecated {{Distribution}} interface, and should switch to {{org.apache.spark.sql.connector.distributions.Distribution}}. 2) currently there is no way to use this in join where we want to compare reported partitionings from both sides and decide whether they are "compatible" (and thus allows Spark to eliminate shuffle). ) > Initial implementation of Storage-Partitioned Join > -------------------------------------------------- > > Key: SPARK-37377 > URL: https://issues.apache.org/jira/browse/SPARK-37377 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: Chao Sun > Assignee: Chao Sun > Priority: Major > Fix For: 3.4.0 > > > This Jira tracks the initial implementation of storage-partitioned join. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org