Hi all, I’d like to start a vote for SPIP: Storage Partitioned Join for Data Source V2.
The proposal is to support a new type of join: storage partitioned join which covers bucket join support for DataSourceV2 but is more general. The goal is to let Spark leverage distribution properties reported by data sources and eliminate shuffle whenever possible. Please also refer to: - Previous discussion in dev mailing list: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2 <https://lists.apache.org/thread.html/r7dc67c3db280a8b2e65855cb0b1c86b524d4e6ae1ed9db9ca12cb2e6%40%3Cdev.spark.apache.org%3E> . - JIRA: SPARK-37166 <https://issues.apache.org/jira/browse/SPARK-37166> - Design doc <https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE> Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org