[DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

Chao Sun Fri, 22 Oct 2021 12:18:26 -0700

Hi,

Ryan and I drafted a design doc to support a new type of join: storage
partitioned join which covers bucket join support for DataSourceV2 but is
more general. The goal is to let Spark leverage distribution properties
reported by data sources and eliminate shuffle whenever possible.


Design doc:
https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE
(includes a POC link at the end)

We'd like to start a discussion on the doc and any feedback is welcome!

Thanks,
Chao

[DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

Reply via email to