Hi,

Ryan and I drafted a design doc to support a new type of join: storage
partitioned join which covers bucket join support for DataSourceV2 but is
more general. The goal is to let Spark leverage distribution properties
reported by data sources and eliminate shuffle whenever possible.

Design doc:
https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE
(includes a POC link at the end)

We'd like to start a discussion on the doc and any feedback is welcome!

Thanks,
Chao

Reply via email to