+1. Thanks for lifting the current restrictions on bucket join and making this more generalized.
On Sun, Oct 24, 2021 at 9:33 AM Ryan Blue <b...@apache.org> wrote: > +1 from me as well. Thanks Chao for doing so much to get it to this point! > > On Sat, Oct 23, 2021 at 11:29 PM DB Tsai <dbt...@dbtsai.com> wrote: > >> +1 on this SPIP. >> >> This is a more generalized version of bucketed tables and bucketed >> joins which can eliminate very expensive data shuffles when joins, and >> many users in the Apache Spark community have wanted this feature for >> a long time! >> >> Thank you, Ryan and Chao, for working on this, and I look forward to >> it as a new feature in Spark 3.3 >> >> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >> >> On Fri, Oct 22, 2021 at 12:18 PM Chao Sun <sunc...@apache.org> wrote: >> > >> > Hi, >> > >> > Ryan and I drafted a design doc to support a new type of join: storage >> partitioned join which covers bucket join support for DataSourceV2 but is >> more general. The goal is to let Spark leverage distribution properties >> reported by data sources and eliminate shuffle whenever possible. >> > >> > Design doc: >> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE >> (includes a POC link at the end) >> > >> > We'd like to start a discussion on the doc and any feedback is welcome! >> > >> > Thanks, >> > Chao >> > > > -- > Ryan Blue >