+1 for this SPIP. On Sun, Oct 24, 2021 at 9:59 AM huaxin gao <huaxin.ga...@gmail.com> wrote:
> +1. Thanks for lifting the current restrictions on bucket join and making > this more generalized. > > On Sun, Oct 24, 2021 at 9:33 AM Ryan Blue <b...@apache.org> wrote: > >> +1 from me as well. Thanks Chao for doing so much to get it to this point! >> >> On Sat, Oct 23, 2021 at 11:29 PM DB Tsai <dbt...@dbtsai.com> wrote: >> >>> +1 on this SPIP. >>> >>> This is a more generalized version of bucketed tables and bucketed >>> joins which can eliminate very expensive data shuffles when joins, and >>> many users in the Apache Spark community have wanted this feature for >>> a long time! >>> >>> Thank you, Ryan and Chao, for working on this, and I look forward to >>> it as a new feature in Spark 3.3 >>> >>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>> >>> On Fri, Oct 22, 2021 at 12:18 PM Chao Sun <sunc...@apache.org> wrote: >>> > >>> > Hi, >>> > >>> > Ryan and I drafted a design doc to support a new type of join: storage >>> partitioned join which covers bucket join support for DataSourceV2 but is >>> more general. The goal is to let Spark leverage distribution properties >>> reported by data sources and eliminate shuffle whenever possible. >>> > >>> > Design doc: >>> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE >>> (includes a POC link at the end) >>> > >>> > We'd like to start a discussion on the doc and any feedback is welcome! >>> > >>> > Thanks, >>> > Chao >>> >> >> >> -- >> Ryan Blue >> >