Hi Yong, Do you have an example of a file that is 80% identical between Spark 3 and 4? (sorry, I'm not very familiar with that codebase myself :)
Thanks, Dmitri. On Thu, May 28, 2026 at 11:30 PM Yong Zheng <[email protected]> wrote: > Hello Dmitri, > > Thanks for bring this one up. Personally, I like the second option as well > as our code base around spark is really minimal. One thing I am not sure > about is how do we decided which ones move to common module? Lets use > couple examples: > 1. For the ones with 100% identical and less likely to get change: for > sure we can move those > 2. For the ones with 100% identical for now but may change for the next > version: how do we decide this? Do we move them back from common to spark > version specific module? Or convert to adapter? > 3. For the ones that already have difference, do we keep 2 files that are > 80% identical and one per spark version or convert to adapter? > > Thanks, > Yong > > On 2026/05/28 21:20:57 Dmitri Bourlatchkov wrote: > > Hi All, > > > > This is another discussion stemming from today's Community Sync call and > PR > > [4535]. > > > > Adding support for Spark 4 apparently produced a substantial amount of > > "copied" code in [4535]. > > > > Points in favour of copy: > > > > * Adjusting to differences between Spark versions is easier > > > > * Dropping support for old Spark versions is easy (when they expire). > > > > Points in favour of extracting common modules: > > > > * Nice code organization. Common code is unit-tested once. > > > > * Bug fixes in shared logic only need to be done in one place. > > > > * Polaris does not appear to depend on deep Spark API (no query planning, > > etc.) so differences between Spark versions can probably be handled by > > allowing a small number of customization points in the common code. > > > > I tend to prefer the second approach, that is factoring out common code > and > > sharing it between Spark 3.x and 4.x modules with the expectation that > the > > size of the common code is much larger than the size of the > > version-specific code. > > > > Thoughts? > > > > [4535] https://github.com/apache/polaris/pull/4535 > > > > Thanks, > > Dmitri. > > >
