Hi Yong,

Do you have an example of a file that is 80% identical between Spark 3 and
4? (sorry, I'm not very familiar with that codebase myself :)

Thanks,
Dmitri.

On Thu, May 28, 2026 at 11:30 PM Yong Zheng <[email protected]> wrote:

> Hello Dmitri,
>
> Thanks for bring this one up. Personally, I like the second option as well
> as our code base around spark is really minimal. One thing I am not sure
> about is how do we decided which ones move to common module? Lets use
> couple examples:
> 1. For the ones with 100% identical and less likely to get change: for
> sure we can move those
> 2. For the ones with 100% identical for now but may change for the next
> version: how do we decide this? Do we move them back from common to spark
> version specific module? Or convert to adapter?
> 3. For the ones that already have difference, do we keep 2 files that are
> 80% identical and one per spark version or convert to adapter?
>
> Thanks,
> Yong
>
> On 2026/05/28 21:20:57 Dmitri Bourlatchkov wrote:
> > Hi All,
> >
> > This is another discussion stemming from today's Community Sync call and
> PR
> > [4535].
> >
> > Adding support for Spark 4 apparently produced a substantial amount of
> > "copied" code in [4535].
> >
> > Points in favour of copy:
> >
> > * Adjusting to differences between Spark versions is easier
> >
> > * Dropping support for old Spark versions is easy (when they expire).
> >
> > Points in favour of extracting common modules:
> >
> > * Nice code organization. Common code is unit-tested once.
> >
> > * Bug fixes in shared logic only need to be done in one place.
> >
> > * Polaris does not appear to depend on deep Spark API (no query planning,
> > etc.) so differences between Spark versions can probably be handled by
> > allowing a small number of customization points in the common code.
> >
> > I tend to prefer the second approach, that is factoring out common code
> and
> > sharing it between Spark 3.x and 4.x modules with the expectation that
> the
> > size of the common code is much larger than the size of the
> > version-specific code.
> >
> > Thoughts?
> >
> > [4535] https://github.com/apache/polaris/pull/4535
> >
> > Thanks,
> > Dmitri.
> >
>

Reply via email to