Thanks for raising the Iceberg convention, Ajantha! I think it's a good
idea to investigate it before extracting a common module for multiple Spark
versions.

Yufei


On Mon, Jun 1, 2026 at 6:07 AM Robert Stupp <[email protected]> wrote:

> Hi all,
>
> I also prefer the approach to not have duplicated code.
>
> Looking at the `spark/src` and `integration/src` directories, I see 23
> byte-identical files, and 4 more files that have slight Spark-version
> specific differences that can be "deduplicated" with base classes plus
> version specific adapters.
>
> This appears to match Polaris's use of Spark, which is different from
> projects that deeply integrate with Spark or Flink planning and execution
> internals.
>
> Robert
>
>
> On Mon, Jun 1, 2026 at 7:48 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> > Hi Dmitri,
> >
> > While I don't have a major concern with duplicating code in principle,
> > the main issue is the quantity of duplication. If the amount of
> > redundant code is large, it becomes significantly harder to maintain.
> >
> > For this reason, I prefer the second option of factoring out common code.
> >
> > Regards,
> > JB
> >
> > On Thu, May 28, 2026 at 11:21 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> > >
> > > Hi All,
> > >
> > > This is another discussion stemming from today's Community Sync call
> and
> > PR
> > > [4535].
> > >
> > > Adding support for Spark 4 apparently produced a substantial amount of
> > > "copied" code in [4535].
> > >
> > > Points in favour of copy:
> > >
> > > * Adjusting to differences between Spark versions is easier
> > >
> > > * Dropping support for old Spark versions is easy (when they expire).
> > >
> > > Points in favour of extracting common modules:
> > >
> > > * Nice code organization. Common code is unit-tested once.
> > >
> > > * Bug fixes in shared logic only need to be done in one place.
> > >
> > > * Polaris does not appear to depend on deep Spark API (no query
> planning,
> > > etc.) so differences between Spark versions can probably be handled by
> > > allowing a small number of customization points in the common code.
> > >
> > > I tend to prefer the second approach, that is factoring out common code
> > and
> > > sharing it between Spark 3.x and 4.x modules with the expectation that
> > the
> > > size of the common code is much larger than the size of the
> > > version-specific code.
> > >
> > > Thoughts?
> > >
> > > [4535] https://github.com/apache/polaris/pull/4535
> > >
> > > Thanks,
> > > Dmitri.
> >
>

Reply via email to