[DISCUSS] Code organization for Spark 3.x and 4.x

Dmitri Bourlatchkov Thu, 28 May 2026 14:21:43 -0700

Hi All,

This is another discussion stemming from today's Community Sync call and PR
[4535].


Adding support for Spark 4 apparently produced a substantial amount of
"copied" code in [4535].

Points in favour of copy:

* Adjusting to differences between Spark versions is easier

* Dropping support for old Spark versions is easy (when they expire).

Points in favour of extracting common modules:

* Nice code organization. Common code is unit-tested once.

* Bug fixes in shared logic only need to be done in one place.

* Polaris does not appear to depend on deep Spark API (no query planning,
etc.) so differences between Spark versions can probably be handled by
allowing a small number of customization points in the common code.

I tend to prefer the second approach, that is factoring out common code and
sharing it between Spark 3.x and 4.x modules with the expectation that the
size of the common code is much larger than the size of the
version-specific code.

Thoughts?

[4535] https://github.com/apache/polaris/pull/4535

Thanks,
Dmitri.

[DISCUSS] Code organization for Spark 3.x and 4.x

Reply via email to