Re: [DISCUSS] Code organization for Spark 3.x and 4.x

Yong Zheng Thu, 28 May 2026 20:30:14 -0700

Hello Dmitri,

Thanks for bring this one up. Personally, I like the second option as well as 
our code base around spark is really minimal. One thing I am not sure about is 
how do we decided which ones move to common module? Lets use couple examples:
1. For the ones with 100% identical and less likely to get change: for sure we 
can move those
2. For the ones with 100% identical for now but may change for the next 
version: how do we decide this? Do we move them back from common to spark 
version specific module? Or convert to adapter?
3. For the ones that already have difference, do we keep 2 files that are 80% 
identical and one per spark version or convert to adapter?


Thanks,
Yong  

On 2026/05/28 21:20:57 Dmitri Bourlatchkov wrote:
> Hi All,
> 
> This is another discussion stemming from today's Community Sync call and PR
> [4535].
> 
> Adding support for Spark 4 apparently produced a substantial amount of
> "copied" code in [4535].
> 
> Points in favour of copy:
> 
> * Adjusting to differences between Spark versions is easier
> 
> * Dropping support for old Spark versions is easy (when they expire).
> 
> Points in favour of extracting common modules:
> 
> * Nice code organization. Common code is unit-tested once.
> 
> * Bug fixes in shared logic only need to be done in one place.
> 
> * Polaris does not appear to depend on deep Spark API (no query planning,
> etc.) so differences between Spark versions can probably be handled by
> allowing a small number of customization points in the common code.
> 
> I tend to prefer the second approach, that is factoring out common code and
> sharing it between Spark 3.x and 4.x modules with the expectation that the
> size of the common code is much larger than the size of the
> version-specific code.
> 
> Thoughts?
> 
> [4535] https://github.com/apache/polaris/pull/4535
> 
> Thanks,
> Dmitri.
>

Re: [DISCUSS] Code organization for Spark 3.x and 4.x

Reply via email to