Hi Yong,

Good questions :) As for me I'd refactor all shell-based regtests that use
Docker into Gradle-based tests (preferably JUnit).

If people find value in the section of the regtests that talk to real
cloud services (but do not run in CI), we can keep them. Still, we have
examples of JUnit-based tests that can talk to real cloud services too.

However, this is just my opinion. Let's wait a bit for other people to
express their opinions. We had a good discussion during the sync call, but
I'm not sure we have consensus yet... hence this thread.

Cheers,
Dmitri.

On Thu, May 28, 2026 at 10:00 PM Yong Zheng <[email protected]> wrote:

> One clarification, the regtests we talk about here is only
> https://github.com/apache/polaris/tree/main/plugins/spark/v3.5/regtests
> or https://github.com/apache/polaris/tree/main/regtests as well? as the
> v3.5/regtests doesn't have cloud dependencies but the later one has.
>
> Thanks,
> Yong Zheng
>
> On 2026/05/29 01:29:24 Yong Zheng wrote:
> > Hello Dmitri,
> >
> > Thanks for the recap. If I understand correctly, we want to merge test
> coverage from `/regtests`  (docker based) into `/integration`? Looking at
> them, integration has most of the tests we have in regtests excepts the
> ones that are real cloud specific such as spark_sql_gcp* and
> spark_sql_azure*  as well as pyspark specific (t_pyspark). The cloud
> specific ones are not part of CI, in that case, do we still want to covert
> them? Also, I am not sure if we can convert the pyspark code into JUnit
> directly.
> >
> > Also, regarding those duplicates classes, should we move them to common?
> For the ones that are very similar but minor diff, should we proceed with
> adapters for version specific (meaning, we won't be follow what Iceberg is
> doing with different version of spark).
> >
> > Lastly, should we close current PR and handle the two above in a
> separate PRs first before revisiting Spark4 support?
> >
> > Thanks,
> > Yong Zheng
> >
> > On 2026/05/28 21:07:44 Dmitri Bourlatchkov wrote:
> > > Hi All,
> > >
> > > This is to recap and follow up on today's Community Sync discussion.
> This
> > > matter came up during the review of the Spark 4 PR, but I believe it
> > > deserves a dedicated discussion.
> > >
> > > Observations:
> > >
> > > * Tests under the /regtests directory take a considerable amount of CI
> > > resources. This is mostly due to building custom Docker images.
> > >
> > > * When adding support for newer Spark versions, developers naturally
> tend
> > > to copy existing test infrastructure, which results in building more
> Docker
> > > images.
> > >
> > > * The coverage these tests provide probably does not require the heavy
> > > docker machinery. These tests validate that the Spark Session can
> interact
> > > with Polaris over the Iceberg REST Catalog Java client. The same
> coverage
> > > can be provided in an isolated JVM running under Gradle more
> efficiently
> > > (without a docker env.)
> > >
> > > Proposal:
> > >
> > > * Gradually convert Spark tests under /regtest to Gradle tasks and
> JUnit
> > > tests without Docker.
> > >
> > > The exact impl. is to be figured out.
> > >
> > > Most of the tests can run under the Junit framework using a local
> Polaris
> > > server (same JVM or different JVM depending on use case).
> > >
> > > True "integration" tests can still run under Gradle as a task that
> starts a
> > > Spark shell in a fresh JVM and executes a small set of SQL commands in
> it.
> > > However, these "big" tests can probably be limited in number and
> complexity
> > > and as such should not generate excessive CI load. More specifically,
> these
> > > tests probably do not need to assert the verbatim output of the SQL
> > > commands. A basic sanity check should be sufficient. Functional tests
> can
> > > be performed under JUnit, I think.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Dmitri.
> > >
> >
>

Reply via email to