There are a number of times the e2e purest have caught issues the have tests don’t cover. Personally, I’d love to see expansion of the tests to cover frameworks that we simply can’t cover in the Java tests (pyspark, duckdb, trino, etc.). The variance in support across engines and catalogs was kind of a theme at the Iceberg Summit in April. Broader coverage of testing in the ecosystem would really help Polaris in that regard.
Mike On Thu, May 28, 2026 at 7:59 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi Yong, > > Good questions :) As for me I'd refactor all shell-based regtests that use > Docker into Gradle-based tests (preferably JUnit). > > If people find value in the section of the regtests that talk to real > cloud services (but do not run in CI), we can keep them. Still, we have > examples of JUnit-based tests that can talk to real cloud services too. > > However, this is just my opinion. Let's wait a bit for other people to > express their opinions. We had a good discussion during the sync call, but > I'm not sure we have consensus yet... hence this thread. > > Cheers, > Dmitri. > > On Thu, May 28, 2026 at 10:00 PM Yong Zheng <[email protected]> wrote: > > > One clarification, the regtests we talk about here is only > > https://github.com/apache/polaris/tree/main/plugins/spark/v3.5/regtests > > or https://github.com/apache/polaris/tree/main/regtests as well? as the > > v3.5/regtests doesn't have cloud dependencies but the later one has. > > > > Thanks, > > Yong Zheng > > > > On 2026/05/29 01:29:24 Yong Zheng wrote: > > > Hello Dmitri, > > > > > > Thanks for the recap. If I understand correctly, we want to merge test > > coverage from `/regtests` (docker based) into `/integration`? Looking at > > them, integration has most of the tests we have in regtests excepts the > > ones that are real cloud specific such as spark_sql_gcp* and > > spark_sql_azure* as well as pyspark specific (t_pyspark). The cloud > > specific ones are not part of CI, in that case, do we still want to > covert > > them? Also, I am not sure if we can convert the pyspark code into JUnit > > directly. > > > > > > Also, regarding those duplicates classes, should we move them to > common? > > For the ones that are very similar but minor diff, should we proceed with > > adapters for version specific (meaning, we won't be follow what Iceberg > is > > doing with different version of spark). > > > > > > Lastly, should we close current PR and handle the two above in a > > separate PRs first before revisiting Spark4 support? > > > > > > Thanks, > > > Yong Zheng > > > > > > On 2026/05/28 21:07:44 Dmitri Bourlatchkov wrote: > > > > Hi All, > > > > > > > > This is to recap and follow up on today's Community Sync discussion. > > This > > > > matter came up during the review of the Spark 4 PR, but I believe it > > > > deserves a dedicated discussion. > > > > > > > > Observations: > > > > > > > > * Tests under the /regtests directory take a considerable amount of > CI > > > > resources. This is mostly due to building custom Docker images. > > > > > > > > * When adding support for newer Spark versions, developers naturally > > tend > > > > to copy existing test infrastructure, which results in building more > > Docker > > > > images. > > > > > > > > * The coverage these tests provide probably does not require the > heavy > > > > docker machinery. These tests validate that the Spark Session can > > interact > > > > with Polaris over the Iceberg REST Catalog Java client. The same > > coverage > > > > can be provided in an isolated JVM running under Gradle more > > efficiently > > > > (without a docker env.) > > > > > > > > Proposal: > > > > > > > > * Gradually convert Spark tests under /regtest to Gradle tasks and > > JUnit > > > > tests without Docker. > > > > > > > > The exact impl. is to be figured out. > > > > > > > > Most of the tests can run under the Junit framework using a local > > Polaris > > > > server (same JVM or different JVM depending on use case). > > > > > > > > True "integration" tests can still run under Gradle as a task that > > starts a > > > > Spark shell in a fresh JVM and executes a small set of SQL commands > in > > it. > > > > However, these "big" tests can probably be limited in number and > > complexity > > > > and as such should not generate excessive CI load. More specifically, > > these > > > > tests probably do not need to assert the verbatim output of the SQL > > > > commands. A basic sanity check should be sufficient. Functional tests > > can > > > > be performed under JUnit, I think. > > > > > > > > Thoughts? > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > > >
