Hi All, This is to recap and follow up on today's Community Sync discussion. This matter came up during the review of the Spark 4 PR, but I believe it deserves a dedicated discussion.
Observations: * Tests under the /regtests directory take a considerable amount of CI resources. This is mostly due to building custom Docker images. * When adding support for newer Spark versions, developers naturally tend to copy existing test infrastructure, which results in building more Docker images. * The coverage these tests provide probably does not require the heavy docker machinery. These tests validate that the Spark Session can interact with Polaris over the Iceberg REST Catalog Java client. The same coverage can be provided in an isolated JVM running under Gradle more efficiently (without a docker env.) Proposal: * Gradually convert Spark tests under /regtest to Gradle tasks and JUnit tests without Docker. The exact impl. is to be figured out. Most of the tests can run under the Junit framework using a local Polaris server (same JVM or different JVM depending on use case). True "integration" tests can still run under Gradle as a task that starts a Spark shell in a fresh JVM and executes a small set of SQL commands in it. However, these "big" tests can probably be limited in number and complexity and as such should not generate excessive CI load. More specifically, these tests probably do not need to assert the verbatim output of the SQL commands. A basic sanity check should be sufficient. Functional tests can be performed under JUnit, I think. Thoughts? Thanks, Dmitri.
