Hi All,

This is to recap and follow up on today's Community Sync discussion. This
matter came up during the review of the Spark 4 PR, but I believe it
deserves a dedicated discussion.

Observations:

* Tests under the /regtests directory take a considerable amount of CI
resources. This is mostly due to building custom Docker images.

* When adding support for newer Spark versions, developers naturally tend
to copy existing test infrastructure, which results in building more Docker
images.

* The coverage these tests provide probably does not require the heavy
docker machinery. These tests validate that the Spark Session can interact
with Polaris over the Iceberg REST Catalog Java client. The same coverage
can be provided in an isolated JVM running under Gradle more efficiently
(without a docker env.)

Proposal:

* Gradually convert Spark tests under /regtest to Gradle tasks and JUnit
tests without Docker.

The exact impl. is to be figured out.

Most of the tests can run under the Junit framework using a local Polaris
server (same JVM or different JVM depending on use case).

True "integration" tests can still run under Gradle as a task that starts a
Spark shell in a fresh JVM and executes a small set of SQL commands in it.
However, these "big" tests can probably be limited in number and complexity
and as such should not generate excessive CI load. More specifically, these
tests probably do not need to assert the verbatim output of the SQL
commands. A basic sanity check should be sufficient. Functional tests can
be performed under JUnit, I think.

Thoughts?

Thanks,
Dmitri.

Reply via email to