Yicong-Huang opened a new issue, #4870: URL: https://github.com/apache/texera/issues/4870
### Task Summary The amber CI job currently runs every Scala test in `WorkflowExecutionService` (66 spec files) inside a single matrix entry that always installs both Scala and Python dependencies. Only a handful of tests actually need Python at runtime (they spawn Python UDF workers via the e2e harness); the rest are pure-Scala unit tests that pay for the Python install on every run and conflate "needs Python" failures with engine-internal regressions. Split into two jobs, incrementally: 1. Add a class-level ScalaTest tag annotation `@IntegrationTest` (FQN `org.apache.texera.amber.tags.IntegrationTest`) under `amber/src/test/scala/...`. ScalaTest will pick this up via its tag annotation machinery, so no per-test `taggedAs(...)` is required. 2. Introduce a new `amber-integration` job in `build.yml` that mirrors the existing `amber` job's setup (JDK + sbt + Postgres) plus Python and runs only tests tagged `IntegrationTest`: `sbt 'WorkflowExecutionService/testOnly * -- -n org.apache.texera.amber.tags.IntegrationTest'`. 3. Modify the existing `amber` job to skip the same tag (`-l ...`) and drop its Python setup. 4. Wire `run_amber_integration` through `precheck` / `required-checks.yml` so the new job is gated identically to `amber`. 5. As the first migration, annotate `engine/e2e/ReconfigurationSpec.scala` (5 tests, the only e2e spec that actually spawns Python UDFs). Other e2e specs (`DataProcessingSpec`, `PauseSpec`, `BatchSizePropagationSpec`, `PythonWorkflowWorkerSpec`) can move in follow-up PRs as they are reviewed. ### Task Type - [x] Refactor / Cleanup - [x] DevOps / Deployment / CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
