sandynz opened a new issue, #38270: URL: https://github.com/apache/shardingsphere/issues/38270
# Proposal: Smart Trigger Mechanism for E2E-SQL Workflow ## Summary This proposal introduces a **Dynamic Matrix Generation** mechanism for `.github/workflows/e2e-sql.yml` to replace the current static full-matrix strategy. By analyzing PR changed files against a "code-to-test" mapping table, the workflow will only run the subset of E2E jobs that are actually affected, reducing CI resource consumption by **50-90%** for most PRs while maintaining full coverage safety guarantees. ## Motivation The current `e2e-sql.yml` workflow defines a 4-dimensional matrix: | Dimension | Values | Count | |-----------|--------|-------| | `adapter` | `proxy`, `jdbc` | 2 | | `mode` | `Standalone`, `Cluster` | 2 | | `database` | `MySQL`, `PostgreSQL` | 2 | | `scenario` | 21 scenarios (e.g., `db`, `tbl`, `encrypt`, `shadow`, `mask`, ...) | 21 | After applying existing `exclude` rules, a single trigger still produces **~130 parallel jobs**. However, most PRs only touch a narrow slice of the codebase — for example, a change to `features/encrypt/` should only need to run the ~8 encrypt-related scenarios, not all 21. **Current path-level filtering** (the `on.pull_request.paths` block) can only gate the *entire* workflow on/off. It cannot selectively reduce the matrix dimensions. The `e2e-operation.yml` workflow already uses `dorny/paths-filter` to dynamically select *which operations* to test (see [operation-filters.yml](https://github.com/apache/shardingsphere/blob/master/.github/workflows/resources/filter/operation-filters.yml)), but `e2e-sql.yml` requires a more sophisticated approach because it has 4 interacting dimensions rather than a single `operation` dimension. ## Design ### Architecture Overview ``` PR Trigger │ ▼ ┌──────────────────────────┐ │ detect-and-generate │ (Job 1: ~30 seconds) │ ├─ dorny/paths-filter │ → 18 boolean change labels │ └─ generate-matrix.sh │ → JSON matrix (only affected combinations) └──────────┬───────────────┘ │ ┌─────┴────────────────────┐ ▼ ▼ ┌──────────────┐ ┌────────────────────────────────┐ │ build-e2e- │ │ e2e-sql │ │ image │ │ strategy: │ │ (skip if no │ │ matrix: ${{ fromJSON(...) }} │ │ proxy jobs) │ │ (only affected combinations) │ └──────────────┘ └────────────────────────────────┘ ``` ### Step 1: Code-to-Test Mapping Table The following table defines how source code directories map to each E2E matrix dimension. This is the core intellectual model behind the dynamic matrix. #### Dimension 1: Adapter | Changed Path | Adapters to Test | Rationale | |---|---|---| | `proxy/**/src/main/**` | `proxy` | Proxy-specific code | | `distribution/proxy/**` | `proxy` | Proxy packaging | | `jdbc/src/main/**` | `jdbc` | JDBC-specific code | | `jdbc-dialect/**/src/main/**` | `jdbc` | JDBC dialect adapters | | `infra/**`, `kernel/**`, `parser/**`, `database/**`, `features/**`, `mode/**` | `proxy`, `jdbc` | Shared infrastructure used by both adapters | #### Dimension 2: Mode | Changed Path | Modes to Test | Rationale | |---|---|---| | `mode/type/standalone/**/src/main/**` | `Standalone` | Standalone-specific code | | `mode/type/cluster/**/src/main/**` | `Cluster` | Cluster-specific code | | `mode/core/**`, `mode/api/**` | `Standalone`, `Cluster` | Shared mode core | | Any non-mode path | `Standalone`, `Cluster` | Default: test both modes | #### Dimension 3: Database | Changed Path | Databases to Test | Rationale | |---|---|---| | `parser/dialect/mysql/**/src/main/**` | `MySQL` | MySQL dialect parsing | | `database/protocol/mysql/**/src/main/**` | `MySQL` | MySQL wire protocol | | `proxy/frontend/type/mysql/**/src/main/**` | `MySQL` | Proxy MySQL frontend | | `jdbc-dialect/mysql/**/src/main/**` | `MySQL` | JDBC MySQL dialect | | `parser/dialect/postgresql/**/src/main/**`, `parser/dialect/opengauss/**/src/main/**` | `PostgreSQL` | PostgreSQL/openGauss dialect parsing | | `database/protocol/postgresql/**/src/main/**`, `database/protocol/opengauss/**/src/main/**` | `PostgreSQL` | PostgreSQL/openGauss wire protocol | | `proxy/frontend/type/postgresql/**/src/main/**`, `proxy/frontend/type/opengauss/**/src/main/**` | `PostgreSQL` | Proxy PostgreSQL/openGauss frontend | | `jdbc-dialect/postgresql/**/src/main/**`, `jdbc-dialect/opengauss/**/src/main/**` | `PostgreSQL` | JDBC PostgreSQL/openGauss dialect | | Any non-database-dialect path | `MySQL`, `PostgreSQL` | Default: test both databases | #### Dimension 4: Scenario | Changed Path | Scenarios to Test | |---|---| | `features/sharding/**` | `db`, `tbl`, `dbtbl_with_readwrite_splitting`, `dbtbl_with_readwrite_splitting_and_encrypt`, `sharding_and_encrypt`, `sharding_and_shadow`, `sharding_encrypt_shadow`, `mask_sharding`, `mask_encrypt_sharding`, `db_tbl_sql_federation` | | `features/encrypt/**` | `encrypt`, `dbtbl_with_readwrite_splitting_and_encrypt`, `sharding_and_encrypt`, `encrypt_and_readwrite_splitting`, `encrypt_shadow`, `sharding_encrypt_shadow`, `mask_encrypt`, `mask_encrypt_sharding` | | `features/readwrite-splitting/**` | `readwrite_splitting`, `dbtbl_with_readwrite_splitting`, `dbtbl_with_readwrite_splitting_and_encrypt`, `encrypt_and_readwrite_splitting`, `readwrite_splitting_and_shadow` | | `features/shadow/**` | `shadow`, `encrypt_shadow`, `readwrite_splitting_and_shadow`, `sharding_and_shadow`, `sharding_encrypt_shadow` | | `features/mask/**` | `mask`, `mask_encrypt`, `mask_sharding`, `mask_encrypt_sharding` | | `**/*-distsql*/**` | `distsql_rdl` | | `kernel/sql-federation/**` | `db_tbl_sql_federation` | | Adapter/mode/database-only changes (no feature changes) | Core smoke set: `empty_rules`, `db`, `tbl`, `encrypt`, `readwrite_splitting`, `passthrough` | #### Full Fallback (triggers complete ~130-job matrix) | Changed Path | Rationale | |---|---| | `infra/**/src/main/**` | Core SPI framework, affects all features | | `parser/core/**/src/main/**` | SQL parsing core, affects all SQL types | | `database/connector/core/**/src/main/**` | Database connector core, affects all DB types | | `kernel/{authority,logging,metadata,single,sql-parser}/**/src/main/**` | Shared kernel modules | | `test/e2e/{sql,env,fixture}/**`, `test/pom.xml` | Test framework itself | | `.github/workflows/e2e-sql.yml` | Workflow definition | | `**/pom.xml` | Dependency changes may have transitive effects | ### Step 2: Implementation Plan #### 2.1 New filter configuration file Create `.github/workflows/resources/filter/e2e-sql-filters.yml` with fine-grained path patterns for each change label. This follows the existing pattern established by `operation-filters.yml`. ```yaml # Adapter dimension adapter_proxy: - 'proxy/**/src/main/**' - 'distribution/proxy/**' - '!distribution/proxy/src/main/release-docs/**' adapter_jdbc: - 'jdbc/src/main/**' - 'jdbc-dialect/**/src/main/**' # Mode dimension mode_standalone: - 'mode/type/standalone/**/src/main/**' mode_cluster: - 'mode/type/cluster/**/src/main/**' mode_core: - 'mode/core/**/src/main/**' - 'mode/api/**/src/main/**' # Database dimension database_mysql: - 'parser/dialect/mysql/**/src/main/**' - 'database/protocol/mysql/**/src/main/**' - 'proxy/frontend/type/mysql/**/src/main/**' - 'jdbc-dialect/mysql/**/src/main/**' database_postgresql: - 'parser/dialect/postgresql/**/src/main/**' - 'parser/dialect/opengauss/**/src/main/**' - 'database/protocol/postgresql/**/src/main/**' - 'database/protocol/opengauss/**/src/main/**' - 'proxy/frontend/type/postgresql/**/src/main/**' - 'proxy/frontend/type/opengauss/**/src/main/**' - 'jdbc-dialect/postgresql/**/src/main/**' - 'jdbc-dialect/opengauss/**/src/main/**' # Feature/Scenario dimension feature_sharding: - 'features/sharding/**/src/main/**' feature_encrypt: - 'features/encrypt/**/src/main/**' feature_readwrite_splitting: - 'features/readwrite-splitting/**/src/main/**' feature_shadow: - 'features/shadow/**/src/main/**' feature_mask: - 'features/mask/**/src/main/**' feature_broadcast: - 'features/broadcast/**/src/main/**' feature_distsql: - '**/*-distsql*/**/src/main/**' feature_sql_federation: - 'kernel/sql-federation/**/src/main/**' # Full fallback triggers core_infra: - 'infra/**/src/main/**' - 'parser/core/**/src/main/**' - 'database/connector/core/**/src/main/**' - 'database/exception/**/src/main/**' - 'kernel/authority/**/src/main/**' - 'kernel/logging/**/src/main/**' - 'kernel/metadata/**/src/main/**' - 'kernel/single/**/src/main/**' - 'kernel/sql-parser/**/src/main/**' test_framework: - '.github/workflows/e2e-sql.yml' - 'test/pom.xml' - 'test/e2e/fixture/**' - 'test/e2e/env/**' - 'test/e2e/sql/**' pom_changes: - '**/pom.xml' ``` #### 2.2 Matrix generation script Create `.github/workflows/resources/scripts/generate-e2e-sql-matrix.sh` that: 1. Reads the 18 boolean change labels from `dorny/paths-filter` output. 2. Determines if full fallback is needed (`core_infra || test_framework || pom_changes`). 3. Computes the minimal set of `adapter`, `mode`, `database`, and `scenario` values. 4. Applies the existing exclude rules (e.g., `jdbc+passthrough`, `jdbc+Cluster`, `proxy+Standalone+empty_rules`, etc.). 5. Applies the existing include rules (e.g., the extra `passthrough` job with `-Dmysql-connector-java.version=8.3.0`). 6. Outputs a JSON object in `{"include": [...]}` format consumable by `strategy.matrix: ${{ fromJSON(...) }}`. The script logic in pseudocode: ``` IF core_infra OR test_framework OR pom_changes changed: adapters = [proxy, jdbc] modes = [Standalone, Cluster] databases = [MySQL, PostgreSQL] scenarios = ALL_21_SCENARIOS ELSE: adapters = union_of(adapter-specific labels, OR both if feature/mode/db changed) modes = union_of(mode-specific labels, OR both if non-mode changed) databases = union_of(database-specific labels, OR both if non-db changed) scenarios = union_of(feature-to-scenario mappings) IF scenarios is empty: scenarios = CORE_SMOKE_SET # fallback smoke scenarios FOR each (adapter, mode, database, scenario) in cartesian product: APPLY exclude rules (skip disallowed combinations) APPEND include rules (extra passthrough with connector version) OUTPUT as JSON ``` #### 2.3 Updated workflow structure The updated `e2e-sql.yml` will have 4 jobs: | Job | Purpose | Condition | |-----|---------|-----------| | `global-environment` | Import reusable environment | Always | | `detect-and-generate-matrix` | Run paths-filter + matrix script | Always | | `build-e2e-image` | Build proxy Docker image | Only if matrix contains `adapter: proxy` | | `e2e-sql` | Run E2E tests | `strategy.matrix: ${{ fromJSON(needs.detect-and-generate-matrix.outputs.matrix) }}` | Key changes from the current workflow: - The static `strategy.matrix` block with hardcoded values is replaced by `${{ fromJSON(...) }}`. - `build-e2e-image` gains a conditional: it is skipped entirely when no proxy adapter jobs exist in the matrix (saves ~15-20 min). - The `e2e-sql` job uses `if: always() && ... && (needs.build-e2e-image.result == 'success' || needs.build-e2e-image.result == 'skipped')` to handle the case where build-e2e-image is skipped (JDBC-only runs). ### Step 3: Safety & Complexity Strategies #### 3.1 Full fallback guarantee The design is deliberately conservative. **Any change that cannot be confidently scoped triggers the full matrix:** - `infra/` — the SPI and utility foundation for everything - `parser/core/` — shared SQL parsing engine - `database/connector/core/` — shared database abstraction - `kernel/{authority,logging,metadata,single,sql-parser}/` — cross-cutting kernel modules - `**/pom.xml` — dependency changes may have transitive effects - `test/e2e/sql/**`, `test/e2e/env/**`, `test/e2e/fixture/**` — test framework changes - `.github/workflows/e2e-sql.yml` — workflow definition changes - `workflow_dispatch` — manual trigger always runs the full matrix #### 3.2 POM changes (future optimization) Currently `**/pom.xml` triggers the full matrix. A future enhancement could distinguish: - Root `pom.xml` or `test/pom.xml` → full matrix (dependency management changes) - `features/encrypt/pom.xml` → only encrypt-related scenarios This requires careful analysis and is deferred to a follow-up iteration. #### 3.3 Optional: Two-phase execution (future enhancement) For full-matrix runs (~130 jobs), a two-phase gate could provide faster feedback: - **Phase 1 (Smoke):** Run 6-8 representative jobs covering key combinations. Takes ~15 min. - **Phase 2 (Full):** Only proceeds if Phase 1 passes. Runs remaining ~120 jobs. This avoids wasting resources on the full matrix when a basic regression is caught quickly. It can be implemented by adding a `phase` field to matrix items and splitting into two downstream jobs. ## Expected Impact | PR Change Scope | Current Jobs | Projected Jobs | Reduction | |---|---|---|---| | Only `features/encrypt/` | ~130 | ~38 | **~71%** | | Only `features/mask/` | ~130 | ~18 | **~86%** | | Only `proxy/` | ~130 | ~65 | **~50%** | | Only `jdbc/` + `features/shadow/` | ~130 | ~10 | **~92%** | | Only `mode/type/standalone/` | ~130 | ~65 | **~50%** | | Only `parser/dialect/mysql/` | ~130 | ~65 | **~50%** | | `infra/` (full fallback) | ~130 | ~130 | 0% | For a project with high PR volume, even a conservative estimate of **60% average reduction** across all PRs translates to significant savings in CI minutes and faster feedback loops. ## References - Current workflow: [`.github/workflows/e2e-sql.yml`](https://github.com/apache/shardingsphere/blob/master/.github/workflows/e2e-sql.yml) - Existing dynamic pattern: [`.github/workflows/e2e-operation.yml`](https://github.com/apache/shardingsphere/blob/master/.github/workflows/e2e-operation.yml) + [`operation-filters.yml`](https://github.com/apache/shardingsphere/blob/master/.github/workflows/resources/filter/operation-filters.yml) - E2E test parameter generator: [`E2ETestParameterGenerator.java`](https://github.com/apache/shardingsphere/blob/master/test/e2e/sql/src/test/java/org/apache/shardingsphere/test/e2e/sql/framework/param/array/E2ETestParameterGenerator.java) - Module architecture: [`CLAUDE.md`](https://github.com/apache/shardingsphere/blob/master/CLAUDE.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
