sandynz opened a new issue, #38270:
URL: https://github.com/apache/shardingsphere/issues/38270

   # Proposal: Smart Trigger Mechanism for E2E-SQL Workflow
   
   ## Summary
   
   This proposal introduces a **Dynamic Matrix Generation** mechanism for 
`.github/workflows/e2e-sql.yml` to replace the current static full-matrix 
strategy. By analyzing PR changed files against a "code-to-test" mapping table, 
the workflow will only run the subset of E2E jobs that are actually affected, 
reducing CI resource consumption by **50-90%** for most PRs while maintaining 
full coverage safety guarantees.
   
   ## Motivation
   
   The current `e2e-sql.yml` workflow defines a 4-dimensional matrix:
   
   | Dimension | Values | Count |
   |-----------|--------|-------|
   | `adapter` | `proxy`, `jdbc` | 2 |
   | `mode` | `Standalone`, `Cluster` | 2 |
   | `database` | `MySQL`, `PostgreSQL` | 2 |
   | `scenario` | 21 scenarios (e.g., `db`, `tbl`, `encrypt`, `shadow`, `mask`, 
...) | 21 |
   
   After applying existing `exclude` rules, a single trigger still produces 
**~130 parallel jobs**. However, most PRs only touch a narrow slice of the 
codebase — for example, a change to `features/encrypt/` should only need to run 
the ~8 encrypt-related scenarios, not all 21.
   
   **Current path-level filtering** (the `on.pull_request.paths` block) can 
only gate the *entire* workflow on/off. It cannot selectively reduce the matrix 
dimensions. The `e2e-operation.yml` workflow already uses `dorny/paths-filter` 
to dynamically select *which operations* to test (see 
[operation-filters.yml](https://github.com/apache/shardingsphere/blob/master/.github/workflows/resources/filter/operation-filters.yml)),
 but `e2e-sql.yml` requires a more sophisticated approach because it has 4 
interacting dimensions rather than a single `operation` dimension.
   
   ## Design
   
   ### Architecture Overview
   
   ```
   PR Trigger
       │
       ▼
   ┌──────────────────────────┐
   │  detect-and-generate     │  (Job 1: ~30 seconds)
   │  ├─ dorny/paths-filter   │  → 18 boolean change labels
   │  └─ generate-matrix.sh   │  → JSON matrix (only affected combinations)
   └──────────┬───────────────┘
              │
        ┌─────┴────────────────────┐
        ▼                          ▼
   ┌──────────────┐    ┌────────────────────────────────┐
   │ build-e2e-   │    │  e2e-sql                       │
   │ image        │    │  strategy:                     │
   │ (skip if no  │    │    matrix: ${{ fromJSON(...) }} │
   │  proxy jobs) │    │  (only affected combinations)  │
   └──────────────┘    └────────────────────────────────┘
   ```
   
   ### Step 1: Code-to-Test Mapping Table
   
   The following table defines how source code directories map to each E2E 
matrix dimension. This is the core intellectual model behind the dynamic matrix.
   
   #### Dimension 1: Adapter
   
   | Changed Path | Adapters to Test | Rationale |
   |---|---|---|
   | `proxy/**/src/main/**` | `proxy` | Proxy-specific code |
   | `distribution/proxy/**` | `proxy` | Proxy packaging |
   | `jdbc/src/main/**` | `jdbc` | JDBC-specific code |
   | `jdbc-dialect/**/src/main/**` | `jdbc` | JDBC dialect adapters |
   | `infra/**`, `kernel/**`, `parser/**`, `database/**`, `features/**`, 
`mode/**` | `proxy`, `jdbc` | Shared infrastructure used by both adapters |
   
   #### Dimension 2: Mode
   
   | Changed Path | Modes to Test | Rationale |
   |---|---|---|
   | `mode/type/standalone/**/src/main/**` | `Standalone` | Standalone-specific 
code |
   | `mode/type/cluster/**/src/main/**` | `Cluster` | Cluster-specific code |
   | `mode/core/**`, `mode/api/**` | `Standalone`, `Cluster` | Shared mode core 
|
   | Any non-mode path | `Standalone`, `Cluster` | Default: test both modes |
   
   #### Dimension 3: Database
   
   | Changed Path | Databases to Test | Rationale |
   |---|---|---|
   | `parser/dialect/mysql/**/src/main/**` | `MySQL` | MySQL dialect parsing |
   | `database/protocol/mysql/**/src/main/**` | `MySQL` | MySQL wire protocol |
   | `proxy/frontend/type/mysql/**/src/main/**` | `MySQL` | Proxy MySQL 
frontend |
   | `jdbc-dialect/mysql/**/src/main/**` | `MySQL` | JDBC MySQL dialect |
   | `parser/dialect/postgresql/**/src/main/**`, 
`parser/dialect/opengauss/**/src/main/**` | `PostgreSQL` | PostgreSQL/openGauss 
dialect parsing |
   | `database/protocol/postgresql/**/src/main/**`, 
`database/protocol/opengauss/**/src/main/**` | `PostgreSQL` | 
PostgreSQL/openGauss wire protocol |
   | `proxy/frontend/type/postgresql/**/src/main/**`, 
`proxy/frontend/type/opengauss/**/src/main/**` | `PostgreSQL` | Proxy 
PostgreSQL/openGauss frontend |
   | `jdbc-dialect/postgresql/**/src/main/**`, 
`jdbc-dialect/opengauss/**/src/main/**` | `PostgreSQL` | JDBC 
PostgreSQL/openGauss dialect |
   | Any non-database-dialect path | `MySQL`, `PostgreSQL` | Default: test both 
databases |
   
   #### Dimension 4: Scenario
   
   | Changed Path | Scenarios to Test |
   |---|---|
   | `features/sharding/**` | `db`, `tbl`, `dbtbl_with_readwrite_splitting`, 
`dbtbl_with_readwrite_splitting_and_encrypt`, `sharding_and_encrypt`, 
`sharding_and_shadow`, `sharding_encrypt_shadow`, `mask_sharding`, 
`mask_encrypt_sharding`, `db_tbl_sql_federation` |
   | `features/encrypt/**` | `encrypt`, 
`dbtbl_with_readwrite_splitting_and_encrypt`, `sharding_and_encrypt`, 
`encrypt_and_readwrite_splitting`, `encrypt_shadow`, `sharding_encrypt_shadow`, 
`mask_encrypt`, `mask_encrypt_sharding` |
   | `features/readwrite-splitting/**` | `readwrite_splitting`, 
`dbtbl_with_readwrite_splitting`, `dbtbl_with_readwrite_splitting_and_encrypt`, 
`encrypt_and_readwrite_splitting`, `readwrite_splitting_and_shadow` |
   | `features/shadow/**` | `shadow`, `encrypt_shadow`, 
`readwrite_splitting_and_shadow`, `sharding_and_shadow`, 
`sharding_encrypt_shadow` |
   | `features/mask/**` | `mask`, `mask_encrypt`, `mask_sharding`, 
`mask_encrypt_sharding` |
   | `**/*-distsql*/**` | `distsql_rdl` |
   | `kernel/sql-federation/**` | `db_tbl_sql_federation` |
   | Adapter/mode/database-only changes (no feature changes) | Core smoke set: 
`empty_rules`, `db`, `tbl`, `encrypt`, `readwrite_splitting`, `passthrough` |
   
   #### Full Fallback (triggers complete ~130-job matrix)
   
   | Changed Path | Rationale |
   |---|---|
   | `infra/**/src/main/**` | Core SPI framework, affects all features |
   | `parser/core/**/src/main/**` | SQL parsing core, affects all SQL types |
   | `database/connector/core/**/src/main/**` | Database connector core, 
affects all DB types |
   | `kernel/{authority,logging,metadata,single,sql-parser}/**/src/main/**` | 
Shared kernel modules |
   | `test/e2e/{sql,env,fixture}/**`, `test/pom.xml` | Test framework itself |
   | `.github/workflows/e2e-sql.yml` | Workflow definition |
   | `**/pom.xml` | Dependency changes may have transitive effects |
   
   ### Step 2: Implementation Plan
   
   #### 2.1 New filter configuration file
   
   Create `.github/workflows/resources/filter/e2e-sql-filters.yml` with 
fine-grained path patterns for each change label. This follows the existing 
pattern established by `operation-filters.yml`.
   
   ```yaml
   # Adapter dimension
   adapter_proxy:
     - 'proxy/**/src/main/**'
     - 'distribution/proxy/**'
     - '!distribution/proxy/src/main/release-docs/**'
   
   adapter_jdbc:
     - 'jdbc/src/main/**'
     - 'jdbc-dialect/**/src/main/**'
   
   # Mode dimension
   mode_standalone:
     - 'mode/type/standalone/**/src/main/**'
   
   mode_cluster:
     - 'mode/type/cluster/**/src/main/**'
   
   mode_core:
     - 'mode/core/**/src/main/**'
     - 'mode/api/**/src/main/**'
   
   # Database dimension
   database_mysql:
     - 'parser/dialect/mysql/**/src/main/**'
     - 'database/protocol/mysql/**/src/main/**'
     - 'proxy/frontend/type/mysql/**/src/main/**'
     - 'jdbc-dialect/mysql/**/src/main/**'
   
   database_postgresql:
     - 'parser/dialect/postgresql/**/src/main/**'
     - 'parser/dialect/opengauss/**/src/main/**'
     - 'database/protocol/postgresql/**/src/main/**'
     - 'database/protocol/opengauss/**/src/main/**'
     - 'proxy/frontend/type/postgresql/**/src/main/**'
     - 'proxy/frontend/type/opengauss/**/src/main/**'
     - 'jdbc-dialect/postgresql/**/src/main/**'
     - 'jdbc-dialect/opengauss/**/src/main/**'
   
   # Feature/Scenario dimension
   feature_sharding:
     - 'features/sharding/**/src/main/**'
   
   feature_encrypt:
     - 'features/encrypt/**/src/main/**'
   
   feature_readwrite_splitting:
     - 'features/readwrite-splitting/**/src/main/**'
   
   feature_shadow:
     - 'features/shadow/**/src/main/**'
   
   feature_mask:
     - 'features/mask/**/src/main/**'
   
   feature_broadcast:
     - 'features/broadcast/**/src/main/**'
   
   feature_distsql:
     - '**/*-distsql*/**/src/main/**'
   
   feature_sql_federation:
     - 'kernel/sql-federation/**/src/main/**'
   
   # Full fallback triggers
   core_infra:
     - 'infra/**/src/main/**'
     - 'parser/core/**/src/main/**'
     - 'database/connector/core/**/src/main/**'
     - 'database/exception/**/src/main/**'
     - 'kernel/authority/**/src/main/**'
     - 'kernel/logging/**/src/main/**'
     - 'kernel/metadata/**/src/main/**'
     - 'kernel/single/**/src/main/**'
     - 'kernel/sql-parser/**/src/main/**'
   
   test_framework:
     - '.github/workflows/e2e-sql.yml'
     - 'test/pom.xml'
     - 'test/e2e/fixture/**'
     - 'test/e2e/env/**'
     - 'test/e2e/sql/**'
   
   pom_changes:
     - '**/pom.xml'
   ```
   
   #### 2.2 Matrix generation script
   
   Create `.github/workflows/resources/scripts/generate-e2e-sql-matrix.sh` that:
   
   1. Reads the 18 boolean change labels from `dorny/paths-filter` output.
   2. Determines if full fallback is needed (`core_infra || test_framework || 
pom_changes`).
   3. Computes the minimal set of `adapter`, `mode`, `database`, and `scenario` 
values.
   4. Applies the existing exclude rules (e.g., `jdbc+passthrough`, 
`jdbc+Cluster`, `proxy+Standalone+empty_rules`, etc.).
   5. Applies the existing include rules (e.g., the extra `passthrough` job 
with `-Dmysql-connector-java.version=8.3.0`).
   6. Outputs a JSON object in `{"include": [...]}` format consumable by 
`strategy.matrix: ${{ fromJSON(...) }}`.
   
   The script logic in pseudocode:
   
   ```
   IF core_infra OR test_framework OR pom_changes changed:
       adapters  = [proxy, jdbc]
       modes     = [Standalone, Cluster]
       databases = [MySQL, PostgreSQL]
       scenarios = ALL_21_SCENARIOS
   ELSE:
       adapters  = union_of(adapter-specific labels, OR both if feature/mode/db 
changed)
       modes     = union_of(mode-specific labels, OR both if non-mode changed)
       databases = union_of(database-specific labels, OR both if non-db changed)
       scenarios = union_of(feature-to-scenario mappings)
       IF scenarios is empty:
           scenarios = CORE_SMOKE_SET  # fallback smoke scenarios
   
   FOR each (adapter, mode, database, scenario) in cartesian product:
       APPLY exclude rules (skip disallowed combinations)
   
   APPEND include rules (extra passthrough with connector version)
   OUTPUT as JSON
   ```
   
   #### 2.3 Updated workflow structure
   
   The updated `e2e-sql.yml` will have 4 jobs:
   
   | Job | Purpose | Condition |
   |-----|---------|-----------|
   | `global-environment` | Import reusable environment | Always |
   | `detect-and-generate-matrix` | Run paths-filter + matrix script | Always |
   | `build-e2e-image` | Build proxy Docker image | Only if matrix contains 
`adapter: proxy` |
   | `e2e-sql` | Run E2E tests | `strategy.matrix: ${{ 
fromJSON(needs.detect-and-generate-matrix.outputs.matrix) }}` |
   
   Key changes from the current workflow:
   - The static `strategy.matrix` block with hardcoded values is replaced by 
`${{ fromJSON(...) }}`.
   - `build-e2e-image` gains a conditional: it is skipped entirely when no 
proxy adapter jobs exist in the matrix (saves ~15-20 min).
   - The `e2e-sql` job uses `if: always() && ... && 
(needs.build-e2e-image.result == 'success' || needs.build-e2e-image.result == 
'skipped')` to handle the case where build-e2e-image is skipped (JDBC-only 
runs).
   
   ### Step 3: Safety & Complexity Strategies
   
   #### 3.1 Full fallback guarantee
   
   The design is deliberately conservative. **Any change that cannot be 
confidently scoped triggers the full matrix:**
   
   - `infra/` — the SPI and utility foundation for everything
   - `parser/core/` — shared SQL parsing engine
   - `database/connector/core/` — shared database abstraction
   - `kernel/{authority,logging,metadata,single,sql-parser}/` — cross-cutting 
kernel modules
   - `**/pom.xml` — dependency changes may have transitive effects
   - `test/e2e/sql/**`, `test/e2e/env/**`, `test/e2e/fixture/**` — test 
framework changes
   - `.github/workflows/e2e-sql.yml` — workflow definition changes
   - `workflow_dispatch` — manual trigger always runs the full matrix
   
   #### 3.2 POM changes (future optimization)
   
   Currently `**/pom.xml` triggers the full matrix. A future enhancement could 
distinguish:
   - Root `pom.xml` or `test/pom.xml` → full matrix (dependency management 
changes)
   - `features/encrypt/pom.xml` → only encrypt-related scenarios
   
   This requires careful analysis and is deferred to a follow-up iteration.
   
   #### 3.3 Optional: Two-phase execution (future enhancement)
   
   For full-matrix runs (~130 jobs), a two-phase gate could provide faster 
feedback:
   
   - **Phase 1 (Smoke):** Run 6-8 representative jobs covering key 
combinations. Takes ~15 min.
   - **Phase 2 (Full):** Only proceeds if Phase 1 passes. Runs remaining ~120 
jobs.
   
   This avoids wasting resources on the full matrix when a basic regression is 
caught quickly. It can be implemented by adding a `phase` field to matrix items 
and splitting into two downstream jobs.
   
   ## Expected Impact
   
   | PR Change Scope | Current Jobs | Projected Jobs | Reduction |
   |---|---|---|---|
   | Only `features/encrypt/` | ~130 | ~38 | **~71%** |
   | Only `features/mask/` | ~130 | ~18 | **~86%** |
   | Only `proxy/` | ~130 | ~65 | **~50%** |
   | Only `jdbc/` + `features/shadow/` | ~130 | ~10 | **~92%** |
   | Only `mode/type/standalone/` | ~130 | ~65 | **~50%** |
   | Only `parser/dialect/mysql/` | ~130 | ~65 | **~50%** |
   | `infra/` (full fallback) | ~130 | ~130 | 0% |
   
   For a project with high PR volume, even a conservative estimate of **60% 
average reduction** across all PRs translates to significant savings in CI 
minutes and faster feedback loops.
   
   ## References
   
   - Current workflow: 
[`.github/workflows/e2e-sql.yml`](https://github.com/apache/shardingsphere/blob/master/.github/workflows/e2e-sql.yml)
   - Existing dynamic pattern: 
[`.github/workflows/e2e-operation.yml`](https://github.com/apache/shardingsphere/blob/master/.github/workflows/e2e-operation.yml)
 + 
[`operation-filters.yml`](https://github.com/apache/shardingsphere/blob/master/.github/workflows/resources/filter/operation-filters.yml)
   - E2E test parameter generator: 
[`E2ETestParameterGenerator.java`](https://github.com/apache/shardingsphere/blob/master/test/e2e/sql/src/test/java/org/apache/shardingsphere/test/e2e/sql/framework/param/array/E2ETestParameterGenerator.java)
   - Module architecture: 
[`CLAUDE.md`](https://github.com/apache/shardingsphere/blob/master/CLAUDE.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to