morningman opened a new pull request, #64143: URL: https://github.com/apache/doris/pull/64143
## Proposed changes P3 of the catalog-SPI migration (base: `branch-catalog-spi`). Migrates the **hudi** connector following the **hybrid** strategy (D-019): harden the dormant HMS-over-SPI hudi connector to correctness parity, build a test baseline, and write the per-table dispatch design — **all behind the closed gate** (`SPI_READY_TYPES` unchanged). > ⚠️ **No user-visible behavior change.** The SPI hudi path stays dormant (gate closed); hudi queries continue to use the legacy `HMSExternalTable.dlaType=HUDI` path. This PR removes correctness blockers ahead of the live cutover (deferred to P7 / batch E). ### What's included **Correctness fixes (hardening dormant code, behind gate):** - **T02** — fix hudi JNI `column_types` double bug: emit full Hive type strings (was Doris bare type names, losing precision/scale/subtypes) and send `column_names`/`column_types`/`delta_logs` as typed lists end-to-end (was comma join/split, which shattered `decimal(10,2)` / `struct<...>`). Matches the BE `hudi_jni_reader.cpp` contract (names `,` / types `#` / delta `,`). - **T04** — fail loud on time-travel / incremental read in the SPI `visitPhysicalHudiScan` branch (was silently returning the latest snapshot / silently full-scanning). - **T05** — real EQ/IN partition pruning in `HudiConnectorMetadata.applyFilter` (was a placeholder that ignored predicates and unconditionally switched the partition source from Hudi-metadata to HMS); faithfully mirrors `HiveConnectorMetadata.applyFilter`. - **T07** — column-name casing fix in `avroSchemaToColumns` (top-level lowercase, mirroring legacy `HMSExternalTable`). **Test baseline (all three connector modules started P3 with 0 tests):** - `fe-connector-hudi` (33): type-mapping / schema-parity (COW/MOR golden) / table-type / partition-pruning / scan-range. - `fe-connector-hms` (12): shared Hive-type-string parser tests. - `fe-connector-hive` (14): file-format / partition-pruning (mirrors T05). - COW/MOR schema is **type-agnostic** (golden parity vs legacy `initHudiSchema`); table type only affects scan planning. **Decisions / design (code-grounded, design-only):** - **T03** — defer `schema_id`/`history_schema_info` field-id evolution to batch E (DV-006; not a model-agnostic SPI fix). - **T06** — keep MVCC/snapshot SPI defaults (opt-out) + document (DV-007). - **T08** — `tableFormatType` dispatch design memo + **D-020**: single `hms` catalog per-table routing via a new backward-compatible `ConnectorMetadata.getScanPlanProvider(handle)` (per-table provider seam); refines D-005. The keystone gap is split into M1 (identity consumption, fe-core reads `tableFormatType` as an opaque string) and M2 (scan routing). ### Deferred to batch E / P7 (not in this PR) Gate flip (`SPI_READY_TYPES += hms/hudi`), fe-core `tableFormatType` consumption (M1+M2 implementation), live cutover, delete legacy `datasource/hudi/`, full incremental/time-travel/MVCC, Iceberg-on-hms via SPI (needs P6 `IcebergScanPlanProvider`), cluster/runtime validation. ### Verification Per task tracking, each code batch landed with: per-module compile + checkstyle 0 (incl. test sources) + connector import-gate pass + new unit tests green. The two most recent commits are docs-only (`plan-doc/`); the code is unchanged since the last green batch. Gate stays closed → the dormant SPI path is unreachable at runtime → zero live-path risk. CI re-verifies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
