wolfboys opened a new pull request, #4450:
URL: https://github.com/apache/flink-cdc/pull/4450

   ## Summary
   
   Completes the MySQL → Doris schema change pipeline for the column-reference 
/ comment propagation work started in earlier PRs. Closes the comment-deletion 
regression (`AlterColumnTypeEvent` silently dropped `comments=null` entries) 
and unblocks end-to-end comment flow by enabling Debezium's 
`include.schema.comments` in the MySQL CDC source.
   
   ## What's in this PR
   
   ### Bug fixes
   - **Bug #1 — comment removal silently dropped.** 
`SchemaMergingUtils.getSchemaDifference` and 
`AlterColumnTypeEvent.addColumnComment` now allow `null` values to mean "remove 
the comment", so a downstream `MODIFY COLUMN <col> <type>` (no `COMMENT` 
clause) is propagated as a comment removal event instead of being filtered out. 
Added 3 unit tests covering null propagation through `applySchemaChangeEvent`, 
`isSchemaChangeEventRedundant`, and the trim/copy helpers.
   - **Issue #20 — MySQL CDC source dropping column comments.** Debezium 1.9.8 
defaults `include.schema.comments` to `false`, so the parsed `Table` never had 
`Column.comment` set. `MySqlSourceConfigFactory` now sets 
`include.schema.comments=true`, which makes `CreateTableEvent` and downstream 
`AlterColumnTypeEvent` carry column comments all the way to the sink metadata 
applier. Verified in the e2e log: `CreateTableEvent` now reads `\`id\` BIGINT 
NOT NULL 'student id'`, and the generated Doris DDL includes `COMMENT 'student 
id'`.
   
   ### Cleanups
   - Removed two dead `containsKey` checks in 
`PostTransformOperator.rewriteRenameColumnEvent` that the lineage-map invariant 
guarantees unreachable.
   - Downgraded `SchemaDerivator.normalizeSchemaChangeEvents` info log to debug 
(the dump of full schema-change events is too verbose for INFO level).
   - Inlined the trivial `describeSchemaColumns` wrapper.
   
   ### Tests
   - New unit tests in `SchemaMergingUtilsTest` and `SchemaUtilsTest` covering 
comment removal / null propagation / `addColumnComment`.
   - Extended `SchemaDerivatorTest` with a case-insensitive column-reference 
normalization test for `ADD` / `ALTER` / `RENAME` / `DROP` schema changes.
   - New `PostTransformOperatorProjectionKeysRegressionTest` cases for 
`column-name-case: LOWER` covering `RENAME` / `ALTER` / `ADD` / `DROP` / 
`comment-only alter` / `lower-case rewrite`.
   - New `DorisMetadataApplierTest` case for comment-only alter carrying an 
upstream `JOB` resolved to the lower-case physical `job`.
   - **New e2e tests** in `MySqlToDorisE2eITCase`:
     - `testSchemaChangeWithColumnReferenceAndCommentAcrossCase` — 8-step DDL 
matrix (snapshot, type-only, type+comment, comment-only, comment removal, ADD 
with AFTER, RENAME, DROP) with comment verification via 
`INFORMATION_SCHEMA.COLUMNS`.
     - 
`testSchemaChangeWithColumnReferenceAndCommentAcrossCaseAndLowerTransform` — 
same matrix under `column-name-case: LOWER`.
   
   ### Unit test status: **363/363 passing** (was 149 before this PR — gain of 
214 from the newly-enabled MySQL CDC tests).
   
   ## Test plan
   
   - All unit tests pass on `flink-cdc-common`, `flink-cdc-runtime`, 
`flink-cdc-pipeline-connector-doris`, `flink-cdc-pipeline-connector-paimon`, 
and `flink-connector-mysql-cdc`.
   - Manual regression procedure is documented in 
`issue/MYSQL_TO_DORIS_REGRESSION_ACCEPTANCE_MANUAL_2026-06-28.md` for the 
tester to run against a real MySQL+Doris pair.
   
   ## Known issue (out of scope for this PR)
   
   The testcontainer-based e2e tests in `MySqlToDorisE2eITCase` cannot be run 
successfully on the `schema_cache` branch today: 
`apache/doris:doris-all-in-one-2.1.0` returns `code=0` from `CREATE TABLE` over 
HTTP but the table is not yet visible to JDBC `DESCRIBE` for ~3 minutes (or 
ever in some runs). This affects every existing MySQL→Doris e2e test on the 
branch, not just the new ones. It is a pre-existing bug introduced by `bc2df451 
[hotfix] Fix Doris schema cache miss and column position handling` and is 
independent of this PR. Fixing it requires a separate, focused investigation of 
the FE→BE metadata sync path in that hotfix; it is **out of scope for this PR** 
but should be addressed in a follow-up. The end-to-end behaviour is verifiable 
against a real Doris cluster using the acceptance manual referenced above.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to