dangerousfeng opened a new pull request, #4370:
URL: https://github.com/apache/flink-cdc/pull/4370

   ## Summary
   
   When recovering from a checkpoint/savepoint, binlog events may be replayed, 
causing `AddColumnEvent` to be applied for columns that already exist in the 
cached schema. This leads to duplicate field names in `RowType`, which throws:
   
   ```
   java.lang.IllegalArgumentException: Field names must be unique. Found 
duplicates: [valid_date]
       at 
org.apache.flink.cdc.common.types.RowType.validateFields(RowType.java:158)
       at 
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processElement(...)
   ```
   
   ### Root Cause
   
   `SchemaUtils.applyAddColumnEvent()` blindly adds columns without checking if 
a column with the same name already exists. While 
`isSchemaChangeEventRedundant()` exists as a utility, 
`PreTransformOperator.cacheChangeSchema()` does not call it before applying 
schema changes.
   
   ### Fix
   
   - Added an idempotency check in `SchemaUtils.applyAddColumnEvent()` to skip 
columns that already exist in the schema
   - This is the most defensive fix location since it protects **all** callers 
of `applySchemaChangeEvent()`, not just `PreTransformOperator`
   - Also maintains `existingColumnNames` set across iterations for correctness 
when a single event adds multiple columns
   
   ### Changes
   
   - `SchemaUtils.java`: Added duplicate column name check before adding columns
   - `SchemaUtilsTest.java`: Added test cases for duplicate `AddColumnEvent` in 
both `LAST` and `FIRST` positions
   
   ## Test plan
   
   - [x] Added unit tests for duplicate `AddColumnEvent` with `LAST` position
   - [x] Added unit tests for duplicate `AddColumnEvent` with `FIRST` position
   - [x] All existing `SchemaUtilsTest` tests pass (5/5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to