tchivs opened a new pull request, #4085:
URL: https://github.com/apache/flink-cdc/pull/4085

   
   ## Description
   
   ### What is the purpose of the change
   This PR fixes inconsistent PostgreSQL bit type mapping between snapshot and 
incremental phases in Flink CDC connectors. The issue caused schema evolution 
failures and type mismatches when processing tables with `BIT`, `BIT(n)`, and 
`VARBIT` fields.
   
   ### Brief change log
   - **Enhanced PostgresTypeUtils**: Added comprehensive bit type support in 
both source and pipeline connectors
   - **Consistent Type Mapping**: Implemented BIT(1) → BOOLEAN, BIT(n) → BYTES, 
VARBIT → BYTES mappings
   - **Extended Test Coverage**: Added bit type testing to existing test suites 
and schemas
   - **Schema Evolution Fix**: Ensured consistent type mapping across snapshot 
and streaming phases
   - **Array Support**: Added proper handling for bit arrays (BIT[], VARBIT[])
   
   ### Verifying this change
   This change is verified by:
   - ✅ **Unit tests** for PostgresTypeUtils with all bit type variants
   - ✅ **Integration tests** covering both snapshot and streaming phases  
   - ✅ **Schema evolution tests** ensuring cross-phase consistency
   - ✅ **Edge case tests** for NULL values, arrays, and variable-length bit 
strings
   - ✅ **Regression tests** confirming existing functionality remains intact
   
   ### Does this pull request potentially affect one of the following parts:
   - Dependencies (does it add or upgrade a dependency): **No**
   - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **No**
   - The serializers: **No**  
   - The runtime per-record code paths: **No**
   - Anything that affects deployment or recovery: **No**
   - The S3 file layout: **No**
   - Experimental features: **No**
   
   ### Documentation
   - Added comprehensive inline documentation for bit type mapping logic
   - Updated test documentation with bit type conversion behavior
   - Added debugging test method documenting VARBIT conversion specifics
   
   ## Problem Statement
   
   ### Issue Description
   PostgreSQL bit types (`BIT`, `BIT(n)`, `VARBIT`) were handled inconsistently 
between snapshot and streaming phases:
   
   1. **Type Mapping Inconsistency**: 
      - `BIT(1)` sometimes mapped to `BOOLEAN`, sometimes to `BYTES`
      - `BIT(n)` lacked proper `BYTES` mapping support
      - `VARBIT` caused `UnsupportedOperationException`
   
   2. **Schema Evolution Failures**:
      ```java
      SchemaEvolveException: Incompatible schema change detected: 
      field type changed from BOOLEAN to BYTES
      ```
   
   3. **Missing Type Support**:
      ```java
      UnsupportedOperationException: Doesn't support Postgres type 'bit' yet
      ```
   
   ### Root Cause Analysis
   - Missing bit type constants in PostgresTypeUtils
   - Incomplete type mapping logic for bit variants
   - Different code paths for snapshot vs streaming type resolution
   - Lack of comprehensive test coverage for bit types
   
   ## Solution Overview
   
   ### 1. Enhanced Type Mapping
   ```java
   // Added consistent mapping rules
   case PG_BIT:
       if (precision == 1) {
           return DataTypes.BOOLEAN();  // BIT(1) → BOOLEAN
       } else {
           return DataTypes.BYTES();    // BIT(n) → BYTES
       }
   case PG_VARBIT:
       return DataTypes.BYTES();        // VARBIT → BYTES
   ```
   
   ### 2. Comprehensive Type Constants
   ```java
   private static final String PG_BIT = "bit";
   private static final String PG_BIT_ARRAY = "_bit";
   private static final String PG_VARBIT = "varbit";
   private static final String PG_VARBIT_ARRAY = "_varbit";
   ```
   
   ### 3. Cross-Connector Consistency
   - Applied identical changes to both source and pipeline connectors
   - Ensured snapshot and streaming phases use same type resolution logic
   - Maintained backward compatibility for existing deployments
   
   ## Testing Strategy
   
   ### 1. Unit Tests
   - **PostgresTypeUtils**: Verified type mapping for all bit variants
   - **Edge Cases**: NULL values, precision boundaries, array types
   - **Consistency**: Cross-connector type mapping validation
   
   ### 2. Integration Tests  
   - **PostgresFullTypesITCase**: Added comprehensive bit type testing
   - **PostgresNumericZeroSourceITCase**: Extended with bit type validation
   - **Schema Evolution**: Verified consistency between phases
   
   ### 3. Test Data
   ```sql
   -- Added to column_type_test.sql
   bit_c               BIT,            -- BIT(1) → BOOLEAN
   bit8_c              BIT(8),         -- BIT(8) → BYTES  
   bit16_c             BIT(16),        -- BIT(16) → BYTES
   varbit_c            VARBIT(32),     -- VARBIT → BYTES
   
   -- Test values
   B'1', B'10101010', B'1010101010101010', B'11110000111100001111'
   ```
   
   ## Impact Assessment
   
   ### Positive Impact
   - ✅ **Reliability**: Eliminates schema evolution failures for bit types
   - ✅ **Compatibility**: Expands PostgreSQL type support coverage
   - ✅ **Consistency**: Unified behavior across all processing phases
   - ✅ **Maintainability**: Comprehensive test coverage prevents regressions
   
   ### Risk Mitigation
   - **Backward Compatibility**: All changes are additive and non-breaking
   - **Performance**: Minimal overhead (only type checking logic)
   - **Rollback Safety**: Changes can be safely reverted without data loss
   - **Isolation**: Only affects tables with bit types
   
   ## Verification Results
   
   ### Before Fix
   ```bash
   # Schema evolution error
   FAILED: SchemaEvolveException: field type changed from BOOLEAN to BYTES
   
   # Type support error  
   FAILED: UnsupportedOperationException: Doesn't support Postgres type 'bit'
   ```
   
   ### After Fix
   ```bash
   # All tests pass
   ✅ testFullTypes() - bit types processed correctly
   ✅ testBitTypesHandling() - consistent type mapping verified
   ✅ testVarbitConversionDebugging() - conversion behavior documented
   ✅ All existing tests continue to pass
   ```
   
   ## Reviewer Checklist
   - [ ] **Type Mapping Logic**: Verify consistency between source and pipeline 
connectors
   - [ ] **Test Coverage**: Confirm adequate testing for all bit type variants  
   - [ ] **Backward Compatibility**: Ensure existing functionality remains 
unaffected
   - [ ] **Schema Evolution**: Validate cross-phase type consistency
   - [ ] **Performance**: Confirm minimal overhead introduction
   - [ ] **Documentation**: Review inline comments and test documentation
   
   ## Related Issues
   - **Fixes**: [FLINK-35907](https://issues.apache.org/jira/browse/FLINK-35907)
   - **Related**: 
[FLINK-38196](https://issues.apache.org/jira/browse/FLINK-38196) (numeric type 
improvements)
   
   ## Future Considerations
   - Monitor for additional PostgreSQL type compatibility issues
   - Consider extending bit type support to other PostgreSQL-compatible 
databases
   - Evaluate need for custom bit type serialization optimizations
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to