dangzitou opened a new pull request, #4428: URL: https://github.com/apache/flink-cdc/pull/4428
## What this PR does Replaces platform-default `String.getBytes()` with `String.getBytes(StandardCharsets.UTF_8)` across multiple modules to ensure consistent encoding behavior regardless of JVM locale or OS configuration. ### Affected files - `SchemaMergingUtils.java` + test — core schema coercion - `DebeziumJsonSerializationSchema.java` — Kafka Debezium JSON default value handling - `RowDataTiKVEventDeserializationSchemaBase.java` — TiDB source connector - `BinaryTypeReturningClass.java` / `VarBinaryTypeReturningClass.java` — UDF examples ### Why `String.getBytes()` without an explicit charset uses the JVM's default charset, which varies across environments (e.g., `US-ASCII` on some minimal Docker images, `GBK` on Chinese Windows). This causes silent data corruption when non-ASCII characters are involved. Using `UTF-8` explicitly makes the behavior deterministic. ## Testing These are straightforward defensive improvements — the fix ensures consistent UTF-8 encoding regardless of JVM locale. --- Split from #4427 as requested by @yuxiqian. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
