[
https://issues.apache.org/jira/browse/FLINK-38110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021801#comment-18021801
]
haiqingchen commented on FLINK-38110:
-------------------------------------
[~ouyangwuli] could you help review the pull request?
https://github.com/apache/flink-cdc/pull/4128
> PostgreSQL connector reads Chinese columns with garbled characters
> ------------------------------------------------------------------
>
> Key: FLINK-38110
> URL: https://issues.apache.org/jira/browse/FLINK-38110
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.4.0
> Reporter: haiqingchen
> Priority: Minor
> Labels: pull-request-available
> Attachments: image-2025-07-17-14-53-02-657.png
>
>
> When there's column name in Chinese in PG tables, Postgresql connector with
> pgoutput plugin will decode them as garbled characters, especially during
> incremental capure.
> The reason is when handling column names and table names,
> io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder
> doesn't convert the String to utf8 charset,
> {code:java}
> private static String readString(ByteBuffer buffer) {
> StringBuilder sb = new StringBuilder();
> boolean var2 = false;
> byte b;
> while((b = buffer.get()) != 0) {
> sb.append((char)b);
> }
> return sb.toString();
> } {code}
> while when it handle column value, it will convert the string into utf8
> charset.
> {code:java}
> private static String readColumnValueAsString(ByteBuffer buffer) {
> int length = buffer.getInt();
> byte[] value = new byte[length];
> buffer.get(value, 0, length);
> return new String(value, Charset.forName("UTF-8"));
> } {code}
> My solution is
> copy PgOutputMessageDecoder from debezium and fix the readString to reading
> utf8 string
--
This message was sent by Atlassian Jira
(v8.20.10#820010)