[ 
https://issues.apache.org/jira/browse/FLINK-38110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiqingchen updated FLINK-38110:
--------------------------------
    Description: 
When there's column name in Chinese in PG tables, Postgresql connector with 
pgoutput plugin will decode them as garbled characters, especially during 
incremental capure.

The reason is when handling column names and table names,

io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder

doesn't convert the String to utf8 charset,
{code:java}
private static String readString(ByteBuffer buffer) {
    StringBuilder sb = new StringBuilder();
    boolean var2 = false;

    byte b;
    while((b = buffer.get()) != 0) {
        sb.append((char)b);
    }

    return sb.toString();
} {code}
while when it handle column value,  it will convert the string into utf8 
charset.
{code:java}
private static String readColumnValueAsString(ByteBuffer buffer) {
    int length = buffer.getInt();
    byte[] value = new byte[length];
    buffer.get(value, 0, length);
    return new String(value, Charset.forName("UTF-8"));
} {code}

How to fix this: 
copied PgOutputMessageDecoder from debezium and fix the readString to reading 
utf8 string


  was:
When there's column name in Chinese in PG tables, Postgresql connector with 
pgoutput plugin will decode them as garbled characters, especially during 
incremental capure.

The reason is when handling column names and table names,

io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder

doesn't convert the String to utf8 charset,
{code:java}

private static String readString(ByteBuffer buffer) {
    StringBuilder sb = new StringBuilder();
    boolean var2 = false;

    byte b;
    while((b = buffer.get()) != 0) {
        sb.append((char)b);
    }

    return sb.toString();
} {code}
while when it handle column value,  it will convert the string into utf8 
charset.
{code:java}
private static String readColumnValueAsString(ByteBuffer buffer) {
    int length = buffer.getInt();
    byte[] value = new byte[length];
    buffer.get(value, 0, length);
    return new String(value, Charset.forName("UTF-8"));
} {code}


> PostgreSQL connector reads Chinese columns with garbled characters
> ------------------------------------------------------------------
>
>                 Key: FLINK-38110
>                 URL: https://issues.apache.org/jira/browse/FLINK-38110
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>    Affects Versions: cdc-3.4.0
>            Reporter: haiqingchen
>            Priority: Minor
>
> When there's column name in Chinese in PG tables, Postgresql connector with 
> pgoutput plugin will decode them as garbled characters, especially during 
> incremental capure.
> The reason is when handling column names and table names,
> io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder
> doesn't convert the String to utf8 charset,
> {code:java}
> private static String readString(ByteBuffer buffer) {
>     StringBuilder sb = new StringBuilder();
>     boolean var2 = false;
>     byte b;
>     while((b = buffer.get()) != 0) {
>         sb.append((char)b);
>     }
>     return sb.toString();
> } {code}
> while when it handle column value,  it will convert the string into utf8 
> charset.
> {code:java}
> private static String readColumnValueAsString(ByteBuffer buffer) {
>     int length = buffer.getInt();
>     byte[] value = new byte[length];
>     buffer.get(value, 0, length);
>     return new String(value, Charset.forName("UTF-8"));
> } {code}
> How to fix this: 
> copied PgOutputMessageDecoder from debezium and fix the readString to reading 
> utf8 string



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to