Xiqian YU created FLINK-35102: --------------------------------- Summary: Incorret Type mapping for Flink CDC Doris connector Key: FLINK-35102 URL: https://issues.apache.org/jira/browse/FLINK-35102 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: Xiqian YU
According to Flink CDC Doris connector docs, CHAR and VARCHAR are mapped to 3-bytes since Doris uses UTF-8 variable-length encoding internally. |CHAR(n)|CHAR(n*3)|In Doris, strings are stored in UTF-8 encoding, so English characters occupy 1 byte and Chinese characters occupy 3 bytes. The length here is multiplied by 3. The maximum length of CHAR is 255. Once exceeded, it will automatically be converted to VARCHAR type.| |VARCHAR(n)|VARCHAR(n*3)|Same as above. The length here is multiplied by 3. The maximum length of VARCHAR is 65533. Once exceeded, it will automatically be converted to STRING type.| However, currently Doris connector maps `CHAR(n)` to `CHAR(n)` and `VARCHAR(n)` to `VARCHAR(n * 4)`, which is inconsistent with specification in docs. -- This message was sent by Atlassian Jira (v8.20.10#820010)