[ https://issues.apache.org/jira/browse/FLINK-35102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leonard Xu updated FLINK-35102: ------------------------------- Fix Version/s: cdc-3.1.0 > Incorret Type mapping for Flink CDC Doris connector > --------------------------------------------------- > > Key: FLINK-35102 > URL: https://issues.apache.org/jira/browse/FLINK-35102 > Project: Flink > Issue Type: Bug > Components: Flink CDC > Reporter: Xiqian YU > Assignee: Xiqian YU > Priority: Major > Labels: pull-request-available > Fix For: cdc-3.1.0 > > > According to Flink CDC Doris connector docs, CHAR and VARCHAR are mapped to > 3-bytes since Doris uses UTF-8 variable-length encoding internally. > |CHAR(n)|CHAR(n*3)|In Doris, strings are stored in UTF-8 encoding, so English > characters occupy 1 byte and Chinese characters occupy 3 bytes. The length > here is multiplied by 3. The maximum length of CHAR is 255. Once exceeded, it > will automatically be converted to VARCHAR type.| > |VARCHAR(n)|VARCHAR(n*3)|Same as above. The length here is multiplied by 3. > The maximum length of VARCHAR is 65533. Once exceeded, it will automatically > be converted to STRING type.| > However, currently Doris connector maps `CHAR(n)` to `CHAR(n)` and > `VARCHAR(n)` to `VARCHAR(n * 4)`, which is inconsistent with specification in > docs. -- This message was sent by Atlassian Jira (v8.20.10#820010)