Xiqian YU created FLINK-35102:
---------------------------------

             Summary: Incorret Type mapping for Flink CDC Doris connector
                 Key: FLINK-35102
                 URL: https://issues.apache.org/jira/browse/FLINK-35102
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
            Reporter: Xiqian YU


According to Flink CDC Doris connector docs, CHAR and VARCHAR are mapped to 
3-bytes since Doris uses UTF-8 variable-length encoding internally.
|CHAR(n)|CHAR(n*3)|In Doris, strings are stored in UTF-8 encoding, so English 
characters occupy 1 byte and Chinese characters occupy 3 bytes. The length here 
is multiplied by 3. The maximum length of CHAR is 255. Once exceeded, it will 
automatically be converted to VARCHAR type.|
|VARCHAR(n)|VARCHAR(n*3)|Same as above. The length here is multiplied by 3. The 
maximum length of VARCHAR is 65533. Once exceeded, it will automatically be 
converted to STRING type.|

However, currently Doris connector maps `CHAR(n)` to `CHAR(n)` and `VARCHAR(n)` 
to `VARCHAR(n * 4)`, which is inconsistent with specification in docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to