[ 
https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611913#comment-16611913
 ] 

Fabian Hueske commented on FLINK-10257:
---------------------------------------

Thanks for looking into this and driving it forward [~pnowojski]!

> Incorrect CHAR type support in Flink SQL and Table API
> ------------------------------------------------------
>
>                 Key: FLINK-10257
>                 URL: https://issues.apache.org/jira/browse/FLINK-10257
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>            Reporter: Piotr Nowojski
>            Assignee: Hequn Cheng
>            Priority: Critical
>
> Despite that we officially do not support CHAR type, this type is visible and 
> accessible for the users. First of all, string literals have default type of 
> CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.
> Problem is that we do not support CHAR correctly. We mishandle it in:
>  # comparisons and functions
>  # writing values to sinks
> According to SQL standard (and as almost all of the other databases do), CHAR 
> comparisons should ignore white spaces. On the other hand, functions like 
> {{CONCAT}} or {{LENGTH}} shouldn't: 
> [http://troels.arvin.dk/db/rdbms/#data_types-char] .
> Currently in In Flink we completely ignore those rules. Sometimes we store 
> internally CHAR with padded spaces sometimes without. This results with semi 
> random behaviour with respect to comparisons/functions/writing to sinks. For 
> example following query:
> {code:java}
> tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM 
> sourceTable").insertInto("targetTable")
> env.execute()
> {code}
> Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi", 
> "Hello", "Hello world", writes to sink not padded strings (correctly), but 
> following query:
> {code:java}
> tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) 
> WHERE c = 'Hi'")
>   .insertInto("targetTable")
> env.execute(){code}
> Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}} 
> is a NOOP in Flink, while 'Hi' constant handed by Calcite to us will be 
> padded with 8 spaces.
> On the other hand following query produces strings padded with spaces:
> {code:java}
> tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE 
> 'this should not happen' END FROM sourceTable")
>   .insertInto("targetTable")
> env.execute()
> val expected = Seq(
>   "GERMANY",
>   "POLAND",
>   "POLAND").mkString("\n")
> org.junit.ComparisonFailure: Different elements in arrays: expected 3 
> elements and received 3
> expected: [GERMANY, POLAND, POLAND]
> received: [GERMANY , POLAND , POLAND ]
> {code}
> To make matter even worse, Calcite's constant folding correctly performs 
> comparisons, while if same comparisons are performed by Flink, they yield 
> different results. In other words in SQL:
> {code:java}
> SELECT 'POLAND' = 'POLAND    '
> {code}
> return true, but same expression performed on columns
> {code:java}
> SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM 
> countries{code}
> returns false.
> To further complicated things, in SQL our string literals have {{CHAR}} type, 
> while in Table API our literals have String type (effectively {{VARCHAR}}) 
> making results inconsistent between those two APIs.
>  
> CC [~twalthr] [~fhueske] [~hequn8128]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to