Piotr Nowojski created FLINK-10257:
--------------------------------------

             Summary: Incorrect CHAR type support in Flink SQL and Table API
                 Key: FLINK-10257
                 URL: https://issues.apache.org/jira/browse/FLINK-10257
             Project: Flink
          Issue Type: Bug
          Components: Table API & SQL
            Reporter: Piotr Nowojski


Despite that we officially do not support CHAR type, this type is visible and 
accessible for the users. First of all, string literals have default type of 
CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.

Problem is that we do not support CHAR correctly. We mishandle it in:
 # comparisons and functions
 # writing values to sinks

According to SQL standard (and as almost all of the other databases do), CHAR 
comparisons should ignore white spaces. On the other hand, functions like 
{{CONCAT}} or {{LENGTH}} shouldn't: 
[http://troels.arvin.dk/db/rdbms/#data_types-char] .

Currently in In Flink we completely ignore those rules. Sometimes we store 
internally CHAR with padded spaces sometimes without. This results with semi 
random behaviour with respect to comparisons/functions/writing to sinks. For 
example following query:

 
{code:java}
tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM 
sourceTable").insertInto("targetTable")
env.execute()
{code}
Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi", 
"Hello", "Hello world", writes to sink not padded strings (correctly), but 
following query incorrectly

 
{code:java}
tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) 
WHERE c = 'Hi'")
  .insertInto("targetTable")
env.execute(){code}
Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}} is 
a NOOP in Flink, while 'Hi' constant handled by Calcite to us will be padded 
with 8 spaces.

On the other hand following query produces strings padded with spaces:

 
{code:java}
tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE 
'this should not happen' END FROM sourceTable")
  .insertInto("targetTable")
env.execute()

val expected = Seq(
  "GERMANY",
  "POLAND",
  "POLAND").mkString("\n")

org.junit.ComparisonFailure: Different elements in arrays: expected 3 elements 
and received 3
expected: [GERMANY, POLAND, POLAND]
received: [GERMANY , POLAND , POLAND ]
{code}
 

To make matter even worse, Calcite's constant folding correctly performs 
comparisons, while if same comparisons are performed by Flink, they yield 
different results. In other words in SQL:
{code:java}
SELECT 'POLAND' = 'POLAND  '
{code}
return true, but same expression performed on columns
{code:java}
SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM 
countries{code}
returns false.

To further complicated things, in SQL our string literals have {{CHAR}} type, 
while in Table API our literals have String type (effectively {{VARCHAR}}) 
making results inconsistent between those two APIs.

 

CC [~twalthr] [~fhueske] [~hequn8128]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to