[ https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611913#comment-16611913 ]
Fabian Hueske commented on FLINK-10257: --------------------------------------- Thanks for looking into this and driving it forward [~pnowojski]! > Incorrect CHAR type support in Flink SQL and Table API > ------------------------------------------------------ > > Key: FLINK-10257 > URL: https://issues.apache.org/jira/browse/FLINK-10257 > Project: Flink > Issue Type: Bug > Components: Table API & SQL > Reporter: Piotr Nowojski > Assignee: Hequn Cheng > Priority: Critical > > Despite that we officially do not support CHAR type, this type is visible and > accessible for the users. First of all, string literals have default type of > CHAR in SQL. Secondly users can always cast expressions/columns to CHAR. > Problem is that we do not support CHAR correctly. We mishandle it in: > # comparisons and functions > # writing values to sinks > According to SQL standard (and as almost all of the other databases do), CHAR > comparisons should ignore white spaces. On the other hand, functions like > {{CONCAT}} or {{LENGTH}} shouldn't: > [http://troels.arvin.dk/db/rdbms/#data_types-char] . > Currently in In Flink we completely ignore those rules. Sometimes we store > internally CHAR with padded spaces sometimes without. This results with semi > random behaviour with respect to comparisons/functions/writing to sinks. For > example following query: > {code:java} > tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM > sourceTable").insertInto("targetTable") > env.execute() > {code} > Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi", > "Hello", "Hello world", writes to sink not padded strings (correctly), but > following query: > {code:java} > tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) > WHERE c = 'Hi'") > .insertInto("targetTable") > env.execute(){code} > Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}} > is a NOOP in Flink, while 'Hi' constant handed by Calcite to us will be > padded with 8 spaces. > On the other hand following query produces strings padded with spaces: > {code:java} > tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE > 'this should not happen' END FROM sourceTable") > .insertInto("targetTable") > env.execute() > val expected = Seq( > "GERMANY", > "POLAND", > "POLAND").mkString("\n") > org.junit.ComparisonFailure: Different elements in arrays: expected 3 > elements and received 3 > expected: [GERMANY, POLAND, POLAND] > received: [GERMANY , POLAND , POLAND ] > {code} > To make matter even worse, Calcite's constant folding correctly performs > comparisons, while if same comparisons are performed by Flink, they yield > different results. In other words in SQL: > {code:java} > SELECT 'POLAND' = 'POLAND ' > {code} > return true, but same expression performed on columns > {code:java} > SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM > countries{code} > returns false. > To further complicated things, in SQL our string literals have {{CHAR}} type, > while in Table API our literals have String type (effectively {{VARCHAR}}) > making results inconsistent between those two APIs. > > CC [~twalthr] [~fhueske] [~hequn8128] -- This message was sent by Atlassian JIRA (v7.6.3#76005)