Hi, https://issues.apache.org/jira/browse/IMPALA-3942
Do we need to distinghish between single quoted string literal and double quoted string literal when the frontend builds an AST? I was worried about the solution to this problem in IMPALA-3942 and this question came up. 1. Symptom create table t1 (original string); insert into t1 values('That\\\'s it!'); create view v1 as select regexp_replace(original, "\\\\'","'") as replaced, * from t1; select * from v1; -- parse error internally 2. Cause I think the root cause is the conversion of double quoted string literal(i.e. "\\\\'" and "'") to single quoted string literal while generating a query string using AST. The re-generated select query should have syntatical error. Please see the result of "show create table v1". "create view v2" query works on Hive because Hive keeps double quoted string literal, so it works fine as below. hive> show create table v1; CREATE VIEW `v1` AS SELECT regexp_replace(original, '\\\\'', ''') replaced, * FROM jc.t1 hive> show create table v2; CREATE VIEW `v2` AS select regexp_replace(`t1`.`original`, "\\\\'","'") as `replaced`, `t1`.`original` from `jc`.`t1` 3. (Possible) Solution I am not sure the approache makes any side effect. Do you think this approach is valid? My initial idea is to keep distinguishable information for single/double quote string literals. StringLiteral class can have a bolean flag either single or double quote. When toSql* is invoked, quote style is determined by the flag. Currently our lexical analyzer just keeps the string literal only. In sql-scanner.flex, SingleQuoteStringLiteral = \'(\\.|[^\\\'])*\' DoubleQuoteStringLiteral = \"(\\.|[^\\\"])*\" {SingleQuoteStringLiteral} { return newToken(SqlParserSymbols.STRING_LITERAL, yytext().substring(1, yytext().length()-1)); } {DoubleQuoteStringLiteral} { return newToken(SqlParserSymbols.STRING_LITERAL, yytext().substring(1, yytext().length()-1)); } 4. Further question Most of RDBMSes supports only single quoted string literal(not double quote). By the way, Hive supports the both and it makes some problems such as migration issue, different behavior and so on. Why does Impala support this feature also? Just for more compatibility with Hive? Or other reason? I found an article "Hive: Allows Single and Double Quotes Interchangeably". The author said "do not use double quote". What do you think about that? http://www.thedatastudio.net/hive_flexible_quotes.htm Best regards, Jinchul