[ https://issues.apache.org/jira/browse/FLINK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-15573: ----------------------------------- Labels: auto-closed (was: stale-minor) > Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its > default charset > --------------------------------------------------------------------------------------------- > > Key: FLINK-15573 > URL: https://issues.apache.org/jira/browse/FLINK-15573 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner > Reporter: Lsw_aka_laplace > Priority: Minor > Labels: auto-closed > Attachments: image-2020-01-15-21-49-19-373.png > > > UPDATE: > Flink now uses Calcite for SQL planner, Calcite currently only support > ISO8859-1 charset and the charset cannot be configured also. But even so, > from my perspective, we still need to change the > PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also > cannot meet. > Considering about the implementation, PlannerExpressionParserImpl uses the > Scala native parser tool, which reads and consumes `scala.Char`(or just > regard it as java char type). For us, concerning only about char type is > enough, which means on the implementation, in this case, we don‘t even care > about the charset problem, leading to *A simple and backwards compatible > solution*. > The implementation almost the same as picture below indicates. Actually I > have made this change in my company specific branch and deployed it. It works > well~ > > ************************************************************************************** > Now I am talking about the `PlannerExpressionParserImpl` > For now the fieldRefrence‘s charset is JavaIdentifier,why not change it > to UnicodeIdentifier? > Currently in my team, we do actually have this problem. For instance, > data from Es always contains `@timestamp` field , which JavaIdentifier can > not meet. So what we did is just let the fieldRefrence Charset use Unicode > > {code:scala} > lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace > rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" > + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: > Char))) ^^ (.mkString) ) > lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = > (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }{code} > > It is simple but really makes sense~ > > mysql supports unicode ,see the picture below , field called `@@` > !image-2020-01-15-21-49-19-373.png! > Looking forward for any opinion > -- This message was sent by Atlassian Jira (v8.3.4#803005)