Max Gekk created SPARK-42873:
--------------------------------

             Summary: Define Spark SQL types as keywords
                 Key: SPARK-42873
                 URL: https://issues.apache.org/jira/browse/SPARK-42873
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Max Gekk
            Assignee: Max Gekk


Currently, Spark SQL defines primitive types as:

 
{code}
| identifier (LEFT_PAREN INTEGER_VALUE
  (COMMA INTEGER_VALUE)* RIGHT_PAREN)?                      #primitiveDataType
{code}
where identifier is parsed later by visitPrimitiveDataType():

{code:scala}
  override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType 
= withOrigin(ctx) {
    val dataType = ctx.identifier.getText.toLowerCase(Locale.ROOT)
    (dataType, ctx.INTEGER_VALUE().asScala.toList) match {
      case ("boolean", Nil) => BooleanType
      case ("tinyint" | "byte", Nil) => ByteType
      case ("smallint" | "short", Nil) => ShortType
      case ("int" | "integer", Nil) => IntegerType
      case ("bigint" | "long", Nil) => LongType
      case ("float" | "real", Nil) => FloatType
...
{code}

So, the types are not Spark SQL keywords, and this causes some inconveniences 
while analysing/transforming the lexer tree. For example, while forming the 
stable column aliases.

Need to define Spark SQL types in SqlBaseLexer.g4.

Also, typed literals have the same issue. The types "DATE", "TIMESTAMP_NTZ", 
"TIMESTAMP", "TIMESTAMP_LTZ", "INTERVAL", and "X" should be defined as base 
lexer tokens. 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to