Tejas Patil created SPARK-17741:
-----------------------------------

             Summary: Grammar to parse top level and nested data fields 
separately
                 Key: SPARK-17741
                 URL: https://issues.apache.org/jira/browse/SPARK-17741
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Tejas Patil
            Priority: Trivial


Based on discussion over the dev list:

{noformat}
Is there any reason why Spark SQL supports "<column name>" ":" "<data type>" 
while specifying columns ?
eg. sql("CREATE TABLE t1 (column1:INT)") works fine. 
Here is relevant snippet in the grammar [0]:

```
colType
    : identifier ':'? dataType (COMMENT STRING)?
    ;
```

I do not see MySQL[1], Hive[2], Presto[3] and PostgreSQL [4] supporting ":" 
while specifying columns.
They all use space as a delimiter.

[0] : 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L596
[1] : http://dev.mysql.com/doc/refman/5.7/en/create-table.html
[2] : 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
[3] : https://prestodb.io/docs/current/sql/create-table.html
[4] : https://www.postgresql.org/docs/9.1/static/sql-createtable.html
{noformat}

Herman's response:

{noformat}
This is because we use the same rule to parse top level and nested data fields. 
For example:

create table tbl_x(
  id bigint,
  nested struct<col1:string,col2:string>
)

Shows both syntaxes. We should split this rule in a top-level and nested rule.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to