[ 
https://issues.apache.org/jira/browse/SPARK-25722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25722:
----------------------------------
    Description: 
Among built-in data sources, `avro` and `orc` doesn't allow `backtick` in 
column names. We had better be consistent if possible.
 * Option 1: Support a backtick character
 * Option 2: Disallow a backtick character (This may be considered as a 
regression at TEXT/CSV/JSON/Parquet)

 So, Option 1 is better.

*TEXT*, *CSV*, *JSON*, *PARQUET*
{code:java}
Seq("text", "csv", "json", "parquet").foreach { format =>
  Seq("1").toDF("`").write.mode("overwrite").format(format).save("/tmp/t")
}{code}
*AVRO*
{code:java}
scala> Seq("1").toDF("`").write.mode("overwrite").format("avro").save("/tmp/t")
org.apache.avro.SchemaParseException: Illegal initial character: `{code}
*ORC (native)*
{code:java}
scala> Seq("1").toDF("`").write.mode("overwrite").format("orc").save("/tmp/t")
java.lang.IllegalArgumentException: Unmatched quote at 
'struct<^```:string>'{code}
*ORC (hive)*
{code:java}
scala> Seq("1").toDF("`").write.mode("overwrite").format("orc").save("/tmp/t")
java.lang.IllegalArgumentException: Error: name expected at the position 7 of 
'struct<`:string>' but '`' is found.{code}
 

  was:
Among built-in data sources, `avro` and `orc` doesn't allow `backtick` in 
column names. We had better be consistent if possible.
 * Option 1: Support a backtick character
 * Option 2: Disallow a backtick character (This may be considered as a 
regression at TEXT/CSV/JSON/Parquet)

 So, Option 1 is better.

*TEXT*, *CSV*, *JSON*, *PARQUET*
{code:java}
Seq("text", "csv", "json", "parquet").foreach { format =>
  Seq("1").toDF("`").write.mode("overwrite").format(format).save("/tmp/t")
}{code}
*AVRO*
{code:java}
scala> Seq("1").toDF("`").write.mode("overwrite").format("avro").save("/tmp/t")
org.apache.avro.SchemaParseException: Illegal initial character: `{code}
*ORC*
{code:java}
scala> Seq("1").toDF("`").write.mode("overwrite").format("orc").save("/tmp/t")
java.lang.IllegalArgumentException: Unmatched quote at 
'struct<^```:string>'{code}


> Support a backtick character in column names
> --------------------------------------------
>
>                 Key: SPARK-25722
>                 URL: https://issues.apache.org/jira/browse/SPARK-25722
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dongjoon Hyun
>            Priority: Minor
>
> Among built-in data sources, `avro` and `orc` doesn't allow `backtick` in 
> column names. We had better be consistent if possible.
>  * Option 1: Support a backtick character
>  * Option 2: Disallow a backtick character (This may be considered as a 
> regression at TEXT/CSV/JSON/Parquet)
>  So, Option 1 is better.
> *TEXT*, *CSV*, *JSON*, *PARQUET*
> {code:java}
> Seq("text", "csv", "json", "parquet").foreach { format =>
>   Seq("1").toDF("`").write.mode("overwrite").format(format).save("/tmp/t")
> }{code}
> *AVRO*
> {code:java}
> scala> 
> Seq("1").toDF("`").write.mode("overwrite").format("avro").save("/tmp/t")
> org.apache.avro.SchemaParseException: Illegal initial character: `{code}
> *ORC (native)*
> {code:java}
> scala> Seq("1").toDF("`").write.mode("overwrite").format("orc").save("/tmp/t")
> java.lang.IllegalArgumentException: Unmatched quote at 
> 'struct<^```:string>'{code}
> *ORC (hive)*
> {code:java}
> scala> Seq("1").toDF("`").write.mode("overwrite").format("orc").save("/tmp/t")
> java.lang.IllegalArgumentException: Error: name expected at the position 7 of 
> 'struct<`:string>' but '`' is found.{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to