[jira] [Commented] (SPARK-30249) Invalid Column Names in parquet tables should not be allowed

2019-12-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997774#comment-16997774
 ] 

Hyukjin Kwon commented on SPARK-30249:
--

It seems to be valid in Parquet:

{code}
scala> Seq(1).toDF("a:b").write.parquet("/tmp/foo")

scala> spark.read.parquet("/tmp/foo").show()
+---+
|a:b|
+---+
|  1|
+---+
{code}

> Invalid Column Names in parquet tables should not be allowed
> 
>
> Key: SPARK-30249
> URL: https://issues.apache.org/jira/browse/SPARK-30249
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rakesh Raushan
>Priority: Minor
>
> Column names such as  `a:b` , `??`, `,,`, `^^` , `++`etc are allowed when we 
> are creating parquet tables.
> While when we are creating tables with `orc` all such column names are marked 
> as invalid and analysis exception is thrown.
> These column names should also be not allowed for parquet tables as well.
> Also this induces inconsistency between column names for Parquet and ORC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30249) Invalid Column Names in parquet tables should not be allowed

2019-12-13 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996044#comment-16996044
 ] 

Dongjoon Hyun commented on SPARK-30249:
---

I believe it's prevented because ORC format doesn't support that.

When you use those column in Parquet file, does Parquet table work incorrectly?

I didn't test it, but It might be a valid format in Parquet file format.

> Invalid Column Names in parquet tables should not be allowed
> 
>
> Key: SPARK-30249
> URL: https://issues.apache.org/jira/browse/SPARK-30249
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rakesh Raushan
>Priority: Minor
>
> Column names such as  `a:b` , `??`, `,,`, `^^` , `++`etc are allowed when we 
> are creating parquet tables.
> While when we are creating tables with `orc` all such column names are marked 
> as invalid and analysis exception is thrown.
> These column names should also be not allowed for parquet tables as well.
> Also this induces inconsistency between column names for Parquet and ORC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org