[jira] [Commented] (SPARK-30249) Invalid Column Names in parquet tables should not be allowed
[ https://issues.apache.org/jira/browse/SPARK-30249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997774#comment-16997774 ] Hyukjin Kwon commented on SPARK-30249: -- It seems to be valid in Parquet: {code} scala> Seq(1).toDF("a:b").write.parquet("/tmp/foo") scala> spark.read.parquet("/tmp/foo").show() +---+ |a:b| +---+ | 1| +---+ {code} > Invalid Column Names in parquet tables should not be allowed > > > Key: SPARK-30249 > URL: https://issues.apache.org/jira/browse/SPARK-30249 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Rakesh Raushan >Priority: Minor > > Column names such as `a:b` , `??`, `,,`, `^^` , `++`etc are allowed when we > are creating parquet tables. > While when we are creating tables with `orc` all such column names are marked > as invalid and analysis exception is thrown. > These column names should also be not allowed for parquet tables as well. > Also this induces inconsistency between column names for Parquet and ORC -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30249) Invalid Column Names in parquet tables should not be allowed
[ https://issues.apache.org/jira/browse/SPARK-30249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996044#comment-16996044 ] Dongjoon Hyun commented on SPARK-30249: --- I believe it's prevented because ORC format doesn't support that. When you use those column in Parquet file, does Parquet table work incorrectly? I didn't test it, but It might be a valid format in Parquet file format. > Invalid Column Names in parquet tables should not be allowed > > > Key: SPARK-30249 > URL: https://issues.apache.org/jira/browse/SPARK-30249 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Rakesh Raushan >Priority: Minor > > Column names such as `a:b` , `??`, `,,`, `^^` , `++`etc are allowed when we > are creating parquet tables. > While when we are creating tables with `orc` all such column names are marked > as invalid and analysis exception is thrown. > These column names should also be not allowed for parquet tables as well. > Also this induces inconsistency between column names for Parquet and ORC -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org