[ https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815706#comment-16815706 ]
Hyukjin Kwon commented on SPARK-27442: -------------------------------------- Why does Spark has to be able to read it? IIRC, this restriction was inherited from Parquet. > ParquetFileFormat fails to read column named with invalid characters > -------------------------------------------------------------------- > > Key: SPARK-27442 > URL: https://issues.apache.org/jira/browse/SPARK-27442 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.0.0, 2.4.1 > Reporter: Jan Vršovský > Priority: Major > > When reading a parquet file which contains characters considered invalid, the > reader fails with exception: > Name: org.apache.spark.sql.AnalysisException > Message: Attribute name "..." contains invalid character(s) among " > ,;{}()\n\t=". Please use alias to rename it. > Spark should not be able to write such files, but it should be able to read > it (and allow the user to correct it). However, possible workarounds (such as > using alias to rename the column, or forcing another schema) do not work, > since the check is done on the input. > (Possible fix: remove superficial > {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from > {{buildReaderWithPartitionValues}} ?) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org