[ https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477089#comment-17477089 ]
angerszhu commented on SPARK-27442: ----------------------------------- Yea, will try to fix this. > ParquetFileFormat fails to read column named with invalid characters > -------------------------------------------------------------------- > > Key: SPARK-27442 > URL: https://issues.apache.org/jira/browse/SPARK-27442 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.0.0, 2.4.1 > Reporter: Jan Vršovský > Priority: Minor > > When reading a parquet file which contains characters considered invalid, the > reader fails with exception: > Name: org.apache.spark.sql.AnalysisException > Message: Attribute name "..." contains invalid character(s) among " > ,;{}()\n\t=". Please use alias to rename it. > Spark should not be able to write such files, but it should be able to read > it (and allow the user to correct it). However, possible workarounds (such as > using alias to rename the column, or forcing another schema) do not work, > since the check is done on the input. > (Possible fix: remove superficial > {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from > {{buildReaderWithPartitionValues}} ?) -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org