Jan Vršovský created SPARK-27442:
------------------------------------

             Summary: ParquetFileFormat fails to read column named with invalid 
characters
                 Key: SPARK-27442
                 URL: https://issues.apache.org/jira/browse/SPARK-27442
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.4.1, 2.0.0
            Reporter: Jan Vršovský


When reading a parquet file which contains characters considered invalid, the 
reader fails with exception:

Name: org.apache.spark.sql.AnalysisException
Message: Attribute name "..." contains invalid character(s) among " 
,;{}()\n\t=". Please use alias to rename it.

Spark should not be able to write such files, but it should be able to read it 
(and allow the user to correct it). However, possible workarounds (such as 
using alias to rename the column, or forcing another schema) do not work, since 
the check is done on the input.

(Possible fix: remove superficial 
{{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from 
{{buildReaderWithPartitionValues}} ?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to