[jira] [Commented] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters

angerszhu (Jira) Mon, 17 Jan 2022 02:09:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477089#comment-17477089
 ]


angerszhu commented on SPARK-27442:
-----------------------------------

Yea, will try to fix this.

> ParquetFileFormat fails to read column named with invalid characters
> --------------------------------------------------------------------
>
>                 Key: SPARK-27442
>                 URL: https://issues.apache.org/jira/browse/SPARK-27442
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.0.0, 2.4.1
>            Reporter: Jan Vršovský
>            Priority: Minor
>
> When reading a parquet file which contains characters considered invalid, the 
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among " 
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read 
> it (and allow the user to correct it). However, possible workarounds (such as 
> using alias to rename the column, or forcing another schema) do not work, 
> since the check is done on the input.
> (Possible fix: remove superficial 
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from 
> {{buildReaderWithPartitionValues}} ?)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters

Reply via email to