[ 
https://issues.apache.org/jira/browse/SPARK-36269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385930#comment-17385930
 ] 

Apache Spark commented on SPARK-36269:
--------------------------------------

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/33489

> Fix only set data columns to Hive column names config
> -----------------------------------------------------
>
>                 Key: SPARK-36269
>                 URL: https://issues.apache.org/jira/browse/SPARK-36269
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Cheng Su
>            Priority: Minor
>
> When reading Hive table, we set the Hive column id and column name configs 
> (`hive.io.file.readcolumn.ids` and `hive.io.file.readcolumn.names`). We 
> should set non-partition columns (data columns) for both configs, as Spark 
> always appends partition columns in its own reader - 
> [https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L240]
>  . The column id config has only non-partition columns, but column name 
> config has both partition and non-partition columns. We should keep them to 
> be consistent with only non-partition columns. This does not cause issue for 
> public OSS Hive file format, but for customized internal Hive file format, it 
> causes the issue as we are expecting these two configs to be same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to