[
https://issues.apache.org/jira/browse/SPARK-36269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-36269.
---------------------------------
Fix Version/s: 3.1.3
3.2.0
3.0.4
Resolution: Fixed
Issue resolved by pull request 33489
[https://github.com/apache/spark/pull/33489]
> Fix only set data columns to Hive column names config
> -----------------------------------------------------
>
> Key: SPARK-36269
> URL: https://issues.apache.org/jira/browse/SPARK-36269
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Cheng Su
> Assignee: Cheng Su
> Priority: Minor
> Fix For: 3.0.4, 3.2.0, 3.1.3
>
>
> When reading Hive table, we set the Hive column id and column name configs
> (`hive.io.file.readcolumn.ids` and `hive.io.file.readcolumn.names`). We
> should set non-partition columns (data columns) for both configs, as Spark
> always appends partition columns in its own reader -
> [https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L240]
> . The column id config has only non-partition columns, but column name
> config has both partition and non-partition columns. We should keep them to
> be consistent with only non-partition columns. This does not cause issue for
> public OSS Hive file format, but for customized internal Hive file format, it
> causes the issue as we are expecting these two configs to be same.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]