[jira] [Created] (SPARK-36269) Fix only set data columns to Hive column names config

Cheng Su (Jira) Thu, 22 Jul 2021 21:13:06 -0700

Cheng Su created SPARK-36269:
--------------------------------

             Summary: Fix only set data columns to Hive column names config
                 Key: SPARK-36269
                 URL: https://issues.apache.org/jira/browse/SPARK-36269
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Cheng Su



When reading Hive table, we set the Hive column id and column name configs 
(`hive.io.file.readcolumn.ids` and `hive.io.file.readcolumn.names`). We should 
set non-partition columns (data columns) for both configs, as Spark always 
appends partition columns in its own reader - 
[https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L240]
 . The column id config has only non-partition columns, but column name config 
has both partition and non-partition columns. We should keep them to be 
consistent with only non-partition columns. This does not cause issue for 
public OSS Hive file format, but for customized internal Hive file format, it 
causes the issue as we are expecting these two configs to be same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36269) Fix only set data columns to Hive column names config

Reply via email to