Chenxiao Mao created SPARK-25391:
------------------------------------

             Summary: Make behaviors consistent when converting parquet hive 
table to parquet data source
                 Key: SPARK-25391
                 URL: https://issues.apache.org/jira/browse/SPARK-25391
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Chenxiao Mao


parquet data source tables and hive parquet tables have different behaviors 
about parquet field resolution. So, when 
{{spark.sql.hive.convertMetastoreParquet}} is true, users might face 
inconsistent behaviors. The differences are:
 * Whether respect {{spark.sql.caseSensitive}}. Without SPARK-25132, both data 
source tables and hive tables do NOT respect {{spark.sql.caseSensitive}}. 
However data source tables always do case-sensitive parquet field resolution, 
while hive tables always do case-insensitive parquet field resolution no matter 
whether {{spark.sql.caseSensitive}} is set to true or false. SPARK-25132 let 
data source tables respect {{spark.sql.caseSensitive}} while hive serde table 
behavior is not changed.
 * How to resolve ambiguity in case-insensitive mode. Without SPARK-25132, data 
source tables do case-sensitive resolution and return columns with the 
corresponding letter cases, while hive tables always return the first matched 
column ignoring cases. SPARK-25132 let data source tables throw exception when 
there is ambiguity while hive table behavior is not changed.

This ticket aims to make behaviors consistent when converting hive table to 
data source table.
 * The behavior must be consistent to do the conversion, so we skip the 
conversion in case-sensitive mode because hive parquet table always do 
case-insensitive field resolution.
 * In case-insensitive mode, when converting hive parquet table to parquet data 
source, we switch the duplicated fields resolution mode to ask parquet data 
source to pick the first matched field - the same behavior as hive parquet 
table - to keep behaviors consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to