[jira] [Assigned] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

Ethan Guo (Jira) Fri, 02 Sep 2022 16:20:09 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Guo reassigned HUDI-992:
------------------------------

    Assignee: Ethan Guo  (was: Udit Mehrotra)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-992
>                 URL: https://issues.apache.org/jira/browse/HUDI-992
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: bootstrap, meta-sync
>    Affects Versions: 0.9.0
>            Reporter: Udit Mehrotra
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

Reply via email to