[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo reassigned HUDI-992: ------------------------------ Assignee: Ethan Guo (was: Udit Mehrotra) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > ------------------------------------------------------------------------------------------------------- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync > Affects Versions: 0.9.0 > Reporter: Udit Mehrotra > Assignee: Ethan Guo > Priority: Blocker > Fix For: 0.13.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)