[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-992: Fix Version/s: 0.14.1 (was: 0.14.0) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Zhang updated HUDI-992: --- Fix Version/s: 0.14.0 (was: 0.13.1) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-992: - Fix Version/s: (was: 0.12.3) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Fix Version/s: 0.12.3 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.1, 0.12.3 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-992: - Fix Version/s: 0.13.1 (was: 0.13.0) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Sprint: 2022/09/05 (was: 2022/09/05, 2022/09/19) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Sprint: 2022/09/05, 2022/09/19 (was: 2022/09/05) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-992: --- Status: In Progress (was: Open) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-992: --- Priority: Blocker (was: Major) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Blocker > Fix For: 0.13.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-992: --- Sprint: 2022/09/05 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.13.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-992: --- Story Points: 2 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.12.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-992: --- Epic Link: HUDI-1265 (was: HUDI-2519) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.12.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-992: - Fix Version/s: 0.12.1 (was: 0.12.0) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.12.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Fix Version/s: (was: 0.11.0) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.12.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Component/s: bootstrap > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.11.0, 0.12.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Fix Version/s: 0.12.0 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.11.0, 0.12.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Component/s: meta-sync > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug > Components: meta-sync >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.11.0, 0.12.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Epic Link: HUDI-2519 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.11.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-992: Parent: (was: HUDI-1265) Issue Type: Bug (was: Sub-task) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.11.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-992: Fix Version/s: (was: 0.10.0) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.11.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-992: --- Fix Version/s: (was: 0.9.0) 0.10.0 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.10.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-992: - Fix Version/s: (was: 0.8.0) 0.9.0 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.9.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-992: - Affects Version/s: 0.9.0 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Affects Versions: 0.9.0 >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.8.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-992: Priority: Major (was: Blocker) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Fix For: 0.7.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Parent: HUDI-1265 (was: HUDI-242) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Blocker > Fix For: 0.6.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-992: --- Fix Version/s: (was: 0.6.0) 0.6.1 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Blocker > Fix For: 0.6.1 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-992: Status: New (was: Open) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Priority: Blocker > Fix For: 0.6.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Fix Version/s: 0.6.0 > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Priority: Blocker > Fix For: 0.6.0 > > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Priority: Blocker (was: Major) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Priority: Blocker > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Status: Open (was: New) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Priority: Major > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-992: --- Parent: HUDI-242 Issue Type: Sub-task (was: Bug) > For hive-style partitioned source data, partition columns synced with Hive > will always have String type > --- > > Key: HUDI-992 > URL: https://issues.apache.org/jira/browse/HUDI-992 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Udit Mehrotra >Priority: Major > > Currently bootstrap implementation is not able to handle partition columns > correctly when the source data has *hive-style partitioning*, as is also > mentioned in https://jira.apache.org/jira/browse/HUDI-915 > The schema inferred while performing bootstrap and stored in the commit > metadata does not have partition column schema(in case of hive partitioned > data). As a result during hive-sync when hudi tries to determine the type of > partition column from that schema, it would not find it and assume the > default data type *string*. > Here is where partition column schema is determined for hive-sync: > [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417] > > Thus no matter what the data type of partition column is in the source data > (atleast what spark infers it as from the path), it will always be synced as > string. > -- This message was sent by Atlassian Jira (v8.3.4#803005)