[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2023-10-04 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-992:

Fix Version/s: 0.14.1
   (was: 0.14.0)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2023-05-22 Thread Yue Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-992:
---
Fix Version/s: 0.14.0
   (was: 0.13.1)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2023-04-23 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-992:
-
Fix Version/s: (was: 0.12.3)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2023-03-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Fix Version/s: 0.12.3

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.1, 0.12.3
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-12-20 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-992:
-
Fix Version/s: 0.13.1
   (was: 0.13.0)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-09-19 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Sprint: 2022/09/05  (was: 2022/09/05, 2022/09/19)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-09-19 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Sprint: 2022/09/05, 2022/09/19  (was: 2022/09/05)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-09-15 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-992:
---
Status: In Progress  (was: Open)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-08-22 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-992:
---
Priority: Blocker  (was: Major)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-08-22 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-992:
---
Sprint: 2022/09/05

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-08-19 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-992:
---
Story Points: 2

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.12.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-08-19 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-992:
---
Epic Link: HUDI-1265  (was: HUDI-2519)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.12.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-08-16 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-992:
-
Fix Version/s: 0.12.1
   (was: 0.12.0)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.12.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-04-25 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Fix Version/s: (was: 0.11.0)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.12.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-03-27 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Component/s: bootstrap

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.11.0, 0.12.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-03-27 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Fix Version/s: 0.12.0

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.11.0, 0.12.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-03-27 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Component/s: meta-sync

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.11.0, 0.12.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-02-14 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Epic Link: HUDI-2519

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.11.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-02-14 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-992:

Parent: (was: HUDI-1265)
Issue Type: Bug  (was: Sub-task)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.11.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2021-11-26 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-992:

Fix Version/s: (was: 0.10.0)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.11.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2021-08-25 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-992:
---
Fix Version/s: (was: 0.9.0)
   0.10.0

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.10.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2021-03-23 Thread Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-992:
-
Fix Version/s: (was: 0.8.0)
   0.9.0

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.9.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2021-03-23 Thread Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-992:
-
Affects Version/s: 0.9.0

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.8.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-12-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-992:

Priority: Major  (was: Blocker)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.7.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-09-01 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-992:

Parent: HUDI-1265  (was: HUDI-242)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.6.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-08-14 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha updated HUDI-992:
---
Fix Version/s: (was: 0.6.0)
   0.6.1

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.6.1
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-30 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-992:

Status: New  (was: Open)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.6.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-992:

Fix Version/s: 0.6.0

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.6.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-992:

Priority: Blocker  (was: Major)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Priority: Blocker
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-06 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-992:

Status: Open  (was: New)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Priority: Major
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-03 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-992:
---
Parent: HUDI-242
Issue Type: Sub-task  (was: Bug)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Priority: Major
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)