[ https://issues.apache.org/jira/browse/HUDI-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-4453: ---------------------------- Status: In Progress (was: Open) > Support partition pruning for tables Bootstrapped from Source Hive Style > partitioned tables > ------------------------------------------------------------------------------------------- > > Key: HUDI-4453 > URL: https://issues.apache.org/jira/browse/HUDI-4453 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Udit Mehrotra > Assignee: Ethan Guo > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > As of now the *Bootstrap* feature determines the source schema by reading it > from the source parquet files => > [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java#L61] > This does not consider parquet tables which might be Hive style partitioned. > Thus, from the source schema partition columns would be missed and not > written to the target Hudi table either. Also because of this partition > pruning does not work, as we are unable to prune out source partitions. We > should improve this logic to determine partition schema correctly from the > partition paths in case of hive style partitioned tables and write the > partition column values correctly in the target Hudi table. -- This message was sent by Atlassian Jira (v8.20.10#820010)