[ https://issues.apache.org/jira/browse/HUDI-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lamber-ken resolved HUDI-353. ----------------------------- Resolution: Resolved Fixed at master e555aa516de867a4faf0322e79defa1f52d887ef > Add support for Hive style partitioning path > -------------------------------------------- > > Key: HUDI-353 > URL: https://issues.apache.org/jira/browse/HUDI-353 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Hive Integration > Reporter: Wenning Ding > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In Hive, the partition folder name follows this format: > <partition_column_name>=<partition_value>. > But in Hudi, the name of its partition folder is <partition_value>. > e.g. A dataset is partitioned by three columns: year, month and day. > In Hive, the data is saved in: > {{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}} > In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}} > Basically I add a new option in Spark datasource named > {{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive > style partitioning or not. By default this option is false (not use). > Also, if using hive style partitioning, instead of scanning the dataset and > manually adding/updating all partitions, we can use "MSCK REPAIR TABLE > <table_name>" to automatically sync all the partition info with Hive > MetaStore. > h3. -- This message was sent by Atlassian Jira (v8.3.4#803005)