[jira] [Resolved] (HUDI-353) Add support for Hive style partitioning path

lamber-ken (Jira) Tue, 03 Mar 2020 23:10:25 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


lamber-ken resolved HUDI-353.
-----------------------------
    Resolution: Resolved

Fixed at master e555aa516de867a4faf0322e79defa1f52d887ef

> Add support for Hive style partitioning path
> --------------------------------------------
>
>                 Key: HUDI-353
>                 URL: https://issues.apache.org/jira/browse/HUDI-353
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Hive Integration
>            Reporter: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Hive, the partition folder name follows this format: 
> <partition_column_name>=<partition_value>.
> But in Hudi, the name of its partition folder is <partition_value>.
> e.g. A dataset is partitioned by three columns: year, month and day.
> In Hive, the data is saved in: 
> {{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}}
> In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}}
> Basically I add a new option in Spark datasource named 
> {{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive 
> style partitioning or not. By default this option is false (not use).
> Also, if using hive style partitioning, instead of scanning the dataset and 
> manually adding/updating all partitions, we can use "MSCK REPAIR TABLE 
> <table_name>" to automatically sync all the partition info with Hive 
> MetaStore.
> h3.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-353) Add support for Hive style partitioning path

Reply via email to