[ 
https://issues.apache.org/jira/browse/HUDI-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zouxxyy updated HUDI-5057:
--------------------------
    Description: 
When `hoodie.datasource.write.hive_style_partitioning` is set to true, hudi 
table partition name is "x=x", for example "dt=2022-10-06";

If it is set to false,  name is "x", for example "2022-10-06";

At this moment,If we try to repair the partitions, it doesn't work because the 
current `RepairTableCommand` does not support reading partition name like 
"2022-10-06"

 

  was:
When disable `hoodie.datasource.write.hive_style_partitioning`

Run `msck repair table` sql  fails to repair the partitions in the file system 
to the catalog

For example:

1. create table by sparksql
{code:java}
create table h0 (
id int,
name string,
ts long,
dt string) 
using hudi
partitioned by (dt)
location '/tmp/test'
tblproperties (
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.write.hive_style_partitioning = 'false');{code}
2. modify the partitions
{code:java}
import org.apache.hudi.DataSourceWriteOptions.{PARTITIONPATH_FIELD, 
PRECOMBINE_FIELD, RECORDKEY_FIELD}
import org.apache.hudi.HoodieSparkUtils
import 
org.apache.hudi.common.table.HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE
import org.apache.hudi.config.HoodieWriteConfig.TBL_NAME

import org.apache.spark.sql.SaveMode

val df = Seq((1, "a1", 1000, "2022-10-06")).toDF("id", "name", "ts", "dt");
df.write.format("hudi")
  .option(RECORDKEY_FIELD.key, "id")
  .option(PRECOMBINE_FIELD.key, "ts")
  .option(PARTITIONPATH_FIELD.key, "dt")
  .option(HIVE_STYLE_PARTITIONING_ENABLE.key, "false")
  .mode(SaveMode.Append)
  .save("/tmp/test");{code}
3. run msck repair table by sparksql
{code:java}
msck repair table h0;{code}
4. list partitionNames
{code:java}
val table = spark.sessionState.sqlParser.parseTableIdentifier("h0");
spark.sessionState.catalog.listPartitionNames(table).toArray;{code}
It should return Array(dt=2022-10-06) but Array()


> Fix msck repair hudi table
> --------------------------
>
>                 Key: HUDI-5057
>                 URL: https://issues.apache.org/jira/browse/HUDI-5057
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark-sql
>    Affects Versions: 0.12.0
>            Reporter: zouxxyy
>            Assignee: zouxxyy
>            Priority: Major
>              Labels: pull-request-available
>
> When `hoodie.datasource.write.hive_style_partitioning` is set to true, hudi 
> table partition name is "x=x", for example "dt=2022-10-06";
> If it is set to false,  name is "x", for example "2022-10-06";
> At this moment,If we try to repair the partitions, it doesn't work because 
> the current `RepairTableCommand` does not support reading partition name like 
> "2022-10-06"
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to