[ 
https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7480:
---------------------------------
    Labels: pull-request-available  (was: )

> initializeFunctionalIndexPartition is called multiple times
> -----------------------------------------------------------
>
>                 Key: HUDI-7480
>                 URL: https://issues.apache.org/jira/browse/HUDI-7480
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Vinaykumar Bhat
>            Assignee: Sagar Sumit
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> This is due to a issue in 
> initializeFromFilesystem(), which tries to check if MDT partition needs to be 
> initialized based on the absence of partition-type. But for functional index, 
> partition-type actually store the prefix (func_index_)- hence the check 
> always fails and we try to reinit the same functional index partition again.
>  
> Simple test:
> {quote}spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price double,
> | ts long
> |) using hudi
> | options (
> | primaryKey ='id',
> | type = '$tableType',
> | preCombineField = 'ts',
> | hoodie.metadata.record.index.enable = 'true',
> | hoodie.datasource.write.recordkey.field = 'id'
> | )
> | partitioned by(ts)
> | location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
> spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
> spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
>  
> var createIndexSql = s"create index idx_datestr on $tableName using 
> column_stats(ts) options(func='from_unixtime', format='yyyy-MM-dd')"
> spark.sql(createIndexSql)
>  
> -- This insert throws null-pointer exception
> spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to