Vinaykumar Bhat created HUDI-7480:
-------------------------------------

             Summary: initializeFunctionalIndexPartition is called multiple 
times
                 Key: HUDI-7480
                 URL: https://issues.apache.org/jira/browse/HUDI-7480
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Vinaykumar Bhat


This is due to a issue in 
initializeFromFilesystem(), which tries to check if MDT partition needs to be 
initialized based on the absence of partition-type. But for functional index, 
partition-type actually store the prefix (func_index_)- hence the check always 
fails and we try to reinit the same functional index partition again.
 
Simple test:
{quote}spark.sql(
s"""
|create table $tableName (
| id int,
| name string,
| price double,
| ts long
|) using hudi
| options (
| primaryKey ='id',
| type = '$tableType',
| preCombineField = 'ts',
| hoodie.metadata.record.index.enable = 'true',
| hoodie.datasource.write.recordkey.field = 'id'
| )
| partitioned by(ts)
| location '$basePath'
""".stripMargin)
spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
 
var createIndexSql = s"create index idx_datestr on $tableName using 
column_stats(ts) options(func='from_unixtime', format='yyyy-MM-dd')"
spark.sql(createIndexSql)
 
-- This insert throws null-pointer exception
spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to