[jira] [Assigned] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times
[ https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7480: --- Assignee: Sagar Sumit (was: Vinaykumar Bhat) > initializeFunctionalIndexPartition is called multiple times > --- > > Key: HUDI-7480 > URL: https://issues.apache.org/jira/browse/HUDI-7480 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > This is due to a issue in > initializeFromFilesystem(), which tries to check if MDT partition needs to be > initialized based on the absence of partition-type. But for functional index, > partition-type actually store the prefix (func_index_)- hence the check > always fails and we try to reinit the same functional index partition again. > > Simple test: > {quote}spark.sql( > s""" > |create table $tableName ( > | id int, > | name string, > | price double, > | ts long > |) using hudi > | options ( > | primaryKey ='id', > | type = '$tableType', > | preCombineField = 'ts', > | hoodie.metadata.record.index.enable = 'true', > | hoodie.datasource.write.recordkey.field = 'id' > | ) > | partitioned by(ts) > | location '$basePath' > """.stripMargin) > spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") > spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)") > spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)") > > var createIndexSql = s"create index idx_datestr on $tableName using > column_stats(ts) options(func='from_unixtime', format='-MM-dd')" > spark.sql(createIndexSql) > > -- This insert throws null-pointer exception > spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times
[ https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-7480: Assignee: Vinaykumar Bhat (was: Sagar Sumit) > initializeFunctionalIndexPartition is called multiple times > --- > > Key: HUDI-7480 > URL: https://issues.apache.org/jira/browse/HUDI-7480 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > This is due to a issue in > initializeFromFilesystem(), which tries to check if MDT partition needs to be > initialized based on the absence of partition-type. But for functional index, > partition-type actually store the prefix (func_index_)- hence the check > always fails and we try to reinit the same functional index partition again. > > Simple test: > {quote}spark.sql( > s""" > |create table $tableName ( > | id int, > | name string, > | price double, > | ts long > |) using hudi > | options ( > | primaryKey ='id', > | type = '$tableType', > | preCombineField = 'ts', > | hoodie.metadata.record.index.enable = 'true', > | hoodie.datasource.write.recordkey.field = 'id' > | ) > | partitioned by(ts) > | location '$basePath' > """.stripMargin) > spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") > spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)") > spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)") > > var createIndexSql = s"create index idx_datestr on $tableName using > column_stats(ts) options(func='from_unixtime', format='-MM-dd')" > spark.sql(createIndexSql) > > -- This insert throws null-pointer exception > spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times
[ https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat reassigned HUDI-7480: - Assignee: Sagar Sumit > initializeFunctionalIndexPartition is called multiple times > --- > > Key: HUDI-7480 > URL: https://issues.apache.org/jira/browse/HUDI-7480 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Sagar Sumit >Priority: Major > > This is due to a issue in > initializeFromFilesystem(), which tries to check if MDT partition needs to be > initialized based on the absence of partition-type. But for functional index, > partition-type actually store the prefix (func_index_)- hence the check > always fails and we try to reinit the same functional index partition again. > > Simple test: > {quote}spark.sql( > s""" > |create table $tableName ( > | id int, > | name string, > | price double, > | ts long > |) using hudi > | options ( > | primaryKey ='id', > | type = '$tableType', > | preCombineField = 'ts', > | hoodie.metadata.record.index.enable = 'true', > | hoodie.datasource.write.recordkey.field = 'id' > | ) > | partitioned by(ts) > | location '$basePath' > """.stripMargin) > spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") > spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)") > spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)") > > var createIndexSql = s"create index idx_datestr on $tableName using > column_stats(ts) options(func='from_unixtime', format='-MM-dd')" > spark.sql(createIndexSql) > > -- This insert throws null-pointer exception > spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)