[jira] [Assigned] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times

2024-03-27 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7480:
---

Assignee: Sagar Sumit  (was: Vinaykumar Bhat)

> initializeFunctionalIndexPartition is called multiple times
> ---
>
> Key: HUDI-7480
> URL: https://issues.apache.org/jira/browse/HUDI-7480
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> This is due to a issue in 
> initializeFromFilesystem(), which tries to check if MDT partition needs to be 
> initialized based on the absence of partition-type. But for functional index, 
> partition-type actually store the prefix (func_index_)- hence the check 
> always fails and we try to reinit the same functional index partition again.
>  
> Simple test:
> {quote}spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price double,
> | ts long
> |) using hudi
> | options (
> | primaryKey ='id',
> | type = '$tableType',
> | preCombineField = 'ts',
> | hoodie.metadata.record.index.enable = 'true',
> | hoodie.datasource.write.recordkey.field = 'id'
> | )
> | partitioned by(ts)
> | location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
> spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
> spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
>  
> var createIndexSql = s"create index idx_datestr on $tableName using 
> column_stats(ts) options(func='from_unixtime', format='-MM-dd')"
> spark.sql(createIndexSql)
>  
> -- This insert throws null-pointer exception
> spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times

2024-03-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-7480:


Assignee: Vinaykumar Bhat  (was: Sagar Sumit)

> initializeFunctionalIndexPartition is called multiple times
> ---
>
> Key: HUDI-7480
> URL: https://issues.apache.org/jira/browse/HUDI-7480
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> This is due to a issue in 
> initializeFromFilesystem(), which tries to check if MDT partition needs to be 
> initialized based on the absence of partition-type. But for functional index, 
> partition-type actually store the prefix (func_index_)- hence the check 
> always fails and we try to reinit the same functional index partition again.
>  
> Simple test:
> {quote}spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price double,
> | ts long
> |) using hudi
> | options (
> | primaryKey ='id',
> | type = '$tableType',
> | preCombineField = 'ts',
> | hoodie.metadata.record.index.enable = 'true',
> | hoodie.datasource.write.recordkey.field = 'id'
> | )
> | partitioned by(ts)
> | location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
> spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
> spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
>  
> var createIndexSql = s"create index idx_datestr on $tableName using 
> column_stats(ts) options(func='from_unixtime', format='-MM-dd')"
> spark.sql(createIndexSql)
>  
> -- This insert throws null-pointer exception
> spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times

2024-03-05 Thread Vinaykumar Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinaykumar Bhat reassigned HUDI-7480:
-

Assignee: Sagar Sumit

> initializeFunctionalIndexPartition is called multiple times
> ---
>
> Key: HUDI-7480
> URL: https://issues.apache.org/jira/browse/HUDI-7480
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Sagar Sumit
>Priority: Major
>
> This is due to a issue in 
> initializeFromFilesystem(), which tries to check if MDT partition needs to be 
> initialized based on the absence of partition-type. But for functional index, 
> partition-type actually store the prefix (func_index_)- hence the check 
> always fails and we try to reinit the same functional index partition again.
>  
> Simple test:
> {quote}spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price double,
> | ts long
> |) using hudi
> | options (
> | primaryKey ='id',
> | type = '$tableType',
> | preCombineField = 'ts',
> | hoodie.metadata.record.index.enable = 'true',
> | hoodie.datasource.write.recordkey.field = 'id'
> | )
> | partitioned by(ts)
> | location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
> spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
> spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
>  
> var createIndexSql = s"create index idx_datestr on $tableName using 
> column_stats(ts) options(func='from_unixtime', format='-MM-dd')"
> spark.sql(createIndexSql)
>  
> -- This insert throws null-pointer exception
> spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)