[jira] [Created] (HUDI-7018) Test case for spark catalog refresh table
HunterXHunter created HUDI-7018: --- Summary: Test case for spark catalog refresh table Key: HUDI-7018 URL: https://issues.apache.org/jira/browse/HUDI-7018 Project: Apache Hudi Issue Type: Test Reporter: HunterXHunter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6032) Fix Read metafield '_hoodie_commit_time' multiple times from the parquet file when using flink
[ https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-6032: Summary: Fix Read metafield '_hoodie_commit_time' multiple times from the parquet file when using flink (was: Fix multiple reads metafield '_hoodie_commit_time' use flink.) > Fix Read metafield '_hoodie_commit_time' multiple times from the parquet > file when using flink > --- > > Key: HUDI-6032 > URL: https://issues.apache.org/jira/browse/HUDI-6032 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > > Flink cant read metafield '_hoodie_commit_time' from parquet file. > [https://github.com/apache/hudi/issues/8371] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6032) Fix Read metafield '_hoodie_commit_time' multiple times from the parquet file when using flink
[ https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-6032: --- Assignee: HunterXHunter > Fix Read metafield '_hoodie_commit_time' multiple times from the parquet > file when using flink > --- > > Key: HUDI-6032 > URL: https://issues.apache.org/jira/browse/HUDI-6032 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > Flink cant read metafield '_hoodie_commit_time' from parquet file. > [https://github.com/apache/hudi/issues/8371] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6032) Fix multiple reads metafield '_hoodie_commit_time' use flink.
[ https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-6032: Summary: Fix multiple reads metafield '_hoodie_commit_time' use flink. (was: Flink cant read metafield '_hoodie_commit_time' from parquet file) > Fix multiple reads metafield '_hoodie_commit_time' use flink. > - > > Key: HUDI-6032 > URL: https://issues.apache.org/jira/browse/HUDI-6032 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > > https://github.com/apache/hudi/issues/8371 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6032) Fix multiple reads metafield '_hoodie_commit_time' use flink.
[ https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-6032: Description: Flink cant read metafield '_hoodie_commit_time' from parquet file. [https://github.com/apache/hudi/issues/8371] was:https://github.com/apache/hudi/issues/8371 > Fix multiple reads metafield '_hoodie_commit_time' use flink. > - > > Key: HUDI-6032 > URL: https://issues.apache.org/jira/browse/HUDI-6032 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > > Flink cant read metafield '_hoodie_commit_time' from parquet file. > [https://github.com/apache/hudi/issues/8371] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6032) Flink cant read metafield '_hoodie_commit_time' from parquet file
HunterXHunter created HUDI-6032: --- Summary: Flink cant read metafield '_hoodie_commit_time' from parquet file Key: HUDI-6032 URL: https://issues.apache.org/jira/browse/HUDI-6032 Project: Apache Hudi Issue Type: Bug Reporter: HunterXHunter https://github.com/apache/hudi/issues/8371 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5996) We should verify the consistency of bucket num at job startup.
HunterXHunter created HUDI-5996: --- Summary: We should verify the consistency of bucket num at job startup. Key: HUDI-5996 URL: https://issues.apache.org/jira/browse/HUDI-5996 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter Users may sometimes modify the bucket num, and the inconsistency of the bucket num will lead to data duplication and make it unavailability. Maybe there are some other parameters that should also be checked before the job starts -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties
[ https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5584: --- Assignee: HunterXHunter > When the table to be synchronized already exists in hive, need to update > serde/table properties > --- > > Key: HUDI-5584 > URL: https://issues.apache.org/jira/browse/HUDI-5584 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only > one table to be synchronized to hive without suffix _ro. > But sometimes tables have been created in hive early, > like: > {code:java} > create table hive.test.HUDI_5584 ( > id int, > ts int) > using hudi > tblproperties ( > type = 'mor', > primaryKey = 'id', > preCombineField = 'ts', > hoodie.datasource.hive_sync.enable = 'true', > hoodie.datasource.hive_sync.table.strategy='ro' > ) location '/tmp/HUDI_5584' {code} > and show create table . > {code:java} > CREATE EXTERNAL TABLE `hudi_5584`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `id` int, > `ts` int) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > WITH SERDEPROPERTIES ( > 'path'='file:///tmp/HUDI_5584') > STORED AS INPUTFORMAT > 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'file:/tmp/HUDI_5584' > TBLPROPERTIES ( > 'hoodie.datasource.hive_sync.enable'='true', > 'hoodie.datasource.hive_sync.table.strategy'='ro', > 'preCombineField'='ts', > 'primaryKey'='id', > 'spark.sql.create.version'='3.3.1', > 'spark.sql.sources.provider'='hudi', > 'spark.sql.sources.schema.numParts'='1', > 'spark.sql.sources.schema.part.0'='xx' > 'transient_lastDdlTime'='1674108302', > 'type'='mor') {code} > *The table like a realtime table.* > > When we finish writing data and synchronize ro table , because the table > already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. > This causes the type of the table is not match as expect. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5591) HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter HIVE_SYNC_TABLE_STRATEGY
[ https://issues.apache.org/jira/browse/HUDI-5591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5591: --- Assignee: HunterXHunter > HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter > HIVE_SYNC_TABLE_STRATEGY > --- > > Key: HUDI-5591 > URL: https://issues.apache.org/jira/browse/HUDI-5591 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5591) HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter HIVE_SYNC_TABLE_STRATEGY
HunterXHunter created HUDI-5591: --- Summary: HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter HIVE_SYNC_TABLE_STRATEGY Key: HUDI-5591 URL: https://issues.apache.org/jira/browse/HUDI-5591 Project: Apache Hudi Issue Type: Bug Reporter: HunterXHunter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties
[ https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5584: Description: when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one table to be synchronized to hive without suffix _ro. But sometimes tables have been created in hive early, like: {code:java} create table hive.test.HUDI_5584 ( id int, ts int) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts', hoodie.datasource.hive_sync.enable = 'true', hoodie.datasource.hive_sync.table.strategy='ro' ) location '/tmp/HUDI_5584' {code} and show create table . {code:java} CREATE EXTERNAL TABLE `hudi_5584`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` int, `ts` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='file:///tmp/HUDI_5584') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/HUDI_5584' TBLPROPERTIES ( 'hoodie.datasource.hive_sync.enable'='true', 'hoodie.datasource.hive_sync.table.strategy'='ro', 'preCombineField'='ts', 'primaryKey'='id', 'spark.sql.create.version'='3.3.1', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='xx' 'transient_lastDdlTime'='1674108302', 'type'='mor') {code} *The table like a realtime table.* When we finish writing data and synchronize ro table , because the table already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. This causes the type of the table is not match expect. was: when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one table to be synchronized to hive without suffix _ro. But sometimes the table may have been created in hive early. like: {code:java} create table hive.test.HUDI_5584 ( id int, ts int) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts', hoodie.datasource.hive_sync.enable = 'true', hoodie.datasource.hive_sync.table.strategy='ro' ) location '/tmp/HUDI_5584' {code} and show create table . {code:java} CREATE EXTERNAL TABLE `hudi_5584`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` int, `ts` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='file:///tmp/HUDI_5584') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/HUDI_5584' TBLPROPERTIES ( 'hoodie.datasource.hive_sync.enable'='true', 'hoodie.datasource.hive_sync.table.strategy'='ro', 'preCombineField'='ts', 'primaryKey'='id', 'spark.sql.create.version'='3.3.1', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='xx' 'transient_lastDdlTime'='1674108302', 'type'='mor') {code} the table like a realtime table. When we finish writing data and synchronize tables, because the table already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. This causes the type of the table to be unexpected. > When the table to be synchronized already exists in hive, need to update > serde/table properties > --- > > Key: HUDI-5584 > URL: https://issues.apache.org/jira/browse/HUDI-5584 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > > when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only > one table to be synchronized to hive without suffix _ro. > But sometimes tables have been created in hive early, > like: > {code:java} > create table hive.test.HUDI_5584 ( > id int, > ts int) > using hudi > tblproperties ( > type = 'mor', > primaryKey = 'id', > preCombineField = 'ts', > hoodie.datasource.hive_sync.enable = 'true', > hoodie.datasource.hive_sync.table.strategy='ro' > ) location '/tmp/HUDI_5584' {code} > and show create table . > {code:java} > CREATE EXTERNAL TABLE `hudi_5584`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `id` int, > `ts` int) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > WITH SERDEPROPERTIES ( >
[jira] [Updated] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties
[ https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5584: Description: when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one table to be synchronized to hive without suffix _ro. But sometimes tables have been created in hive early, like: {code:java} create table hive.test.HUDI_5584 ( id int, ts int) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts', hoodie.datasource.hive_sync.enable = 'true', hoodie.datasource.hive_sync.table.strategy='ro' ) location '/tmp/HUDI_5584' {code} and show create table . {code:java} CREATE EXTERNAL TABLE `hudi_5584`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` int, `ts` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='file:///tmp/HUDI_5584') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/HUDI_5584' TBLPROPERTIES ( 'hoodie.datasource.hive_sync.enable'='true', 'hoodie.datasource.hive_sync.table.strategy'='ro', 'preCombineField'='ts', 'primaryKey'='id', 'spark.sql.create.version'='3.3.1', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='xx' 'transient_lastDdlTime'='1674108302', 'type'='mor') {code} *The table like a realtime table.* When we finish writing data and synchronize ro table , because the table already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. This causes the type of the table is not match as expect. was: when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one table to be synchronized to hive without suffix _ro. But sometimes tables have been created in hive early, like: {code:java} create table hive.test.HUDI_5584 ( id int, ts int) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts', hoodie.datasource.hive_sync.enable = 'true', hoodie.datasource.hive_sync.table.strategy='ro' ) location '/tmp/HUDI_5584' {code} and show create table . {code:java} CREATE EXTERNAL TABLE `hudi_5584`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` int, `ts` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='file:///tmp/HUDI_5584') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/HUDI_5584' TBLPROPERTIES ( 'hoodie.datasource.hive_sync.enable'='true', 'hoodie.datasource.hive_sync.table.strategy'='ro', 'preCombineField'='ts', 'primaryKey'='id', 'spark.sql.create.version'='3.3.1', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='xx' 'transient_lastDdlTime'='1674108302', 'type'='mor') {code} *The table like a realtime table.* When we finish writing data and synchronize ro table , because the table already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. This causes the type of the table is not match expect. > When the table to be synchronized already exists in hive, need to update > serde/table properties > --- > > Key: HUDI-5584 > URL: https://issues.apache.org/jira/browse/HUDI-5584 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > > when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only > one table to be synchronized to hive without suffix _ro. > But sometimes tables have been created in hive early, > like: > {code:java} > create table hive.test.HUDI_5584 ( > id int, > ts int) > using hudi > tblproperties ( > type = 'mor', > primaryKey = 'id', > preCombineField = 'ts', > hoodie.datasource.hive_sync.enable = 'true', > hoodie.datasource.hive_sync.table.strategy='ro' > ) location '/tmp/HUDI_5584' {code} > and show create table . > {code:java} > CREATE EXTERNAL TABLE `hudi_5584`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `id` int, > `ts` int) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > WITH SERDEPROPERTIES
[jira] [Updated] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties
[ https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5584: Description: when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one table to be synchronized to hive without suffix _ro. But sometimes the table may have been created in hive early. like: {code:java} create table hive.test.HUDI_5584 ( id int, ts int) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts', hoodie.datasource.hive_sync.enable = 'true', hoodie.datasource.hive_sync.table.strategy='ro' ) location '/tmp/HUDI_5584' {code} and show create table . {code:java} CREATE EXTERNAL TABLE `hudi_5584`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` int, `ts` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='file:///tmp/HUDI_5584') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/HUDI_5584' TBLPROPERTIES ( 'hoodie.datasource.hive_sync.enable'='true', 'hoodie.datasource.hive_sync.table.strategy'='ro', 'preCombineField'='ts', 'primaryKey'='id', 'spark.sql.create.version'='3.3.1', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='xx' 'transient_lastDdlTime'='1674108302', 'type'='mor') {code} the table like a realtime table. When we finish writing data and synchronize tables, because the table already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. This causes the type of the table to be unexpected. > When the table to be synchronized already exists in hive, need to update > serde/table properties > --- > > Key: HUDI-5584 > URL: https://issues.apache.org/jira/browse/HUDI-5584 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > > when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only > one table to be synchronized to hive without suffix _ro. > But sometimes the table may have been created in hive early. > like: > {code:java} > create table hive.test.HUDI_5584 ( > id int, > ts int) > using hudi > tblproperties ( > type = 'mor', > primaryKey = 'id', > preCombineField = 'ts', > hoodie.datasource.hive_sync.enable = 'true', > hoodie.datasource.hive_sync.table.strategy='ro' > ) location '/tmp/HUDI_5584' {code} > and show create table . > {code:java} > CREATE EXTERNAL TABLE `hudi_5584`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `id` int, > `ts` int) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > WITH SERDEPROPERTIES ( > 'path'='file:///tmp/HUDI_5584') > STORED AS INPUTFORMAT > 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'file:/tmp/HUDI_5584' > TBLPROPERTIES ( > 'hoodie.datasource.hive_sync.enable'='true', > 'hoodie.datasource.hive_sync.table.strategy'='ro', > 'preCombineField'='ts', > 'primaryKey'='id', > 'spark.sql.create.version'='3.3.1', > 'spark.sql.sources.provider'='hudi', > 'spark.sql.sources.schema.numParts'='1', > 'spark.sql.sources.schema.part.0'='xx' > 'transient_lastDdlTime'='1674108302', > 'type'='mor') {code} > the table like a realtime table. > When we finish writing data and synchronize tables, because the table already > exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified. > This causes the type of the table to be unexpected. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties
HunterXHunter created HUDI-5584: --- Summary: When the table to be synchronized already exists in hive, need to update serde/table properties Key: HUDI-5584 URL: https://issues.apache.org/jira/browse/HUDI-5584 Project: Apache Hudi Issue Type: Bug Reporter: HunterXHunter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5580) flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint
[ https://issues.apache.org/jira/browse/HUDI-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678448#comment-17678448 ] HunterXHunter commented on HUDI-5580: - [~Bone An] Can you take a look at this test? > flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint > > > Key: HUDI-5580 > URL: https://issues.apache.org/jira/browse/HUDI-5580 > Project: Apache Hudi > Issue Type: Test >Reporter: HunterXHunter >Priority: Major > > {code:java} > 2023-01-18T15:37:37.0801896Z [ERROR] > TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321 > expected: <0> but was: <1> > {code} > https://github.com/apache/hudi/actions/runs/3949925387/jobs/6761767342 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5580) flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint
[ https://issues.apache.org/jira/browse/HUDI-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5580: Description: {code:java} 2023-01-18T15:37:37.0801896Z [ERROR] TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321 expected: <0> but was: <1> {code} https://github.com/apache/hudi/actions/runs/3949925387/jobs/6761767342 was: {code:java} 2023-01-18T15:37:37.0801896Z [ERROR] TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321 expected: <0> but was: <1> {code} https://pipelines.actions.githubusercontent.com/serviceHosts/624e4e79-816a-4c2a-80fd-f50e8b678dc8/_apis/pipelines/1/runs/25269/signedlogcontent/16?urlExpires=2023-01-19T01%3A31%3A15.1330329Z=HMACV1=lCtYU0ZRBi2xhWmBZP9OH42DNh7KGz7Z8x79IKhURZE%3D > flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint > > > Key: HUDI-5580 > URL: https://issues.apache.org/jira/browse/HUDI-5580 > Project: Apache Hudi > Issue Type: Test >Reporter: HunterXHunter >Priority: Major > > {code:java} > 2023-01-18T15:37:37.0801896Z [ERROR] > TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321 > expected: <0> but was: <1> > {code} > https://github.com/apache/hudi/actions/runs/3949925387/jobs/6761767342 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5580) flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint
HunterXHunter created HUDI-5580: --- Summary: flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint Key: HUDI-5580 URL: https://issues.apache.org/jira/browse/HUDI-5580 Project: Apache Hudi Issue Type: Test Reporter: HunterXHunter {code:java} 2023-01-18T15:37:37.0801896Z [ERROR] TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321 expected: <0> but was: <1> {code} https://pipelines.actions.githubusercontent.com/serviceHosts/624e4e79-816a-4c2a-80fd-f50e8b678dc8/_apis/pipelines/1/runs/25269/signedlogcontent/16?urlExpires=2023-01-19T01%3A31%3A15.1330329Z=HMACV1=lCtYU0ZRBi2xhWmBZP9OH42DNh7KGz7Z8x79IKhURZE%3D -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name
[ https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5572: --- Assignee: HunterXHunter > Flink write need to skip check the compatibility of Schema#name > --- > > Key: HUDI-5572 > URL: https://issues.apache.org/jira/browse/HUDI-5572 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Attachments: image-2023-01-18-11-51-12-914.png > > > When we use spark to initialize the hudi table, > .hoodie#hoodie.properties#hoodie.table.create.schema will carry information > 'name=$tablename_record' and 'namespace'='hoodie.$tablename'. > But Flink will not carry this information when writing, > so there will be incompatibilities when doing `validateSchema`. > Here I think we should skip check the compatibility of Schema#name when using > flink write. > !image-2023-01-18-11-51-12-914.png|width=851,height=399! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name
[ https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5572: Description: When we use spark to initialize the hudi table, .hoodie#hoodie.properties#hoodie.table.create.schema will carry information 'name=$tablename_record' and 'namespace'='hoodie.$tablename'. But Flink will not carry this information when writing, so there will be incompatibilities when doing `validateSchema`. Here I think we should skip check the compatibility of Schema#name when using flink write. !image-2023-01-18-11-51-12-914.png|width=851,height=399! was: When we use spark to initialize the hudi table, .hoodie#hoodie.properties#hoodie.table.create.schema will carry information 'name=$tablename_record' and 'namespace'='hoodie.$tablename'. But Flink will not carry this information when writing, so there will be incompatibilities when doing `validateSchema`. Here I think we should skip check the compatibility of Schema#name when using flink write. > Flink write need to skip check the compatibility of Schema#name > --- > > Key: HUDI-5572 > URL: https://issues.apache.org/jira/browse/HUDI-5572 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2023-01-18-11-51-12-914.png > > > When we use spark to initialize the hudi table, > .hoodie#hoodie.properties#hoodie.table.create.schema will carry information > 'name=$tablename_record' and 'namespace'='hoodie.$tablename'. > But Flink will not carry this information when writing, > so there will be incompatibilities when doing `validateSchema`. > Here I think we should skip check the compatibility of Schema#name when using > flink write. > !image-2023-01-18-11-51-12-914.png|width=851,height=399! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name
[ https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5572: Attachment: image-2023-01-18-11-51-12-914.png > Flink write need to skip check the compatibility of Schema#name > --- > > Key: HUDI-5572 > URL: https://issues.apache.org/jira/browse/HUDI-5572 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2023-01-18-11-51-12-914.png > > > When we use spark to initialize the hudi table, > .hoodie#hoodie.properties#hoodie.table.create.schema will carry information > 'name=$tablename_record' and 'namespace'='hoodie.$tablename'. > But Flink will not carry this information when writing, > so there will be incompatibilities when doing `validateSchema`. > Here I think we should skip check the compatibility of Schema#name when using > flink write. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name
HunterXHunter created HUDI-5572: --- Summary: Flink write need to skip check the compatibility of Schema#name Key: HUDI-5572 URL: https://issues.apache.org/jira/browse/HUDI-5572 Project: Apache Hudi Issue Type: Bug Reporter: HunterXHunter When we use spark to initialize the hudi table, .hoodie#hoodie.properties#hoodie.table.create.schema will carry information 'name=$tablename_record' and 'namespace'='hoodie.$tablename'. But Flink will not carry this information when writing, so there will be incompatibilities when doing `validateSchema`. Here I think we should skip check the compatibility of Schema#name when using flink write. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5554) Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY
[ https://issues.apache.org/jira/browse/HUDI-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5554: Summary: Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY (was: add UT TestHiveSyncTool.testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY) > Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter > HIVE_SYNC_TABLE_STRATEGY > -- > > Key: HUDI-5554 > URL: https://issues.apache.org/jira/browse/HUDI-5554 > Project: Apache Hudi > Issue Type: Test >Reporter: HunterXHunter >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5554) add UT TestHiveSyncTool.testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY
HunterXHunter created HUDI-5554: --- Summary: add UT TestHiveSyncTool.testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY Key: HUDI-5554 URL: https://issues.apache.org/jira/browse/HUDI-5554 Project: Apache Hudi Issue Type: Test Reporter: HunterXHunter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5554) Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY
[ https://issues.apache.org/jira/browse/HUDI-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5554: --- Assignee: HunterXHunter > Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter > HIVE_SYNC_TABLE_STRATEGY > -- > > Key: HUDI-5554 > URL: https://issues.apache.org/jira/browse/HUDI-5554 > Project: Apache Hudi > Issue Type: Test >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5528) HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY
[ https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter resolved HUDI-5528. - > HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY > --- > > Key: HUDI-5528 > URL: https://issues.apache.org/jira/browse/HUDI-5528 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > `HiveSyncProcedure & HiveSyncTool` also needs to add > `HIVE_SYNC_TABLE_STRATEGY` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5528) HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY
[ https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5528: --- Assignee: HunterXHunter > HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY > --- > > Key: HUDI-5528 > URL: https://issues.apache.org/jira/browse/HUDI-5528 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > `HiveSyncProcedure & HiveSyncTool` also needs to add > `HIVE_SYNC_TABLE_STRATEGY` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5528) HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY
[ https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5528: Summary: HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY (was: Support optional table synchronization to hive.) > HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY > --- > > Key: HUDI-5528 > URL: https://issues.apache.org/jira/browse/HUDI-5528 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Priority: Major > > `HiveSyncProcedure & HiveSyncTool` also needs to add > `HIVE_SYNC_TABLE_STRATEGY` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5528) Support optional table synchronization to hive.
[ https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5528: Description: `HiveSyncProcedure & HiveSyncTool` also needs to add `HIVE_SYNC_TABLE_STRATEGY` (was: `HiveSyncProcedure & HiveSyncTool & HoodieDeltaStreamer` also needs to add `HIVE_SYNC_TABLE_STRATEGY`) > Support optional table synchronization to hive. > --- > > Key: HUDI-5528 > URL: https://issues.apache.org/jira/browse/HUDI-5528 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Priority: Major > > `HiveSyncProcedure & HiveSyncTool` also needs to add > `HIVE_SYNC_TABLE_STRATEGY` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5528) Support optional table synchronization to hive.
HunterXHunter created HUDI-5528: --- Summary: Support optional table synchronization to hive. Key: HUDI-5528 URL: https://issues.apache.org/jira/browse/HUDI-5528 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter `HiveSyncProcedure & HiveSyncTool & HoodieDeltaStreamer` also needs to add `HIVE_SYNC_TABLE_STRATEGY` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit
[ https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655214#comment-17655214 ] HunterXHunter commented on HUDI-5505: - [~danny0405] Do you have time to confirm this issue? > Compaction NUM_COMMITS policy should only judge completed deltacommit > - > > Key: HUDI-5505 > URL: https://issues.apache.org/jira/browse/HUDI-5505 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2023-01-05-13-10-57-918.png > > > `compaction.delta_commits =1` > > {code:java} > 20230105115229301.deltacommit > 20230105115229301.deltacommit.inflight > 20230105115229301.deltacommit.requested > 20230105115253118.commit > 20230105115253118.compaction.inflight > 20230105115253118.compaction.requested > 20230105115330994.deltacommit.inflight > 20230105115330994.deltacommit.requested{code} > The return result of `ScheduleCompactionActionExecutor.needCompact ` is > `true`, > This should not be expected. > > And In the `Occ` or `lazy clean` mode,this will cause compaction trigger > early. > `compaction.delta_commits =3` > > {code:java} > 20230105125650541.deltacommit.inflight > 20230105125650541.deltacommit.requested > 20230105125715081.deltacommit > 20230105125715081.deltacommit.inflight > 20230105125715081.deltacommit.requested > 20230105130018070.deltacommit.inflight > 20230105130018070.deltacommit.requested {code} > > And compaction will be trigger, this should not be expected. > !image-2023-01-05-13-10-57-918.png|width=699,height=158! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit
[ https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5505: Issue Type: Bug (was: Improvement) > Compaction NUM_COMMITS policy should only judge completed deltacommit > - > > Key: HUDI-5505 > URL: https://issues.apache.org/jira/browse/HUDI-5505 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2023-01-05-13-10-57-918.png > > > `compaction.delta_commits =1` > > {code:java} > 20230105115229301.deltacommit > 20230105115229301.deltacommit.inflight > 20230105115229301.deltacommit.requested > 20230105115253118.commit > 20230105115253118.compaction.inflight > 20230105115253118.compaction.requested > 20230105115330994.deltacommit.inflight > 20230105115330994.deltacommit.requested{code} > The return result of `ScheduleCompactionActionExecutor.needCompact ` is > `true`, > This should not be expected. > > And In the `Occ` or `lazy clean` mode,this will cause compaction trigger > early. > `compaction.delta_commits =3` > > {code:java} > 20230105125650541.deltacommit.inflight > 20230105125650541.deltacommit.requested > 20230105125715081.deltacommit > 20230105125715081.deltacommit.inflight > 20230105125715081.deltacommit.requested > 20230105130018070.deltacommit.inflight > 20230105130018070.deltacommit.requested {code} > > And compaction will be trigger, this should not be expected. > !image-2023-01-05-13-10-57-918.png|width=699,height=158! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit
HunterXHunter created HUDI-5505: --- Summary: Compaction NUM_COMMITS policy should only judge completed deltacommit Key: HUDI-5505 URL: https://issues.apache.org/jira/browse/HUDI-5505 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter Attachments: image-2023-01-05-13-10-57-918.png `compaction.delta_commits =1` {code:java} 20230105115229301.deltacommit 20230105115229301.deltacommit.inflight 20230105115229301.deltacommit.requested 20230105115253118.commit 20230105115253118.compaction.inflight 20230105115253118.compaction.requested 20230105115330994.deltacommit.inflight 20230105115330994.deltacommit.requested{code} The return result of `ScheduleCompactionActionExecutor.needCompact ` is `true`, This should not be expected. And In the `Occ` or `lazy clean` mode,this will cause compaction trigger early. `compaction.delta_commits =3` {code:java} 20230105125650541.deltacommit.inflight 20230105125650541.deltacommit.requested 20230105125715081.deltacommit 20230105125715081.deltacommit.inflight 20230105125715081.deltacommit.requested 20230105130018070.deltacommit.inflight 20230105130018070.deltacommit.requested {code} And compaction will be trigger, this should not be expected. !image-2023-01-05-13-10-57-918.png|width=699,height=158! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5416) Skiping the lock in HoodieFlinkWriteClient#inittable
[ https://issues.apache.org/jira/browse/HUDI-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter resolved HUDI-5416. - > Skiping the lock in HoodieFlinkWriteClient#inittable > > > Key: HUDI-5416 > URL: https://issues.apache.org/jira/browse/HUDI-5416 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > Attachments: image-2022-12-19-17-44-19-289.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5416) Skiping the lock in HoodieFlinkWriteClient#inittable
HunterXHunter created HUDI-5416: --- Summary: Skiping the lock in HoodieFlinkWriteClient#inittable Key: HUDI-5416 URL: https://issues.apache.org/jira/browse/HUDI-5416 Project: Apache Hudi Issue Type: Bug Reporter: HunterXHunter Attachments: image-2022-12-19-17-44-19-289.png -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5415) Support multi-writer for flink write.
HunterXHunter created HUDI-5415: --- Summary: Support multi-writer for flink write. Key: HUDI-5415 URL: https://issues.apache.org/jira/browse/HUDI-5415 Project: Apache Hudi Issue Type: New Feature Components: flink Reporter: HunterXHunter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5416) Skiping the lock in HoodieFlinkWriteClient#inittable
[ https://issues.apache.org/jira/browse/HUDI-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5416: Issue Type: Improvement (was: Bug) > Skiping the lock in HoodieFlinkWriteClient#inittable > > > Key: HUDI-5416 > URL: https://issues.apache.org/jira/browse/HUDI-5416 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-19-17-44-19-289.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5377) Write call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter resolved HUDI-5377. - > Write call stack information to lock file > - > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > { > "lockCreateTime" : 1671017890189, > "lockStackInfo" : [ "\t java.lang.Thread.getStackTrace (Thread.java:1564) > \n", "\t > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.initLockInfo > (FileSystemBasedLockProvider.java:212) \n", "\t > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:172) \n", "\t > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:116) \n", "\t > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:108) \n", "\t > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) \n", "\t > org.apache.hudi.client.BaseHoodieWriteClient.clean > (BaseHoodieWriteClient.java:891) \n", "\t > org.apache.hudi.client.BaseHoodieWriteClient.clean > (BaseHoodieWriteClient.java:858) \n", "\t > org.apache.hudi.sink.CleanFunction.lambda$open$0 (CleanFunction.java:67) \n", > "\t org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) \n", "\t > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) \n", "\t > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) \n", "\t java.lang.Thread.run (Thread.java:750) > \n" ], > "lockThreadName" : "pool-8-thread-1" > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5386) Cleaning conflicts in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Summary: Cleaning conflicts in occ mode (was: Rollback conflict in occ mode) > Cleaning conflicts in occ mode > -- > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. > !image-2022-12-14-11-26-37-252.png! > !image-2022-12-14-11-26-21-995.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5391) Modify the default value of parameter `hoodie.write.lock.client`
[ https://issues.apache.org/jira/browse/HUDI-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5391: --- Assignee: HunterXHunter > Modify the default value of parameter `hoodie.write.lock.client` > - > > Key: HUDI-5391 > URL: https://issues.apache.org/jira/browse/HUDI-5391 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > In OCC mode, there are many steps to trigger lock, which will lead to > frequent locking and unlocking, however, the execution time of locked > operations is short. > So, The default value of > `hoodie.write.lock.client.wait_time_ms_between_retry` should be adjusted > from 10s to 2s to Reduce unnecessary waiting time, > and the default value of `hoodie.write.lock.client.num_retries` can be > increased to 50. > The above adjustments have obvious positive effects in actual use. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5391) Modify the default value of parameter `hoodie.write.lock.client`
HunterXHunter created HUDI-5391: --- Summary: Modify the default value of parameter `hoodie.write.lock.client` Key: HUDI-5391 URL: https://issues.apache.org/jira/browse/HUDI-5391 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter In OCC mode, there are many steps to trigger lock, which will lead to frequent locking and unlocking, however, the execution time of locked operations is short. So, The default value of `hoodie.write.lock.client.wait_time_ms_between_retry` should be adjusted from 10s to 2s to Reduce unnecessary waiting time, and the default value of `hoodie.write.lock.client.num_retries` can be increased to 50. The above adjustments have obvious positive effects in actual use. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5377) Write call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5377: Description: When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. like : { "lockCreateTime" : 1671017890189, "lockStackInfo" : [ "\t java.lang.Thread.getStackTrace (Thread.java:1564) \n", "\t org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.initLockInfo (FileSystemBasedLockProvider.java:212) \n", "\t org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock (FileSystemBasedLockProvider.java:172) \n", "\t org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock (FileSystemBasedLockProvider.java:116) \n", "\t org.apache.hudi.client.transaction.lock.LockManager.lock (LockManager.java:108) \n", "\t org.apache.hudi.client.transaction.TransactionManager.beginTransaction (TransactionManager.java:58) \n", "\t org.apache.hudi.client.BaseHoodieWriteClient.clean (BaseHoodieWriteClient.java:891) \n", "\t org.apache.hudi.client.BaseHoodieWriteClient.clean (BaseHoodieWriteClient.java:858) \n", "\t org.apache.hudi.sink.CleanFunction.lambda$open$0 (CleanFunction.java:67) \n", "\t org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 (NonThrownExecutor.java:130) \n", "\t java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) \n", "\t java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) \n", "\t java.lang.Thread.run (Thread.java:750) \n" ], "lockThreadName" : "pool-8-thread-1" } was: When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. like : LOCK-TIME : 2022-12-13 11:13:15.015 LOCK-STACK-INFO : org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock (FileSystemBasedLockProvider.java:148) org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock (FileSystemBasedLockProvider.java:100) org.apache.hudi.client.transaction.lock.LockManager.lock (LockManager.java:102) org.apache.hudi.client.transaction.TransactionManager.beginTransaction (TransactionManager.java:58) org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService (BaseHoodieWriteClient.java:1425) org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant (BaseHoodieWriteClient.java:1037) org.apache.hudi.util.CompactionUtil.scheduleCompaction (CompactionUtil.java:72) org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 (StreamWriteOperatorCoordinator.java:250) org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 (NonThrownExecutor.java:130) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) java.lang.Thread.run (Thread.java:750) > Write call stack information to lock file > - > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > { > "lockCreateTime" : 1671017890189, > "lockStackInfo" : [ "\t java.lang.Thread.getStackTrace (Thread.java:1564) > \n", "\t > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.initLockInfo > (FileSystemBasedLockProvider.java:212) \n", "\t > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:172) \n", "\t > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:116) \n", "\t > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:108) \n", "\t > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) \n", "\t > org.apache.hudi.client.BaseHoodieWriteClient.clean > (BaseHoodieWriteClient.java:891) \n", "\t > org.apache.hudi.client.BaseHoodieWriteClient.clean > (BaseHoodieWriteClient.java:858) \n", "\t > org.apache.hudi.sink.CleanFunction.lambda$open$0 (CleanFunction.java:67) \n", > "\t org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) \n", "\t > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) \n", "\t > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) \n",
[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Attachment: image-2022-12-14-11-26-37-252.png > Rollback conflict in occ mode > - > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Attachment: (was: 1670986960525.jpg) > Rollback conflict in occ mode > - > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. > !image-2022-12-14-11-26-37-252.png! > !image-2022-12-14-11-26-21-995.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Attachment: image-2022-12-14-11-26-21-995.png > Rollback conflict in occ mode > - > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Attachment: (was: WechatIMG70.jpeg) > Rollback conflict in occ mode > - > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. > !image-2022-12-14-11-26-37-252.png! > !image-2022-12-14-11-26-21-995.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Description: {code:java} configuration parameter: 'hoodie.cleaner.policy.failed.writes' = 'LAZY' 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} Because `getInstantsToRollback` is not locked, multiple writes get the same `instantsToRollback`, the same `instant` will be deleted multiple times and the same `rollback.inflight` will be created multiple times. !image-2022-12-14-11-26-37-252.png! !image-2022-12-14-11-26-21-995.png! was: {code:java} configuration parameter: 'hoodie.cleaner.policy.failed.writes' = 'LAZY' 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} Because `getInstantsToRollback` is not locked, multiple writes get the same `instantsToRollback`, the same `instant` will be deleted multiple times and the same `rollback.inflight` will be created multiple times. > Rollback conflict in occ mode > - > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. > !image-2022-12-14-11-26-37-252.png! > !image-2022-12-14-11-26-21-995.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5386) Rollback conflict in occ mode
HunterXHunter created HUDI-5386: --- Summary: Rollback conflict in occ mode Key: HUDI-5386 URL: https://issues.apache.org/jira/browse/HUDI-5386 Project: Apache Hudi Issue Type: Bug Reporter: HunterXHunter Attachments: 1670986960525.jpg, WechatIMG70.jpeg {code:java} configuration parameter: 'hoodie.cleaner.policy.failed.writes' = 'LAZY' 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} Because `getInstantsToRollback` is not locked, multiple writes get the same `instantsToRollback`, the same `instant` will be deleted multiple times and the same `rollback.inflight` will be created multiple times. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5377) Write call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5377: Summary: Write call stack information to lock file (was: Add call stack information to lock file) > Write call stack information to lock file > - > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > > LOCK-TIME : 2022-12-13 11:13:15.015 > LOCK-STACK-INFO : > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:148) > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:100) > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:102) > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService > (BaseHoodieWriteClient.java:1425) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant > (BaseHoodieWriteClient.java:1037) > org.apache.hudi.util.CompactionUtil.scheduleCompaction > (CompactionUtil.java:72) > > org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 > (StreamWriteOperatorCoordinator.java:250) > org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > java.lang.Thread.run (Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5377) Add call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5377: --- Assignee: HunterXHunter > Add call stack information to lock file > --- > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > > LOCK-TIME : 2022-12-13 11:13:15.015 > LOCK-STACK-INFO : > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:148) > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:100) > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:102) > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService > (BaseHoodieWriteClient.java:1425) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant > (BaseHoodieWriteClient.java:1037) > org.apache.hudi.util.CompactionUtil.scheduleCompaction > (CompactionUtil.java:72) > > org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 > (StreamWriteOperatorCoordinator.java:250) > org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > java.lang.Thread.run (Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5377) Add call stack information to lock file
[ https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5377: Description: When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. like : LOCK-TIME : 2022-12-13 11:13:15.015 LOCK-STACK-INFO : org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock (FileSystemBasedLockProvider.java:148) org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock (FileSystemBasedLockProvider.java:100) org.apache.hudi.client.transaction.lock.LockManager.lock (LockManager.java:102) org.apache.hudi.client.transaction.TransactionManager.beginTransaction (TransactionManager.java:58) org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService (BaseHoodieWriteClient.java:1425) org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant (BaseHoodieWriteClient.java:1037) org.apache.hudi.util.CompactionUtil.scheduleCompaction (CompactionUtil.java:72) org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 (StreamWriteOperatorCoordinator.java:250) org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 (NonThrownExecutor.java:130) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) java.lang.Thread.run (Thread.java:750) was: When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. > Add call stack information to lock file > --- > > Key: HUDI-5377 > URL: https://issues.apache.org/jira/browse/HUDI-5377 > Project: Apache Hudi > Issue Type: Improvement >Reporter: HunterXHunter >Priority: Major > > When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire > lock', > We need to know which step caused the deadlock. > like : > > LOCK-TIME : 2022-12-13 11:13:15.015 > LOCK-STACK-INFO : > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock > (FileSystemBasedLockProvider.java:148) > > org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock > (FileSystemBasedLockProvider.java:100) > org.apache.hudi.client.transaction.lock.LockManager.lock > (LockManager.java:102) > org.apache.hudi.client.transaction.TransactionManager.beginTransaction > (TransactionManager.java:58) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService > (BaseHoodieWriteClient.java:1425) > org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant > (BaseHoodieWriteClient.java:1037) > org.apache.hudi.util.CompactionUtil.scheduleCompaction > (CompactionUtil.java:72) > > org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2 > (StreamWriteOperatorCoordinator.java:250) > org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 > (NonThrownExecutor.java:130) > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > java.lang.Thread.run (Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5377) Add call stack information to lock file
HunterXHunter created HUDI-5377: --- Summary: Add call stack information to lock file Key: HUDI-5377 URL: https://issues.apache.org/jira/browse/HUDI-5377 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter When Occ is enabled, Sometimes an exception is thrown 'Unable to acquire lock', We need to know which step caused the deadlock. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4961) Support optional table synchronization to hive.
[ https://issues.apache.org/jira/browse/HUDI-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4961: Description: The current default is to synchronize tables RT and RO,named with the suffix _rt and _ro,but sometimes the user only needs one RO or RT table and the table name does not need the suffix,An optional parameter is added to allow the user to synchronize only one table, and the table name is not suffixed. add new parameter : {{hive_sync.table.strategy}} Available options : RO , RT, ALL {{hoodie.datasource.hive_sync.table.strategy}} Available options : RO , RT, ALL was:The current default is to synchronize tables RT and RO,named with the suffix _rt and _ro,but sometimes the user only needs one RO or RT table and the table name does not need the suffix,An optional parameter is added to allow the user to synchronize only one table, and the table name is not suffixed. > Support optional table synchronization to hive. > --- > > Key: HUDI-4961 > URL: https://issues.apache.org/jira/browse/HUDI-4961 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > The current default is to synchronize tables RT and RO,named with the suffix > _rt and _ro,but sometimes the user only needs one RO or RT table and the > table name does not need the suffix,An optional parameter is added to allow > the user to synchronize only one table, and the table name is not suffixed. > add new parameter : > {{hive_sync.table.strategy}} Available options : RO , RT, ALL > {{hoodie.datasource.hive_sync.table.strategy}} Available options : RO , RT, > ALL -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4961) Support optional table synchronization to hive.
[ https://issues.apache.org/jira/browse/HUDI-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4961: --- Assignee: HunterXHunter > Support optional table synchronization to hive. > --- > > Key: HUDI-4961 > URL: https://issues.apache.org/jira/browse/HUDI-4961 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > Labels: pull-request-available > > The current default is to synchronize tables RT and RO,named with the suffix > _rt and _ro,but sometimes the user only needs one RO or RT table and the > table name does not need the suffix,An optional parameter is added to allow > the user to synchronize only one table, and the table name is not suffixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4945) Add a test case for batch clean.
[ https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4945: Description: (was: h1. Add a test case for batch clean.) > Add a test case for batch clean. > > > Key: HUDI-4945 > URL: https://issues.apache.org/jira/browse/HUDI-4945 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4945) Add a test case for batch clean.
[ https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4945: Description: h1. Add a test case for batch clean. (was: Support to trigger the clean in the flink batch mode.) > Add a test case for batch clean. > > > Key: HUDI-4945 > URL: https://issues.apache.org/jira/browse/HUDI-4945 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > > h1. Add a test case for batch clean. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4945) Add a test case for batch clean.
[ https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4945: Summary: Add a test case for batch clean. (was: Support to trigger the clean in the flink batch mode.) > Add a test case for batch clean. > > > Key: HUDI-4945 > URL: https://issues.apache.org/jira/browse/HUDI-4945 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > > Support to trigger the clean in the flink batch mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4961) Support optional table synchronization to hive.
[ https://issues.apache.org/jira/browse/HUDI-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4961: Component/s: hive > Support optional table synchronization to hive. > --- > > Key: HUDI-4961 > URL: https://issues.apache.org/jira/browse/HUDI-4961 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: HunterXHunter >Priority: Major > > The current default is to synchronize tables RT and RO,named with the suffix > _rt and _ro,but sometimes the user only needs one RO or RT table and the > table name does not need the suffix,An optional parameter is added to allow > the user to synchronize only one table, and the table name is not suffixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4945) Support to trigger the clean in the flink batch mode.
[ https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4945: Component/s: flink > Support to trigger the clean in the flink batch mode. > - > > Key: HUDI-4945 > URL: https://issues.apache.org/jira/browse/HUDI-4945 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > > Support to trigger the clean in the flink batch mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4961) Support optional table synchronization to hive.
HunterXHunter created HUDI-4961: --- Summary: Support optional table synchronization to hive. Key: HUDI-4961 URL: https://issues.apache.org/jira/browse/HUDI-4961 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter The current default is to synchronize tables RT and RO,named with the suffix _rt and _ro,but sometimes the user only needs one RO or RT table and the table name does not need the suffix,An optional parameter is added to allow the user to synchronize only one table, and the table name is not suffixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4945) Support to trigger the clean in the flink batch mode.
HunterXHunter created HUDI-4945: --- Summary: Support to trigger the clean in the flink batch mode. Key: HUDI-4945 URL: https://issues.apache.org/jira/browse/HUDI-4945 Project: Apache Hudi Issue Type: Improvement Reporter: HunterXHunter Support to trigger the clean in the flink batch mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-4405) Support to trigger the clean in the flink batch mode.
[ https://issues.apache.org/jira/browse/HUDI-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter resolved HUDI-4405. - > Support to trigger the clean in the flink batch mode. > - > > Key: HUDI-4405 > URL: https://issues.apache.org/jira/browse/HUDI-4405 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > Support to trigger the clean in the flink batch mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4405) Support to trigger the clean in the flink batch mode.
[ https://issues.apache.org/jira/browse/HUDI-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4405: --- Assignee: HunterXHunter > Support to trigger the clean in the flink batch mode. > - > > Key: HUDI-4405 > URL: https://issues.apache.org/jira/browse/HUDI-4405 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > Support to trigger the clean in the flink batch mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4746) Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
[ https://issues.apache.org/jira/browse/HUDI-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4746: --- Assignee: (was: HunterXHunter) > Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction > > > Key: HUDI-4746 > URL: https://issues.apache.org/jira/browse/HUDI-4746 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Priority: Major > > ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction > > [aug 25: > https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44|https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44] > aug 25: > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44] > > > {code:java} > 2022-08-25T10:48:57.4158416Z [ERROR] > testWriteMergeOnReadWithCompaction{String}[2] Time elapsed: 22.789 s <<< > FAILURE! > 2022-08-25T10:48:57.4159313Z org.opentest4j.AssertionFailedError: expected: > but was: > 2022-08-25T10:48:57.4160369Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252) > 2022-08-25T10:48:57.4161127Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182) > 2022-08-25T10:48:57.4161883Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156) > 2022-08-25T10:48:57.4166292Z > 2022-08-25T10:48:58.0221317Z [INFO] > 2022-08-25T10:48:58.0222033Z [INFO] Results: > 2022-08-25T10:48:58.0228955Z [INFO] > 2022-08-25T10:48:58.0229555Z [ERROR] Failures: > 2022-08-25T10:48:58.0231472Z [ERROR] > ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252 > expected: but was: > 2022-08-25T10:48:58.0232489Z [INFO] > 2022-08-25T10:48:58.0233058Z [ERROR] Tests run: 114, Failures: 1, Errors: 0, > Skipped: 0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4743) Flaky: ITTestHoodieDataSource crashes
[ https://issues.apache.org/jira/browse/HUDI-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598352#comment-17598352 ] HunterXHunter commented on HUDI-4743: - Can we use `{{{}-Xmx2024m -XX:MaxPermSize=256m` instead of `@\{argLine}` to solve this problem?{}}} > Flaky: ITTestHoodieDataSource crashes > - > > Key: HUDI-4743 > URL: https://issues.apache.org/jira/browse/HUDI-4743 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Priority: Major > > ITTestHoodieDataSource crashed > > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/11033/logs/39] > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10994/logs/31] > > {code:java} > 2022-08-30T06:18:11.2568236Z [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-failsafe-plugin:2.22.2:verify > (verify-integration-test) on project hudi-flink: There are test > failures.2022-08-30T06:18:11.2571112Z [ERROR] 2022-08-30T06:18:11.2573983Z > [ERROR] Please refer to > /home/vsts/work/1/s/hudi-flink-datasource/hudi-flink/target/failsafe-reports > for the individual test results.2022-08-30T06:18:11.2577098Z [ERROR] Please > refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and > [date].dumpstream.2022-08-30T06:18:11.2579886Z [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM > terminated without properly saying goodbye. VM crash or System.exit > called?2022-08-30T06:18:11.2584190Z [ERROR] Command was /bin/sh -c cd > /home/vsts/work/1/s/hudi-flink-datasource/hudi-flink && > /usr/lib/jvm/temurin-8-jdk-amd64/jre/bin/java -Xmx2g > org.apache.maven.surefire.booter.ForkedBooter > /home/vsts/work/1/s/hudi-flink-datasource/hudi-flink/target/surefire > 2022-08-30T05-30-42_232-jvmRun1 surefire724291575167156tmp > surefire_23336829373297076850tmp2022-08-30T06:18:11.2588719Z [ERROR] Error > occurred in starting fork, check output in log2022-08-30T06:18:11.2593048Z > [ERROR] Process Exit Code: 2392022-08-30T06:18:11.2596938Z [ERROR] Crashed > tests:2022-08-30T06:18:11.2600707Z [ERROR] > org.apache.hudi.table.ITTestHoodieDataSource2022-08-30T06:18:11.2604657Z > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)2022-08-30T06:18:11.2608953Z > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:282)2022-08-30T06:18:11.2612284Z > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)2022-08-30T06:18:11.2612983Z > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)2022-08-30T06:18:11.2613739Z > [ERROR]at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)2022-08-30T06:18:11.2614505Z > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)2022-08-30T06:18:11.2615248Z > [ERROR] at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)2022-08-30T06:18:11.2615951Z > [ERROR]at > org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:370)2022-08-30T06:18:11.2616777Z > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:351)2022-08-30T06:18:11.2617439Z > [ERROR]at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:215)2022-08-30T06:18:11.2618097Z > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:171)2022-08-30T06:18:11.2618744Z > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:163)2022-08-30T06:18:11.2619458Z > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)2022-08-30T06:18:11.2620222Z > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)2022-08-30T06:18:11.2624164Z > [ERROR] at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)2022-08-30T06:18:11.2624944Z > [ERROR]at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)2022-08-30T06:18:11.2625581Z > [ERROR] at > org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:294)2022-08-30T06:18:11.2626157Z > [ERROR] at > org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)2022-08-30T06:18:11.2626724Z > [ERROR] at >
[jira] [Assigned] (HUDI-4746) Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
[ https://issues.apache.org/jira/browse/HUDI-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4746: --- Assignee: HunterXHunter > Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction > > > Key: HUDI-4746 > URL: https://issues.apache.org/jira/browse/HUDI-4746 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: HunterXHunter >Priority: Major > > ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction > > [aug 25: > https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44|https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44] > aug 25: > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44] > > > {code:java} > 2022-08-25T10:48:57.4158416Z [ERROR] > testWriteMergeOnReadWithCompaction{String}[2] Time elapsed: 22.789 s <<< > FAILURE! > 2022-08-25T10:48:57.4159313Z org.opentest4j.AssertionFailedError: expected: > but was: > 2022-08-25T10:48:57.4160369Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252) > 2022-08-25T10:48:57.4161127Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182) > 2022-08-25T10:48:57.4161883Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156) > 2022-08-25T10:48:57.4166292Z > 2022-08-25T10:48:58.0221317Z [INFO] > 2022-08-25T10:48:58.0222033Z [INFO] Results: > 2022-08-25T10:48:58.0228955Z [INFO] > 2022-08-25T10:48:58.0229555Z [ERROR] Failures: > 2022-08-25T10:48:58.0231472Z [ERROR] > ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252 > expected: but was: > 2022-08-25T10:48:58.0232489Z [INFO] > 2022-08-25T10:48:58.0233058Z [ERROR] Tests run: 114, Failures: 1, Errors: 0, > Skipped: 0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4745) Fix flaky: ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering
[ https://issues.apache.org/jira/browse/HUDI-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4745: --- Assignee: HunterXHunter > Fix flaky: ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering > --- > > Key: HUDI-4745 > URL: https://issues.apache.org/jira/browse/HUDI-4745 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: HunterXHunter >Priority: Major > > ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering > > aug 30: > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/11043/logs/40] > [aug 25: > https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44|https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44] > > > {code:java} > 2022-08-30T14:09:34.2164385Z [INFO] Running > org.apache.hudi.sink.ITTestDataStreamWrite2022-08-30T14:11:55.7830524Z > [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 141.51 s <<< FAILURE! - in > org.apache.hudi.sink.ITTestDataStreamWrite2022-08-30T14:11:55.7832415Z > [ERROR] testWriteCopyOnWriteWithClustering Time elapsed: 18.72 s <<< > FAILURE!2022-08-30T14:11:55.7843136Z org.opentest4j.AssertionFailedError: > expected: but was: 2022-08-30T14:11:55.7844163Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodieWithCluster(ITTestDataStreamWrite.java:298)2022-08-30T14:11:55.7845258Z > at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering(ITTestDataStreamWrite.java:166)2022-08-30T14:11:55.7845819Z > 2022-08-30T14:11:56.4989181Z [INFO] 2022-08-30T14:11:56.4990015Z [INFO] > Results:2022-08-30T14:11:56.4990891Z [INFO] 2022-08-30T14:11:56.4991209Z > [ERROR] Failures: 2022-08-30T14:11:56.4992974Z [ERROR] > ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering:166->testWriteToHoodieWithCluster:298 > expected: but was: 2022-08-30T14:11:56.5051270Z [INFO] > 2022-08-30T14:11:56.5052102Z [ERROR] Tests run: 114, Failures: 1, Errors: 0, > Skipped: 02022-08-30T14:11:56.5052705Z [INFO] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4726) Incremental input splits result is not as expected when flink incremental read.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: How to reproduct. {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. was: How to reproduct. {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed. > Incremental input splits result is not as expected when flink incremental > read. > --- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > How to reproduct. > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1',
[jira] [Updated] (HUDI-4726) Incremental input splits result is not as expected when flink incremental read.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: How to reproduct. {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed. was: When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed. {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. > Incremental input splits result is not as expected when flink incremental > read. > --- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > How to reproduct. > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) >
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed. {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. was: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) >
[jira] [Updated] (HUDI-4726) Incremental input splits result is not as expected when flink incremental read.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Summary: Incremental input splits result is not as expected when flink incremental read. (was: When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.) > Incremental input splits result is not as expected when flink incremental > read. > --- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should > be like exp3. > 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should > be (true,+I[id1, t3, par1]). > 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) > this is right{code} > The root of the problem is `IncrementalInputSplits.inputSplits`, because > `startCommit` is out of range, `fullTableScan` is `true`, finally, the file > read is t6..parquet instead of t3.parquet. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Issue Type: Bug (was: Improvement) > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should > be like exp3. > 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should > be (true,+I[id1, t3, par1]). > 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) > this is right{code} > The root of the problem is `IncrementalInputSplits.inputSplits`, because > `startCommit` is out of range, `fullTableScan` is `true`, finally, the file > read is t6..parquet instead of t3.parquet. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should be (true,+I[id1, t3, par1]). 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) this is right{code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. was: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. {code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0',
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. {code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. was: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. -- The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet.{code} > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should > be like exp3. > {code} > The root of the problem is `IncrementalInputSplits.inputSplits`, because > `startCommit` is out of range,
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. -- The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet.{code} > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should > be like exp3. > -- > The root of the problem is `IncrementalInputSplits.inputSplits`, because > `startCommit` is out of range, `fullTableScan` is `true`, finally, the file > read is t6..parquet instead of t3.parquet.{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4600) Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HUDI-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4600: --- Assignee: HunterXHunter > Hive synchronization failure : Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > -- > > Key: HUDI-4600 > URL: https://issues.apache.org/jira/browse/HUDI-4600 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Blocker > > > {code:java} > 10:32:28.039 [pool-9-thread-1] ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler - Retrying HMSHandler > after 2000 ms (attempt 1 of 10) with error: > javax.jdo.JDOFatalInternalException: Unexpected exception caught. > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550) > at > org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:659) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248) > at >