from:"HunterXHunter \(Jira\)"

[jira] [Created] (HUDI-7018) Test case for spark catalog refresh table

2023-10-31 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-7018:
---

 Summary: Test case for spark catalog refresh table
 Key: HUDI-7018
 URL: https://issues.apache.org/jira/browse/HUDI-7018
 Project: Apache Hudi
  Issue Type: Test
Reporter: HunterXHunter






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6032) Fix Read metafield '_hoodie_commit_time' multiple times from the parquet file when using flink

2023-04-05 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-6032:

Summary: Fix Read metafield '_hoodie_commit_time'  multiple times from the 
parquet file when using flink  (was: Fix multiple reads metafield 
'_hoodie_commit_time' use flink.)

> Fix Read metafield '_hoodie_commit_time'  multiple times from the parquet 
> file when using flink
> ---
>
> Key: HUDI-6032
> URL: https://issues.apache.org/jira/browse/HUDI-6032
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
>
> Flink cant read metafield '_hoodie_commit_time' from parquet file.
> [https://github.com/apache/hudi/issues/8371]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-6032) Fix Read metafield '_hoodie_commit_time' multiple times from the parquet file when using flink

2023-04-05 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-6032:
---

Assignee: HunterXHunter

> Fix Read metafield '_hoodie_commit_time'  multiple times from the parquet 
> file when using flink
> ---
>
> Key: HUDI-6032
> URL: https://issues.apache.org/jira/browse/HUDI-6032
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> Flink cant read metafield '_hoodie_commit_time' from parquet file.
> [https://github.com/apache/hudi/issues/8371]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6032) Fix multiple reads metafield '_hoodie_commit_time' use flink.

2023-04-05 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-6032:

Summary: Fix multiple reads metafield '_hoodie_commit_time' use flink.  
(was: Flink cant read metafield '_hoodie_commit_time' from parquet file)

> Fix multiple reads metafield '_hoodie_commit_time' use flink.
> -
>
> Key: HUDI-6032
> URL: https://issues.apache.org/jira/browse/HUDI-6032
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
>
> https://github.com/apache/hudi/issues/8371



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6032) Fix multiple reads metafield '_hoodie_commit_time' use flink.

2023-04-05 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-6032:

Description: 
Flink cant read metafield '_hoodie_commit_time' from parquet file.

[https://github.com/apache/hudi/issues/8371]

  was:https://github.com/apache/hudi/issues/8371


> Fix multiple reads metafield '_hoodie_commit_time' use flink.
> -
>
> Key: HUDI-6032
> URL: https://issues.apache.org/jira/browse/HUDI-6032
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
>
> Flink cant read metafield '_hoodie_commit_time' from parquet file.
> [https://github.com/apache/hudi/issues/8371]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-6032) Flink cant read metafield '_hoodie_commit_time' from parquet file

2023-04-04 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-6032:
---

 Summary: Flink cant read metafield '_hoodie_commit_time' from 
parquet file
 Key: HUDI-6032
 URL: https://issues.apache.org/jira/browse/HUDI-6032
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter


https://github.com/apache/hudi/issues/8371



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5996) We should verify the consistency of bucket num at job startup.

2023-03-29 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5996:
---

 Summary: We should verify the consistency of bucket num at job 
startup.
 Key: HUDI-5996
 URL: https://issues.apache.org/jira/browse/HUDI-5996
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


Users may sometimes modify the bucket num, and the inconsistency of the bucket 
num will lead to data duplication and make it unavailability. Maybe there are 
some other parameters that should also be checked before the job starts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties

2023-01-20 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5584:
---

Assignee: HunterXHunter

> When the table to be synchronized already exists in hive, need to update 
> serde/table properties
> ---
>
> Key: HUDI-5584
> URL: https://issues.apache.org/jira/browse/HUDI-5584
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only 
> one table to be synchronized to hive without suffix _ro.
> But sometimes tables have been created in hive early,
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
>   id int,
>  ts int)
>  using hudi
>  tblproperties (
>   type = 'mor',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584'  {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
>   `id` int,
>   `ts` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='file:///tmp/HUDI_5584')
> STORED AS INPUTFORMAT
>   'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'file:/tmp/HUDI_5584'
> TBLPROPERTIES (
>   'hoodie.datasource.hive_sync.enable'='true',
>   'hoodie.datasource.hive_sync.table.strategy'='ro',
>   'preCombineField'='ts',
>   'primaryKey'='id',
>   'spark.sql.create.version'='3.3.1',
>   'spark.sql.sources.provider'='hudi',
>   'spark.sql.sources.schema.numParts'='1',
>   'spark.sql.sources.schema.part.0'='xx'
>   'transient_lastDdlTime'='1674108302',
>   'type'='mor') {code}
> *The table like a realtime table.*
>  
> When we finish writing data and synchronize ro table , because the table 
> already exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.
> This causes the type of the table is not match as expect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5591) HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter HIVE_SYNC_TABLE_STRATEGY

2023-01-20 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5591:
---

Assignee: HunterXHunter

> HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter 
> HIVE_SYNC_TABLE_STRATEGY
> ---
>
> Key: HUDI-5591
> URL: https://issues.apache.org/jira/browse/HUDI-5591
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5591) HoodieSparkSqlWriter#getHiveTableNames needs to consider parameter HIVE_SYNC_TABLE_STRATEGY

2023-01-19 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5591:
---

 Summary: HoodieSparkSqlWriter#getHiveTableNames needs to consider 
parameter HIVE_SYNC_TABLE_STRATEGY
 Key: HUDI-5591
 URL: https://issues.apache.org/jira/browse/HUDI-5591
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties

2023-01-18 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5584:

Description: 
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one 
table to be synchronized to hive without suffix _ro.

But sometimes tables have been created in hive early,

like:
{code:java}
create table hive.test.HUDI_5584 (
  id int,
 ts int)
 using hudi
 tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584'  {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `ts` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/HUDI_5584'
TBLPROPERTIES (
  'hoodie.datasource.hive_sync.enable'='true',
  'hoodie.datasource.hive_sync.table.strategy'='ro',
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.3.1',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='xx'
  'transient_lastDdlTime'='1674108302',
  'type'='mor') {code}
*The table like a realtime table.*

 

When we finish writing data and synchronize ro table , because the table 
already exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.

This causes the type of the table is not match expect.

 

 

  was:
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one 
table to be synchronized to hive without suffix _ro.

But sometimes the table may have been created in hive early.

like:
{code:java}
create table hive.test.HUDI_5584 (
  id int,
 ts int)
 using hudi
 tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584'  {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `ts` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/HUDI_5584'
TBLPROPERTIES (
  'hoodie.datasource.hive_sync.enable'='true',
  'hoodie.datasource.hive_sync.table.strategy'='ro',
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.3.1',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='xx'
  'transient_lastDdlTime'='1674108302',
  'type'='mor') {code}
the table like a realtime table.

When we finish writing data and synchronize tables, because the table already 
exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.

This causes the type of the table to be unexpected.

 

 


> When the table to be synchronized already exists in hive, need to update 
> serde/table properties
> ---
>
> Key: HUDI-5584
> URL: https://issues.apache.org/jira/browse/HUDI-5584
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only 
> one table to be synchronized to hive without suffix _ro.
> But sometimes tables have been created in hive early,
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
>   id int,
>  ts int)
>  using hudi
>  tblproperties (
>   type = 'mor',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584'  {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
>   `id` int,
>   `ts` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>

[jira] [Updated] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties

2023-01-18 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5584:

Description: 
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one 
table to be synchronized to hive without suffix _ro.

But sometimes tables have been created in hive early,

like:
{code:java}
create table hive.test.HUDI_5584 (
  id int,
 ts int)
 using hudi
 tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584'  {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `ts` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/HUDI_5584'
TBLPROPERTIES (
  'hoodie.datasource.hive_sync.enable'='true',
  'hoodie.datasource.hive_sync.table.strategy'='ro',
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.3.1',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='xx'
  'transient_lastDdlTime'='1674108302',
  'type'='mor') {code}
*The table like a realtime table.*

 

When we finish writing data and synchronize ro table , because the table 
already exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.

This causes the type of the table is not match as expect.

 

 

  was:
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one 
table to be synchronized to hive without suffix _ro.

But sometimes tables have been created in hive early,

like:
{code:java}
create table hive.test.HUDI_5584 (
  id int,
 ts int)
 using hudi
 tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584'  {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `ts` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/HUDI_5584'
TBLPROPERTIES (
  'hoodie.datasource.hive_sync.enable'='true',
  'hoodie.datasource.hive_sync.table.strategy'='ro',
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.3.1',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='xx'
  'transient_lastDdlTime'='1674108302',
  'type'='mor') {code}
*The table like a realtime table.*

 

When we finish writing data and synchronize ro table , because the table 
already exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.

This causes the type of the table is not match expect.

 

 


> When the table to be synchronized already exists in hive, need to update 
> serde/table properties
> ---
>
> Key: HUDI-5584
> URL: https://issues.apache.org/jira/browse/HUDI-5584
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only 
> one table to be synchronized to hive without suffix _ro.
> But sometimes tables have been created in hive early,
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
>   id int,
>  ts int)
>  using hudi
>  tblproperties (
>   type = 'mor',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584'  {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
>   `id` int,
>   `ts` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES

[jira] [Updated] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties

2023-01-18 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5584:

Description: 
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one 
table to be synchronized to hive without suffix _ro.

But sometimes the table may have been created in hive early.

like:
{code:java}
create table hive.test.HUDI_5584 (
  id int,
 ts int)
 using hudi
 tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584'  {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `ts` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/HUDI_5584'
TBLPROPERTIES (
  'hoodie.datasource.hive_sync.enable'='true',
  'hoodie.datasource.hive_sync.table.strategy'='ro',
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.3.1',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='xx'
  'transient_lastDdlTime'='1674108302',
  'type'='mor') {code}
the table like a realtime table.

When we finish writing data and synchronize tables, because the table already 
exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.

This causes the type of the table to be unexpected.

 

 

> When the table to be synchronized already exists in hive, need to update 
> serde/table properties
> ---
>
> Key: HUDI-5584
> URL: https://issues.apache.org/jira/browse/HUDI-5584
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only 
> one table to be synchronized to hive without suffix _ro.
> But sometimes the table may have been created in hive early.
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
>   id int,
>  ts int)
>  using hudi
>  tblproperties (
>   type = 'mor',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584'  {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
>   `id` int,
>   `ts` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='file:///tmp/HUDI_5584')
> STORED AS INPUTFORMAT
>   'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'file:/tmp/HUDI_5584'
> TBLPROPERTIES (
>   'hoodie.datasource.hive_sync.enable'='true',
>   'hoodie.datasource.hive_sync.table.strategy'='ro',
>   'preCombineField'='ts',
>   'primaryKey'='id',
>   'spark.sql.create.version'='3.3.1',
>   'spark.sql.sources.provider'='hudi',
>   'spark.sql.sources.schema.numParts'='1',
>   'spark.sql.sources.schema.part.0'='xx'
>   'transient_lastDdlTime'='1674108302',
>   'type'='mor') {code}
> the table like a realtime table.
> When we finish writing data and synchronize tables, because the table already 
> exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.
> This causes the type of the table to be unexpected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties

2023-01-18 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5584:
---

 Summary: When the table to be synchronized already exists in hive, 
need to update serde/table properties
 Key: HUDI-5584
 URL: https://issues.apache.org/jira/browse/HUDI-5584
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5580) flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint

2023-01-18 Thread HunterXHunter (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678448#comment-17678448
 ] 

HunterXHunter commented on HUDI-5580:
-

[~Bone An] Can you take a look at this test?

> flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint
> 
>
> Key: HUDI-5580
> URL: https://issues.apache.org/jira/browse/HUDI-5580
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: HunterXHunter
>Priority: Major
>
> {code:java}
> 2023-01-18T15:37:37.0801896Z [ERROR]   
> TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321
>  expected: <0> but was: <1>
>  {code}
> https://github.com/apache/hudi/actions/runs/3949925387/jobs/6761767342



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5580) flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint

2023-01-18 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5580:

Description: 
{code:java}
2023-01-18T15:37:37.0801896Z [ERROR]   
TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321
 expected: <0> but was: <1>
 {code}
https://github.com/apache/hudi/actions/runs/3949925387/jobs/6761767342

  was:
{code:java}
2023-01-18T15:37:37.0801896Z [ERROR]   
TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321
 expected: <0> but was: <1>
 {code}
https://pipelines.actions.githubusercontent.com/serviceHosts/624e4e79-816a-4c2a-80fd-f50e8b678dc8/_apis/pipelines/1/runs/25269/signedlogcontent/16?urlExpires=2023-01-19T01%3A31%3A15.1330329Z=HMACV1=lCtYU0ZRBi2xhWmBZP9OH42DNh7KGz7Z8x79IKhURZE%3D


> flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint
> 
>
> Key: HUDI-5580
> URL: https://issues.apache.org/jira/browse/HUDI-5580
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: HunterXHunter
>Priority: Major
>
> {code:java}
> 2023-01-18T15:37:37.0801896Z [ERROR]   
> TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321
>  expected: <0> but was: <1>
>  {code}
> https://github.com/apache/hudi/actions/runs/3949925387/jobs/6761767342



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5580) flaky test TestStructuredStreaming#testStructuredStreamingWithCheckpoint

2023-01-18 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5580:
---

 Summary: flaky test 
TestStructuredStreaming#testStructuredStreamingWithCheckpoint
 Key: HUDI-5580
 URL: https://issues.apache.org/jira/browse/HUDI-5580
 Project: Apache Hudi
  Issue Type: Test
Reporter: HunterXHunter


{code:java}
2023-01-18T15:37:37.0801896Z [ERROR]   
TestStructuredStreaming.testStructuredStreamingWithCheckpoint:308->assertLatestCheckpointInfoMatched:321
 expected: <0> but was: <1>
 {code}
https://pipelines.actions.githubusercontent.com/serviceHosts/624e4e79-816a-4c2a-80fd-f50e8b678dc8/_apis/pipelines/1/runs/25269/signedlogcontent/16?urlExpires=2023-01-19T01%3A31%3A15.1330329Z=HMACV1=lCtYU0ZRBi2xhWmBZP9OH42DNh7KGz7Z8x79IKhURZE%3D



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5572:
---

Assignee: HunterXHunter

> Flink write need to skip check the compatibility of Schema#name
> ---
>
> Key: HUDI-5572
> URL: https://issues.apache.org/jira/browse/HUDI-5572
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
> Attachments: image-2023-01-18-11-51-12-914.png
>
>
> When we use spark to initialize the hudi table, 
> .hoodie#hoodie.properties#hoodie.table.create.schema will carry information 
> 'name=$tablename_record' and 'namespace'='hoodie.$tablename'.
> But Flink will not carry this information when writing,
> so there will be incompatibilities when doing `validateSchema`.
> Here I think we should skip check the compatibility of Schema#name when using 
> flink write.
> !image-2023-01-18-11-51-12-914.png|width=851,height=399!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5572:

Description: 
When we use spark to initialize the hudi table, 
.hoodie#hoodie.properties#hoodie.table.create.schema will carry information 
'name=$tablename_record' and 'namespace'='hoodie.$tablename'.

But Flink will not carry this information when writing,

so there will be incompatibilities when doing `validateSchema`.

Here I think we should skip check the compatibility of Schema#name when using 
flink write.

!image-2023-01-18-11-51-12-914.png|width=851,height=399!

  was:
When we use spark to initialize the hudi table, 
.hoodie#hoodie.properties#hoodie.table.create.schema will carry information 
'name=$tablename_record' and 'namespace'='hoodie.$tablename'.

But Flink will not carry this information when writing,

so there will be incompatibilities when doing `validateSchema`.

Here I think we should skip check the compatibility of Schema#name when using 
flink write.

 


> Flink write need to skip check the compatibility of Schema#name
> ---
>
> Key: HUDI-5572
> URL: https://issues.apache.org/jira/browse/HUDI-5572
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2023-01-18-11-51-12-914.png
>
>
> When we use spark to initialize the hudi table, 
> .hoodie#hoodie.properties#hoodie.table.create.schema will carry information 
> 'name=$tablename_record' and 'namespace'='hoodie.$tablename'.
> But Flink will not carry this information when writing,
> so there will be incompatibilities when doing `validateSchema`.
> Here I think we should skip check the compatibility of Schema#name when using 
> flink write.
> !image-2023-01-18-11-51-12-914.png|width=851,height=399!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5572:

Attachment: image-2023-01-18-11-51-12-914.png

> Flink write need to skip check the compatibility of Schema#name
> ---
>
> Key: HUDI-5572
> URL: https://issues.apache.org/jira/browse/HUDI-5572
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2023-01-18-11-51-12-914.png
>
>
> When we use spark to initialize the hudi table, 
> .hoodie#hoodie.properties#hoodie.table.create.schema will carry information 
> 'name=$tablename_record' and 'namespace'='hoodie.$tablename'.
> But Flink will not carry this information when writing,
> so there will be incompatibilities when doing `validateSchema`.
> Here I think we should skip check the compatibility of Schema#name when using 
> flink write.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5572:
---

 Summary: Flink write need to skip check the compatibility of 
Schema#name
 Key: HUDI-5572
 URL: https://issues.apache.org/jira/browse/HUDI-5572
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter


When we use spark to initialize the hudi table, 
.hoodie#hoodie.properties#hoodie.table.create.schema will carry information 
'name=$tablename_record' and 'namespace'='hoodie.$tablename'.

But Flink will not carry this information when writing,

so there will be incompatibilities when doing `validateSchema`.

Here I think we should skip check the compatibility of Schema#name when using 
flink write.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5554) Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY

2023-01-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5554:

Summary: Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for 
parameter HIVE_SYNC_TABLE_STRATEGY  (was: add UT 
TestHiveSyncTool.testSyncMergeOnReadWithStrategy for parameter 
HIVE_SYNC_TABLE_STRATEGY)

> Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter 
> HIVE_SYNC_TABLE_STRATEGY
> --
>
> Key: HUDI-5554
> URL: https://issues.apache.org/jira/browse/HUDI-5554
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: HunterXHunter
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5554) add UT TestHiveSyncTool.testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY

2023-01-13 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5554:
---

 Summary: add UT TestHiveSyncTool.testSyncMergeOnReadWithStrategy 
for parameter HIVE_SYNC_TABLE_STRATEGY
 Key: HUDI-5554
 URL: https://issues.apache.org/jira/browse/HUDI-5554
 Project: Apache Hudi
  Issue Type: Test
Reporter: HunterXHunter






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5554) Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter HIVE_SYNC_TABLE_STRATEGY

2023-01-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5554:
---

Assignee: HunterXHunter

> Add UT TestHiveSyncTool#testSyncMergeOnReadWithStrategy for parameter 
> HIVE_SYNC_TABLE_STRATEGY
> --
>
> Key: HUDI-5554
> URL: https://issues.apache.org/jira/browse/HUDI-5554
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-5528) HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY

2023-01-12 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter resolved HUDI-5528.
-

> HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY
> ---
>
> Key: HUDI-5528
> URL: https://issues.apache.org/jira/browse/HUDI-5528
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> `HiveSyncProcedure & HiveSyncTool`  also needs to add 
> `HIVE_SYNC_TABLE_STRATEGY`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5528) HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY

2023-01-11 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5528:
---

Assignee: HunterXHunter

> HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY
> ---
>
> Key: HUDI-5528
> URL: https://issues.apache.org/jira/browse/HUDI-5528
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> `HiveSyncProcedure & HiveSyncTool`  also needs to add 
> `HIVE_SYNC_TABLE_STRATEGY`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5528) HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY

2023-01-11 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5528:

Summary: HiveSyncProcedure & HiveSyncTool also needs to add 
HIVE_SYNC_TABLE_STRATEGY  (was: Support optional table synchronization to hive.)

> HiveSyncProcedure & HiveSyncTool also needs to add HIVE_SYNC_TABLE_STRATEGY
> ---
>
> Key: HUDI-5528
> URL: https://issues.apache.org/jira/browse/HUDI-5528
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Priority: Major
>
> `HiveSyncProcedure & HiveSyncTool`  also needs to add 
> `HIVE_SYNC_TABLE_STRATEGY`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5528) Support optional table synchronization to hive.

2023-01-11 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5528:

Description: `HiveSyncProcedure & HiveSyncTool`  also needs to add 
`HIVE_SYNC_TABLE_STRATEGY`  (was: `HiveSyncProcedure & HiveSyncTool & 
HoodieDeltaStreamer`  also needs to add `HIVE_SYNC_TABLE_STRATEGY`)

> Support optional table synchronization to hive.
> ---
>
> Key: HUDI-5528
> URL: https://issues.apache.org/jira/browse/HUDI-5528
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Priority: Major
>
> `HiveSyncProcedure & HiveSyncTool`  also needs to add 
> `HIVE_SYNC_TABLE_STRATEGY`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5528) Support optional table synchronization to hive.

2023-01-10 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5528:
---

 Summary: Support optional table synchronization to hive.
 Key: HUDI-5528
 URL: https://issues.apache.org/jira/browse/HUDI-5528
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


`HiveSyncProcedure & HiveSyncTool & HoodieDeltaStreamer`  also needs to add 
`HIVE_SYNC_TABLE_STRATEGY`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit

2023-01-05 Thread HunterXHunter (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655214#comment-17655214
 ] 

HunterXHunter commented on HUDI-5505:
-

[~danny0405] Do you have time to confirm this issue?

> Compaction NUM_COMMITS policy should only judge completed deltacommit
> -
>
> Key: HUDI-5505
> URL: https://issues.apache.org/jira/browse/HUDI-5505
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2023-01-05-13-10-57-918.png
>
>
> `compaction.delta_commits =1`
>  
> {code:java}
> 20230105115229301.deltacommit
> 20230105115229301.deltacommit.inflight
> 20230105115229301.deltacommit.requested
> 20230105115253118.commit
> 20230105115253118.compaction.inflight
> 20230105115253118.compaction.requested
> 20230105115330994.deltacommit.inflight
> 20230105115330994.deltacommit.requested{code}
> The return result of `ScheduleCompactionActionExecutor.needCompact ` is 
> `true`, 
> This should not be expected.
>  
> And In the `Occ` or `lazy clean` mode，this will cause compaction trigger 
> early.
> `compaction.delta_commits =3`
>  
> {code:java}
> 20230105125650541.deltacommit.inflight
> 20230105125650541.deltacommit.requested
> 20230105125715081.deltacommit
> 20230105125715081.deltacommit.inflight
> 20230105125715081.deltacommit.requested
> 20230105130018070.deltacommit.inflight
> 20230105130018070.deltacommit.requested {code}
>  
> And compaction will be trigger, this should not be expected.
> !image-2023-01-05-13-10-57-918.png|width=699,height=158!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit

2023-01-05 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5505:

Issue Type: Bug  (was: Improvement)

> Compaction NUM_COMMITS policy should only judge completed deltacommit
> -
>
> Key: HUDI-5505
> URL: https://issues.apache.org/jira/browse/HUDI-5505
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2023-01-05-13-10-57-918.png
>
>
> `compaction.delta_commits =1`
>  
> {code:java}
> 20230105115229301.deltacommit
> 20230105115229301.deltacommit.inflight
> 20230105115229301.deltacommit.requested
> 20230105115253118.commit
> 20230105115253118.compaction.inflight
> 20230105115253118.compaction.requested
> 20230105115330994.deltacommit.inflight
> 20230105115330994.deltacommit.requested{code}
> The return result of `ScheduleCompactionActionExecutor.needCompact ` is 
> `true`, 
> This should not be expected.
>  
> And In the `Occ` or `lazy clean` mode，this will cause compaction trigger 
> early.
> `compaction.delta_commits =3`
>  
> {code:java}
> 20230105125650541.deltacommit.inflight
> 20230105125650541.deltacommit.requested
> 20230105125715081.deltacommit
> 20230105125715081.deltacommit.inflight
> 20230105125715081.deltacommit.requested
> 20230105130018070.deltacommit.inflight
> 20230105130018070.deltacommit.requested {code}
>  
> And compaction will be trigger, this should not be expected.
> !image-2023-01-05-13-10-57-918.png|width=699,height=158!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit

2023-01-04 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5505:
---

 Summary: Compaction NUM_COMMITS policy should only judge completed 
deltacommit
 Key: HUDI-5505
 URL: https://issues.apache.org/jira/browse/HUDI-5505
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter
 Attachments: image-2023-01-05-13-10-57-918.png

`compaction.delta_commits =1`

 
{code:java}
20230105115229301.deltacommit
20230105115229301.deltacommit.inflight
20230105115229301.deltacommit.requested
20230105115253118.commit
20230105115253118.compaction.inflight
20230105115253118.compaction.requested
20230105115330994.deltacommit.inflight
20230105115330994.deltacommit.requested{code}
The return result of `ScheduleCompactionActionExecutor.needCompact ` is `true`, 

This should not be expected.

 

And In the `Occ` or `lazy clean` mode，this will cause compaction trigger early.

`compaction.delta_commits =3`

 
{code:java}
20230105125650541.deltacommit.inflight
20230105125650541.deltacommit.requested
20230105125715081.deltacommit
20230105125715081.deltacommit.inflight
20230105125715081.deltacommit.requested
20230105130018070.deltacommit.inflight
20230105130018070.deltacommit.requested {code}
 

And compaction will be trigger, this should not be expected.

!image-2023-01-05-13-10-57-918.png|width=699,height=158!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-5416) Skiping the lock in HoodieFlinkWriteClient#inittable

2022-12-19 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter resolved HUDI-5416.
-

> Skiping the lock in HoodieFlinkWriteClient#inittable
> 
>
> Key: HUDI-5416
> URL: https://issues.apache.org/jira/browse/HUDI-5416
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-12-19-17-44-19-289.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5416) Skiping the lock in HoodieFlinkWriteClient#inittable

2022-12-19 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5416:
---

 Summary: Skiping the lock in HoodieFlinkWriteClient#inittable
 Key: HUDI-5416
 URL: https://issues.apache.org/jira/browse/HUDI-5416
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter
 Attachments: image-2022-12-19-17-44-19-289.png





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5415) Support multi-writer for flink write.

2022-12-19 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5415:
---

 Summary: Support multi-writer for flink write.
 Key: HUDI-5415
 URL: https://issues.apache.org/jira/browse/HUDI-5415
 Project: Apache Hudi
  Issue Type: New Feature
  Components: flink
Reporter: HunterXHunter






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5416) Skiping the lock in HoodieFlinkWriteClient#inittable

2022-12-19 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5416:

Issue Type: Improvement  (was: Bug)

> Skiping the lock in HoodieFlinkWriteClient#inittable
> 
>
> Key: HUDI-5416
> URL: https://issues.apache.org/jira/browse/HUDI-5416
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-19-17-44-19-289.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-5377) Write call stack information to lock file

2022-12-18 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter resolved HUDI-5377.
-

> Write call stack information to lock file
> -
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
> {
>   "lockCreateTime" : 1671017890189,
>   "lockStackInfo" : [ "\t java.lang.Thread.getStackTrace (Thread.java:1564) 
> \n", "\t 
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.initLockInfo
>  (FileSystemBasedLockProvider.java:212) \n", "\t 
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:172) \n", "\t 
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:116) \n", "\t 
> org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:108) \n", "\t 
> org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58) \n", "\t 
> org.apache.hudi.client.BaseHoodieWriteClient.clean 
> (BaseHoodieWriteClient.java:891) \n", "\t 
> org.apache.hudi.client.BaseHoodieWriteClient.clean 
> (BaseHoodieWriteClient.java:858) \n", "\t 
> org.apache.hudi.sink.CleanFunction.lambda$open$0 (CleanFunction.java:67) \n", 
> "\t org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130) \n", "\t 
> java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149) \n", "\t 
> java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624) \n", "\t java.lang.Thread.run (Thread.java:750) 
> \n" ],
>   "lockThreadName" : "pool-8-thread-1"
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5386) Cleaning conflicts in occ mode

2022-12-14 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Summary: Cleaning conflicts in occ mode  (was: Rollback conflict in occ 
mode)

> Cleaning conflicts in occ mode
> --
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5391) Modify the default value of parameter `hoodie.write.lock.client`

2022-12-14 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5391:
---

Assignee: HunterXHunter

> Modify the default value of parameter  `hoodie.write.lock.client`
> -
>
> Key: HUDI-5391
> URL: https://issues.apache.org/jira/browse/HUDI-5391
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> In OCC mode, there are many steps to trigger lock, which will lead to 
> frequent locking and unlocking, however, the execution time of locked 
> operations is short.
> So, The default value of  
> `hoodie.write.lock.client.wait_time_ms_between_retry`  should be adjusted 
> from 10s to 2s to Reduce unnecessary waiting time,
> and the default value of `hoodie.write.lock.client.num_retries` can be 
> increased to 50.
> The above adjustments have obvious positive effects in actual use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5391) Modify the default value of parameter `hoodie.write.lock.client`

2022-12-14 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5391:
---

 Summary: Modify the default value of parameter  
`hoodie.write.lock.client`
 Key: HUDI-5391
 URL: https://issues.apache.org/jira/browse/HUDI-5391
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


In OCC mode, there are many steps to trigger lock, which will lead to frequent 
locking and unlocking, however, the execution time of locked operations is 
short.

So, The default value of  `hoodie.write.lock.client.wait_time_ms_between_retry` 
 should be adjusted from 10s to 2s to Reduce unnecessary waiting time,

and the default value of `hoodie.write.lock.client.num_retries` can be 
increased to 50.

The above adjustments have obvious positive effects in actual use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5377) Write call stack information to lock file

2022-12-14 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5377:

Description: 
When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.

like :

{
  "lockCreateTime" : 1671017890189,
  "lockStackInfo" : [ "\t java.lang.Thread.getStackTrace (Thread.java:1564) 
\n", "\t 
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.initLockInfo
 (FileSystemBasedLockProvider.java:212) \n", "\t 
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock 
(FileSystemBasedLockProvider.java:172) \n", "\t 
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
(FileSystemBasedLockProvider.java:116) \n", "\t 
org.apache.hudi.client.transaction.lock.LockManager.lock (LockManager.java:108) 
\n", "\t org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
(TransactionManager.java:58) \n", "\t 
org.apache.hudi.client.BaseHoodieWriteClient.clean 
(BaseHoodieWriteClient.java:891) \n", "\t 
org.apache.hudi.client.BaseHoodieWriteClient.clean 
(BaseHoodieWriteClient.java:858) \n", "\t 
org.apache.hudi.sink.CleanFunction.lambda$open$0 (CleanFunction.java:67) \n", 
"\t org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
(NonThrownExecutor.java:130) \n", "\t 
java.util.concurrent.ThreadPoolExecutor.runWorker 
(ThreadPoolExecutor.java:1149) \n", "\t 
java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:624) \n", "\t java.lang.Thread.run (Thread.java:750) 
\n" ],
  "lockThreadName" : "pool-8-thread-1"
}

  was:
When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.

like :

 

LOCK-TIME : 2022-12-13 11:13:15.015
LOCK-STACK-INFO :
     
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock 
(FileSystemBasedLockProvider.java:148)
     
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
(FileSystemBasedLockProvider.java:100)
     org.apache.hudi.client.transaction.lock.LockManager.lock 
(LockManager.java:102)
     org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
(TransactionManager.java:58)
     org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
(BaseHoodieWriteClient.java:1425)
     org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
(BaseHoodieWriteClient.java:1037)
     org.apache.hudi.util.CompactionUtil.scheduleCompaction 
(CompactionUtil.java:72)
     
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
 (StreamWriteOperatorCoordinator.java:250)
     org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
(NonThrownExecutor.java:130)
     java.util.concurrent.ThreadPoolExecutor.runWorker 
(ThreadPoolExecutor.java:1149)
     java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:624)
     java.lang.Thread.run (Thread.java:750)


> Write call stack information to lock file
> -
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
> {
>   "lockCreateTime" : 1671017890189,
>   "lockStackInfo" : [ "\t java.lang.Thread.getStackTrace (Thread.java:1564) 
> \n", "\t 
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.initLockInfo
>  (FileSystemBasedLockProvider.java:212) \n", "\t 
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:172) \n", "\t 
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:116) \n", "\t 
> org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:108) \n", "\t 
> org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58) \n", "\t 
> org.apache.hudi.client.BaseHoodieWriteClient.clean 
> (BaseHoodieWriteClient.java:891) \n", "\t 
> org.apache.hudi.client.BaseHoodieWriteClient.clean 
> (BaseHoodieWriteClient.java:858) \n", "\t 
> org.apache.hudi.sink.CleanFunction.lambda$open$0 (CleanFunction.java:67) \n", 
> "\t org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130) \n", "\t 
> java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149) \n", "\t 
> java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624) \n",

[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: image-2022-12-14-11-26-37-252.png

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: (was: 1670986960525.jpg)

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: image-2022-12-14-11-26-21-995.png

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: (was: WechatIMG70.jpeg)

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Description: 
{code:java}
configuration parameter: 
'hoodie.cleaner.policy.failed.writes' = 'LAZY'
'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
Because `getInstantsToRollback` is not locked, multiple writes get the same 
`instantsToRollback`, the same `instant` will be deleted multiple times and the 
same `rollback.inflight` will be created multiple times.

!image-2022-12-14-11-26-37-252.png!

!image-2022-12-14-11-26-21-995.png!

  was:
{code:java}
configuration parameter: 
'hoodie.cleaner.policy.failed.writes' = 'LAZY'
'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
Because `getInstantsToRollback` is not locked, multiple writes get the same 
`instantsToRollback`, the same `instant` will be deleted multiple times and the 
same `rollback.inflight` will be created multiple times.


> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5386:
---

 Summary: Rollback conflict in occ mode
 Key: HUDI-5386
 URL: https://issues.apache.org/jira/browse/HUDI-5386
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter
 Attachments: 1670986960525.jpg, WechatIMG70.jpeg

{code:java}
configuration parameter: 
'hoodie.cleaner.policy.failed.writes' = 'LAZY'
'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
Because `getInstantsToRollback` is not locked, multiple writes get the same 
`instantsToRollback`, the same `instant` will be deleted multiple times and the 
same `rollback.inflight` will be created multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5377) Write call stack information to lock file

2022-12-13 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5377:

Summary: Write call stack information to lock file  (was: Add call stack 
information to lock file)

> Write call stack information to lock file
> -
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-5377:
---

Assignee: HunterXHunter

> Add call stack information to lock file
> ---
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5377:

Description: 
When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.

like :

 

LOCK-TIME : 2022-12-13 11:13:15.015
LOCK-STACK-INFO :
     
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock 
(FileSystemBasedLockProvider.java:148)
     
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
(FileSystemBasedLockProvider.java:100)
     org.apache.hudi.client.transaction.lock.LockManager.lock 
(LockManager.java:102)
     org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
(TransactionManager.java:58)
     org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
(BaseHoodieWriteClient.java:1425)
     org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
(BaseHoodieWriteClient.java:1037)
     org.apache.hudi.util.CompactionUtil.scheduleCompaction 
(CompactionUtil.java:72)
     
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
 (StreamWriteOperatorCoordinator.java:250)
     org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
(NonThrownExecutor.java:130)
     java.util.concurrent.ThreadPoolExecutor.runWorker 
(ThreadPoolExecutor.java:1149)
     java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:624)
     java.lang.Thread.run (Thread.java:750)

  was:
When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.


> Add call stack information to lock file
> ---
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Priority: Major
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5377) Add call stack information to lock file

2022-12-12 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-5377:
---

 Summary: Add call stack information to lock file
 Key: HUDI-5377
 URL: https://issues.apache.org/jira/browse/HUDI-5377
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire lock',

We need to know which step caused the deadlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4961) Support optional table synchronization to hive.

2022-12-09 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4961:

Description: 
The current default is to synchronize tables RT and RO,named with the suffix 
_rt and _ro,but sometimes the user only needs one RO or RT table and the table 
name does not need the suffix,An optional parameter is added to allow the user 
to synchronize only one table, and the table name is not suffixed.

add new parameter :
{{hive_sync.table.strategy}} Available options ： RO , RT, ALL
{{hoodie.datasource.hive_sync.table.strategy}} Available options ： RO , RT, ALL

  was:The current default is to synchronize tables RT and RO,named with the 
suffix _rt and _ro,but sometimes the user only needs one RO or RT table and the 
table name does not need the suffix,An optional parameter is added to allow the 
user to synchronize only one table, and the table name is not suffixed.


> Support optional table synchronization to hive.
> ---
>
> Key: HUDI-4961
> URL: https://issues.apache.org/jira/browse/HUDI-4961
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> The current default is to synchronize tables RT and RO,named with the suffix 
> _rt and _ro,but sometimes the user only needs one RO or RT table and the 
> table name does not need the suffix,An optional parameter is added to allow 
> the user to synchronize only one table, and the table name is not suffixed.
> add new parameter :
> {{hive_sync.table.strategy}} Available options ： RO , RT, ALL
> {{hoodie.datasource.hive_sync.table.strategy}} Available options ： RO , RT, 
> ALL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4961) Support optional table synchronization to hive.

2022-12-09 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4961:
---

Assignee: HunterXHunter

> Support optional table synchronization to hive.
> ---
>
> Key: HUDI-4961
> URL: https://issues.apache.org/jira/browse/HUDI-4961
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> The current default is to synchronize tables RT and RO,named with the suffix 
> _rt and _ro,but sometimes the user only needs one RO or RT table and the 
> table name does not need the suffix,An optional parameter is added to allow 
> the user to synchronize only one table, and the table name is not suffixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4945) Add a test case for batch clean.

2022-10-09 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4945:

Description: (was: h1. Add a test case for batch clean.)

> Add a test case for batch clean.
> 
>
> Key: HUDI-4945
> URL: https://issues.apache.org/jira/browse/HUDI-4945
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4945) Add a test case for batch clean.

2022-10-09 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4945:

Description: h1. Add a test case for batch clean.  (was: Support to trigger 
the clean in the flink batch mode.)

> Add a test case for batch clean.
> 
>
> Key: HUDI-4945
> URL: https://issues.apache.org/jira/browse/HUDI-4945
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> h1. Add a test case for batch clean.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4945) Add a test case for batch clean.

2022-10-09 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4945:

Summary: Add a test case for batch clean.  (was: Support to trigger the 
clean in the flink batch mode.)

> Add a test case for batch clean.
> 
>
> Key: HUDI-4945
> URL: https://issues.apache.org/jira/browse/HUDI-4945
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> Support to trigger the clean in the flink batch mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4961) Support optional table synchronization to hive.

2022-09-30 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4961:

Component/s: hive

> Support optional table synchronization to hive.
> ---
>
> Key: HUDI-4961
> URL: https://issues.apache.org/jira/browse/HUDI-4961
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: HunterXHunter
>Priority: Major
>
> The current default is to synchronize tables RT and RO,named with the suffix 
> _rt and _ro,but sometimes the user only needs one RO or RT table and the 
> table name does not need the suffix,An optional parameter is added to allow 
> the user to synchronize only one table, and the table name is not suffixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4945) Support to trigger the clean in the flink batch mode.

2022-09-30 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4945:

Component/s: flink

> Support to trigger the clean in the flink batch mode.
> -
>
> Key: HUDI-4945
> URL: https://issues.apache.org/jira/browse/HUDI-4945
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> Support to trigger the clean in the flink batch mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4961) Support optional table synchronization to hive.

2022-09-30 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-4961:
---

 Summary: Support optional table synchronization to hive.
 Key: HUDI-4961
 URL: https://issues.apache.org/jira/browse/HUDI-4961
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


The current default is to synchronize tables RT and RO,named with the suffix 
_rt and _ro,but sometimes the user only needs one RO or RT table and the table 
name does not need the suffix,An optional parameter is added to allow the user 
to synchronize only one table, and the table name is not suffixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4945) Support to trigger the clean in the flink batch mode.

2022-09-28 Thread HunterXHunter (Jira)

HunterXHunter created HUDI-4945:
---

 Summary: Support to trigger the clean in the flink batch mode.
 Key: HUDI-4945
 URL: https://issues.apache.org/jira/browse/HUDI-4945
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: HunterXHunter


Support to trigger the clean in the flink batch mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-4405) Support to trigger the clean in the flink batch mode.

2022-09-06 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter resolved HUDI-4405.
-

> Support to trigger the clean in the flink batch mode.
> -
>
> Key: HUDI-4405
> URL: https://issues.apache.org/jira/browse/HUDI-4405
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> Support to trigger the clean in the flink batch mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4405) Support to trigger the clean in the flink batch mode.

2022-09-06 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4405:
---

Assignee: HunterXHunter

> Support to trigger the clean in the flink batch mode.
> -
>
> Key: HUDI-4405
> URL: https://issues.apache.org/jira/browse/HUDI-4405
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> Support to trigger the clean in the flink batch mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4746) Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction

2022-08-31 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4746:
---

Assignee: (was: HunterXHunter)

> Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
> 
>
> Key: HUDI-4746
> URL: https://issues.apache.org/jira/browse/HUDI-4746
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Priority: Major
>
> ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
>  
> [aug 25: 
> https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44|https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44]
> aug 25: 
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44]
>  
>  
> {code:java}
> 2022-08-25T10:48:57.4158416Z [ERROR] 
> testWriteMergeOnReadWithCompaction{String}[2]  Time elapsed: 22.789 s  <<< 
> FAILURE!
> 2022-08-25T10:48:57.4159313Z org.opentest4j.AssertionFailedError: expected: 
>  but was: 
> 2022-08-25T10:48:57.4160369Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252)
> 2022-08-25T10:48:57.4161127Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182)
> 2022-08-25T10:48:57.4161883Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156)
> 2022-08-25T10:48:57.4166292Z 
> 2022-08-25T10:48:58.0221317Z [INFO] 
> 2022-08-25T10:48:58.0222033Z [INFO] Results:
> 2022-08-25T10:48:58.0228955Z [INFO] 
> 2022-08-25T10:48:58.0229555Z [ERROR] Failures: 
> 2022-08-25T10:48:58.0231472Z [ERROR]   
> ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252
>  expected:  but was: 
> 2022-08-25T10:48:58.0232489Z [INFO] 
> 2022-08-25T10:48:58.0233058Z [ERROR] Tests run: 114, Failures: 1, Errors: 0, 
> Skipped: 0 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4743) Flaky: ITTestHoodieDataSource crashes

2022-08-31 Thread HunterXHunter (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598352#comment-17598352
 ] 

HunterXHunter commented on HUDI-4743:
-

Can we use `{{{}-Xmx2024m -XX:MaxPermSize=256m` instead of 
`@\{argLine}` to solve this problem?{}}}

> Flaky: ITTestHoodieDataSource crashes
> -
>
> Key: HUDI-4743
> URL: https://issues.apache.org/jira/browse/HUDI-4743
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Priority: Major
>
> ITTestHoodieDataSource crashed
>  
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/11033/logs/39]
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10994/logs/31]
>  
> {code:java}
> 2022-08-30T06:18:11.2568236Z [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-failsafe-plugin:2.22.2:verify 
> (verify-integration-test) on project hudi-flink: There are test 
> failures.2022-08-30T06:18:11.2571112Z [ERROR] 2022-08-30T06:18:11.2573983Z 
> [ERROR] Please refer to 
> /home/vsts/work/1/s/hudi-flink-datasource/hudi-flink/target/failsafe-reports 
> for the individual test results.2022-08-30T06:18:11.2577098Z [ERROR] Please 
> refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and 
> [date].dumpstream.2022-08-30T06:18:11.2579886Z [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM 
> terminated without properly saying goodbye. VM crash or System.exit 
> called?2022-08-30T06:18:11.2584190Z [ERROR] Command was /bin/sh -c cd 
> /home/vsts/work/1/s/hudi-flink-datasource/hudi-flink && 
> /usr/lib/jvm/temurin-8-jdk-amd64/jre/bin/java -Xmx2g 
> org.apache.maven.surefire.booter.ForkedBooter 
> /home/vsts/work/1/s/hudi-flink-datasource/hudi-flink/target/surefire 
> 2022-08-30T05-30-42_232-jvmRun1 surefire724291575167156tmp 
> surefire_23336829373297076850tmp2022-08-30T06:18:11.2588719Z [ERROR] Error 
> occurred in starting fork, check output in log2022-08-30T06:18:11.2593048Z 
> [ERROR] Process Exit Code: 2392022-08-30T06:18:11.2596938Z [ERROR] Crashed 
> tests:2022-08-30T06:18:11.2600707Z [ERROR] 
> org.apache.hudi.table.ITTestHoodieDataSource2022-08-30T06:18:11.2604657Z 
> [ERROR]   at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)2022-08-30T06:18:11.2608953Z
>  [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:282)2022-08-30T06:18:11.2612284Z
>  [ERROR]  at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)2022-08-30T06:18:11.2612983Z
>  [ERROR]  at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)2022-08-30T06:18:11.2613739Z
>  [ERROR]at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)2022-08-30T06:18:11.2614505Z
>  [ERROR]   at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)2022-08-30T06:18:11.2615248Z
>  [ERROR] at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)2022-08-30T06:18:11.2615951Z
>  [ERROR]at 
> org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:370)2022-08-30T06:18:11.2616777Z
>  [ERROR]   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:351)2022-08-30T06:18:11.2617439Z
>  [ERROR]at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:215)2022-08-30T06:18:11.2618097Z
>  [ERROR]  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:171)2022-08-30T06:18:11.2618744Z
>  [ERROR]  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:163)2022-08-30T06:18:11.2619458Z
>  [ERROR]  at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)2022-08-30T06:18:11.2620222Z
>  [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)2022-08-30T06:18:11.2624164Z
>  [ERROR]  at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)2022-08-30T06:18:11.2624944Z
>  [ERROR]at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)2022-08-30T06:18:11.2625581Z
>  [ERROR]  at 
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:294)2022-08-30T06:18:11.2626157Z
>  [ERROR]   at 
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)2022-08-30T06:18:11.2626724Z
>  [ERROR]   at 
>

[jira] [Assigned] (HUDI-4746) Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction

2022-08-30 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4746:
---

Assignee: HunterXHunter

> Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
> 
>
> Key: HUDI-4746
> URL: https://issues.apache.org/jira/browse/HUDI-4746
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: HunterXHunter
>Priority: Major
>
> ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
>  
> [aug 25: 
> https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44|https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10940/logs/44]
> aug 25: 
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44]
>  
>  
> {code:java}
> 2022-08-25T10:48:57.4158416Z [ERROR] 
> testWriteMergeOnReadWithCompaction{String}[2]  Time elapsed: 22.789 s  <<< 
> FAILURE!
> 2022-08-25T10:48:57.4159313Z org.opentest4j.AssertionFailedError: expected: 
>  but was: 
> 2022-08-25T10:48:57.4160369Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252)
> 2022-08-25T10:48:57.4161127Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182)
> 2022-08-25T10:48:57.4161883Z  at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156)
> 2022-08-25T10:48:57.4166292Z 
> 2022-08-25T10:48:58.0221317Z [INFO] 
> 2022-08-25T10:48:58.0222033Z [INFO] Results:
> 2022-08-25T10:48:58.0228955Z [INFO] 
> 2022-08-25T10:48:58.0229555Z [ERROR] Failures: 
> 2022-08-25T10:48:58.0231472Z [ERROR]   
> ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252
>  expected:  but was: 
> 2022-08-25T10:48:58.0232489Z [INFO] 
> 2022-08-25T10:48:58.0233058Z [ERROR] Tests run: 114, Failures: 1, Errors: 0, 
> Skipped: 0 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4745) Fix flaky: ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering

2022-08-30 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4745:
---

Assignee: HunterXHunter

> Fix flaky: ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering
> ---
>
> Key: HUDI-4745
> URL: https://issues.apache.org/jira/browse/HUDI-4745
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: HunterXHunter
>Priority: Major
>
> ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering
>  
> aug 30: 
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/11043/logs/40]
> [aug 25: 
> https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44|https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10928/logs/44]
>  
>  
> {code:java}
> 2022-08-30T14:09:34.2164385Z [INFO] Running 
> org.apache.hudi.sink.ITTestDataStreamWrite2022-08-30T14:11:55.7830524Z 
> [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 141.51 s <<< FAILURE! - in 
> org.apache.hudi.sink.ITTestDataStreamWrite2022-08-30T14:11:55.7832415Z 
> [ERROR] testWriteCopyOnWriteWithClustering  Time elapsed: 18.72 s  <<< 
> FAILURE!2022-08-30T14:11:55.7843136Z org.opentest4j.AssertionFailedError: 
> expected:  but was: 2022-08-30T14:11:55.7844163Z   at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodieWithCluster(ITTestDataStreamWrite.java:298)2022-08-30T14:11:55.7845258Z
>   at 
> org.apache.hudi.sink.ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering(ITTestDataStreamWrite.java:166)2022-08-30T14:11:55.7845819Z
>  2022-08-30T14:11:56.4989181Z [INFO] 2022-08-30T14:11:56.4990015Z [INFO] 
> Results:2022-08-30T14:11:56.4990891Z [INFO] 2022-08-30T14:11:56.4991209Z 
> [ERROR] Failures: 2022-08-30T14:11:56.4992974Z [ERROR]   
> ITTestDataStreamWrite.testWriteCopyOnWriteWithClustering:166->testWriteToHoodieWithCluster:298
>  expected:  but was: 2022-08-30T14:11:56.5051270Z [INFO] 
> 2022-08-30T14:11:56.5052102Z [ERROR] Tests run: 114, Failures: 1, Errors: 0, 
> Skipped: 02022-08-30T14:11:56.5052705Z [INFO] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4726) Incremental input splits result is not as expected when flink incremental read.

2022-08-28 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
How to reproduct.
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

  was:
How to reproduct.
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

When using Flink for incremental query, when `read.start-commit is out of 
range`, full table scanning should not be performed.


> Incremental input splits result is not as expected when flink incremental 
> read.
> ---
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> How to reproduct.
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1',

[jira] [Updated] (HUDI-4726) Incremental input splits result is not as expected when flink incremental read.

2022-08-28 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
How to reproduct.
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

When using Flink for incremental query, when `read.start-commit is out of 
range`, full table scanning should not be performed.

  was:
When using Flink for incremental query, when `read.start-commit is out of 
range`, full table scanning should not be performed.
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 


> Incremental input splits result is not as expected when flink incremental 
> read.
> ---
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> How to reproduct.
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
>

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-28 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
When using Flink for incremental query, when `read.start-commit is out of 
range`, full table scanning should not be performed.
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 

  was:
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 


> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
>

[jira] [Updated] (HUDI-4726) Incremental input splits result is not as expected when flink incremental read.

2022-08-28 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Summary: Incremental input splits result is not as expected when flink 
incremental read.  (was: When using Flink for incremental query, when 
`read.start-commit is out of range`, full table scanning should not be 
performed.)

> Incremental input splits result is not as expected when flink incremental 
> read.
> ---
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
> be like exp3.
> 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
> be (true,+I[id1, t3, par1]). 
> 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
> this is right{code}
> The root of the problem is `IncrementalInputSplits.inputSplits`, because 
> `startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
> read is t6..parquet instead of t3.parquet.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-27 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Issue Type: Bug  (was: Improvement)

> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
>  
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
> be like exp3.
> 'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
> be (true,+I[id1, t3, par1]). 
> 'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
> this is right{code}
> The root of the problem is `IncrementalInputSplits.inputSplits`, because 
> `startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
> read is t6..parquet instead of t3.parquet.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-27 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
'read.start-commit'='0', 'read.end-commit'='t4' -- (nothing) -- expect should 
be (true,+I[id1, t3, par1]). 
'read.start-commit'='0', 'read.end-commit'='t5' -- (true,+I[id1, t4, par1]) 
this is right{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 

  was:
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 


> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
>  
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0',

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-26 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 

  was:
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.

-- 
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.{code}
 

 


> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
>  
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
> be like exp3.
> {code}
> The root of the problem is `IncrementalInputSplits.inputSplits`, because 
> `startCommit` is out of range,

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-26 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.

-- 
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.{code}
 

 

> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
>  
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
> be like exp3.
> -- 
> The root of the problem is `IncrementalInputSplits.inputSplits`, because 
> `startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
> read is t6..parquet instead of t3.parquet.{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4600) Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2022-08-26 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4600:
---

Assignee: HunterXHunter

> Hive synchronization failure : Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> --
>
> Key: HUDI-4600
> URL: https://issues.apache.org/jira/browse/HUDI-4600
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Blocker
>
>  
> {code:java}
> 10:32:28.039 [pool-9-thread-1] ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler - Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught.
>   at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:659)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
>   at 
>

75 matches

Mail list logo