[jira] [Created] (HUDI-7168) Add test cases for HUDI-7165

2023-12-03 Thread kwang (Jira)
kwang created HUDI-7168:
---

 Summary: Add test cases for HUDI-7165
 Key: HUDI-7168
 URL: https://issues.apache.org/jira/browse/HUDI-7168
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7165) Flink multi writer not close the failed instant heartbeat

2023-11-30 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7165:

Summary: Flink multi writer not close the failed instant heartbeat  (was: 
Flink multi writer not close the instant heartbeat)

> Flink multi writer not close the failed instant heartbeat
> -
>
> Key: HUDI-7165
> URL: https://issues.apache.org/jira/browse/HUDI-7165
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: kwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7165) Flink multi writer not close the instant heartbeat

2023-11-30 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7165:

Summary: Flink multi writer not close the instant heartbeat  (was: 
Coordinator restart not close the heartbeat client)

> Flink multi writer not close the instant heartbeat
> --
>
> Key: HUDI-7165
> URL: https://issues.apache.org/jira/browse/HUDI-7165
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: kwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7165) Coordinator restart not close the heartbeat client

2023-11-30 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7165:

Component/s: flink

> Coordinator restart not close the heartbeat client
> --
>
> Key: HUDI-7165
> URL: https://issues.apache.org/jira/browse/HUDI-7165
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: kwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7165) Coordinator restart not close the heartbeat client

2023-11-30 Thread kwang (Jira)
kwang created HUDI-7165:
---

 Summary: Coordinator restart not close the heartbeat client
 Key: HUDI-7165
 URL: https://issues.apache.org/jira/browse/HUDI-7165
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7105:

Description: If there exists many partitions and files When generating the 
clean plan, it's easy to throw oom exception. Using secondaryFileSystemView 
first is more stable than remoteFileSystemView.  (was: If there exists mang 
partitions and files When generating the clean plan, it's easy to throw oom 
exception. Using secondaryFileSystemView first is more stable than 
remoteFileSystemView.)

> Add FileSystemViewManager configuable
> -
>
> Key: HUDI-7105
> URL: https://issues.apache.org/jira/browse/HUDI-7105
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: clean
>
> If there exists many partitions and files When generating the clean plan, 
> it's easy to throw oom exception. Using secondaryFileSystemView first is more 
> stable than remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7105:

Description: If there exists mang partitions and files When generating the 
clean plan, it's easy to throw oom exception. Using secondaryFileSystemView 
first is more stable than remoteFileSystemView.  (was: If there exists mang 
partitions and files When generating the clean plan, it's easy to throw oom 
exception. Using secondaryFileSystemView is more stable than 
remoteFileSystemView.)

> Add FileSystemViewManager configuable
> -
>
> Key: HUDI-7105
> URL: https://issues.apache.org/jira/browse/HUDI-7105
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: clean
>
> If there exists mang partitions and files When generating the clean plan, 
> it's easy to throw oom exception. Using secondaryFileSystemView first is more 
> stable than remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread kwang (Jira)
kwang created HUDI-7105:
---

 Summary: Add FileSystemViewManager configuable
 Key: HUDI-7105
 URL: https://issues.apache.org/jira/browse/HUDI-7105
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang


If there exists mang partitions and files When generating the clean plan, it's 
easy to throw oom exception. Using secondaryFileSystemView is more stable than 
remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7071) Compaction/Clustering job not fail when throw HoodieException

2023-11-10 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7071:

Description: 
Clustering/Compaction job throw follow exception, the final result returns -1 
and the job's state is success.
{code:java}
ERROR UtilHelpers: Cluster failed
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 
5) : org.apache.hudi.exception.HoodieException: unable to read next record from 
parquet file 
at 
org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54)
at 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at 
org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45)
 {code}

  was:
Clustering/Compaction job throw follow exception, the final result returns -1 
and the job's state is success.
{code:java}
ERROR UtilHelpers: Cluster failed
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 
5) (lenghong-slave-prd-10-197-3-139.v-bj-5.vivo.lan executor 1): 
org.apache.hudi.exception.HoodieException: unable to read next record from 
parquet file 
at 
org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54)
at 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at 
org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45)
 {code}


> Compaction/Clustering job not fail when throw HoodieException
> -
>
> Key: HUDI-7071
> URL: https://issues.apache.org/jira/browse/HUDI-7071
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>
> Clustering/Compaction job throw follow exception, the final result returns -1 
> and the job's state is success.
> {code:java}
> ERROR UtilHelpers: Cluster failed
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 
> (TID 5) : org.apache.hudi.exception.HoodieException: unable to read next 
> record from parquet file 
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54)
>   at 
> org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
>   at 
> org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
>   at 
> org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7071) Compaction/Clustering job not fail when throw HoodieException

2023-11-10 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7071:

Description: 
Clustering/Compaction job throw follow exception, the final result returns -1 
and the job's state is success.
{code:java}
ERROR UtilHelpers: Cluster failed
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 
5) (lenghong-slave-prd-10-197-3-139.v-bj-5.vivo.lan executor 1): 
org.apache.hudi.exception.HoodieException: unable to read next record from 
parquet file 
at 
org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54)
at 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
at 
org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45)
 {code}

  was:Clustering/Compaction job throw exception ` 
org.apache.hudi.exception.HoodieException: unable to read next record from 
parquet file `, the final result returns -1 and the job's state is success.


> Compaction/Clustering job not fail when throw HoodieException
> -
>
> Key: HUDI-7071
> URL: https://issues.apache.org/jira/browse/HUDI-7071
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>
> Clustering/Compaction job throw follow exception, the final result returns -1 
> and the job's state is success.
> {code:java}
> ERROR UtilHelpers: Cluster failed
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 
> (TID 5) (lenghong-slave-prd-10-197-3-139.v-bj-5.vivo.lan executor 1): 
> org.apache.hudi.exception.HoodieException: unable to read next record from 
> parquet file 
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54)
>   at 
> org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
>   at 
> org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
>   at 
> org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7071) Compaction/Clustering job not fail when throw HoodieException

2023-11-10 Thread kwang (Jira)
kwang created HUDI-7071:
---

 Summary: Compaction/Clustering job not fail when throw 
HoodieException
 Key: HUDI-7071
 URL: https://issues.apache.org/jira/browse/HUDI-7071
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang


Clustering/Compaction job throw exception ` 
org.apache.hudi.exception.HoodieException: unable to read next record from 
parquet file `, the final result returns -1 and the job's state is success.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7038) RunCompactionProcedure support limit parameter

2023-11-07 Thread kwang (Jira)
kwang created HUDI-7038:
---

 Summary: RunCompactionProcedure support limit parameter
 Key: HUDI-7038
 URL: https://issues.apache.org/jira/browse/HUDI-7038
 Project: Apache Hudi
  Issue Type: Improvement
  Components: compaction
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6990) Configurable clustering task parallelism

2023-11-05 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6990:

Summary: Configurable clustering task parallelism  (was: Spark clustering 
job reads records support control the parallelism)

> Configurable clustering task parallelism
> 
>
> Key: HUDI-6990
> URL: https://issues.apache.org/jira/browse/HUDI-6990
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.14.1
>
> Attachments: after-subtasks.png, before-subtasks.png
>
>
> Spark executes clustering job will read clustering plan which contains 
> multiple groups. Each group process many base files or log files. When we 
> config param `
> hoodie.clustering.plan.strategy.sort.columns`, we read those files through 
> spark's parallelize method, every file read will generate one sub task. It's 
> unreasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6991) Fix hoodie.parquet.max.file.size conf reset error

2023-11-02 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6991:

Summary: Fix hoodie.parquet.max.file.size conf reset error  (was: Fix 
parquet file size reset error in 
SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD)

> Fix hoodie.parquet.max.file.size conf reset error
> -
>
> Key: HUDI-6991
> URL: https://issues.apache.org/jira/browse/HUDI-6991
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7022) RunClusteringProcedure support limit parameter

2023-11-02 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7022:

Component/s: clustering

> RunClusteringProcedure support limit parameter
> --
>
> Key: HUDI-7022
> URL: https://issues.apache.org/jira/browse/HUDI-7022
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: clustering
>Reporter: kwang
>Priority: Major
>
> Since clustering plan generation is non-blocking, all pending clustering 
> plans will be executed at once when using `call run_clustering(table => 
> '$table', op => 'execute')` in default. Add limit parameter to controll it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7010) Build clustering group reduces redundant traversals

2023-11-02 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7010:

Component/s: clustering

> Build clustering group reduces redundant traversals
> ---
>
> Key: HUDI-7010
> URL: https://issues.apache.org/jira/browse/HUDI-7010
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7022) RunClusteringProcedure support limit parameter

2023-11-02 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7022:

Labels:   (was: clustering)

> RunClusteringProcedure support limit parameter
> --
>
> Key: HUDI-7022
> URL: https://issues.apache.org/jira/browse/HUDI-7022
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>
> Since clustering plan generation is non-blocking, all pending clustering 
> plans will be executed at once when using `call run_clustering(table => 
> '$table', op => 'execute')` in default. Add limit parameter to controll it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7022) RunClusteringProcedure support limit parameter

2023-11-02 Thread kwang (Jira)
kwang created HUDI-7022:
---

 Summary: RunClusteringProcedure support limit parameter
 Key: HUDI-7022
 URL: https://issues.apache.org/jira/browse/HUDI-7022
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang


Since clustering plan generation is non-blocking, all pending clustering plans 
will be executed at once when using `call run_clustering(table => '$table', op 
=> 'execute')` in default. Add limit parameter to controll it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7014) Follow up HUDI-6975, optimize the code of BoundedPartitionAwareCompactionStrategy

2023-10-31 Thread kwang (Jira)
kwang created HUDI-7014:
---

 Summary: Follow up HUDI-6975, optimize the code of 
BoundedPartitionAwareCompactionStrategy
 Key: HUDI-7014
 URL: https://issues.apache.org/jira/browse/HUDI-7014
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7010) Build clustering group reduces redundant traversals

2023-10-30 Thread kwang (Jira)
kwang created HUDI-7010:
---

 Summary: Build clustering group reduces redundant traversals
 Key: HUDI-7010
 URL: https://issues.apache.org/jira/browse/HUDI-7010
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6991) Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD

2023-10-26 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6991:

Component/s: clustering

> Fix parquet file size reset error in 
> SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
> --
>
> Key: HUDI-6991
> URL: https://issues.apache.org/jira/browse/HUDI-6991
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-10-26 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6990:

Component/s: clustering

> Spark clustering job reads records support control the parallelism
> --
>
> Key: HUDI-6990
> URL: https://issues.apache.org/jira/browse/HUDI-6990
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: clustering
>Reporter: kwang
>Priority: Major
> Attachments: after-subtasks.png, before-subtasks.png
>
>
> Spark executes clustering job will read clustering plan which contains 
> multiple groups. Each group process many base files or log files. When we 
> config param `
> hoodie.clustering.plan.strategy.sort.columns`, we read those files through 
> spark's parallelize method, every file read will generate one sub task. It's 
> unreasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6991) Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD

2023-10-26 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6991:

Summary: Fix parquet file size reset error in 
SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD  (was: Fix 
parquet max file size error in 
SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD)

> Fix parquet file size reset error in 
> SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
> --
>
> Key: HUDI-6991
> URL: https://issues.apache.org/jira/browse/HUDI-6991
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: kwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6991) Fix parquet max file size error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD

2023-10-26 Thread kwang (Jira)
kwang created HUDI-6991:
---

 Summary: Fix parquet max file size error in 
SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
 Key: HUDI-6991
 URL: https://issues.apache.org/jira/browse/HUDI-6991
 Project: Apache Hudi
  Issue Type: Bug
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-10-26 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6990:

Attachment: after-subtasks.png
before-subtasks.png

> Spark clustering job reads records support control the parallelism
> --
>
> Key: HUDI-6990
> URL: https://issues.apache.org/jira/browse/HUDI-6990
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Attachments: after-subtasks.png, before-subtasks.png
>
>
> Spark executes clustering job will read clustering plan which contains 
> multiple groups. Each group process many base files or log files. When we 
> config param `
> hoodie.clustering.plan.strategy.sort.columns`, we read those files through 
> spark's parallelize method, every file read will generate one sub task. It's 
> unreasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-10-26 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6990:

Description: 
Spark executes clustering job will read clustering plan which contains multiple 
groups. Each group process many base files or log files. When we config param `
hoodie.clustering.plan.strategy.sort.columns`, we read those files through 
spark's parallelize method, every file read will generate one sub task. It's 
unreasonable.

  was:Spark executes clustering job will read clustering plan which contains 
multiple groups. Each group process many base files or log files. When we read 
those files through spark's parallelize method, every file will generate one 
sub task. It's unreasonable.


> Spark clustering job reads records support control the parallelism
> --
>
> Key: HUDI-6990
> URL: https://issues.apache.org/jira/browse/HUDI-6990
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>
> Spark executes clustering job will read clustering plan which contains 
> multiple groups. Each group process many base files or log files. When we 
> config param `
> hoodie.clustering.plan.strategy.sort.columns`, we read those files through 
> spark's parallelize method, every file read will generate one sub task. It's 
> unreasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-10-26 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6990:

Description: Spark executes clustering job will read clustering plan which 
contains multiple groups. Each group process many base files or log files. When 
we read those files through spark's parallelize method, every file will 
generate one sub task. It's unreasonable.

> Spark clustering job reads records support control the parallelism
> --
>
> Key: HUDI-6990
> URL: https://issues.apache.org/jira/browse/HUDI-6990
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>
> Spark executes clustering job will read clustering plan which contains 
> multiple groups. Each group process many base files or log files. When we 
> read those files through spark's parallelize method, every file will generate 
> one sub task. It's unreasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-10-26 Thread kwang (Jira)
kwang created HUDI-6990:
---

 Summary: Spark clustering job reads records support control the 
parallelism
 Key: HUDI-6990
 URL: https://issues.apache.org/jira/browse/HUDI-6990
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6975) Optimize the implementation of DayBasedCompactionStrategy

2023-10-24 Thread kwang (Jira)
kwang created HUDI-6975:
---

 Summary: Optimize the implementation of DayBasedCompactionStrategy
 Key: HUDI-6975
 URL: https://issues.apache.org/jira/browse/HUDI-6975
 Project: Apache Hudi
  Issue Type: Improvement
  Components: compaction
Reporter: kwang


DayBasedCompactionStrategy#orderAndFilter is no need.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6940) OutputStream maybe not close in HoodieHeartbeatClient

2023-10-15 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang closed HUDI-6940.
---
Resolution: Invalid

> OutputStream maybe not close in HoodieHeartbeatClient
> -
>
> Key: HUDI-6940
> URL: https://issues.apache.org/jira/browse/HUDI-6940
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6940) OutputStream maybe not close in HoodieHeartbeatClient

2023-10-12 Thread kwang (Jira)
kwang created HUDI-6940:
---

 Summary: OutputStream maybe not close in HoodieHeartbeatClient
 Key: HUDI-6940
 URL: https://issues.apache.org/jira/browse/HUDI-6940
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: kwang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation

2023-10-12 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6937:

Fix Version/s: (was: 0.14.1)

> CopyOnWriteInsertHandler#consume will cause clustering performance degradation
> --
>
> Key: HUDI-6937
> URL: https://issues.apache.org/jira/browse/HUDI-6937
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: kwang
>Priority: Major
> Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, 
> hudi-0.14-flamegraph.png, hudi-0.14-log.png
>
>
> We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering 
> performance dropped by half. We compared and analyzed this two versions of 
> flame graphs, found TypedProperties object instantiation takes too much time. 
> This will cause clustering performance degradation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation

2023-10-12 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6937:

Description: We upgraded Hudi from 0.12 to 0.14, and found that the offline 
clustering performance dropped by half. We compared and analyzed this two 
versions of flame graphs, found TypedProperties object instantiation takes too 
much time. This will cause clustering 

> CopyOnWriteInsertHandler#consume will cause clustering performance degradation
> --
>
> Key: HUDI-6937
> URL: https://issues.apache.org/jira/browse/HUDI-6937
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.1
>
> Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, 
> hudi-0.14-flamegraph.png, hudi-0.14-log.png
>
>
> We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering 
> performance dropped by half. We compared and analyzed this two versions of 
> flame graphs, found TypedProperties object instantiation takes too much time. 
> This will cause clustering 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation

2023-10-12 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6937:

Description: We upgraded Hudi from 0.12 to 0.14, and found that the offline 
clustering performance dropped by half. We compared and analyzed this two 
versions of flame graphs, found TypedProperties object instantiation takes too 
much time. This will cause clustering performance degradation.  (was: We 
upgraded Hudi from 0.12 to 0.14, and found that the offline clustering 
performance dropped by half. We compared and analyzed this two versions of 
flame graphs, found TypedProperties object instantiation takes too much time. 
This will cause clustering )

> CopyOnWriteInsertHandler#consume will cause clustering performance degradation
> --
>
> Key: HUDI-6937
> URL: https://issues.apache.org/jira/browse/HUDI-6937
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.1
>
> Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, 
> hudi-0.14-flamegraph.png, hudi-0.14-log.png
>
>
> We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering 
> performance dropped by half. We compared and analyzed this two versions of 
> flame graphs, found TypedProperties object instantiation takes too much time. 
> This will cause clustering performance degradation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation

2023-10-12 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6937:

Attachment: hudi-0.12-flamegraph.png
hudi-0.12-log.png
hudi-0.14-flamegraph.png
hudi-0.14-log.png

> CopyOnWriteInsertHandler#consume will cause clustering performance degradation
> --
>
> Key: HUDI-6937
> URL: https://issues.apache.org/jira/browse/HUDI-6937
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
> Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, 
> hudi-0.14-flamegraph.png, hudi-0.14-log.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation

2023-10-12 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6937:

Component/s: spark
 (was: hudi-utilities)

> CopyOnWriteInsertHandler#consume will cause clustering performance degradation
> --
>
> Key: HUDI-6937
> URL: https://issues.apache.org/jira/browse/HUDI-6937
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation

2023-10-12 Thread kwang (Jira)
kwang created HUDI-6937:
---

 Summary: CopyOnWriteInsertHandler#consume will cause clustering 
performance degradation
 Key: HUDI-6937
 URL: https://issues.apache.org/jira/browse/HUDI-6937
 Project: Apache Hudi
  Issue Type: Improvement
  Components: hudi-utilities
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6659) Remove the validateRollback restriction for Spark/Flink MDT rollback

2023-08-06 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6659:

Description: 
Spark offline compaction rollbacks failed inflight instant throw this exception 
info when mdt enabled.
{code:java}
org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
20230802174236306 is earlier than the latest compaction 20230803201423881001. 
There are 3 deltacommits after this compaction: 
[[20230803201545303__deltacommit__COMPLETED__20230803201721395], 
[20230803201729007__deltacommit__COMPLETED__20230803201848687], 
[20230803201852499__deltacommit__COMPLETED__20230803202010862]]
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002)
    at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139)
    at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623)
    at 
org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80)
    at 
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307)
    at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034)
    at 
org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) 
{code}

  was:
Spark offline compaction rollbacks failed inflight instant throw this error 
info when mdt enabled.
{code:java}
org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
20230802174236306 is earlier than the latest compaction 20230803201423881001. 
There are 3 deltacommits after this compaction: 
[[20230803201545303__deltacommit__COMPLETED__20230803201721395], 
[20230803201729007__deltacommit__COMPLETED__20230803201848687], 
[20230803201852499__deltacommit__COMPLETED__20230803202010862]]
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002)
    at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139)
    at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623)
    at 
org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80)
    at 
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307)
    at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034)
    at 
org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) 
{code}


> Remove the validateRollback restriction for Spark/Flink MDT rollback
> 
>
> Key: HUDI-6659
> URL: https://issues.apache.org/jira/browse/HUDI-6659
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> Spark offline compaction rollbacks failed inflight instant throw this 
> exception info when mdt enabled.
> {code:java}
> org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
> 20230802174236306 is earlier than the latest 

[jira] [Updated] (HUDI-6659) Remove the validateRollback restriction for Spark/Flink MDT rollback

2023-08-06 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6659:

Description: 
Spark offline compaction rollbacks failed inflight instant throw this error 
info when mdt enabled.
{code:java}
org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
20230802174236306 is earlier than the latest compaction 20230803201423881001. 
There are 3 deltacommits after this compaction: 
[[20230803201545303__deltacommit__COMPLETED__20230803201721395], 
[20230803201729007__deltacommit__COMPLETED__20230803201848687], 
[20230803201852499__deltacommit__COMPLETED__20230803202010862]]
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002)
    at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139)
    at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623)
    at 
org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80)
    at 
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307)
    at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034)
    at 
org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) 
{code}

  was:
Spark offline compaction rollback failed inflight instant throw next error info 
when mdt enabled.
{code:java}
org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
20230802174236306 is earlier than the latest compaction 20230803201423881001. 
There are 3 deltacommits after this compaction: 
[[20230803201545303__deltacommit__COMPLETED__20230803201721395], 
[20230803201729007__deltacommit__COMPLETED__20230803201848687], 
[20230803201852499__deltacommit__COMPLETED__20230803202010862]]
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002)
    at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139)
    at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623)
    at 
org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80)
    at 
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307)
    at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034)
    at 
org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) 
{code}


> Remove the validateRollback restriction for Spark/Flink MDT rollback
> 
>
> Key: HUDI-6659
> URL: https://issues.apache.org/jira/browse/HUDI-6659
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> Spark offline compaction rollbacks failed inflight instant throw this error 
> info when mdt enabled.
> {code:java}
> org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
> 20230802174236306 is earlier than the latest compaction 

[jira] [Created] (HUDI-6659) Remove the validateRollback restriction for Spark/Flink MDT rollback

2023-08-06 Thread kwang (Jira)
kwang created HUDI-6659:
---

 Summary: Remove the validateRollback restriction for Spark/Flink 
MDT rollback
 Key: HUDI-6659
 URL: https://issues.apache.org/jira/browse/HUDI-6659
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0


Spark offline compaction rollback failed inflight instant throw next error info 
when mdt enabled.
{code:java}
org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 
20230802174236306 is earlier than the latest compaction 20230803201423881001. 
There are 3 deltacommits after this compaction: 
[[20230803201545303__deltacommit__COMPLETED__20230803201721395], 
[20230803201729007__deltacommit__COMPLETED__20230803201848687], 
[20230803201852499__deltacommit__COMPLETED__20230803202010862]]
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002)
    at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118)
    at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139)
    at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650)
    at 
org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623)
    at 
org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80)
    at 
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307)
    at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034)
    at 
org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) 
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6605) Add compaction/logcompaction writestatus errors check and advance it

2023-08-01 Thread kwang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749477#comment-17749477
 ] 

kwang commented on HUDI-6605:
-

Hi [~danny0405], you merged error, this pull request is linked to HUDI-6604.

> Add compaction/logcompaction writestatus errors check and advance it
> 
>
> Key: HUDI-6605
> URL: https://issues.apache.org/jira/browse/HUDI-6605
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6605) Add compaction/logcompaction writestatus errors check and advance it

2023-07-28 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6605:

Summary: Add compaction/logcompaction writestatus errors check and advance 
it  (was: Add compaction/logcompaction writestatus errors checking and advance 
it to Inflight transition)

> Add compaction/logcompaction writestatus errors check and advance it
> 
>
> Key: HUDI-6605
> URL: https://issues.apache.org/jira/browse/HUDI-6605
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6605) Add compaction/logcompaction writestatus errors checking and advance it to Inflight transition

2023-07-28 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6605:

Summary: Add compaction/logcompaction writestatus errors checking and 
advance it to Inflight transition  (was: Compaction writestatus errors checking 
should come before transitionReplaceInflightToComplete)

> Add compaction/logcompaction writestatus errors checking and advance it to 
> Inflight transition
> --
>
> Key: HUDI-6605
> URL: https://issues.apache.org/jira/browse/HUDI-6605
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6605) Compaction writestatus errors checking should come before transitionReplaceInflightToComplete

2023-07-28 Thread kwang (Jira)
kwang created HUDI-6605:
---

 Summary: Compaction writestatus errors checking should come before 
transitionReplaceInflightToComplete
 Key: HUDI-6605
 URL: https://issues.apache.org/jira/browse/HUDI-6605
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6565) Spark offline compaction add failed retry mechanism

2023-07-19 Thread kwang (Jira)
kwang created HUDI-6565:
---

 Summary: Spark offline compaction add failed retry mechanism
 Key: HUDI-6565
 URL: https://issues.apache.org/jira/browse/HUDI-6565
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6482) Supports new compaction strategy DayBasedAndBoundedIOCompactionStrategy

2023-07-05 Thread kwang (Jira)
kwang created HUDI-6482:
---

 Summary: Supports new compaction strategy 
DayBasedAndBoundedIOCompactionStrategy
 Key: HUDI-6482
 URL: https://issues.apache.org/jira/browse/HUDI-6482
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0


When the accumulated traffic is too large, using strategy A, the new generated 
compaction plan will handle too much data if we use DayBasedCompactionStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread kwang (Jira)
kwang created HUDI-6458:
---

 Summary: Scheduling jobs should not fail when there is no 
completed commits
 Key: HUDI-6458
 URL: https://issues.apache.org/jira/browse/HUDI-6458
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned

2023-06-29 Thread kwang (Jira)
kwang created HUDI-6457:
---

 Summary: Keep JavaSizeBasedClusteringPlanStrategy and 
SparkSizeBasedClusteringPlanStrategy aligned
 Key: HUDI-6457
 URL: https://issues.apache.org/jira/browse/HUDI-6457
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6396) Flink supports schedule the clustering in batch execution mode and code refactor

2023-06-17 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6396:

Summary: Flink supports schedule the clustering in batch execution mode and 
code refactor  (was: Flink supports schedule the clustering plan in batch 
execution mode and code refactor)

> Flink supports schedule the clustering in batch execution mode and code 
> refactor
> 
>
> Key: HUDI-6396
> URL: https://issues.apache.org/jira/browse/HUDI-6396
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> Flink currently only supports schedule the compaction plan but not schedule 
> the clustering plan in batch execution mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6396) Flink supports schedule the clustering plan in batch execution mode and code refactor

2023-06-17 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6396:

Summary: Flink supports schedule the clustering plan in batch execution 
mode and code refactor  (was: Flink supports schedule the clustering plan in 
batch execution mode)

> Flink supports schedule the clustering plan in batch execution mode and code 
> refactor
> -
>
> Key: HUDI-6396
> URL: https://issues.apache.org/jira/browse/HUDI-6396
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> Flink currently only supports schedule the compaction plan but not schedule 
> the clustering plan in batch execution mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6396) Flink supports schedule the clustering plan in batch execution mode

2023-06-16 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6396:

Summary: Flink supports schedule the clustering plan in batch execution 
mode  (was: Flink support schedule the clustering plan in batch execution mode)

> Flink supports schedule the clustering plan in batch execution mode
> ---
>
> Key: HUDI-6396
> URL: https://issues.apache.org/jira/browse/HUDI-6396
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> Flink currently only supports schedule the compaction plan but not schedule 
> the clustering plan in batch execution mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6396) Flink support schedule the clustering plan in batch execution mode

2023-06-16 Thread kwang (Jira)
kwang created HUDI-6396:
---

 Summary: Flink support schedule the clustering plan in batch 
execution mode
 Key: HUDI-6396
 URL: https://issues.apache.org/jira/browse/HUDI-6396
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0


Flink currently only supports schedule the compaction plan but not schedule the 
clustering plan in batch execution mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6395) Scheduling jobs should not fail when there is no scheduled compaction or clustering plan

2023-06-16 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6395:

Summary: Scheduling jobs should not fail when there is no scheduled 
compaction or clustering plan  (was: Schedule jobs should not fail when there 
is no scheduled compaction or clustering plan)

> Scheduling jobs should not fail when there is no scheduled compaction or 
> clustering plan
> 
>
> Key: HUDI-6395
> URL: https://issues.apache.org/jira/browse/HUDI-6395
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> We use --mode to execute compactor or clustering job, the job should not fail 
> when there is no scheduled compaction or clustering plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6395) Schedule jobs should not fail when there is no scheduled compaction or clustering plan

2023-06-16 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6395:

Description: We use --mode to execute compactor or clustering job, the job 
should not fail when there is no scheduled compaction or clustering plan.

> Schedule jobs should not fail when there is no scheduled compaction or 
> clustering plan
> --
>
> Key: HUDI-6395
> URL: https://issues.apache.org/jira/browse/HUDI-6395
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>
> We use --mode to execute compactor or clustering job, the job should not fail 
> when there is no scheduled compaction or clustering plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6395) Schedule jobs should not fail when there is no scheduled compaction or clustering in the table

2023-06-16 Thread kwang (Jira)
kwang created HUDI-6395:
---

 Summary: Schedule jobs should not fail when there is no scheduled 
compaction or clustering in the table
 Key: HUDI-6395
 URL: https://issues.apache.org/jira/browse/HUDI-6395
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6395) Schedule jobs should not fail when there is no scheduled compaction or clustering plan

2023-06-16 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6395:

Summary: Schedule jobs should not fail when there is no scheduled 
compaction or clustering plan  (was: Schedule jobs should not fail when there 
is no scheduled compaction or clustering in the table)

> Schedule jobs should not fail when there is no scheduled compaction or 
> clustering plan
> --
>
> Key: HUDI-6395
> URL: https://issues.apache.org/jira/browse/HUDI-6395
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6360) spark-memory paramater should not be required in compactor or clustering job

2023-06-14 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6360:

Summary: spark-memory paramater should not be required in compactor or 
clustering job  (was: spark-memory parmater should not be forced to be true)

> spark-memory paramater should not be required in compactor or clustering job
> 
>
> Key: HUDI-6360
> URL: https://issues.apache.org/jira/browse/HUDI-6360
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When using spark to submit a compaction or clustering job, the spark-memory 
> parameter should not be forced to be true, because the executor-memory 
> parameter of spark and the spark-memory parameter of hudi will conflict.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)