[jira] [Created] (HUDI-7168) Add test cases for HUDI-7165
kwang created HUDI-7168: --- Summary: Add test cases for HUDI-7165 Key: HUDI-7168 URL: https://issues.apache.org/jira/browse/HUDI-7168 Project: Apache Hudi Issue Type: Improvement Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7165) Flink multi writer not close the failed instant heartbeat
[ https://issues.apache.org/jira/browse/HUDI-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7165: Summary: Flink multi writer not close the failed instant heartbeat (was: Flink multi writer not close the instant heartbeat) > Flink multi writer not close the failed instant heartbeat > - > > Key: HUDI-7165 > URL: https://issues.apache.org/jira/browse/HUDI-7165 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: kwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7165) Flink multi writer not close the instant heartbeat
[ https://issues.apache.org/jira/browse/HUDI-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7165: Summary: Flink multi writer not close the instant heartbeat (was: Coordinator restart not close the heartbeat client) > Flink multi writer not close the instant heartbeat > -- > > Key: HUDI-7165 > URL: https://issues.apache.org/jira/browse/HUDI-7165 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: kwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7165) Coordinator restart not close the heartbeat client
[ https://issues.apache.org/jira/browse/HUDI-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7165: Component/s: flink > Coordinator restart not close the heartbeat client > -- > > Key: HUDI-7165 > URL: https://issues.apache.org/jira/browse/HUDI-7165 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: kwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7165) Coordinator restart not close the heartbeat client
kwang created HUDI-7165: --- Summary: Coordinator restart not close the heartbeat client Key: HUDI-7165 URL: https://issues.apache.org/jira/browse/HUDI-7165 Project: Apache Hudi Issue Type: Improvement Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable
[ https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7105: Description: If there exists many partitions and files When generating the clean plan, it's easy to throw oom exception. Using secondaryFileSystemView first is more stable than remoteFileSystemView. (was: If there exists mang partitions and files When generating the clean plan, it's easy to throw oom exception. Using secondaryFileSystemView first is more stable than remoteFileSystemView.) > Add FileSystemViewManager configuable > - > > Key: HUDI-7105 > URL: https://issues.apache.org/jira/browse/HUDI-7105 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Labels: clean > > If there exists many partitions and files When generating the clean plan, > it's easy to throw oom exception. Using secondaryFileSystemView first is more > stable than remoteFileSystemView. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable
[ https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7105: Description: If there exists mang partitions and files When generating the clean plan, it's easy to throw oom exception. Using secondaryFileSystemView first is more stable than remoteFileSystemView. (was: If there exists mang partitions and files When generating the clean plan, it's easy to throw oom exception. Using secondaryFileSystemView is more stable than remoteFileSystemView.) > Add FileSystemViewManager configuable > - > > Key: HUDI-7105 > URL: https://issues.apache.org/jira/browse/HUDI-7105 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Labels: clean > > If there exists mang partitions and files When generating the clean plan, > it's easy to throw oom exception. Using secondaryFileSystemView first is more > stable than remoteFileSystemView. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7105) Add FileSystemViewManager configuable
kwang created HUDI-7105: --- Summary: Add FileSystemViewManager configuable Key: HUDI-7105 URL: https://issues.apache.org/jira/browse/HUDI-7105 Project: Apache Hudi Issue Type: Improvement Reporter: kwang If there exists mang partitions and files When generating the clean plan, it's easy to throw oom exception. Using secondaryFileSystemView is more stable than remoteFileSystemView. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7071) Compaction/Clustering job not fail when throw HoodieException
[ https://issues.apache.org/jira/browse/HUDI-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7071: Description: Clustering/Compaction job throw follow exception, the final result returns -1 and the job's state is success. {code:java} ERROR UtilHelpers: Cluster failed org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 5) : org.apache.hudi.exception.HoodieException: unable to read next record from parquet file at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) {code} was: Clustering/Compaction job throw follow exception, the final result returns -1 and the job's state is success. {code:java} ERROR UtilHelpers: Cluster failed org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 5) (lenghong-slave-prd-10-197-3-139.v-bj-5.vivo.lan executor 1): org.apache.hudi.exception.HoodieException: unable to read next record from parquet file at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) {code} > Compaction/Clustering job not fail when throw HoodieException > - > > Key: HUDI-7071 > URL: https://issues.apache.org/jira/browse/HUDI-7071 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > > Clustering/Compaction job throw follow exception, the final result returns -1 > and the job's state is success. > {code:java} > ERROR UtilHelpers: Cluster failed > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 > (TID 5) : org.apache.hudi.exception.HoodieException: unable to read next > record from parquet file > at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) > at > org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) > at > org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) > at > org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7071) Compaction/Clustering job not fail when throw HoodieException
[ https://issues.apache.org/jira/browse/HUDI-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7071: Description: Clustering/Compaction job throw follow exception, the final result returns -1 and the job's state is success. {code:java} ERROR UtilHelpers: Cluster failed org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 (TID 5) (lenghong-slave-prd-10-197-3-139.v-bj-5.vivo.lan executor 1): org.apache.hudi.exception.HoodieException: unable to read next record from parquet file at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) at org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) {code} was:Clustering/Compaction job throw exception ` org.apache.hudi.exception.HoodieException: unable to read next record from parquet file `, the final result returns -1 and the job's state is success. > Compaction/Clustering job not fail when throw HoodieException > - > > Key: HUDI-7071 > URL: https://issues.apache.org/jira/browse/HUDI-7071 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > > Clustering/Compaction job throw follow exception, the final result returns -1 > and the job's state is success. > {code:java} > ERROR UtilHelpers: Cluster failed > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 2 times, most recent failure: Lost task 0.1 in stage 4.0 > (TID 5) (lenghong-slave-prd-10-197-3-139.v-bj-5.vivo.lan executor 1): > org.apache.hudi.exception.HoodieException: unable to read next record from > parquet file > at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54) > at > org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) > at > org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39) > at > org.apache.hudi.client.utils.ConcatenatingIterator.hasNext(ConcatenatingIterator.java:45) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7071) Compaction/Clustering job not fail when throw HoodieException
kwang created HUDI-7071: --- Summary: Compaction/Clustering job not fail when throw HoodieException Key: HUDI-7071 URL: https://issues.apache.org/jira/browse/HUDI-7071 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Clustering/Compaction job throw exception ` org.apache.hudi.exception.HoodieException: unable to read next record from parquet file `, the final result returns -1 and the job's state is success. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7038) RunCompactionProcedure support limit parameter
kwang created HUDI-7038: --- Summary: RunCompactionProcedure support limit parameter Key: HUDI-7038 URL: https://issues.apache.org/jira/browse/HUDI-7038 Project: Apache Hudi Issue Type: Improvement Components: compaction Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6990) Configurable clustering task parallelism
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6990: Summary: Configurable clustering task parallelism (was: Spark clustering job reads records support control the parallelism) > Configurable clustering task parallelism > > > Key: HUDI-6990 > URL: https://issues.apache.org/jira/browse/HUDI-6990 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: kwang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.14.1 > > Attachments: after-subtasks.png, before-subtasks.png > > > Spark executes clustering job will read clustering plan which contains > multiple groups. Each group process many base files or log files. When we > config param ` > hoodie.clustering.plan.strategy.sort.columns`, we read those files through > spark's parallelize method, every file read will generate one sub task. It's > unreasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6991) Fix hoodie.parquet.max.file.size conf reset error
[ https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6991: Summary: Fix hoodie.parquet.max.file.size conf reset error (was: Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD) > Fix hoodie.parquet.max.file.size conf reset error > - > > Key: HUDI-6991 > URL: https://issues.apache.org/jira/browse/HUDI-6991 > Project: Apache Hudi > Issue Type: Bug > Components: clustering >Reporter: kwang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7022) RunClusteringProcedure support limit parameter
[ https://issues.apache.org/jira/browse/HUDI-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7022: Component/s: clustering > RunClusteringProcedure support limit parameter > -- > > Key: HUDI-7022 > URL: https://issues.apache.org/jira/browse/HUDI-7022 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: kwang >Priority: Major > > Since clustering plan generation is non-blocking, all pending clustering > plans will be executed at once when using `call run_clustering(table => > '$table', op => 'execute')` in default. Add limit parameter to controll it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7010) Build clustering group reduces redundant traversals
[ https://issues.apache.org/jira/browse/HUDI-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7010: Component/s: clustering > Build clustering group reduces redundant traversals > --- > > Key: HUDI-7010 > URL: https://issues.apache.org/jira/browse/HUDI-7010 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: kwang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7022) RunClusteringProcedure support limit parameter
[ https://issues.apache.org/jira/browse/HUDI-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-7022: Labels: (was: clustering) > RunClusteringProcedure support limit parameter > -- > > Key: HUDI-7022 > URL: https://issues.apache.org/jira/browse/HUDI-7022 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > > Since clustering plan generation is non-blocking, all pending clustering > plans will be executed at once when using `call run_clustering(table => > '$table', op => 'execute')` in default. Add limit parameter to controll it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7022) RunClusteringProcedure support limit parameter
kwang created HUDI-7022: --- Summary: RunClusteringProcedure support limit parameter Key: HUDI-7022 URL: https://issues.apache.org/jira/browse/HUDI-7022 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Since clustering plan generation is non-blocking, all pending clustering plans will be executed at once when using `call run_clustering(table => '$table', op => 'execute')` in default. Add limit parameter to controll it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7014) Follow up HUDI-6975, optimize the code of BoundedPartitionAwareCompactionStrategy
kwang created HUDI-7014: --- Summary: Follow up HUDI-6975, optimize the code of BoundedPartitionAwareCompactionStrategy Key: HUDI-7014 URL: https://issues.apache.org/jira/browse/HUDI-7014 Project: Apache Hudi Issue Type: Improvement Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7010) Build clustering group reduces redundant traversals
kwang created HUDI-7010: --- Summary: Build clustering group reduces redundant traversals Key: HUDI-7010 URL: https://issues.apache.org/jira/browse/HUDI-7010 Project: Apache Hudi Issue Type: Improvement Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6991) Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
[ https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6991: Component/s: clustering > Fix parquet file size reset error in > SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD > -- > > Key: HUDI-6991 > URL: https://issues.apache.org/jira/browse/HUDI-6991 > Project: Apache Hudi > Issue Type: Bug > Components: clustering >Reporter: kwang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6990: Component/s: clustering > Spark clustering job reads records support control the parallelism > -- > > Key: HUDI-6990 > URL: https://issues.apache.org/jira/browse/HUDI-6990 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: kwang >Priority: Major > Attachments: after-subtasks.png, before-subtasks.png > > > Spark executes clustering job will read clustering plan which contains > multiple groups. Each group process many base files or log files. When we > config param ` > hoodie.clustering.plan.strategy.sort.columns`, we read those files through > spark's parallelize method, every file read will generate one sub task. It's > unreasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6991) Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
[ https://issues.apache.org/jira/browse/HUDI-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6991: Summary: Fix parquet file size reset error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD (was: Fix parquet max file size error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD) > Fix parquet file size reset error in > SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD > -- > > Key: HUDI-6991 > URL: https://issues.apache.org/jira/browse/HUDI-6991 > Project: Apache Hudi > Issue Type: Bug >Reporter: kwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6991) Fix parquet max file size error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD
kwang created HUDI-6991: --- Summary: Fix parquet max file size error in SparkSortAndSizeExecutionStrategy#performClusteringWithRecordsRDD Key: HUDI-6991 URL: https://issues.apache.org/jira/browse/HUDI-6991 Project: Apache Hudi Issue Type: Bug Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6990: Attachment: after-subtasks.png before-subtasks.png > Spark clustering job reads records support control the parallelism > -- > > Key: HUDI-6990 > URL: https://issues.apache.org/jira/browse/HUDI-6990 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Attachments: after-subtasks.png, before-subtasks.png > > > Spark executes clustering job will read clustering plan which contains > multiple groups. Each group process many base files or log files. When we > config param ` > hoodie.clustering.plan.strategy.sort.columns`, we read those files through > spark's parallelize method, every file read will generate one sub task. It's > unreasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6990: Description: Spark executes clustering job will read clustering plan which contains multiple groups. Each group process many base files or log files. When we config param ` hoodie.clustering.plan.strategy.sort.columns`, we read those files through spark's parallelize method, every file read will generate one sub task. It's unreasonable. was:Spark executes clustering job will read clustering plan which contains multiple groups. Each group process many base files or log files. When we read those files through spark's parallelize method, every file will generate one sub task. It's unreasonable. > Spark clustering job reads records support control the parallelism > -- > > Key: HUDI-6990 > URL: https://issues.apache.org/jira/browse/HUDI-6990 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > > Spark executes clustering job will read clustering plan which contains > multiple groups. Each group process many base files or log files. When we > config param ` > hoodie.clustering.plan.strategy.sort.columns`, we read those files through > spark's parallelize method, every file read will generate one sub task. It's > unreasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6990: Description: Spark executes clustering job will read clustering plan which contains multiple groups. Each group process many base files or log files. When we read those files through spark's parallelize method, every file will generate one sub task. It's unreasonable. > Spark clustering job reads records support control the parallelism > -- > > Key: HUDI-6990 > URL: https://issues.apache.org/jira/browse/HUDI-6990 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > > Spark executes clustering job will read clustering plan which contains > multiple groups. Each group process many base files or log files. When we > read those files through spark's parallelize method, every file will generate > one sub task. It's unreasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6990) Spark clustering job reads records support control the parallelism
kwang created HUDI-6990: --- Summary: Spark clustering job reads records support control the parallelism Key: HUDI-6990 URL: https://issues.apache.org/jira/browse/HUDI-6990 Project: Apache Hudi Issue Type: Improvement Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6975) Optimize the implementation of DayBasedCompactionStrategy
kwang created HUDI-6975: --- Summary: Optimize the implementation of DayBasedCompactionStrategy Key: HUDI-6975 URL: https://issues.apache.org/jira/browse/HUDI-6975 Project: Apache Hudi Issue Type: Improvement Components: compaction Reporter: kwang DayBasedCompactionStrategy#orderAndFilter is no need. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6940) OutputStream maybe not close in HoodieHeartbeatClient
[ https://issues.apache.org/jira/browse/HUDI-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang closed HUDI-6940. --- Resolution: Invalid > OutputStream maybe not close in HoodieHeartbeatClient > - > > Key: HUDI-6940 > URL: https://issues.apache.org/jira/browse/HUDI-6940 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: kwang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6940) OutputStream maybe not close in HoodieHeartbeatClient
kwang created HUDI-6940: --- Summary: OutputStream maybe not close in HoodieHeartbeatClient Key: HUDI-6940 URL: https://issues.apache.org/jira/browse/HUDI-6940 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: kwang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation
[ https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6937: Fix Version/s: (was: 0.14.1) > CopyOnWriteInsertHandler#consume will cause clustering performance degradation > -- > > Key: HUDI-6937 > URL: https://issues.apache.org/jira/browse/HUDI-6937 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: kwang >Priority: Major > Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, > hudi-0.14-flamegraph.png, hudi-0.14-log.png > > > We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering > performance dropped by half. We compared and analyzed this two versions of > flame graphs, found TypedProperties object instantiation takes too much time. > This will cause clustering performance degradation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation
[ https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6937: Description: We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering performance dropped by half. We compared and analyzed this two versions of flame graphs, found TypedProperties object instantiation takes too much time. This will cause clustering > CopyOnWriteInsertHandler#consume will cause clustering performance degradation > -- > > Key: HUDI-6937 > URL: https://issues.apache.org/jira/browse/HUDI-6937 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: kwang >Priority: Major > Fix For: 0.14.1 > > Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, > hudi-0.14-flamegraph.png, hudi-0.14-log.png > > > We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering > performance dropped by half. We compared and analyzed this two versions of > flame graphs, found TypedProperties object instantiation takes too much time. > This will cause clustering -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation
[ https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6937: Description: We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering performance dropped by half. We compared and analyzed this two versions of flame graphs, found TypedProperties object instantiation takes too much time. This will cause clustering performance degradation. (was: We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering performance dropped by half. We compared and analyzed this two versions of flame graphs, found TypedProperties object instantiation takes too much time. This will cause clustering ) > CopyOnWriteInsertHandler#consume will cause clustering performance degradation > -- > > Key: HUDI-6937 > URL: https://issues.apache.org/jira/browse/HUDI-6937 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: kwang >Priority: Major > Fix For: 0.14.1 > > Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, > hudi-0.14-flamegraph.png, hudi-0.14-log.png > > > We upgraded Hudi from 0.12 to 0.14, and found that the offline clustering > performance dropped by half. We compared and analyzed this two versions of > flame graphs, found TypedProperties object instantiation takes too much time. > This will cause clustering performance degradation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation
[ https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6937: Attachment: hudi-0.12-flamegraph.png hudi-0.12-log.png hudi-0.14-flamegraph.png hudi-0.14-log.png > CopyOnWriteInsertHandler#consume will cause clustering performance degradation > -- > > Key: HUDI-6937 > URL: https://issues.apache.org/jira/browse/HUDI-6937 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > Attachments: hudi-0.12-flamegraph.png, hudi-0.12-log.png, > hudi-0.14-flamegraph.png, hudi-0.14-log.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation
[ https://issues.apache.org/jira/browse/HUDI-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6937: Component/s: spark (was: hudi-utilities) > CopyOnWriteInsertHandler#consume will cause clustering performance degradation > -- > > Key: HUDI-6937 > URL: https://issues.apache.org/jira/browse/HUDI-6937 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6937) CopyOnWriteInsertHandler#consume will cause clustering performance degradation
kwang created HUDI-6937: --- Summary: CopyOnWriteInsertHandler#consume will cause clustering performance degradation Key: HUDI-6937 URL: https://issues.apache.org/jira/browse/HUDI-6937 Project: Apache Hudi Issue Type: Improvement Components: hudi-utilities Reporter: kwang Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6659) Remove the validateRollback restriction for Spark/Flink MDT rollback
[ https://issues.apache.org/jira/browse/HUDI-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6659: Description: Spark offline compaction rollbacks failed inflight instant throw this exception info when mdt enabled. {code:java} org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 20230802174236306 is earlier than the latest compaction 20230803201423881001. There are 3 deltacommits after this compaction: [[20230803201545303__deltacommit__COMPLETED__20230803201721395], [20230803201729007__deltacommit__COMPLETED__20230803201848687], [20230803201852499__deltacommit__COMPLETED__20230803202010862]] at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002) at org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218) at org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650) at org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623) at org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80) at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307) at org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034) at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) {code} was: Spark offline compaction rollbacks failed inflight instant throw this error info when mdt enabled. {code:java} org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 20230802174236306 is earlier than the latest compaction 20230803201423881001. There are 3 deltacommits after this compaction: [[20230803201545303__deltacommit__COMPLETED__20230803201721395], [20230803201729007__deltacommit__COMPLETED__20230803201848687], [20230803201852499__deltacommit__COMPLETED__20230803202010862]] at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002) at org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218) at org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650) at org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623) at org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80) at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307) at org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034) at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) {code} > Remove the validateRollback restriction for Spark/Flink MDT rollback > > > Key: HUDI-6659 > URL: https://issues.apache.org/jira/browse/HUDI-6659 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > Spark offline compaction rollbacks failed inflight instant throw this > exception info when mdt enabled. > {code:java} > org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back > 20230802174236306 is earlier than the latest
[jira] [Updated] (HUDI-6659) Remove the validateRollback restriction for Spark/Flink MDT rollback
[ https://issues.apache.org/jira/browse/HUDI-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6659: Description: Spark offline compaction rollbacks failed inflight instant throw this error info when mdt enabled. {code:java} org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 20230802174236306 is earlier than the latest compaction 20230803201423881001. There are 3 deltacommits after this compaction: [[20230803201545303__deltacommit__COMPLETED__20230803201721395], [20230803201729007__deltacommit__COMPLETED__20230803201848687], [20230803201852499__deltacommit__COMPLETED__20230803202010862]] at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002) at org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218) at org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650) at org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623) at org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80) at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307) at org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034) at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) {code} was: Spark offline compaction rollback failed inflight instant throw next error info when mdt enabled. {code:java} org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 20230802174236306 is earlier than the latest compaction 20230803201423881001. There are 3 deltacommits after this compaction: [[20230803201545303__deltacommit__COMPLETED__20230803201721395], [20230803201729007__deltacommit__COMPLETED__20230803201848687], [20230803201852499__deltacommit__COMPLETED__20230803202010862]] at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002) at org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218) at org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650) at org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623) at org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80) at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307) at org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034) at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) {code} > Remove the validateRollback restriction for Spark/Flink MDT rollback > > > Key: HUDI-6659 > URL: https://issues.apache.org/jira/browse/HUDI-6659 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > Spark offline compaction rollbacks failed inflight instant throw this error > info when mdt enabled. > {code:java} > org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back > 20230802174236306 is earlier than the latest compaction
[jira] [Created] (HUDI-6659) Remove the validateRollback restriction for Spark/Flink MDT rollback
kwang created HUDI-6659: --- Summary: Remove the validateRollback restriction for Spark/Flink MDT rollback Key: HUDI-6659 URL: https://issues.apache.org/jira/browse/HUDI-6659 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 Spark offline compaction rollback failed inflight instant throw next error info when mdt enabled. {code:java} org.apache.hudi.exception.HoodieMetadataException: Commit being rolled back 20230802174236306 is earlier than the latest compaction 20230803201423881001. There are 3 deltacommits after this compaction: [[20230803201545303__deltacommit__COMPLETED__20230803201721395], [20230803201729007__deltacommit__COMPLETED__20230803201848687], [20230803201852499__deltacommit__COMPLETED__20230803202010862]] at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateRollback(HoodieBackedTableMetadataWriter.java:1034) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1002) at org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:256) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:118) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:139) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:218) at org.apache.hudi.table.HoodieTable.rollbackInflightInstant(HoodieTable.java:650) at org.apache.hudi.table.HoodieTable.rollbackInflightCompaction(HoodieTable.java:623) at org.apache.hudi.client.SparkRDDTableServiceClient.compact(SparkRDDTableServiceClient.java:80) at org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:307) at org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:1034) at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:306) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-6605) Add compaction/logcompaction writestatus errors check and advance it
[ https://issues.apache.org/jira/browse/HUDI-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749477#comment-17749477 ] kwang commented on HUDI-6605: - Hi [~danny0405], you merged error, this pull request is linked to HUDI-6604. > Add compaction/logcompaction writestatus errors check and advance it > > > Key: HUDI-6605 > URL: https://issues.apache.org/jira/browse/HUDI-6605 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6605) Add compaction/logcompaction writestatus errors check and advance it
[ https://issues.apache.org/jira/browse/HUDI-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6605: Summary: Add compaction/logcompaction writestatus errors check and advance it (was: Add compaction/logcompaction writestatus errors checking and advance it to Inflight transition) > Add compaction/logcompaction writestatus errors check and advance it > > > Key: HUDI-6605 > URL: https://issues.apache.org/jira/browse/HUDI-6605 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6605) Add compaction/logcompaction writestatus errors checking and advance it to Inflight transition
[ https://issues.apache.org/jira/browse/HUDI-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6605: Summary: Add compaction/logcompaction writestatus errors checking and advance it to Inflight transition (was: Compaction writestatus errors checking should come before transitionReplaceInflightToComplete) > Add compaction/logcompaction writestatus errors checking and advance it to > Inflight transition > -- > > Key: HUDI-6605 > URL: https://issues.apache.org/jira/browse/HUDI-6605 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6605) Compaction writestatus errors checking should come before transitionReplaceInflightToComplete
kwang created HUDI-6605: --- Summary: Compaction writestatus errors checking should come before transitionReplaceInflightToComplete Key: HUDI-6605 URL: https://issues.apache.org/jira/browse/HUDI-6605 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6565) Spark offline compaction add failed retry mechanism
kwang created HUDI-6565: --- Summary: Spark offline compaction add failed retry mechanism Key: HUDI-6565 URL: https://issues.apache.org/jira/browse/HUDI-6565 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6482) Supports new compaction strategy DayBasedAndBoundedIOCompactionStrategy
kwang created HUDI-6482: --- Summary: Supports new compaction strategy DayBasedAndBoundedIOCompactionStrategy Key: HUDI-6482 URL: https://issues.apache.org/jira/browse/HUDI-6482 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 When the accumulated traffic is too large, using strategy A, the new generated compaction plan will handle too much data if we use DayBasedCompactionStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits
kwang created HUDI-6458: --- Summary: Scheduling jobs should not fail when there is no completed commits Key: HUDI-6458 URL: https://issues.apache.org/jira/browse/HUDI-6458 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned
kwang created HUDI-6457: --- Summary: Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned Key: HUDI-6457 URL: https://issues.apache.org/jira/browse/HUDI-6457 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6396) Flink supports schedule the clustering in batch execution mode and code refactor
[ https://issues.apache.org/jira/browse/HUDI-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6396: Summary: Flink supports schedule the clustering in batch execution mode and code refactor (was: Flink supports schedule the clustering plan in batch execution mode and code refactor) > Flink supports schedule the clustering in batch execution mode and code > refactor > > > Key: HUDI-6396 > URL: https://issues.apache.org/jira/browse/HUDI-6396 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > Flink currently only supports schedule the compaction plan but not schedule > the clustering plan in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6396) Flink supports schedule the clustering plan in batch execution mode and code refactor
[ https://issues.apache.org/jira/browse/HUDI-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6396: Summary: Flink supports schedule the clustering plan in batch execution mode and code refactor (was: Flink supports schedule the clustering plan in batch execution mode) > Flink supports schedule the clustering plan in batch execution mode and code > refactor > - > > Key: HUDI-6396 > URL: https://issues.apache.org/jira/browse/HUDI-6396 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > Flink currently only supports schedule the compaction plan but not schedule > the clustering plan in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6396) Flink supports schedule the clustering plan in batch execution mode
[ https://issues.apache.org/jira/browse/HUDI-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6396: Summary: Flink supports schedule the clustering plan in batch execution mode (was: Flink support schedule the clustering plan in batch execution mode) > Flink supports schedule the clustering plan in batch execution mode > --- > > Key: HUDI-6396 > URL: https://issues.apache.org/jira/browse/HUDI-6396 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > Flink currently only supports schedule the compaction plan but not schedule > the clustering plan in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6396) Flink support schedule the clustering plan in batch execution mode
kwang created HUDI-6396: --- Summary: Flink support schedule the clustering plan in batch execution mode Key: HUDI-6396 URL: https://issues.apache.org/jira/browse/HUDI-6396 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 Flink currently only supports schedule the compaction plan but not schedule the clustering plan in batch execution mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6395) Scheduling jobs should not fail when there is no scheduled compaction or clustering plan
[ https://issues.apache.org/jira/browse/HUDI-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6395: Summary: Scheduling jobs should not fail when there is no scheduled compaction or clustering plan (was: Schedule jobs should not fail when there is no scheduled compaction or clustering plan) > Scheduling jobs should not fail when there is no scheduled compaction or > clustering plan > > > Key: HUDI-6395 > URL: https://issues.apache.org/jira/browse/HUDI-6395 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > We use --mode to execute compactor or clustering job, the job should not fail > when there is no scheduled compaction or clustering plan. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6395) Schedule jobs should not fail when there is no scheduled compaction or clustering plan
[ https://issues.apache.org/jira/browse/HUDI-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6395: Description: We use --mode to execute compactor or clustering job, the job should not fail when there is no scheduled compaction or clustering plan. > Schedule jobs should not fail when there is no scheduled compaction or > clustering plan > -- > > Key: HUDI-6395 > URL: https://issues.apache.org/jira/browse/HUDI-6395 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > > We use --mode to execute compactor or clustering job, the job should not fail > when there is no scheduled compaction or clustering plan. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6395) Schedule jobs should not fail when there is no scheduled compaction or clustering in the table
kwang created HUDI-6395: --- Summary: Schedule jobs should not fail when there is no scheduled compaction or clustering in the table Key: HUDI-6395 URL: https://issues.apache.org/jira/browse/HUDI-6395 Project: Apache Hudi Issue Type: Improvement Reporter: kwang Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6395) Schedule jobs should not fail when there is no scheduled compaction or clustering plan
[ https://issues.apache.org/jira/browse/HUDI-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6395: Summary: Schedule jobs should not fail when there is no scheduled compaction or clustering plan (was: Schedule jobs should not fail when there is no scheduled compaction or clustering in the table) > Schedule jobs should not fail when there is no scheduled compaction or > clustering plan > -- > > Key: HUDI-6395 > URL: https://issues.apache.org/jira/browse/HUDI-6395 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6360) spark-memory paramater should not be required in compactor or clustering job
[ https://issues.apache.org/jira/browse/HUDI-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kwang updated HUDI-6360: Summary: spark-memory paramater should not be required in compactor or clustering job (was: spark-memory parmater should not be forced to be true) > spark-memory paramater should not be required in compactor or clustering job > > > Key: HUDI-6360 > URL: https://issues.apache.org/jira/browse/HUDI-6360 > Project: Apache Hudi > Issue Type: Improvement >Reporter: kwang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When using spark to submit a compaction or clustering job, the spark-memory > parameter should not be forced to be true, because the executor-memory > parameter of spark and the spark-memory parameter of hudi will conflict. -- This message was sent by Atlassian Jira (v8.20.10#820010)