[jira] [Updated] (HUDI-6703) StreamWriteOperatorCoordinator should refresh the last txn metadata firstly for recommit
[ https://issues.apache.org/jira/browse/HUDI-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6703: - Description: StreamWriteOperatorCoordinator should refresh the last txn metadata firstly to prepare resolution of write conflict for recommit. (was: StreamWriteOperatorCoordinator should refresh the last txn metadata firstly to prepare resolution of write conflict for recommit.) > StreamWriteOperatorCoordinator should refresh the last txn metadata firstly > for recommit > > > Key: HUDI-6703 > URL: https://issues.apache.org/jira/browse/HUDI-6703 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > > StreamWriteOperatorCoordinator should refresh the last txn metadata firstly > to prepare resolution of write conflict for recommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6703) StreamWriteOperatorCoordinator should refresh the last txn metadata firstly for recommit
Nicholas Jiang created HUDI-6703: Summary: StreamWriteOperatorCoordinator should refresh the last txn metadata firstly for recommit Key: HUDI-6703 URL: https://issues.apache.org/jira/browse/HUDI-6703 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang StreamWriteOperatorCoordinator should refresh the last txn metadata firstly to prepare resolution of write conflict for recommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6669) HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores
[ https://issues.apache.org/jira/browse/HUDI-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6669: - Description: HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which stacktrace as follows: {code:java} Caused by: java.lang.OutOfMemoryError at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170) at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891) at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) {code} was: HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which stacktrace as follows: Caused by: java.lang.OutOfMemoryError at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170) at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891) at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > HoodieEngineContext should not use parallel stream with parallelism greater > than CPU cores > -- > > Key: HUDI-6669 > URL: https://issues.apache.org/jira/browse/HUDI-6669 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > HoodieEngineContext should not use parallel stream with parallelism greater > than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of > which stacktrace as follows: > {code:java} > Caused by: java.lang.OutOfMemoryError at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.Nativ
[jira] [Updated] (HUDI-6669) HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores
[ https://issues.apache.org/jira/browse/HUDI-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6669: - Description: HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which stacktrace as follows: Caused by: java.lang.OutOfMemoryError at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170) at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891) at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) was: HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which stacktrace as follows: Caused by: java.lang.OutOfMemoryError at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170) at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891) at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > HoodieEngineContext should not use parallel stream with parallelism greater > than CPU cores > -- > > Key: HUDI-6669 > URL: https://issues.apache.org/jira/browse/HUDI-6669 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > HoodieEngineContext should not use parallel stream with parallelism greater > than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of > which stacktrace as follows: > Caused by: java.lang.OutOfMemoryError at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
[jira] [Created] (HUDI-6669) HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores
Nicholas Jiang created HUDI-6669: Summary: HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores Key: HUDI-6669 URL: https://issues.apache.org/jira/browse/HUDI-6669 Project: Apache Hudi Issue Type: Improvement Components: core Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which stacktrace as follows: Caused by: java.lang.OutOfMemoryError at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145) at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170) at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891) at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6667) ClientIds should generate next id automatically with random uuid instead of incremental id
Nicholas Jiang created HUDI-6667: Summary: ClientIds should generate next id automatically with random uuid instead of incremental id Key: HUDI-6667 URL: https://issues.apache.org/jira/browse/HUDI-6667 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 ClientIds should generate next id automatically with random uuid instead of incremental id to avoid conflict of client id for concurrent batch insert overwrite. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6592) Flink insert overwrite should support dynamic partition instead of whole table
Nicholas Jiang created HUDI-6592: Summary: Flink insert overwrite should support dynamic partition instead of whole table Key: HUDI-6592 URL: https://issues.apache.org/jira/browse/HUDI-6592 Project: Apache Hudi Issue Type: Bug Affects Versions: 0.14.0 Reporter: Nicholas Jiang Assignee: Nicholas Jiang Flink insert overwrite should support dynamic partition instead of the whole table, which behavior is consistent with the semantics of insert overwrite in Flink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6541) Multiple writers should create new and different instant time to avoid marker conflict of same instant
Nicholas Jiang created HUDI-6541: Summary: Multiple writers should create new and different instant time to avoid marker conflict of same instant Key: HUDI-6541 URL: https://issues.apache.org/jira/browse/HUDI-6541 Project: Apache Hudi Issue Type: Bug Components: core Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 Even if the write results of commits have no conflict, multiple writers should create different instant time to avoid marker conflict of same instant. Meanwhile, multiple writers could create new instant time via the file system lock to generate different instant. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6540) Support failed writes clean policy for Flink
Nicholas Jiang created HUDI-6540: Summary: Support failed writes clean policy for Flink Key: HUDI-6540 URL: https://issues.apache.org/jira/browse/HUDI-6540 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 The failed writes clean policy should be lazy when write concurrency mode is optimistic concurrency control. FlinkOptions should support to config failed writes clean policy. The parameters of FlinkStreamerConfig, FlinkCompactionConfig and FlinkClusteringConfig should also support failed writes clean policy parameter. Meanwhile, append mode without inline clustering should add clean operator in pipeline to rollback failed writes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6519) The default value of read.streaming.enabled is determined by execution.runtime-mode
Nicholas Jiang created HUDI-6519: Summary: The default value of read.streaming.enabled is determined by execution.runtime-mode Key: HUDI-6519 URL: https://issues.apache.org/jira/browse/HUDI-6519 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 The default value of read.streaming.enabled could be determined by execution.runtime-mode from which you can choose depending on the requirements of your use case and the characteristics of your job. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6501) StreamWriteOperatorCoordinator should recommit with starting heartbeat for lazy failed writes clean policy
[ https://issues.apache.org/jira/browse/HUDI-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6501: - Summary: StreamWriteOperatorCoordinator should recommit with starting heartbeat for lazy failed writes clean policy (was: Recommit should not abort for heartbeat expired caused by the last failed write) > StreamWriteOperatorCoordinator should recommit with starting heartbeat for > lazy failed writes clean policy > -- > > Key: HUDI-6501 > URL: https://issues.apache.org/jira/browse/HUDI-6501 > Project: Apache Hudi > Issue Type: Bug > Components: core >Affects Versions: 0.14.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When the last write had failed, HoodieHeartbeatClient would close with > stopping all heartbeat which includes deleting heartbeat file. Therefore > StreamWriteOperatorCoordinator should start heartbeat for lazy failed writes > clean policy to avoid aborting for heartbeat expired when recommitting. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6501) Recommit should not abort for heartbeat expired caused by the last failed write
[ https://issues.apache.org/jira/browse/HUDI-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6501: - Description: When the last write had failed, HoodieHeartbeatClient would close with stopping all heartbeat which includes deleting heartbeat file. Therefore StreamWriteOperatorCoordinator should start heartbeat for lazy failed writes clean policy to avoid aborting for heartbeat expired when recommitting. (was: When the last write had failed, HoodieHeartbeatClient would close with stopping all heartbeat which includes deleting heartbeat file in flink job. Therefore, it isn't heartbeat expired for the last failed write when HoodieHeartbeatClient recommits. Fix HoodieHeartbeatClient does not check whether heartbeat is expired with the last failed writes for recommit.) > Recommit should not abort for heartbeat expired caused by the last failed > write > --- > > Key: HUDI-6501 > URL: https://issues.apache.org/jira/browse/HUDI-6501 > Project: Apache Hudi > Issue Type: Bug > Components: core >Affects Versions: 0.14.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When the last write had failed, HoodieHeartbeatClient would close with > stopping all heartbeat which includes deleting heartbeat file. Therefore > StreamWriteOperatorCoordinator should start heartbeat for lazy failed writes > clean policy to avoid aborting for heartbeat expired when recommitting. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6501) Recommit should not abort for heartbeat expired caused by the last failed write
Nicholas Jiang created HUDI-6501: Summary: Recommit should not abort for heartbeat expired caused by the last failed write Key: HUDI-6501 URL: https://issues.apache.org/jira/browse/HUDI-6501 Project: Apache Hudi Issue Type: Bug Components: core Affects Versions: 0.14.0 Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 When the last write had failed, HoodieHeartbeatClient would close with stopping all heartbeat which includes deleting heartbeat file in flink job. Therefore, it isn't heartbeat expired for the last failed write when HoodieHeartbeatClient recommits. Fix HoodieHeartbeatClient does not check whether heartbeat is expired with the last failed writes for recommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6384) The commit action type of inflight instant for replacecommit should be replacecommit
Nicholas Jiang created HUDI-6384: Summary: The commit action type of inflight instant for replacecommit should be replacecommit Key: HUDI-6384 URL: https://issues.apache.org/jira/browse/HUDI-6384 Project: Apache Hudi Issue Type: Bug Components: core Affects Versions: 0.14.0 Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 BaseHoodieWriteClient#commitStats create the inflight instant with the incorrect commit action type determined by table type when committing the replacecommit, which should create the inflight instant with the replacecommit commit action type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip compaction and clustering instants to avoid duplicates
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Description: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering, so that streaming reading may read T-1 day data when clustering the data of T-1 day to cause duplicated data. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices. Same to `read.streaming.skip_compaction`. (was: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering, so that streaming reading may read T-1 day data when clustering the data of T-1 day to cause duplicated data. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices. The same to `read.streaming.skip_compaction`.) > Streaming read should skip compaction and clustering instants to avoid > duplicates > - > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. Same to `read.streaming.skip_compaction`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip compaction and clustering instants to avoid duplicates
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Description: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering, so that streaming reading may read T-1 day data when clustering the data of T-1 day to cause duplicated data. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices. The same to `read.streaming.skip_compaction`. (was: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering, so that streaming reading may read T-1 day data when clustering the data of T-1 day to cause duplicated data. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices.) > Streaming read should skip compaction and clustering instants to avoid > duplicates > - > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. The same to `read.streaming.skip_compaction`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip compaction and clustering instants to avoid duplicates
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Summary: Streaming read should skip compaction and clustering instants to avoid duplicates (was: Streaming read should skip clustering instants to avoid duplicated reading) > Streaming read should skip compaction and clustering instants to avoid > duplicates > - > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants to avoid duplicated reading
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Status: In Progress (was: Open) > Streaming read should skip clustering instants to avoid duplicated reading > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants to avoid duplicated reading
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Summary: Streaming read should skip clustering instants to avoid duplicated reading (was: Streaming read should skip clustering instants to avoid deplicated reading) > Streaming read should skip clustering instants to avoid duplicated reading > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants to avoid deplicated reading
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Summary: Streaming read should skip clustering instants to avoid deplicated reading (was: Streaming read should skip clustering instants) > Streaming read should skip clustering instants to avoid deplicated reading > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Description: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering, so that streaming reading may read T-1 day data when clustering the data of T-1 day to cause duplicated data. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices. (was: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering so that streaming reading may read T-1 day data when clustering the data of T-1 day. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices.) > Streaming read should skip clustering instants > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering, so that streaming reading may read T-1 day data > when clustering the data of T-1 day to cause duplicated data. Therefore > streaming read should skip clustering instants for all cases to avoid reading > the replaced file slices. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Description: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering so that streaming reading may read T-1 day data when clustering the data of T-1 day. Therefore (was: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering so that streaming reading may read T-1 day data when clustering the data of T-1 day. Therefore read.streaming.skip_clustering should be true.) > Streaming read should skip clustering instants > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering so that streaming reading may read T-1 day data > when clustering the data of T-1 day. Therefore -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Description: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering so that streaming reading may read T-1 day data when clustering the data of T-1 day. Therefore streaming read should skip clustering instants for all cases to avoid reading the replaced file slices. (was: At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering so that streaming reading may read T-1 day data when clustering the data of T-1 day. Therefore ) > Streaming read should skip clustering instants > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering so that streaming reading may read T-1 day data > when clustering the data of T-1 day. Therefore streaming read should skip > clustering instants for all cases to avoid reading the replaced file slices. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants
[ https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6317: - Summary: Streaming read should skip clustering instants (was: The default value of read.streaming.skip_clustering should be true) > Streaming read should skip clustering instants > -- > > Key: HUDI-6317 > URL: https://issues.apache.org/jira/browse/HUDI-6317 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > At present, the default value of read.streaming.skip_clustering is false, > which could cause the situation that streaming reading reads the replaced > file slices of clustering so that streaming reading may read T-1 day data > when clustering the data of T-1 day. Therefore read.streaming.skip_clustering > should be true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6317) The default value of read.streaming.skip_clustering should be true
Nicholas Jiang created HUDI-6317: Summary: The default value of read.streaming.skip_clustering should be true Key: HUDI-6317 URL: https://issues.apache.org/jira/browse/HUDI-6317 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 At present, the default value of read.streaming.skip_clustering is false, which could cause the situation that streaming reading reads the replaced file slices of clustering so that streaming reading may read T-1 day data when clustering the data of T-1 day. Therefore read.streaming.skip_clustering should be true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6292) HoodieRealtimeRecordReader#constructRecordReader leads memory leak
[ https://issues.apache.org/jira/browse/HUDI-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-6292. Resolution: Duplicate > HoodieRealtimeRecordReader#constructRecordReader leads memory leak > -- > > Key: HUDI-6292 > URL: https://issues.apache.org/jira/browse/HUDI-6292 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core >Affects Versions: 0.14.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > The exception caused by HoodieRealtimeRecordReader wich constructs record > reader based on job configuration leads memory leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6292) HoodieRealtimeRecordReader#constructRecordReader leads memory leak
[ https://issues.apache.org/jira/browse/HUDI-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6292: - Summary: HoodieRealtimeRecordReader#constructRecordReader leads memory leak (was: HoodieRealtimeRecordReader leads memory leak) > HoodieRealtimeRecordReader#constructRecordReader leads memory leak > -- > > Key: HUDI-6292 > URL: https://issues.apache.org/jira/browse/HUDI-6292 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core >Affects Versions: 0.14.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > > The exception caused by HoodieRealtimeRecordReader wich constructs record > reader based on job configuration leads memory leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6292) HoodieRealtimeRecordReader leads memory leak
Nicholas Jiang created HUDI-6292: Summary: HoodieRealtimeRecordReader leads memory leak Key: HUDI-6292 URL: https://issues.apache.org/jira/browse/HUDI-6292 Project: Apache Hudi Issue Type: Bug Components: reader-core Affects Versions: 0.14.0 Reporter: Nicholas Jiang Assignee: Nicholas Jiang The exception caused by HoodieRealtimeRecordReader wich constructs record reader based on job configuration leads memory leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6281) Comprehensive schema evolution supports column change with a default value
[ https://issues.apache.org/jira/browse/HUDI-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6281: - Status: In Progress (was: Open) > Comprehensive schema evolution supports column change with a default value > -- > > Key: HUDI-6281 > URL: https://issues.apache.org/jira/browse/HUDI-6281 > Project: Apache Hudi > Issue Type: New Feature > Components: core >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > Comprehensive schema evolution should support column change with a default > value, which could add column with a default value etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6281) Comprehensive schema evolution supports column change with a default value
Nicholas Jiang created HUDI-6281: Summary: Comprehensive schema evolution supports column change with a default value Key: HUDI-6281 URL: https://issues.apache.org/jira/browse/HUDI-6281 Project: Apache Hudi Issue Type: New Feature Components: core Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 Comprehensive schema evolution should support column change with a default value, which could add column with a default value etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6229) HoodieInternalWriteStatus marks failure with totalErrorRecords increment
Nicholas Jiang created HUDI-6229: Summary: HoodieInternalWriteStatus marks failure with totalErrorRecords increment Key: HUDI-6229 URL: https://issues.apache.org/jira/browse/HUDI-6229 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.14.0 Reporter: Nicholas Jiang Assignee: Nicholas Jiang HoodieInternalWriteStatus should mark failure with totalErrorRecords increment. Otherwise BulkInsertWriterHelper#toWriteStatus could not get the correct value of totalErrorRecords, which cause that ClusteringCommitSink could not rollback clustering when ClusteringCommitEvent has errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6218) Support instant_time/commit_time of savepoint procedure optional
Nicholas Jiang created HUDI-6218: Summary: Support instant_time/commit_time of savepoint procedure optional Key: HUDI-6218 URL: https://issues.apache.org/jira/browse/HUDI-6218 Project: Apache Hudi Issue Type: Improvement Components: spark Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 Procedure parameter instant_time/commit_time of savepoint procedure optional could be optional and uses the latest instant when the value of instant_time/commit_time is null or empty. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode
[ https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6192: - Summary: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode (was: HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode) > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode > - > > Key: HUDI-6192 > URL: https://issues.apache.org/jira/browse/HUDI-6192 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in > service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode
[ https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6192: - Description: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode of Flink offline compaction and clustering. (was: HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode of Flink offline compaction and clustering.) > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode > - > > Key: HUDI-6192 > URL: https://issues.apache.org/jira/browse/HUDI-6192 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode
[ https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6192: - Description: HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode of Flink offline compaction and clustering. (was: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode of Flink offline compaction and clustering.) > HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in > service mode > -- > > Key: HUDI-6192 > URL: https://issues.apache.org/jira/browse/HUDI-6192 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in > service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode
[ https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6192: - Summary: HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode (was: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode) > HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in > service mode > -- > > Key: HUDI-6192 > URL: https://issues.apache.org/jira/browse/HUDI-6192 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode
[ https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6192: - Description: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode of Flink offline compaction and clustering. (was: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode for service mode of Flink offline compaction and clustering.) > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode > - > > Key: HUDI-6192 > URL: https://issues.apache.org/jira/browse/HUDI-6192 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode
[ https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-6192: - Summary: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode (was: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode) > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in > service mode > - > > Key: HUDI-6192 > URL: https://issues.apache.org/jira/browse/HUDI-6192 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode for > service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode
Nicholas Jiang created HUDI-6192: Summary: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode Key: HUDI-6192 URL: https://issues.apache.org/jira/browse/HUDI-6192 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode for service mode of Flink offline compaction and clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6158) Strengthen Flink clustering commit and rollback strategy
Nicholas Jiang created HUDI-6158: Summary: Strengthen Flink clustering commit and rollback strategy Key: HUDI-6158 URL: https://issues.apache.org/jira/browse/HUDI-6158 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.14.0 `ClusteringCommitSink` could strengthen commit and rollback strategy from two solutions: * Commit: Introduces `clusteringPlanCache` that caches to store clustering plan for each instant. `clusteringPlanCache` stores the mapping of instant_time -> clusteringPlan. * Rolback: Updates `commitBuffer` that stores the mapping of instant_time -> file_ids -> event. Use a map to collect the events because the rolling back of intermediate clustering tasks generates corrupt events. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6135) FlinkClusteringConfig adds --sort-memory option to support write.sort.memory config
[ https://issues.apache.org/jira/browse/HUDI-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-6135: Assignee: Nicholas Jiang > FlinkClusteringConfig adds --sort-memory option to support write.sort.memory > config > --- > > Key: HUDI-6135 > URL: https://issues.apache.org/jira/browse/HUDI-6135 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > FlinkClusteringConfig should add --sort-memory option to support > write.sort.memory config, otherwise FlinkClusteringJob couldn't config the > sort memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6135) FlinkClusteringConfig adds --sort-memory option to support write.sort.memory config
Nicholas Jiang created HUDI-6135: Summary: FlinkClusteringConfig adds --sort-memory option to support write.sort.memory config Key: HUDI-6135 URL: https://issues.apache.org/jira/browse/HUDI-6135 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Fix For: 0.14.0 FlinkClusteringConfig should add --sort-memory option to support write.sort.memory config, otherwise FlinkClusteringJob couldn't config the sort memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6066) HoodieTableSource supports parquet predicate push down
[ https://issues.apache.org/jira/browse/HUDI-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-6066: Assignee: Nicholas Jiang > HoodieTableSource supports parquet predicate push down > -- > > Key: HUDI-6066 > URL: https://issues.apache.org/jira/browse/HUDI-6066 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > > HoodieTableSource supports the implementation of SupportsFilterPushDown > interface that push down filter into FileIndex. HoodieTableSource should > support parquet predicate push down for query performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6066) HoodieTableSource supports parquet predicate push down
Nicholas Jiang created HUDI-6066: Summary: HoodieTableSource supports parquet predicate push down Key: HUDI-6066 URL: https://issues.apache.org/jira/browse/HUDI-6066 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang HoodieTableSource supports the implementation of SupportsFilterPushDown interface that push down filter into FileIndex. HoodieTableSource should support parquet predicate push down for query performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reopened HUDI-5728: -- > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang resolved HUDI-5728. -- > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reopened HUDI-5772: -- > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang resolved HUDI-5772. -- > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS
[ https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5531. Resolution: Won't Fix > RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to > RECENT_PARTITIONS > > > Key: HUDI-5531 > URL: https://issues.apache.org/jira/browse/HUDI-5531 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.1 > > > The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: > output recent partition given skip num and days lookback config, therefore > the RECENT_DAYS strategy doesn't match the semantics because it assumes that > Hudi partitions are partitioned by day, but partitioning by hour can also use > this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode > should rename to RECENT_PARTITIONS for the semantics match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-2503) HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service
[ https://issues.apache.org/jira/browse/HUDI-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-2503. Resolution: Fixed > HoodieFlinkWriteClient supports to allow parallel writing to tables using > Locking service > - > > Key: HUDI-2503 > URL: https://issues.apache.org/jira/browse/HUDI-2503 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > The strategy interface for conflict resolution with multiple writers is > introduced and the SparkRDDWriteClient has integrated with the > ConflictResolutionStrategy. HoodieFlinkWriteClient should also support to > allow parallel writing to tables using Locking service based on > ConflictResolutionStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5772. Resolution: Fixed > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5728. Resolution: Fixed > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false
[ https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5787: - Description: HMSDDLExecutor should set the table type of Hive table to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is set to false. (was: HoodieHiveCatalog should not delete data when dropping the Hive external table, for example, the value of the 'hoodie.datasource.hive_sync.create_managed_table' config is false.) > HMSDDLExecutor should set table type to EXTERNAL_TABLE when > hoodie.datasource.hive_sync.create_managed_table of sync config is false > > > Key: HUDI-5787 > URL: https://issues.apache.org/jira/browse/HUDI-5787 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.13.1 > > > HMSDDLExecutor should set the table type of Hive table to EXTERNAL_TABLE when > hoodie.datasource.hive_sync.create_managed_table of sync config is set to > false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false
[ https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5787: - Summary: HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false (was: HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting hoodie.datasource.hive_sync.create_managed_table to false) > HMSDDLExecutor should set table type to EXTERNAL_TABLE when > hoodie.datasource.hive_sync.create_managed_table of sync config is false > > > Key: HUDI-5787 > URL: https://issues.apache.org/jira/browse/HUDI-5787 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.13.1 > > > HoodieHiveCatalog should not delete data when dropping the Hive external > table, for example, the value of the > 'hoodie.datasource.hive_sync.create_managed_table' config is false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting hoodie.datasource.hive_sync.create_managed_table to false
[ https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5787: - Summary: HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting hoodie.datasource.hive_sync.create_managed_table to false (was: HoodieHiveCatalog should not delete data for dropping external table) > HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting > hoodie.datasource.hive_sync.create_managed_table to false > - > > Key: HUDI-5787 > URL: https://issues.apache.org/jira/browse/HUDI-5787 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.13.1 > > > HoodieHiveCatalog should not delete data when dropping the Hive external > table, for example, the value of the > 'hoodie.datasource.hive_sync.create_managed_table' config is false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5787) HoodieHiveCatalog should not delete data for dropping external table
[ https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5787: Assignee: Nicholas Jiang > HoodieHiveCatalog should not delete data for dropping external table > > > Key: HUDI-5787 > URL: https://issues.apache.org/jira/browse/HUDI-5787 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.13.1 > > > HoodieHiveCatalog should not delete data when dropping the Hive external > table, for example, the value of the > 'hoodie.datasource.hive_sync.create_managed_table' config is false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5787) HoodieHiveCatalog should not delete data for dropping external table
Nicholas Jiang created HUDI-5787: Summary: HoodieHiveCatalog should not delete data for dropping external table Key: HUDI-5787 URL: https://issues.apache.org/jira/browse/HUDI-5787 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Fix For: 0.13.1 HoodieHiveCatalog should not delete data when dropping the Hive external table, for example, the value of the 'hoodie.datasource.hive_sync.create_managed_table' config is false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-1741) Row Level TTL Support for records stored in Hudi
[ https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-1741: Assignee: Nicholas Jiang > Row Level TTL Support for records stored in Hudi > > > Key: HUDI-1741 > URL: https://issues.apache.org/jira/browse/HUDI-1741 > Project: Apache Hudi > Issue Type: New Feature > Components: Utilities >Reporter: Balaji Varadarajan >Assignee: Nicholas Jiang >Priority: Major > > For e:g : Have records only updated last month > > GH: https://github.com/apache/hudi/issues/2743 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
Nicholas Jiang created HUDI-5772: Summary: Align Flink clustering configuration with HoodieClusteringConfig Key: HUDI-5772 URL: https://issues.apache.org/jira/browse/HUDI-5772 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.13.1 Reporter: Nicholas Jiang Assignee: Nicholas Jiang In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 'clustering.plan.strategy.cluster.begin.partition', 'clustering.plan.strategy.cluster.end.partition', 'clustering.plan.strategy.partition.regex.pattern', 'clustering.plan.strategy.partition.selected' options which do not align the clustering configuration of HoodieClusteringConfig. FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5728: - Summary: HoodieTimelineArchiver archives the latest instant before inflight replacecommit (was: HoodieTimelineArchiver archive the latest instant before inflight replacecommit) > HoodieTimelineArchiver archives the latest instant before inflight > replacecommit > > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5728) HoodieTimelineArchiver archive the latest instant before inflight replacecommit
Nicholas Jiang created HUDI-5728: Summary: HoodieTimelineArchiver archive the latest instant before inflight replacecommit Key: HUDI-5728 URL: https://issues.apache.org/jira/browse/HUDI-5728 Project: Apache Hudi Issue Type: Bug Components: table-service Reporter: Nicholas Jiang Fix For: 0.14.0 When inline or async clustering is enabled, we need to ensure that there is a commit in the active timeline to check whether the file slice generated in pending clustering after archive isn't committed via {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore HoodieTimelineArchiver archive the latest instant before inflight replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5728) HoodieTimelineArchiver archive the latest instant before inflight replacecommit
[ https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5728: Assignee: Nicholas Jiang > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit > --- > > Key: HUDI-5728 > URL: https://issues.apache.org/jira/browse/HUDI-5728 > Project: Apache Hudi > Issue Type: Bug > Components: table-service >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.14.0 > > > When inline or async clustering is enabled, we need to ensure that there is a > commit in the active timeline to check whether the file slice generated in > pending clustering after archive isn't committed via > {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore > HoodieTimelineArchiver archive the latest instant before inflight > replacecommit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5663) The pending table service operation should check whether the partition corresponding to the filegroup exists
Nicholas Jiang created HUDI-5663: Summary: The pending table service operation should check whether the partition corresponding to the filegroup exists Key: HUDI-5663 URL: https://issues.apache.org/jira/browse/HUDI-5663 Project: Apache Hudi Issue Type: Improvement Components: table-service Reporter: Nicholas Jiang Fix For: 0.13.1 At present, DeletePartitionCommitActionExecutor prevents the partition from being dropped when there is pending table service operation. The pending table service should check whether the partition corresponding to the filegroup exists, not block the execution of DDL operation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-5553) ALTER TABLE DROP PARTITION DDL may cause data inconsistencies when table service actions are performed
[ https://issues.apache.org/jira/browse/HUDI-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-5553. Resolution: Fixed > ALTER TABLE DROP PARTITION DDL may cause data inconsistencies when table > service actions are performed > -- > > Key: HUDI-5553 > URL: https://issues.apache.org/jira/browse/HUDI-5553 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > > Issue described in detail here: > https://github.com/apache/hudi/issues/7663 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5558) Serializable interface implementation don't explicitly declare serialVersionUID
Nicholas Jiang created HUDI-5558: Summary: Serializable interface implementation don't explicitly declare serialVersionUID Key: HUDI-5558 URL: https://issues.apache.org/jira/browse/HUDI-5558 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Assignee: Nicholas Jiang Fix For: 0.13.0 Serializable interface implementation don't explicitly declare serialVersionUID, which causes the InvalidClassException for the deserialization. Serializable interface implementation should explicitly declare serialVersionUID for all the implementation including their subclass implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5543) Description of clustering.plan.partition.filter.mode supports DAY_ROLLING strategy
[ https://issues.apache.org/jira/browse/HUDI-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5543: Assignee: Nicholas Jiang > Description of clustering.plan.partition.filter.mode supports DAY_ROLLING > strategy > -- > > Key: HUDI-5543 > URL: https://issues.apache.org/jira/browse/HUDI-5543 > Project: Apache Hudi > Issue Type: Sub-task > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.0 > > > The description of clustering.plan.partition.filter.mode doesn't support > DAY_ROLLING strategy, which has been supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5543) Description of clustering.plan.partition.filter.mode supports DAY_ROLLING strategy
Nicholas Jiang created HUDI-5543: Summary: Description of clustering.plan.partition.filter.mode supports DAY_ROLLING strategy Key: HUDI-5543 URL: https://issues.apache.org/jira/browse/HUDI-5543 Project: Apache Hudi Issue Type: Sub-task Components: flink Reporter: Nicholas Jiang Fix For: 0.13.0 The description of clustering.plan.partition.filter.mode doesn't support DAY_ROLLING strategy, which has been supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS
[ https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675992#comment-17675992 ] Nicholas Jiang commented on HUDI-5531: -- [~yihua], [~xleesf] , WDYT? > RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to > RECENT_PARTITIONS > > > Key: HUDI-5531 > URL: https://issues.apache.org/jira/browse/HUDI-5531 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.0 > > > The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: > output recent partition given skip num and days lookback config, therefore > the RECENT_DAYS strategy doesn't match the semantics because it assumes that > Hudi partitions are partitioned by day, but partitioning by hour can also use > this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode > should rename to RECENT_PARTITIONS for the semantics match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS
[ https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5531: - Issue Type: Improvement (was: Task) > RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to > RECENT_PARTITIONS > > > Key: HUDI-5531 > URL: https://issues.apache.org/jira/browse/HUDI-5531 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Priority: Minor > Fix For: 0.13.0 > > > The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: > output recent partition given skip num and days lookback config, therefore > the RECENT_DAYS strategy doesn't match the semantics because it assumes that > Hudi partitions are partitioned by day, but partitioning by hour can also use > this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode > should rename to RECENT_PARTITIONS for the semantics match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS
Nicholas Jiang created HUDI-5531: Summary: RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS Key: HUDI-5531 URL: https://issues.apache.org/jira/browse/HUDI-5531 Project: Apache Hudi Issue Type: Task Reporter: Nicholas Jiang Fix For: 0.13.0 The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: output recent partition given skip num and days lookback config, therefore the RECENT_DAYS strategy doesn't match the semantics because it assumes that Hudi partitions are partitioned by day, but partitioning by hour can also use this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS for the semantics match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS
[ https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5531: Assignee: Nicholas Jiang > RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to > RECENT_PARTITIONS > > > Key: HUDI-5531 > URL: https://issues.apache.org/jira/browse/HUDI-5531 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.0 > > > The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: > output recent partition given skip num and days lookback config, therefore > the RECENT_DAYS strategy doesn't match the semantics because it assumes that > Hudi partitions are partitioned by day, but partitioning by hour can also use > this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode > should rename to RECENT_PARTITIONS for the semantics match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5506) StreamWriteOperatorCoordinator may not recommit with partial uncommitted write metadata event
Nicholas Jiang created HUDI-5506: Summary: StreamWriteOperatorCoordinator may not recommit with partial uncommitted write metadata event Key: HUDI-5506 URL: https://issues.apache.org/jira/browse/HUDI-5506 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.12.2 Reporter: Nicholas Jiang AbstractStreamWriteFunction may get the different pending instant for checkpoint among the subtasks because the StreamWriteOperatorCoordinator may be committing the instant of the last completed checkpoint when AbstractStreamWriteFunction invokes snapshotState. StreamWriteOperatorCoordinator may not recommit with partial uncommitted write metadata event when handling the last boostrap event which is empty boostrap event. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5506) StreamWriteOperatorCoordinator may not recommit with partial uncommitted write metadata event
[ https://issues.apache.org/jira/browse/HUDI-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5506: Assignee: Nicholas Jiang > StreamWriteOperatorCoordinator may not recommit with partial uncommitted > write metadata event > - > > Key: HUDI-5506 > URL: https://issues.apache.org/jira/browse/HUDI-5506 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.12.2 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > > AbstractStreamWriteFunction may get the different pending instant for > checkpoint among the subtasks because the StreamWriteOperatorCoordinator may > be committing the instant of the last completed checkpoint when > AbstractStreamWriteFunction invokes snapshotState. > StreamWriteOperatorCoordinator may not recommit with partial uncommitted > write metadata event when handling the last boostrap event which is empty > boostrap event. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode
[ https://issues.apache.org/jira/browse/HUDI-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5343: - Description: HoodieFlinkStreamer supports async clustering for append mode, which keeps the consistent with the pipeline of HoodieTableSink. (was: HoodieFlinkStreamer supports async clustering for append mode, which keep the consistent with the pipeline of HoodieTableSink.) > HoodieFlinkStreamer supports async clustering for append mode > - > > Key: HUDI-5343 > URL: https://issues.apache.org/jira/browse/HUDI-5343 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.12.2 > > > HoodieFlinkStreamer supports async clustering for append mode, which keeps > the consistent with the pipeline of HoodieTableSink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode
[ https://issues.apache.org/jira/browse/HUDI-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5343: Assignee: Nicholas Jiang > HoodieFlinkStreamer supports async clustering for append mode > - > > Key: HUDI-5343 > URL: https://issues.apache.org/jira/browse/HUDI-5343 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.12.2 > > > HoodieFlinkStreamer supports async clustering for append mode, which keep the > consistent with the pipeline of HoodieTableSink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode
Nicholas Jiang created HUDI-5343: Summary: HoodieFlinkStreamer supports async clustering for append mode Key: HUDI-5343 URL: https://issues.apache.org/jira/browse/HUDI-5343 Project: Apache Hudi Issue Type: Improvement Reporter: Nicholas Jiang Fix For: 0.12.2 HoodieFlinkStreamer supports async clustering for append mode, which keep the consistent with the pipeline of HoodieTableSink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5252) ClusteringCommitSink supports to rollback clustering
Nicholas Jiang created HUDI-5252: Summary: ClusteringCommitSink supports to rollback clustering Key: HUDI-5252 URL: https://issues.apache.org/jira/browse/HUDI-5252 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Fix For: 0.13.0 When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink invokes the CompactionUtil#rollbackCompaction to rollback clustering. ClusteringCommitSink should call ClusteringUtil#rollbackClustering to rollback clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5252) ClusteringCommitSink supports to rollback clustering
[ https://issues.apache.org/jira/browse/HUDI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5252: Assignee: Nicholas Jiang > ClusteringCommitSink supports to rollback clustering > > > Key: HUDI-5252 > URL: https://issues.apache.org/jira/browse/HUDI-5252 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.13.0 > > > When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink > invokes the CompactionUtil#rollbackCompaction to rollback clustering. > ClusteringCommitSink should call ClusteringUtil#rollbackClustering to > rollback clustering. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5206) RowColumnReader should not return null value for certain null child columns
[ https://issues.apache.org/jira/browse/HUDI-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5206: Assignee: Nicholas Jiang > RowColumnReader should not return null value for certain null child columns > --- > > Key: HUDI-5206 > URL: https://issues.apache.org/jira/browse/HUDI-5206 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.13.0 > > > When reading to vector of certain null child columns of row type column, > RowColumnReader should not return null value because the value of the row > type column may not be null, which results in incorrect values of row type > column. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5206) RowColumnReader should not return null value for certain null child columns
Nicholas Jiang created HUDI-5206: Summary: RowColumnReader should not return null value for certain null child columns Key: HUDI-5206 URL: https://issues.apache.org/jira/browse/HUDI-5206 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Nicholas Jiang Fix For: 0.13.0 When reading to vector of certain null child columns of row type column, RowColumnReader should not return null value because the value of the row type column may not be null, which results in incorrect values of row type column. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-1741) Row Level TTL Support for records stored in Hudi
[ https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625400#comment-17625400 ] Nicholas Jiang edited comment on HUDI-1741 at 10/28/22 3:10 AM: [~shivnarayan], IMO, each record of hudi has the commit time of hudi. The solution is to first follow the TTL, do not display expired data when checking, or even push down to the data source directly, and then delete it when doing operations such as clustering that need to rewrite the data. WDYT? cc [~xleesf] was (Author: nicholasjiang): [~shivnarayan], IMO, each record of hudi has the commit time of hudi. The solution is to first follow the TTL, do not display expired data when checking, or even push down to the data source directly, and then delete it when doing operations such as clustering that need to rewrite the data. WDYT? > Row Level TTL Support for records stored in Hudi > > > Key: HUDI-1741 > URL: https://issues.apache.org/jira/browse/HUDI-1741 > Project: Apache Hudi > Issue Type: New Feature > Components: Utilities >Reporter: Balaji Varadarajan >Priority: Major > > For e:g : Have records only updated last month > > GH: https://github.com/apache/hudi/issues/2743 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-1741) Row Level TTL Support for records stored in Hudi
[ https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625400#comment-17625400 ] Nicholas Jiang commented on HUDI-1741: -- [~shivnarayan], IMO, each record of hudi has the commit time of hudi. The solution is to first follow the TTL, do not display expired data when checking, or even push down to the data source directly, and then delete it when doing operations such as clustering that need to rewrite the data. WDYT? > Row Level TTL Support for records stored in Hudi > > > Key: HUDI-1741 > URL: https://issues.apache.org/jira/browse/HUDI-1741 > Project: Apache Hudi > Issue Type: New Feature > Components: Utilities >Reporter: Balaji Varadarajan >Priority: Major > > For e:g : Have records only updated last month > > GH: https://github.com/apache/hudi/issues/2743 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5049) HoodieCatalog supports the implementation of dropPartition
[ https://issues.apache.org/jira/browse/HUDI-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-5049: - Description: HoodieCatalog doesn't support the implementation of dropPartition at present, which is adaptive for the scenario that current partition backfills. (was: HoodieCatalog doesn't support the implementation of dropPartition at present, which is useful for the Hudi current partition backfill scenario.) > HoodieCatalog supports the implementation of dropPartition > -- > > Key: HUDI-5049 > URL: https://issues.apache.org/jira/browse/HUDI-5049 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.0 > > > HoodieCatalog doesn't support the implementation of dropPartition at present, > which is adaptive for the scenario that current partition backfills. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5049) HoodieCatalog supports the implementation of dropPartition
[ https://issues.apache.org/jira/browse/HUDI-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5049: Assignee: Nicholas Jiang > HoodieCatalog supports the implementation of dropPartition > -- > > Key: HUDI-5049 > URL: https://issues.apache.org/jira/browse/HUDI-5049 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > Fix For: 0.13.0 > > > HoodieCatalog doesn't support the implementation of dropPartition at present, > which is useful for the Hudi current partition backfill scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5049) HoodieCatalog supports the implementation of dropPartition
Nicholas Jiang created HUDI-5049: Summary: HoodieCatalog supports the implementation of dropPartition Key: HUDI-5049 URL: https://issues.apache.org/jira/browse/HUDI-5049 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang Fix For: 0.13.0 HoodieCatalog doesn't support the implementation of dropPartition at present, which is useful for the Hudi current partition backfill scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled
[ https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-4914: - Status: In Progress (was: Open) > Managed memory weight should be set when sort clustering is enabled > --- > > Key: HUDI-4914 > URL: https://issues.apache.org/jira/browse/HUDI-4914 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.1 > > > Managed memory weight should be set when sort clustering is enabled, > otherwise the fraction of memory to allocate is 0 that throws the following > exception when initialzing the sorter: > {code:java} > java.lang.IllegalArgumentException: The fraction of memory to allocate should > not be 0. Please make sure that all types of managed memory consumers > contained in the job are configured with a non-negative weight via > `taskmanager.memory.managed.consumer-weights`. at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160) > at > org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672) > at > org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653) > at > org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66) > at > org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351) > at > org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157) > at > org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93) > at > org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107) > at > org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) > at > org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$execut
[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled
[ https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-4914: - Affects Version/s: (was: 0.12.0) > Managed memory weight should be set when sort clustering is enabled > --- > > Key: HUDI-4914 > URL: https://issues.apache.org/jira/browse/HUDI-4914 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Critical > Fix For: 0.12.1 > > > Managed memory weight should be set when sort clustering is enabled, > otherwise the fraction of memory to allocate is 0 that throws the following > exception when initialzing the sorter: > {code:java} > java.lang.IllegalArgumentException: The fraction of memory to allocate should > not be 0. Please make sure that all types of managed memory consumers > contained in the job are configured with a non-negative weight via > `taskmanager.memory.managed.consumer-weights`. at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160) > at > org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672) > at > org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653) > at > org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66) > at > org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351) > at > org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157) > at > org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93) > at > org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107) > at > org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) > at > org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) >
[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled
[ https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-4914: - Priority: Critical (was: Major) > Managed memory weight should be set when sort clustering is enabled > --- > > Key: HUDI-4914 > URL: https://issues.apache.org/jira/browse/HUDI-4914 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.12.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Critical > > Managed memory weight should be set when sort clustering is enabled, > otherwise the fraction of memory to allocate is 0 that throws the following > exception when initialzing the sorter: > {code:java} > java.lang.IllegalArgumentException: The fraction of memory to allocate should > not be 0. Please make sure that all types of managed memory consumers > contained in the job are configured with a non-negative weight via > `taskmanager.memory.managed.consumer-weights`. at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160) > at > org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672) > at > org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653) > at > org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66) > at > org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351) > at > org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157) > at > org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93) > at > org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107) > at > org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) > at > org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) > at
[jira] [Assigned] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled
[ https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-4914: Assignee: Nicholas Jiang > Managed memory weight should be set when sort clustering is enabled > --- > > Key: HUDI-4914 > URL: https://issues.apache.org/jira/browse/HUDI-4914 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.12.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > > Managed memory weight should be set when sort clustering is enabled, > otherwise the fraction of memory to allocate is 0 that throws the following > exception when initialzing the sorter: > {code:java} > java.lang.IllegalArgumentException: The fraction of memory to allocate should > not be 0. Please make sure that all types of managed memory consumers > contained in the job are configured with a non-negative weight via > `taskmanager.memory.managed.consumer-weights`. at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160) > at > org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672) > at > org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653) > at > org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66) > at > org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351) > at > org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157) > at > org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93) > at > org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107) > at > org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) > at > org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) > at > or
[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled
[ https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang updated HUDI-4914: - Fix Version/s: 0.12.1 > Managed memory weight should be set when sort clustering is enabled > --- > > Key: HUDI-4914 > URL: https://issues.apache.org/jira/browse/HUDI-4914 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.12.0 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Critical > Fix For: 0.12.1 > > > Managed memory weight should be set when sort clustering is enabled, > otherwise the fraction of memory to allocate is 0 that throws the following > exception when initialzing the sorter: > {code:java} > java.lang.IllegalArgumentException: The fraction of memory to allocate should > not be 0. Please make sure that all types of managed memory consumers > contained in the job are configured with a non-negative weight via > `taskmanager.memory.managed.consumer-weights`. at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160) > at > org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672) > at > org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653) > at > org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66) > at > org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351) > at > org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157) > at > org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93) > at > org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107) > at > org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458) > at > org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) > at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) > at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) > at > org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask
[jira] [Created] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled
Nicholas Jiang created HUDI-4914: Summary: Managed memory weight should be set when sort clustering is enabled Key: HUDI-4914 URL: https://issues.apache.org/jira/browse/HUDI-4914 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.12.0 Reporter: Nicholas Jiang Managed memory weight should be set when sort clustering is enabled, otherwise the fraction of memory to allocate is 0 that throws the following exception when initialzing the sorter: {code:java} java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0. Please make sure that all types of managed memory consumers contained in the job are configured with a non-negative weight via `taskmanager.memory.managed.consumer-weights`. at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160) at org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672) at org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653) at org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66) at org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351) at org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157) at org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93) at org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107) at org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140) at org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458) at org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127) at org.junit.pla
[jira] (HUDI-4269) Support multiple precombine fields
[ https://issues.apache.org/jira/browse/HUDI-4269 ] Nicholas Jiang deleted comment on HUDI-4269: -- was (Author: nicholasjiang): [~danny0405], I have interest to multiple precombine fields. Could you please assign this ticket to me? > Support multiple precombine fields > -- > > Key: HUDI-4269 > URL: https://issues.apache.org/jira/browse/HUDI-4269 > Project: Apache Hudi > Issue Type: New Feature > Components: core >Reporter: Danny Chen >Assignee: Nicholas Jiang >Priority: Major > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-1161) Support update partial fields for MoR table
[ https://issues.apache.org/jira/browse/HUDI-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-1161: Assignee: Jian Feng (was: Nicholas Jiang) > Support update partial fields for MoR table > --- > > Key: HUDI-1161 > URL: https://issues.apache.org/jira/browse/HUDI-1161 > Project: Apache Hudi > Issue Type: Sub-task > Components: writer-core >Reporter: leesf >Assignee: Jian Feng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-2545) Flink compaction source supports the Source interface based on FLIP-27
[ https://issues.apache.org/jira/browse/HUDI-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-2545: Assignee: yuemeng (was: Nicholas Jiang) > Flink compaction source supports the Source interface based on FLIP-27 > -- > > Key: HUDI-2545 > URL: https://issues.apache.org/jira/browse/HUDI-2545 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: yuemeng >Priority: Major > > The CompactionPlanSourceFunction is the Flink hudi compaction source function > which implements the SourceFunction interface. The new Source interface is > introduced in Flink and the compaction source could support the Source based > on the FLIP-27. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-2441) To support partial update function which can move and update the data from the old partition to the new partition , when the data with same key change it's partition
[ https://issues.apache.org/jira/browse/HUDI-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-2441: Assignee: David_Liang (was: Nicholas Jiang) > To support partial update function which can move and update the data from > the old partition to the new partition , when the data with same key change > it's partition > - > > Key: HUDI-2441 > URL: https://issues.apache.org/jira/browse/HUDI-2441 > Project: Apache Hudi > Issue Type: Improvement > Components: storage-management >Reporter: David_Liang >Assignee: David_Liang >Priority: Major > > to considerate such a scene, there 2 reocod *in different batch* as follow > ||post_id ||position||weight||ts||day || > | 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}| > | 1|beijing|3KG|1630652828|{color:#ff}20210903{color}| > > when using the {color:#ff}*Global Index*{color} with such sql > > {code:java} > merge into target_hudi_table t > using ( > select post_id, position, ts , day from source_table > ) as s > on t.id = s.id > when natched then update set t.position = s.position, t.ts=s.ts, t.day = > s.day > when not matched then insert * > {code} > > Beacuse now the hudi engine haven't support *cross partitions partial merge > into,* the result in the target table is > > ||post_id (as primiary key)||position||weight||ts||day|| > | 1|beijing| |1630652828|*{color:#ff}20210903{color}*| > the record still in the old parition. > > but the *expected* result is > ||post_id (as primiary key)||position||weight||ts||day|| > | > 1|beijing|*{color:#ff}3KG{color}*|1630652828|{color:#ff}*20210903*{color}| > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4551) The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment
[ https://issues.apache.org/jira/browse/HUDI-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-4551: Assignee: Nicholas Jiang > The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the > parallelism of the execution environment > -- > > Key: HUDI-4551 > URL: https://issues.apache.org/jira/browse/HUDI-4551 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > > The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is 4, which > could be the parallelism of the execution environment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4551) The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment
Nicholas Jiang created HUDI-4551: Summary: The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment Key: HUDI-4551 URL: https://issues.apache.org/jira/browse/HUDI-4551 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is 4, which could be the parallelism of the execution environment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering
[ https://issues.apache.org/jira/browse/HUDI-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-4536. Reviewers: Danny Chen Resolution: Fixed > ClusteringOperator causes the NullPointerException when writing with > BulkInsertWriterHelper in clustering > - > > Key: HUDI-4536 > URL: https://issues.apache.org/jira/browse/HUDI-4536 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > ClusteringOperator causes the NullPointerException when writing with > BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper > isn't set to null after close. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering
[ https://issues.apache.org/jira/browse/HUDI-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-4536: Assignee: Nicholas Jiang > ClusteringOperator causes the NullPointerException when writing with > BulkInsertWriterHelper in clustering > - > > Key: HUDI-4536 > URL: https://issues.apache.org/jira/browse/HUDI-4536 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > > ClusteringOperator causes the NullPointerException when writing with > BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper > isn't set to null after close. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering
Nicholas Jiang created HUDI-4536: Summary: ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering Key: HUDI-4536 URL: https://issues.apache.org/jira/browse/HUDI-4536 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Nicholas Jiang ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper isn't set to null after close. -- This message was sent by Atlassian Jira (v8.20.10#820010)