[jira] [Updated] (HUDI-6703) StreamWriteOperatorCoordinator should refresh the last txn metadata firstly for recommit

2023-08-15 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6703:
-
Description: StreamWriteOperatorCoordinator should refresh the last txn 
metadata firstly to prepare resolution of write conflict for recommit.  (was: 
StreamWriteOperatorCoordinator should refresh the last txn metadata firstly to 
prepare resolution of write conflict for recommit.)

> StreamWriteOperatorCoordinator should refresh the last txn metadata firstly 
> for recommit
> 
>
> Key: HUDI-6703
> URL: https://issues.apache.org/jira/browse/HUDI-6703
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> StreamWriteOperatorCoordinator should refresh the last txn metadata firstly 
> to prepare resolution of write conflict for recommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6703) StreamWriteOperatorCoordinator should refresh the last txn metadata firstly for recommit

2023-08-15 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6703:


 Summary: StreamWriteOperatorCoordinator should refresh the last 
txn metadata firstly for recommit
 Key: HUDI-6703
 URL: https://issues.apache.org/jira/browse/HUDI-6703
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang


StreamWriteOperatorCoordinator should refresh the last txn metadata firstly to 
prepare resolution of write conflict for recommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6669) HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores

2023-08-08 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6669:
-
Description: 
HoodieEngineContext should not use parallel stream with parallelism greater 
than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which 
stacktrace as follows:
{code:java}
Caused by: java.lang.OutOfMemoryError at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) 
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at 
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at 
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at 
org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170)
 at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891)
 at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 {code}
 

  was:
HoodieEngineContext should not use parallel stream with parallelism greater 
than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which 
stacktrace as follows:


Caused by: java.lang.OutOfMemoryError at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) 
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at 
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at 
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at 
org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170)
 at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891)
 at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)


> HoodieEngineContext should not use parallel stream with parallelism greater 
> than CPU cores
> --
>
> Key: HUDI-6669
> URL: https://issues.apache.org/jira/browse/HUDI-6669
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> HoodieEngineContext should not use parallel stream with parallelism greater 
> than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of 
> which stacktrace as follows:
> {code:java}
> Caused by: java.lang.OutOfMemoryError at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.Nativ

[jira] [Updated] (HUDI-6669) HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores

2023-08-08 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6669:
-
Description: 
HoodieEngineContext should not use parallel stream with parallelism greater 
than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which 
stacktrace as follows:


Caused by: java.lang.OutOfMemoryError at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598) 
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at 
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at 
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at 
org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145)
 at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170)
 at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891)
 at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

  was:
HoodieEngineContext should not use parallel stream with parallelism greater 
than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which 
stacktrace as follows:
Caused by: java.lang.OutOfMemoryError   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)  at 
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)  
 at 
java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at 
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)  at 
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)   at 
org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101)
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117)
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145)
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170)
 at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891)
   at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)


> HoodieEngineContext should not use parallel stream with parallelism greater 
> than CPU cores
> --
>
> Key: HUDI-6669
> URL: https://issues.apache.org/jira/browse/HUDI-6669
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> HoodieEngineContext should not use parallel stream with parallelism greater 
> than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of 
> which stacktrace as follows:
> Caused by: java.lang.OutOfMemoryError at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native

[jira] [Created] (HUDI-6669) HoodieEngineContext should not use parallel stream with parallelism greater than CPU cores

2023-08-08 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6669:


 Summary: HoodieEngineContext should not use parallel stream with 
parallelism greater than CPU cores
 Key: HUDI-6669
 URL: https://issues.apache.org/jira/browse/HUDI-6669
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


HoodieEngineContext should not use parallel stream with parallelism greater 
than CPU cores to avoid {{OutOfMemoryError}} of {{{}ForkJoinTask{}}}, of which 
stacktrace as follows:
Caused by: java.lang.OutOfMemoryError   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)  at 
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)  
 at 
java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at 
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)  at 
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)   at 
org.apache.hudi.client.common.HoodieFlinkEngineContext.map(HoodieFlinkEngineContext.java:101)
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:117)
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:145)
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:170)
 at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:353)
 at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1434)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891)
   at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:68)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6667) ClientIds should generate next id automatically with random uuid instead of incremental id

2023-08-07 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6667:


 Summary: ClientIds should generate next id automatically with 
random uuid instead of incremental id
 Key: HUDI-6667
 URL: https://issues.apache.org/jira/browse/HUDI-6667
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


ClientIds should generate next id automatically with random uuid instead of 
incremental id to avoid conflict of client id for concurrent batch insert 
overwrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6592) Flink insert overwrite should support dynamic partition instead of whole table

2023-07-25 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6592:


 Summary: Flink insert overwrite should support dynamic partition 
instead of whole table
 Key: HUDI-6592
 URL: https://issues.apache.org/jira/browse/HUDI-6592
 Project: Apache Hudi
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang


Flink insert overwrite should support dynamic partition instead of the whole 
table, which behavior is consistent with the semantics of insert overwrite in 
Flink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6541) Multiple writers should create new and different instant time to avoid marker conflict of same instant

2023-07-16 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6541:


 Summary: Multiple writers should create new and different instant 
time to avoid marker conflict of same instant
 Key: HUDI-6541
 URL: https://issues.apache.org/jira/browse/HUDI-6541
 Project: Apache Hudi
  Issue Type: Bug
  Components: core
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


Even if the write results of commits have no conflict, multiple writers should 
create different instant time to avoid marker conflict of same instant. 
Meanwhile, multiple writers could create new instant time via the file system 
lock to generate different instant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6540) Support failed writes clean policy for Flink

2023-07-16 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6540:


 Summary: Support failed writes clean policy for Flink
 Key: HUDI-6540
 URL: https://issues.apache.org/jira/browse/HUDI-6540
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


The failed writes clean policy should be lazy when write concurrency mode is 
optimistic concurrency control. FlinkOptions should support to config failed 
writes clean policy. The parameters of FlinkStreamerConfig, 
FlinkCompactionConfig and FlinkClusteringConfig should also support failed 
writes clean policy parameter. Meanwhile, append mode without inline clustering 
should add clean operator in pipeline to rollback failed writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6519) The default value of read.streaming.enabled is determined by execution.runtime-mode

2023-07-10 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6519:


 Summary: The default value of read.streaming.enabled is determined 
by execution.runtime-mode
 Key: HUDI-6519
 URL: https://issues.apache.org/jira/browse/HUDI-6519
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


The default value of read.streaming.enabled could be determined by 
execution.runtime-mode from which you can choose depending on the requirements 
of your use case and the characteristics of your job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6501) StreamWriteOperatorCoordinator should recommit with starting heartbeat for lazy failed writes clean policy

2023-07-06 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6501:
-
Summary: StreamWriteOperatorCoordinator should recommit with starting 
heartbeat for lazy failed writes clean policy  (was: Recommit should not abort 
for heartbeat expired caused by the last failed write)

> StreamWriteOperatorCoordinator should recommit with starting heartbeat for 
> lazy failed writes clean policy
> --
>
> Key: HUDI-6501
> URL: https://issues.apache.org/jira/browse/HUDI-6501
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.14.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When the last write had failed, HoodieHeartbeatClient would close with 
> stopping all heartbeat which includes deleting heartbeat file. Therefore 
> StreamWriteOperatorCoordinator should start heartbeat for lazy failed writes 
> clean policy to avoid aborting for heartbeat expired when recommitting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6501) Recommit should not abort for heartbeat expired caused by the last failed write

2023-07-06 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6501:
-
Description: When the last write had failed, HoodieHeartbeatClient would 
close with stopping all heartbeat which includes deleting heartbeat file. 
Therefore StreamWriteOperatorCoordinator should start heartbeat for lazy failed 
writes clean policy to avoid aborting for heartbeat expired when recommitting.  
(was: When the last write had failed, HoodieHeartbeatClient would close with 
stopping all heartbeat which includes deleting heartbeat file in flink job. 
Therefore, it isn't heartbeat expired for the last failed write when 
HoodieHeartbeatClient recommits. Fix HoodieHeartbeatClient does not check 
whether heartbeat is expired with the last failed writes for recommit.)

> Recommit should not abort for heartbeat expired caused by the last failed 
> write
> ---
>
> Key: HUDI-6501
> URL: https://issues.apache.org/jira/browse/HUDI-6501
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.14.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When the last write had failed, HoodieHeartbeatClient would close with 
> stopping all heartbeat which includes deleting heartbeat file. Therefore 
> StreamWriteOperatorCoordinator should start heartbeat for lazy failed writes 
> clean policy to avoid aborting for heartbeat expired when recommitting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6501) Recommit should not abort for heartbeat expired caused by the last failed write

2023-07-06 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6501:


 Summary: Recommit should not abort for heartbeat expired caused by 
the last failed write
 Key: HUDI-6501
 URL: https://issues.apache.org/jira/browse/HUDI-6501
 Project: Apache Hudi
  Issue Type: Bug
  Components: core
Affects Versions: 0.14.0
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


When the last write had failed, HoodieHeartbeatClient would close with stopping 
all heartbeat which includes deleting heartbeat file in flink job. Therefore, 
it isn't heartbeat expired for the last failed write when HoodieHeartbeatClient 
recommits. Fix HoodieHeartbeatClient does not check whether heartbeat is 
expired with the last failed writes for recommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6384) The commit action type of inflight instant for replacecommit should be replacecommit

2023-06-15 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6384:


 Summary: The commit action type of inflight instant for 
replacecommit should be replacecommit 
 Key: HUDI-6384
 URL: https://issues.apache.org/jira/browse/HUDI-6384
 Project: Apache Hudi
  Issue Type: Bug
  Components: core
Affects Versions: 0.14.0
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


BaseHoodieWriteClient#commitStats create the inflight instant with the 
incorrect commit action type determined by table type when committing the 
replacecommit, which should create the inflight instant with the replacecommit 
commit action type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip compaction and clustering instants to avoid duplicates

2023-06-05 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Description: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering, so that 
streaming reading may read T-1 day data when clustering the data of T-1 day to 
cause duplicated data. Therefore streaming read should skip clustering instants 
for all cases to avoid reading the replaced file slices. Same to 
`read.streaming.skip_compaction`.  (was: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering, so that 
streaming reading may read T-1 day data when clustering the data of T-1 day to 
cause duplicated data. Therefore streaming read should skip clustering instants 
for all cases to avoid reading the replaced file slices. The same to 
`read.streaming.skip_compaction`.)

> Streaming read should skip compaction and clustering instants to avoid 
> duplicates
> -
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices. Same to `read.streaming.skip_compaction`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip compaction and clustering instants to avoid duplicates

2023-06-05 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Description: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering, so that 
streaming reading may read T-1 day data when clustering the data of T-1 day to 
cause duplicated data. Therefore streaming read should skip clustering instants 
for all cases to avoid reading the replaced file slices. The same to 
`read.streaming.skip_compaction`.  (was: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering, so that 
streaming reading may read T-1 day data when clustering the data of T-1 day to 
cause duplicated data. Therefore streaming read should skip clustering instants 
for all cases to avoid reading the replaced file slices.)

> Streaming read should skip compaction and clustering instants to avoid 
> duplicates
> -
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices. The same to `read.streaming.skip_compaction`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip compaction and clustering instants to avoid duplicates

2023-06-05 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Summary: Streaming read should skip compaction and clustering instants to 
avoid duplicates  (was: Streaming read should skip clustering instants to avoid 
duplicated reading)

> Streaming read should skip compaction and clustering instants to avoid 
> duplicates
> -
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants to avoid duplicated reading

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Status: In Progress  (was: Open)

> Streaming read should skip clustering instants to avoid duplicated reading
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants to avoid duplicated reading

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Summary: Streaming read should skip clustering instants to avoid duplicated 
reading  (was: Streaming read should skip clustering instants to avoid 
deplicated reading)

> Streaming read should skip clustering instants to avoid duplicated reading
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants to avoid deplicated reading

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Summary: Streaming read should skip clustering instants to avoid deplicated 
reading  (was: Streaming read should skip clustering instants)

> Streaming read should skip clustering instants to avoid deplicated reading
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Description: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering, so that 
streaming reading may read T-1 day data when clustering the data of T-1 day to 
cause duplicated data. Therefore streaming read should skip clustering instants 
for all cases to avoid reading the replaced file slices.  (was: At present, the 
default value of read.streaming.skip_clustering is false, which could cause the 
situation that streaming reading reads the replaced file slices of clustering 
so that streaming reading may read T-1 day data when clustering the data of T-1 
day. Therefore streaming read should skip clustering instants for all cases to 
avoid reading the replaced file slices.)

> Streaming read should skip clustering instants
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering, so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day to cause duplicated data. Therefore 
> streaming read should skip clustering instants for all cases to avoid reading 
> the replaced file slices.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Description: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering so that 
streaming reading may read T-1 day data when clustering the data of T-1 day. 
Therefore   (was: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering so that 
streaming reading may read T-1 day data when clustering the data of T-1 day. 
Therefore read.streaming.skip_clustering should be true.)

> Streaming read should skip clustering instants
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day. Therefore 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Description: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering so that 
streaming reading may read T-1 day data when clustering the data of T-1 day. 
Therefore streaming read should skip clustering instants for all cases to avoid 
reading the replaced file slices.  (was: At present, the default value of 
read.streaming.skip_clustering is false, which could cause the situation that 
streaming reading reads the replaced file slices of clustering so that 
streaming reading may read T-1 day data when clustering the data of T-1 day. 
Therefore )

> Streaming read should skip clustering instants
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day. Therefore streaming read should skip 
> clustering instants for all cases to avoid reading the replaced file slices.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6317) Streaming read should skip clustering instants

2023-06-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6317:
-
Summary: Streaming read should skip clustering instants  (was: The default 
value of read.streaming.skip_clustering should be true)

> Streaming read should skip clustering instants
> --
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false, 
> which could cause the situation that streaming reading reads the replaced 
> file slices of clustering so that streaming reading may read T-1 day data 
> when clustering the data of T-1 day. Therefore read.streaming.skip_clustering 
> should be true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6317) The default value of read.streaming.skip_clustering should be true

2023-06-04 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6317:


 Summary: The default value of read.streaming.skip_clustering 
should be true
 Key: HUDI-6317
 URL: https://issues.apache.org/jira/browse/HUDI-6317
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


At present, the default value of read.streaming.skip_clustering is false, which 
could cause the situation that streaming reading reads the replaced file slices 
of clustering so that streaming reading may read T-1 day data when clustering 
the data of T-1 day. Therefore read.streaming.skip_clustering should be true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6292) HoodieRealtimeRecordReader#constructRecordReader leads memory leak

2023-05-31 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-6292.

Resolution: Duplicate

> HoodieRealtimeRecordReader#constructRecordReader leads memory leak
> --
>
> Key: HUDI-6292
> URL: https://issues.apache.org/jira/browse/HUDI-6292
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Affects Versions: 0.14.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> The exception caused by HoodieRealtimeRecordReader wich constructs record 
> reader based on job configuration leads memory leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6292) HoodieRealtimeRecordReader#constructRecordReader leads memory leak

2023-05-31 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6292:
-
Summary: HoodieRealtimeRecordReader#constructRecordReader leads memory leak 
 (was: HoodieRealtimeRecordReader leads memory leak)

> HoodieRealtimeRecordReader#constructRecordReader leads memory leak
> --
>
> Key: HUDI-6292
> URL: https://issues.apache.org/jira/browse/HUDI-6292
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Affects Versions: 0.14.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> The exception caused by HoodieRealtimeRecordReader wich constructs record 
> reader based on job configuration leads memory leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6292) HoodieRealtimeRecordReader leads memory leak

2023-05-31 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6292:


 Summary: HoodieRealtimeRecordReader leads memory leak
 Key: HUDI-6292
 URL: https://issues.apache.org/jira/browse/HUDI-6292
 Project: Apache Hudi
  Issue Type: Bug
  Components: reader-core
Affects Versions: 0.14.0
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang


The exception caused by HoodieRealtimeRecordReader wich constructs record 
reader based on job configuration leads memory leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6281) Comprehensive schema evolution supports column change with a default value

2023-05-29 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6281:
-
Status: In Progress  (was: Open)

> Comprehensive schema evolution supports column change with a default value
> --
>
> Key: HUDI-6281
> URL: https://issues.apache.org/jira/browse/HUDI-6281
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: core
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> Comprehensive schema evolution should support column change with a default 
> value, which could add column with a default value etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6281) Comprehensive schema evolution supports column change with a default value

2023-05-29 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6281:


 Summary: Comprehensive schema evolution supports column change 
with a default value
 Key: HUDI-6281
 URL: https://issues.apache.org/jira/browse/HUDI-6281
 Project: Apache Hudi
  Issue Type: New Feature
  Components: core
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


Comprehensive schema evolution should support column change with a default 
value, which could add column with a default value etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6229) HoodieInternalWriteStatus marks failure with totalErrorRecords increment

2023-05-17 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6229:


 Summary: HoodieInternalWriteStatus marks failure with 
totalErrorRecords increment
 Key: HUDI-6229
 URL: https://issues.apache.org/jira/browse/HUDI-6229
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.14.0
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang


HoodieInternalWriteStatus should mark failure with totalErrorRecords increment. 
Otherwise BulkInsertWriterHelper#toWriteStatus could not get the correct value 
of totalErrorRecords, which cause that ClusteringCommitSink could not rollback 
clustering when ClusteringCommitEvent has errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6218) Support instant_time/commit_time of savepoint procedure optional

2023-05-15 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6218:


 Summary: Support instant_time/commit_time of savepoint procedure 
optional
 Key: HUDI-6218
 URL: https://issues.apache.org/jira/browse/HUDI-6218
 Project: Apache Hudi
  Issue Type: Improvement
  Components: spark
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


Procedure parameter instant_time/commit_time of savepoint procedure optional 
could be optional and uses the latest instant when the value of 
instant_time/commit_time is null or empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode

2023-05-10 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6192:
-
Summary: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports 
streaming mode in service mode  (was: HoodieFlinkCompactor and 
HoodieFlinkClustering supports streaming mode in service mode)

> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode
> -
>
> Key: HUDI-6192
> URL: https://issues.apache.org/jira/browse/HUDI-6192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in 
> service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode

2023-05-10 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6192:
-
Description: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports 
streaming mode in service mode of Flink offline compaction and clustering.  
(was: HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in 
service mode of Flink offline compaction and clustering.)

> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode
> -
>
> Key: HUDI-6192
> URL: https://issues.apache.org/jira/browse/HUDI-6192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode

2023-05-09 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6192:
-
Description: HoodieFlinkCompactor and HoodieFlinkClustering supports 
streaming mode in service mode of Flink offline compaction and clustering.  
(was: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode 
in service mode of Flink offline compaction and clustering.)

> HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in 
> service mode
> --
>
> Key: HUDI-6192
> URL: https://issues.apache.org/jira/browse/HUDI-6192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in 
> service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in service mode

2023-05-09 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6192:
-
Summary: HoodieFlinkCompactor and HoodieFlinkClustering supports streaming 
mode in service mode  (was: HoodieFlinkCompactor and HoodieFlinkClusteringJob 
supports streaming mode in service mode)

> HoodieFlinkCompactor and HoodieFlinkClustering supports streaming mode in 
> service mode
> --
>
> Key: HUDI-6192
> URL: https://issues.apache.org/jira/browse/HUDI-6192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode

2023-05-09 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6192:
-
Description: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports 
streaming mode in service mode of Flink offline compaction and clustering.  
(was: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode 
for service mode of Flink offline compaction and clustering.)

> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode
> -
>
> Key: HUDI-6192
> URL: https://issues.apache.org/jira/browse/HUDI-6192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in service mode

2023-05-09 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-6192:
-
Summary: HoodieFlinkCompactor and HoodieFlinkClusteringJob supports 
streaming mode in service mode  (was: HoodieFlinkCompactor and 
HoodieFlinkClusteringJob supports streaming mode)

> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode in 
> service mode
> -
>
> Key: HUDI-6192
> URL: https://issues.apache.org/jira/browse/HUDI-6192
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode for 
> service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6192) HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode

2023-05-08 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6192:


 Summary: HoodieFlinkCompactor and HoodieFlinkClusteringJob 
supports streaming mode
 Key: HUDI-6192
 URL: https://issues.apache.org/jira/browse/HUDI-6192
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


HoodieFlinkCompactor and HoodieFlinkClusteringJob supports streaming mode for 
service mode of Flink offline compaction and clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6158) Strengthen Flink clustering commit and rollback strategy

2023-04-30 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6158:


 Summary: Strengthen Flink clustering commit and rollback strategy
 Key: HUDI-6158
 URL: https://issues.apache.org/jira/browse/HUDI-6158
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.14.0


`ClusteringCommitSink` could strengthen commit and rollback strategy from two 
solutions:
 * Commit: Introduces `clusteringPlanCache` that caches to store clustering 
plan for each instant. `clusteringPlanCache` stores the mapping of instant_time 
-> clusteringPlan.
 * Rolback: Updates `commitBuffer` that stores the mapping of instant_time -> 
file_ids -> event. Use a map to collect the events because the rolling back of 
intermediate clustering tasks generates corrupt events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6135) FlinkClusteringConfig adds --sort-memory option to support write.sort.memory config

2023-04-24 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-6135:


Assignee: Nicholas Jiang

> FlinkClusteringConfig adds --sort-memory option to support write.sort.memory 
> config
> ---
>
> Key: HUDI-6135
> URL: https://issues.apache.org/jira/browse/HUDI-6135
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> FlinkClusteringConfig should add --sort-memory option to support 
> write.sort.memory config, otherwise FlinkClusteringJob couldn't config the 
> sort memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6135) FlinkClusteringConfig adds --sort-memory option to support write.sort.memory config

2023-04-24 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6135:


 Summary: FlinkClusteringConfig adds --sort-memory option to 
support write.sort.memory config
 Key: HUDI-6135
 URL: https://issues.apache.org/jira/browse/HUDI-6135
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
 Fix For: 0.14.0


FlinkClusteringConfig should add --sort-memory option to support 
write.sort.memory config, otherwise FlinkClusteringJob couldn't config the sort 
memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6066) HoodieTableSource supports parquet predicate push down

2023-04-12 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-6066:


Assignee: Nicholas Jiang

> HoodieTableSource supports parquet predicate push down
> --
>
> Key: HUDI-6066
> URL: https://issues.apache.org/jira/browse/HUDI-6066
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> HoodieTableSource supports the implementation of SupportsFilterPushDown 
> interface that push down filter into FileIndex. HoodieTableSource should 
> support parquet predicate push down for query performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6066) HoodieTableSource supports parquet predicate push down

2023-04-12 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-6066:


 Summary: HoodieTableSource supports parquet predicate push down
 Key: HUDI-6066
 URL: https://issues.apache.org/jira/browse/HUDI-6066
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang


HoodieTableSource supports the implementation of SupportsFilterPushDown 
interface that push down filter into FileIndex. HoodieTableSource should 
support parquet predicate push down for query performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reopened HUDI-5728:
--

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang resolved HUDI-5728.
--

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reopened HUDI-5772:
--

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang resolved HUDI-5772.
--

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5531.

Resolution: Won't Fix

> RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to 
> RECENT_PARTITIONS
> 
>
> Key: HUDI-5531
> URL: https://issues.apache.org/jira/browse/HUDI-5531
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.1
>
>
> The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: 
> output recent partition given skip num and days lookback config, therefore 
> the RECENT_DAYS strategy doesn't match the semantics because it assumes that 
> Hudi partitions are partitioned by day, but partitioning by hour can also use 
> this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode 
> should rename to RECENT_PARTITIONS for the semantics match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-2503) HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-2503.

Resolution: Fixed

> HoodieFlinkWriteClient supports to allow parallel writing to tables using 
> Locking service
> -
>
> Key: HUDI-2503
> URL: https://issues.apache.org/jira/browse/HUDI-2503
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> The strategy interface for conflict resolution with multiple writers is 
> introduced and the SparkRDDWriteClient has integrated with the 
> ConflictResolutionStrategy. HoodieFlinkWriteClient should also support to 
> allow parallel writing to tables using Locking service based on 
> ConflictResolutionStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5772.

Resolution: Fixed

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5728.

Resolution: Fixed

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false

2023-02-14 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5787:
-
Description: HMSDDLExecutor should set the table type of Hive table to 
EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync 
config is set to false.  (was: HoodieHiveCatalog should not delete data when 
dropping the Hive external table, for example, the value of the 
'hoodie.datasource.hive_sync.create_managed_table' config is false.)

> HMSDDLExecutor should set table type to EXTERNAL_TABLE when 
> hoodie.datasource.hive_sync.create_managed_table of sync config is false
> 
>
> Key: HUDI-5787
> URL: https://issues.apache.org/jira/browse/HUDI-5787
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> HMSDDLExecutor should set the table type of Hive table to EXTERNAL_TABLE when 
> hoodie.datasource.hive_sync.create_managed_table of sync config is set to 
> false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false

2023-02-14 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5787:
-
Summary: HMSDDLExecutor should set table type to EXTERNAL_TABLE when 
hoodie.datasource.hive_sync.create_managed_table of sync config is false  (was: 
HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting 
hoodie.datasource.hive_sync.create_managed_table to false)

> HMSDDLExecutor should set table type to EXTERNAL_TABLE when 
> hoodie.datasource.hive_sync.create_managed_table of sync config is false
> 
>
> Key: HUDI-5787
> URL: https://issues.apache.org/jira/browse/HUDI-5787
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> HoodieHiveCatalog should not delete data when dropping the Hive external 
> table, for example, the value of the 
> 'hoodie.datasource.hive_sync.create_managed_table' config is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting hoodie.datasource.hive_sync.create_managed_table to false

2023-02-14 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5787:
-
Summary: HMSDDLExecutor should set table type to EXTERNAL_TABLE when 
setting hoodie.datasource.hive_sync.create_managed_table to false  (was: 
HoodieHiveCatalog should not delete data for dropping external table)

> HMSDDLExecutor should set table type to EXTERNAL_TABLE when setting 
> hoodie.datasource.hive_sync.create_managed_table to false
> -
>
> Key: HUDI-5787
> URL: https://issues.apache.org/jira/browse/HUDI-5787
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> HoodieHiveCatalog should not delete data when dropping the Hive external 
> table, for example, the value of the 
> 'hoodie.datasource.hive_sync.create_managed_table' config is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5787) HoodieHiveCatalog should not delete data for dropping external table

2023-02-13 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5787:


Assignee: Nicholas Jiang

> HoodieHiveCatalog should not delete data for dropping external table
> 
>
> Key: HUDI-5787
> URL: https://issues.apache.org/jira/browse/HUDI-5787
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.13.1
>
>
> HoodieHiveCatalog should not delete data when dropping the Hive external 
> table, for example, the value of the 
> 'hoodie.datasource.hive_sync.create_managed_table' config is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5787) HoodieHiveCatalog should not delete data for dropping external table

2023-02-13 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5787:


 Summary: HoodieHiveCatalog should not delete data for dropping 
external table
 Key: HUDI-5787
 URL: https://issues.apache.org/jira/browse/HUDI-5787
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
 Fix For: 0.13.1


HoodieHiveCatalog should not delete data when dropping the Hive external table, 
for example, the value of the 
'hoodie.datasource.hive_sync.create_managed_table' config is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-1741) Row Level TTL Support for records stored in Hudi

2023-02-13 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-1741:


Assignee: Nicholas Jiang

> Row Level TTL Support for records stored in Hudi
> 
>
> Key: HUDI-1741
> URL: https://issues.apache.org/jira/browse/HUDI-1741
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Utilities
>Reporter: Balaji Varadarajan
>Assignee: Nicholas Jiang
>Priority: Major
>
> For e:g : Have records only updated last month 
>  
> GH: https://github.com/apache/hudi/issues/2743



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-02-12 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5772:


 Summary: Align Flink clustering configuration with 
HoodieClusteringConfig
 Key: HUDI-5772
 URL: https://issues.apache.org/jira/browse/HUDI-5772
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.13.1
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang


In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
'clustering.plan.strategy.cluster.begin.partition', 
'clustering.plan.strategy.cluster.end.partition', 
'clustering.plan.strategy.partition.regex.pattern', 
'clustering.plan.strategy.partition.selected' options which do not align the 
clustering configuration of HoodieClusteringConfig. FlinkOptions, 
FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-02-07 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5728:
-
Summary: HoodieTimelineArchiver archives the latest instant before inflight 
replacecommit  (was: HoodieTimelineArchiver archive the latest instant before 
inflight replacecommit)

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5728) HoodieTimelineArchiver archive the latest instant before inflight replacecommit

2023-02-07 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5728:


 Summary: HoodieTimelineArchiver archive the latest instant before 
inflight replacecommit
 Key: HUDI-5728
 URL: https://issues.apache.org/jira/browse/HUDI-5728
 Project: Apache Hudi
  Issue Type: Bug
  Components: table-service
Reporter: Nicholas Jiang
 Fix For: 0.14.0


When inline or async clustering is enabled, we need to ensure that there is a 
commit in the active timeline to check whether the file slice generated in 
pending clustering after archive isn't committed via 
{{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
HoodieTimelineArchiver archive the latest instant before inflight replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5728) HoodieTimelineArchiver archive the latest instant before inflight replacecommit

2023-02-07 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5728:


Assignee: Nicholas Jiang

> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit
> ---
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5663) The pending table service operation should check whether the partition corresponding to the filegroup exists

2023-01-31 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5663:


 Summary: The pending table service operation should check whether 
the partition corresponding to the filegroup exists
 Key: HUDI-5663
 URL: https://issues.apache.org/jira/browse/HUDI-5663
 Project: Apache Hudi
  Issue Type: Improvement
  Components: table-service
Reporter: Nicholas Jiang
 Fix For: 0.13.1


At present,  DeletePartitionCommitActionExecutor prevents the partition from 
being dropped when there is pending table service operation. The pending table 
service should check whether the partition corresponding to the filegroup 
exists, not block the execution of DDL operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5553) ALTER TABLE DROP PARTITION DDL may cause data inconsistencies when table service actions are performed

2023-01-31 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5553.

Resolution: Fixed

> ALTER TABLE DROP PARTITION DDL may cause data inconsistencies when table 
> service actions are performed
> --
>
> Key: HUDI-5553
> URL: https://issues.apache.org/jira/browse/HUDI-5553
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
>
> Issue described in detail here:
> https://github.com/apache/hudi/issues/7663



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5558) Serializable interface implementation don't explicitly declare serialVersionUID

2023-01-15 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5558:


 Summary: Serializable interface implementation don't explicitly 
declare serialVersionUID
 Key: HUDI-5558
 URL: https://issues.apache.org/jira/browse/HUDI-5558
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang
 Fix For: 0.13.0


Serializable interface implementation don't explicitly declare 
serialVersionUID, which causes the InvalidClassException for the 
deserialization. Serializable interface implementation should explicitly 
declare serialVersionUID for all the implementation including their subclass 
implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5543) Description of clustering.plan.partition.filter.mode supports DAY_ROLLING strategy

2023-01-12 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5543:


Assignee: Nicholas Jiang

> Description of clustering.plan.partition.filter.mode supports DAY_ROLLING 
> strategy
> --
>
> Key: HUDI-5543
> URL: https://issues.apache.org/jira/browse/HUDI-5543
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.0
>
>
> The description of clustering.plan.partition.filter.mode doesn't support 
> DAY_ROLLING strategy, which has been supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5543) Description of clustering.plan.partition.filter.mode supports DAY_ROLLING strategy

2023-01-12 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5543:


 Summary: Description of clustering.plan.partition.filter.mode 
supports DAY_ROLLING strategy
 Key: HUDI-5543
 URL: https://issues.apache.org/jira/browse/HUDI-5543
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink
Reporter: Nicholas Jiang
 Fix For: 0.13.0


The description of clustering.plan.partition.filter.mode doesn't support 
DAY_ROLLING strategy, which has been supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS

2023-01-12 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675992#comment-17675992
 ] 

Nicholas Jiang commented on HUDI-5531:
--

[~yihua], [~xleesf] , WDYT?

> RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to 
> RECENT_PARTITIONS
> 
>
> Key: HUDI-5531
> URL: https://issues.apache.org/jira/browse/HUDI-5531
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.0
>
>
> The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: 
> output recent partition given skip num and days lookback config, therefore 
> the RECENT_DAYS strategy doesn't match the semantics because it assumes that 
> Hudi partitions are partitioned by day, but partitioning by hour can also use 
> this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode 
> should rename to RECENT_PARTITIONS for the semantics match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS

2023-01-11 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5531:
-
Issue Type: Improvement  (was: Task)

> RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to 
> RECENT_PARTITIONS
> 
>
> Key: HUDI-5531
> URL: https://issues.apache.org/jira/browse/HUDI-5531
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.0
>
>
> The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: 
> output recent partition given skip num and days lookback config, therefore 
> the RECENT_DAYS strategy doesn't match the semantics because it assumes that 
> Hudi partitions are partitioned by day, but partitioning by hour can also use 
> this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode 
> should rename to RECENT_PARTITIONS for the semantics match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS

2023-01-11 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5531:


 Summary: RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode 
should rename to RECENT_PARTITIONS
 Key: HUDI-5531
 URL: https://issues.apache.org/jira/browse/HUDI-5531
 Project: Apache Hudi
  Issue Type: Task
Reporter: Nicholas Jiang
 Fix For: 0.13.0


The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: 
output recent partition given skip num and days lookback config, therefore the 
RECENT_DAYS strategy doesn't match the semantics because it assumes that Hudi 
partitions are partitioned by day, but partitioning by hour can also use this 
strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should 
rename to RECENT_PARTITIONS for the semantics match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS

2023-01-11 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5531:


Assignee: Nicholas Jiang

> RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to 
> RECENT_PARTITIONS
> 
>
> Key: HUDI-5531
> URL: https://issues.apache.org/jira/browse/HUDI-5531
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.0
>
>
> The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: 
> output recent partition given skip num and days lookback config, therefore 
> the RECENT_DAYS strategy doesn't match the semantics because it assumes that 
> Hudi partitions are partitioned by day, but partitioning by hour can also use 
> this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode 
> should rename to RECENT_PARTITIONS for the semantics match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5506) StreamWriteOperatorCoordinator may not recommit with partial uncommitted write metadata event

2023-01-04 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5506:


 Summary: StreamWriteOperatorCoordinator may not recommit with 
partial uncommitted write metadata event
 Key: HUDI-5506
 URL: https://issues.apache.org/jira/browse/HUDI-5506
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.12.2
Reporter: Nicholas Jiang


AbstractStreamWriteFunction may get the different pending instant for 
checkpoint among the subtasks because the StreamWriteOperatorCoordinator may be 
committing the instant of the last completed checkpoint when 
AbstractStreamWriteFunction invokes snapshotState. 
StreamWriteOperatorCoordinator may not recommit with partial uncommitted write 
metadata event when handling the last boostrap event which is empty boostrap 
event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5506) StreamWriteOperatorCoordinator may not recommit with partial uncommitted write metadata event

2023-01-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5506:


Assignee: Nicholas Jiang

> StreamWriteOperatorCoordinator may not recommit with partial uncommitted 
> write metadata event
> -
>
> Key: HUDI-5506
> URL: https://issues.apache.org/jira/browse/HUDI-5506
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.12.2
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> AbstractStreamWriteFunction may get the different pending instant for 
> checkpoint among the subtasks because the StreamWriteOperatorCoordinator may 
> be committing the instant of the last completed checkpoint when 
> AbstractStreamWriteFunction invokes snapshotState. 
> StreamWriteOperatorCoordinator may not recommit with partial uncommitted 
> write metadata event when handling the last boostrap event which is empty 
> boostrap event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode

2022-12-07 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5343:
-
Description: HoodieFlinkStreamer supports async clustering for append mode, 
which keeps the consistent with the pipeline of HoodieTableSink.  (was: 
HoodieFlinkStreamer supports async clustering for append mode, which keep the 
consistent with the pipeline of HoodieTableSink.)

> HoodieFlinkStreamer supports async clustering for append mode
> -
>
> Key: HUDI-5343
> URL: https://issues.apache.org/jira/browse/HUDI-5343
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.12.2
>
>
> HoodieFlinkStreamer supports async clustering for append mode, which keeps 
> the consistent with the pipeline of HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode

2022-12-07 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5343:


Assignee: Nicholas Jiang

> HoodieFlinkStreamer supports async clustering for append mode
> -
>
> Key: HUDI-5343
> URL: https://issues.apache.org/jira/browse/HUDI-5343
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.12.2
>
>
> HoodieFlinkStreamer supports async clustering for append mode, which keep the 
> consistent with the pipeline of HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5343) HoodieFlinkStreamer supports async clustering for append mode

2022-12-07 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5343:


 Summary: HoodieFlinkStreamer supports async clustering for append 
mode
 Key: HUDI-5343
 URL: https://issues.apache.org/jira/browse/HUDI-5343
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Nicholas Jiang
 Fix For: 0.12.2


HoodieFlinkStreamer supports async clustering for append mode, which keep the 
consistent with the pipeline of HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5252) ClusteringCommitSink supports to rollback clustering

2022-11-21 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5252:


 Summary: ClusteringCommitSink supports to rollback clustering
 Key: HUDI-5252
 URL: https://issues.apache.org/jira/browse/HUDI-5252
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
 Fix For: 0.13.0


When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink 
invokes the CompactionUtil#rollbackCompaction to rollback clustering. 
ClusteringCommitSink should call ClusteringUtil#rollbackClustering to rollback 
clustering. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5252) ClusteringCommitSink supports to rollback clustering

2022-11-21 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5252:


Assignee: Nicholas Jiang

> ClusteringCommitSink supports to rollback clustering
> 
>
> Key: HUDI-5252
> URL: https://issues.apache.org/jira/browse/HUDI-5252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.13.0
>
>
> When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink 
> invokes the CompactionUtil#rollbackCompaction to rollback clustering. 
> ClusteringCommitSink should call ClusteringUtil#rollbackClustering to 
> rollback clustering. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5206) RowColumnReader should not return null value for certain null child columns

2022-11-13 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5206:


Assignee: Nicholas Jiang

> RowColumnReader should not return null value for certain null child columns
> ---
>
> Key: HUDI-5206
> URL: https://issues.apache.org/jira/browse/HUDI-5206
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.13.0
>
>
> When reading to vector of certain null child columns of row type column, 
> RowColumnReader should not return null value because the value of the row 
> type column may not be null, which results in incorrect values of row type 
> column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5206) RowColumnReader should not return null value for certain null child columns

2022-11-13 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5206:


 Summary: RowColumnReader should not return null value for certain 
null child columns
 Key: HUDI-5206
 URL: https://issues.apache.org/jira/browse/HUDI-5206
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Nicholas Jiang
 Fix For: 0.13.0


When reading to vector of certain null child columns of row type column, 
RowColumnReader should not return null value because the value of the row type 
column may not be null, which results in incorrect values of row type column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HUDI-1741) Row Level TTL Support for records stored in Hudi

2022-10-27 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625400#comment-17625400
 ] 

Nicholas Jiang edited comment on HUDI-1741 at 10/28/22 3:10 AM:


[~shivnarayan], IMO, each record of hudi has the commit time of hudi. The 
solution is to first follow the TTL, do not display expired data when checking, 
or even push down to the data source directly, and then delete it when doing 
operations such as clustering that need to rewrite the data. WDYT?

cc [~xleesf] 


was (Author: nicholasjiang):
[~shivnarayan], IMO, each record of hudi has the commit time of hudi. The 
solution is to first follow the TTL, do not display expired data when checking, 
or even push down to the data source directly, and then delete it when doing 
operations such as clustering that need to rewrite the data. WDYT?

> Row Level TTL Support for records stored in Hudi
> 
>
> Key: HUDI-1741
> URL: https://issues.apache.org/jira/browse/HUDI-1741
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Utilities
>Reporter: Balaji Varadarajan
>Priority: Major
>
> For e:g : Have records only updated last month 
>  
> GH: https://github.com/apache/hudi/issues/2743



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-1741) Row Level TTL Support for records stored in Hudi

2022-10-27 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625400#comment-17625400
 ] 

Nicholas Jiang commented on HUDI-1741:
--

[~shivnarayan], IMO, each record of hudi has the commit time of hudi. The 
solution is to first follow the TTL, do not display expired data when checking, 
or even push down to the data source directly, and then delete it when doing 
operations such as clustering that need to rewrite the data. WDYT?

> Row Level TTL Support for records stored in Hudi
> 
>
> Key: HUDI-1741
> URL: https://issues.apache.org/jira/browse/HUDI-1741
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Utilities
>Reporter: Balaji Varadarajan
>Priority: Major
>
> For e:g : Have records only updated last month 
>  
> GH: https://github.com/apache/hudi/issues/2743



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5049) HoodieCatalog supports the implementation of dropPartition

2022-10-18 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-5049:
-
Description: HoodieCatalog doesn't support the implementation of 
dropPartition at present, which is adaptive for the scenario that current 
partition backfills.  (was: HoodieCatalog doesn't support the implementation of 
dropPartition at present, which is useful for the Hudi current partition 
backfill scenario.)

> HoodieCatalog supports the implementation of dropPartition
> --
>
> Key: HUDI-5049
> URL: https://issues.apache.org/jira/browse/HUDI-5049
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.0
>
>
> HoodieCatalog doesn't support the implementation of dropPartition at present, 
> which is adaptive for the scenario that current partition backfills.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5049) HoodieCatalog supports the implementation of dropPartition

2022-10-18 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-5049:


Assignee: Nicholas Jiang

> HoodieCatalog supports the implementation of dropPartition
> --
>
> Key: HUDI-5049
> URL: https://issues.apache.org/jira/browse/HUDI-5049
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.0
>
>
> HoodieCatalog doesn't support the implementation of dropPartition at present, 
> which is useful for the Hudi current partition backfill scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5049) HoodieCatalog supports the implementation of dropPartition

2022-10-18 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5049:


 Summary: HoodieCatalog supports the implementation of dropPartition
 Key: HUDI-5049
 URL: https://issues.apache.org/jira/browse/HUDI-5049
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang
 Fix For: 0.13.0


HoodieCatalog doesn't support the implementation of dropPartition at present, 
which is useful for the Hudi current partition backfill scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled

2022-09-25 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-4914:
-
Status: In Progress  (was: Open)

> Managed memory weight should be set when sort clustering is enabled
> ---
>
> Key: HUDI-4914
> URL: https://issues.apache.org/jira/browse/HUDI-4914
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Managed memory weight should be set when sort clustering is enabled, 
> otherwise the fraction of memory to allocate is 0 that throws the following 
> exception when initialzing the sorter:
> {code:java}
> java.lang.IllegalArgumentException: The fraction of memory to allocate should 
> not be 0. Please make sure that all types of managed memory consumers 
> contained in the job are configured with a non-negative weight via 
> `taskmanager.memory.managed.consumer-weights`.    at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
>     at 
> org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157)
>     at 
> org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93)
>     at 
> org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107)
>     at 
> org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$execut

[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled

2022-09-24 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-4914:
-
Affects Version/s: (was: 0.12.0)

> Managed memory weight should be set when sort clustering is enabled
> ---
>
> Key: HUDI-4914
> URL: https://issues.apache.org/jira/browse/HUDI-4914
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Critical
> Fix For: 0.12.1
>
>
> Managed memory weight should be set when sort clustering is enabled, 
> otherwise the fraction of memory to allocate is 0 that throws the following 
> exception when initialzing the sorter:
> {code:java}
> java.lang.IllegalArgumentException: The fraction of memory to allocate should 
> not be 0. Please make sure that all types of managed memory consumers 
> contained in the job are configured with a non-negative weight via 
> `taskmanager.memory.managed.consumer-weights`.    at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
>     at 
> org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157)
>     at 
> org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93)
>     at 
> org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107)
>     at 
> org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
>   

[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled

2022-09-24 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-4914:
-
Priority: Critical  (was: Major)

> Managed memory weight should be set when sort clustering is enabled
> ---
>
> Key: HUDI-4914
> URL: https://issues.apache.org/jira/browse/HUDI-4914
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.12.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Critical
>
> Managed memory weight should be set when sort clustering is enabled, 
> otherwise the fraction of memory to allocate is 0 that throws the following 
> exception when initialzing the sorter:
> {code:java}
> java.lang.IllegalArgumentException: The fraction of memory to allocate should 
> not be 0. Please make sure that all types of managed memory consumers 
> contained in the job are configured with a non-negative weight via 
> `taskmanager.memory.managed.consumer-weights`.    at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
>     at 
> org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157)
>     at 
> org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93)
>     at 
> org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107)
>     at 
> org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
>     at 

[jira] [Assigned] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled

2022-09-24 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-4914:


Assignee: Nicholas Jiang

> Managed memory weight should be set when sort clustering is enabled
> ---
>
> Key: HUDI-4914
> URL: https://issues.apache.org/jira/browse/HUDI-4914
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.12.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> Managed memory weight should be set when sort clustering is enabled, 
> otherwise the fraction of memory to allocate is 0 that throws the following 
> exception when initialzing the sorter:
> {code:java}
> java.lang.IllegalArgumentException: The fraction of memory to allocate should 
> not be 0. Please make sure that all types of managed memory consumers 
> contained in the job are configured with a non-negative weight via 
> `taskmanager.memory.managed.consumer-weights`.    at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
>     at 
> org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157)
>     at 
> org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93)
>     at 
> org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107)
>     at 
> org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
>     at 
> or

[jira] [Updated] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled

2022-09-24 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang updated HUDI-4914:
-
Fix Version/s: 0.12.1

> Managed memory weight should be set when sort clustering is enabled
> ---
>
> Key: HUDI-4914
> URL: https://issues.apache.org/jira/browse/HUDI-4914
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.12.0
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Critical
> Fix For: 0.12.1
>
>
> Managed memory weight should be set when sort clustering is enabled, 
> otherwise the fraction of memory to allocate is 0 that throws the following 
> exception when initialzing the sorter:
> {code:java}
> java.lang.IllegalArgumentException: The fraction of memory to allocate should 
> not be 0. Please make sure that all types of managed memory consumers 
> contained in the job are configured with a non-negative weight via 
> `taskmanager.memory.managed.consumer-weights`.    at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
>     at 
> org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
>     at 
> org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351)
>     at 
> org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157)
>     at 
> org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93)
>     at 
> org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107)
>     at 
> org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458)
>     at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask

[jira] [Created] (HUDI-4914) Managed memory weight should be set when sort clustering is enabled

2022-09-24 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-4914:


 Summary: Managed memory weight should be set when sort clustering 
is enabled
 Key: HUDI-4914
 URL: https://issues.apache.org/jira/browse/HUDI-4914
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.12.0
Reporter: Nicholas Jiang


Managed memory weight should be set when sort clustering is enabled, otherwise 
the fraction of memory to allocate is 0 that throws the following exception 
when initialzing the sorter:
{code:java}
java.lang.IllegalArgumentException: The fraction of memory to allocate should 
not be 0. Please make sure that all types of managed memory consumers contained 
in the job are configured with a non-negative weight via 
`taskmanager.memory.managed.consumer-weights`.    at 
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
    at 
org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
    at 
org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
    at 
org.apache.flink.table.runtime.operators.TableStreamOperator.computeMemorySize(TableStreamOperator.java:66)
    at 
org.apache.hudi.sink.clustering.ClusteringOperator.initSorter(ClusteringOperator.java:351)
    at 
org.apache.hudi.sink.clustering.ClusteringOperator.open(ClusteringOperator.java:157)
    at 
org.apache.hudi.sink.utils.ClusteringFunctionWrapper.openFunction(ClusteringFunctionWrapper.java:93)
    at 
org.apache.hudi.sink.utils.InsertFunctionWrapper.openFunction(InsertFunctionWrapper.java:107)
    at 
org.apache.hudi.sink.utils.TestWriteBase$TestHarness.preparePipeline(TestWriteBase.java:140)
    at 
org.apache.hudi.sink.TestWriteCopyOnWrite.prepareInsertPipeline(TestWriteCopyOnWrite.java:458)
    at 
org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertAsyncClustering(TestWriteCopyOnWrite.java:298)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
    at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
    at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
    at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
    at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
    at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
    at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
    at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
    at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
    at 
org.junit.pla

[jira] (HUDI-4269) Support multiple precombine fields

2022-08-05 Thread Nicholas Jiang (Jira)


[ https://issues.apache.org/jira/browse/HUDI-4269 ]


Nicholas Jiang deleted comment on HUDI-4269:
--

was (Author: nicholasjiang):
[~danny0405], I have interest to multiple precombine fields. Could you please 
assign this ticket to me?

> Support multiple precombine fields
> --
>
> Key: HUDI-4269
> URL: https://issues.apache.org/jira/browse/HUDI-4269
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: core
>Reporter: Danny Chen
>Assignee: Nicholas Jiang
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-1161) Support update partial fields for MoR table

2022-08-05 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-1161:


Assignee: Jian Feng  (was: Nicholas Jiang)

> Support update partial fields for MoR table
> ---
>
> Key: HUDI-1161
> URL: https://issues.apache.org/jira/browse/HUDI-1161
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: writer-core
>Reporter: leesf
>Assignee: Jian Feng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-2545) Flink compaction source supports the Source interface based on FLIP-27

2022-08-05 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-2545:


Assignee: yuemeng  (was: Nicholas Jiang)

> Flink compaction source supports the Source interface based on FLIP-27
> --
>
> Key: HUDI-2545
> URL: https://issues.apache.org/jira/browse/HUDI-2545
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: yuemeng
>Priority: Major
>
> The CompactionPlanSourceFunction is the Flink hudi compaction source function 
> which implements the SourceFunction interface. The new Source interface is 
> introduced in Flink and the compaction source could support the Source based 
> on the FLIP-27.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-2441) To support partial update function which can move and update the data from the old partition to the new partition , when the data with same key change it's partition

2022-08-05 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-2441:


Assignee: David_Liang  (was: Nicholas Jiang)

> To support partial update function which can move and update the data from 
> the old partition to the new partition , when the data with same key change 
> it's partition
> -
>
> Key: HUDI-2441
> URL: https://issues.apache.org/jira/browse/HUDI-2441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: David_Liang
>Assignee: David_Liang
>Priority: Major
>
> to considerate such a scene, there 2 reocod *in different batch*  as follow 
> ||post_id ||position||weight||ts||day ||
> | 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
> | 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|
>  
> when using the {color:#ff}*Global Index*{color} with such sql
>  
> {code:java}
> merge into target_hudi_table  t
>    using (
>         select post_id, position, ts , day from source_table
>    ) as s
> on t.id = s.id
> when natched then update set  t.position = s.position, t.ts=s.ts, t.day = 
> s.day
> when not matched then insert *
> {code}
>  
> Beacuse now the hudi engine haven't support *cross partitions partial merge 
> into,* the result in the target table is  
>  
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 1|beijing| |1630652828|*{color:#ff}20210903{color}*|
> the record still in  the old parition. 
>  
> but the *expected* result is 
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 
> 1|beijing|*{color:#ff}3KG{color}*|1630652828|{color:#ff}*20210903*{color}|
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4551) The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment

2022-08-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-4551:


Assignee: Nicholas Jiang

> The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the 
> parallelism of the execution environment
> --
>
> Key: HUDI-4551
> URL: https://issues.apache.org/jira/browse/HUDI-4551
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
>
> The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is 4, which 
> could be the parallelism of the execution environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4551) The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment

2022-08-04 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-4551:


 Summary: The default value of READ_TASKS, WRITE_TASKS, 
CLUSTERING_TASKS is the parallelism of the execution environment
 Key: HUDI-4551
 URL: https://issues.apache.org/jira/browse/HUDI-4551
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Nicholas Jiang


The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is 4, which 
could be the parallelism of the execution environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering

2022-08-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-4536.

 Reviewers: Danny Chen
Resolution: Fixed

> ClusteringOperator causes the NullPointerException when writing with 
> BulkInsertWriterHelper in clustering
> -
>
> Key: HUDI-4536
> URL: https://issues.apache.org/jira/browse/HUDI-4536
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> ClusteringOperator causes the NullPointerException when writing with 
> BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper 
> isn't set to null after close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering

2022-08-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reassigned HUDI-4536:


Assignee: Nicholas Jiang

> ClusteringOperator causes the NullPointerException when writing with 
> BulkInsertWriterHelper in clustering
> -
>
> Key: HUDI-4536
> URL: https://issues.apache.org/jira/browse/HUDI-4536
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> ClusteringOperator causes the NullPointerException when writing with 
> BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper 
> isn't set to null after close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering

2022-08-04 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-4536:


 Summary: ClusteringOperator causes the NullPointerException when 
writing with BulkInsertWriterHelper in clustering
 Key: HUDI-4536
 URL: https://issues.apache.org/jira/browse/HUDI-4536
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Nicholas Jiang


ClusteringOperator causes the NullPointerException when writing with 
BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper isn't 
set to null after close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >