[ANNOUNCE] Hudi Community Update(2023-03-20 ~ 2023-04-02)

2023-04-02 Thread leesf
Dear community,

Nice to share Hudi community updates for 2023-03-20 ~ 2023-04-02 with
updates on feature and bug fixes.


===
Feature

[Flink] Automatically infer key generator type [1]
[Core] Infer cleaning policy based on clean configs [2]


[1] https://issues.apache.org/jira/browse/HUDI-5929
[2] https://issues.apache.org/jira/browse/HUDI-5954


Bugs

[Core] Fixing pending instant deduction to trigger compaction in MDT [1]
[Flink] Fix bucket stream writer fileId not found exception [2]
[Spark] Add partition ordering for full table scans [3]
[Spark] Support savepoint call procedure with base path in Spark [4]
[Spark] [HUDI-5978] Update timeline timezone when write in spark [5]
[Core] Fix clustering on bootstrapped tables [6]
[Core] Fix Date to String column schema evolution [7]
[Core] Empty preCombineKey should never be stored in hoodie.properties [8]
[Core] Fixing shutting down deltastreamer properly when post write
termination strategy is enabled [9]
[Core] Connection leak for lock provider [10]
[Flink] Auto generate client id for Flink multi writer [11]
[Flink] Always write parquets for insert overwrite operation [12]




[1] https://issues.apache.org/jira/browse/HUDI-5950
[2] https://issues.apache.org/jira/browse/HUDI-5822
[3] https://issues.apache.org/jira/browse/HUDI-5967
[4] https://issues.apache.org/jira/browse/HUDI-5941
[5] https://issues.apache.org/jira/browse/HUDI-5978
[6] https://issues.apache.org/jira/browse/HUDI-5891
[7] https://issues.apache.org/jira/browse/HUDI-5977
[8] https://issues.apache.org/jira/browse/HUDI-5986
[9] https://issues.apache.org/jira/browse/HUDI-5928
[10] https://issues.apache.org/jira/browse/HUDI-5993
[11] https://issues.apache.org/jira/browse/HUDI-6005
[12] https://issues.apache.org/jira/browse/HUDI-6010




Best,
Leesf


Re: When using the HoodieDeltaStreamer, is there a corresponding parameter that can control the number of cycles? For example, if I cycle 5 times, I stop accessing data

2023-04-02 Thread lee
I tried using the 
'org.apache.hudi.utilities.deltastreamer.NoNewDataTerminationStrategy' to stop 
the task, but it didn't seem to meet my expectations. I think that after it 
stops ExecutorService, the subsequent SparkContext will also stop, but now 
SparkContext will always be started and no subsequent logs will be visible.








| |
李杰
|
|
leedd1...@163.com
|
 Replied Message 
| From | Sivabalan |
| Date | 4/1/2023 01:07 |
| To |  |
| Subject | Re: When using the HoodieDeltaStreamer, is there a corresponding 
parameter that can control the number of cycles? For example, if I cycle 5 
times, I stop accessing data |
We do have Graceful termination possibility w/ deltastreamer
continuous mode. Please check here

for post write termination strategy. You can implement your own termination
strategy. Hope that helps.

On Thu, 30 Mar 2023 at 20:16, Vinoth Chandar  wrote:

I believe there is no control today. You could hack a precommit validator
and call System.exit if you want ;) (ugly, I know)

But maybe we could introduce some abstraction to do a check between loops?
or allow users to plugin some logic to decide whether to continue or exit?

Love to understand the use-case more here.

On Wed, Mar 29, 2023 at 7:32 AM lee  wrote:

When I use the HoodieDeltaStreamer, the "-- continuous" parameter: "Delta
Streamer runs in continuous mode running source match ->Transform ->Hudi
Write in loop". So I would like to ask if there are any corresponding
parameters that can control the number of cycles, such as stopping
accessing data when I cycle 5 times.



李杰
leedd1...@163.com

<
https://dashi.163.com/projects/signature-manager/detail/index.html?ftlId=1&name=%E6%9D%8E%E6%9D%B0&uid=leedd1912%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsmc4215b668fdb6b5ca355a1c3319c4a0e.jpg&items=%5B%22leedd1912%40163.com%22%5D





--
Regards,
-Sivabalan