hudi-bot opened a new issue, #15543:
URL: https://github.com/apache/hudi/issues/15543

   When we use flink streaming job to consume data from mq and wtite to hudi 
partition table, we can not know when a partition is wite finished. And this is 
often necessary to tell the downstream offline task scheduler to run while 
partition is finished.
   
   I think we can use the flink watermark mechanism to implment this. As 
watermark represents the minimum timestamp in flink streaming job, when the 
watermark is greater than the hudi partition time, it always means the data is 
write finished to hudi parititon in an ordered streaming data,  and then it is 
the time to write a success file to the parititon path to represent it finished 
wirte.
   
   It can be designed as below.
    # Get the field of partitions and values in flink append streaming data,  
this can be implements in AppendWriteFunction, the emit it to downstream;
    # Implement a SuccessFileWriteSink to receive these partition values and 
store them to activePartitions set, 
    # Compare the watermark timestamp and the partition timestamp values 
converted from activeParitions set,  if the wartermark is greater,  set the 
partition to finished partitons set;
    # Iterate the finished partition set, and get the partition path, and write 
success file to it while flink job make checkpoint;
    # Store the active partition set and finished partition set in flink state, 
avoid the data loss while the job failver.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5159
   - Type: New Feature
   
   
   ---
   
   
   ## Comments
   
   07/Nov/22 14:52;complone;Hi KevinyhZou , hello, it seems that it is also 
helpful for our company's needs support, do you mind if I participate in and 
sort out the development of this task together?;;;
   
   ---
   
   08/Nov/22 05:16;zouyunhe;OK,  I have made  a implement of  this feature, and 
will submit a pr in recently days. You can help to review or see what else need 
to be added. [~complone] ;;;
   
   ---
   
   11/Nov/22 12:33;complone;[~zouyunhe] Okay, I know;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to