[ https://issues.apache.org/jira/browse/HUDI-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Forward Xu updated HUDI-5095: ----------------------------- Attachment: image-2022-10-26-16-35-38-078.png > Flink: Stores a special watermark(flag) to identify the current progress of > writing data > ---------------------------------------------------------------------------------------- > > Key: HUDI-5095 > URL: https://issues.apache.org/jira/browse/HUDI-5095 > Project: Apache Hudi > Issue Type: New Feature > Components: flink, flink-sql > Reporter: Forward Xu > Assignee: Forward Xu > Priority: Major > Attachments: image-2022-10-26-16-34-52-245.png, > image-2022-10-26-16-35-38-078.png, image-2022-10-26-16-35-56-609.png > > > In some cases where we need a flag to measure the progress of data writing, I > think it is a reasonable way to store the watermark as an attribute of the > hudi commit metadata. > One of our scenarios is that Flink writes data to Hudi table in real time, > and then we use this Hudi table to support batch computation, so we need a > flag to evaluate whether its partition data is complete. > For example, job1 is scheduled every hour. At 2022-01-19 02:01:00, job1 > starts to check whether the partition (20220119/01) of hudi_table1 is > completed (Flink writes data to hudi_table1 in real time). When the watermark > properties of hudi_table1‘s commit metadata are higher than 2022- 01-19 > 02:05:00 Update (5 minutes out of order), we consider partition(20220119/01) > as completed and we can safely execute Hive or Flink sql for batch > computation. (basically insert table2 select xx from hudi_table1...) > -- This message was sent by Atlassian Jira (v8.20.10#820010)