+1?? It feels great, but in actual business scenarios, due to some data abnormalities, the event time will be inaccurate. This situation seems to affect the monitoring of this indicator?
Best?? liujinhui ------------------ ???????? ------------------ ??????: "dev" <xu.shiyan.raym...@gmail.com>; ????????: 2021??2??3??(??????) ????9:55 ??????: "dev"<dev@hudi.apache.org>; ????: [DISCUSS] Measure latency by storing event time in WriteStatus Hi all, It is a common requirement to measure data latency in Hudi tables. There isn't a metric reporting latency directly from HoodieMetrics. I'm proposing to measure the latency for each commit by this formula latency = commitTime + commitDuration - earliest event time of the incoming records There are 4 major parts to make this available (thanks to Vinoth's hints) - To store the earliest event time, we need to extract the event times from Hoodie payloads. We can make it available in org.apache.hudi.common.model.DefaultHoodieRecordPayload#getMetadata() - then org.apache.hudi.client.WriteStatus#markSuccess() can perform the comparison and store the min value in org.apache.hudi.common.model.HoodieWriteStat - org.apache.hudi.common.model.HoodieCommitMetadata can then aggregate all the min values and returns a global min of all the partitions. - lastly, in org.apache.hudi.metrics.HoodieMetrics#updateCommitMetrics we can compute the latency using the formula above I have a draft implementation shown in the diff https://github.com/apache/hudi/compare/master...xushiyan:measure-latency I think this metric will be commonly used so I made those changes on default classes like DefaultHoodieRecordPayload and HoodieWriteStat. Hope to get some early feedback on the implementation. Thank you. Best, Raymond