+1?? It feels great, but in actual business scenarios, due to some data 
abnormalities, the event time will be inaccurate. 
This situation seems to affect the monitoring of this indicator?


 Best??
 liujinhui




------------------ ???????? ------------------
??????:                                                                         
                                               "dev"                            
                                                        
<xu.shiyan.raym...@gmail.com&gt;;
????????:&nbsp;2021??2??3??(??????) ????9:55
??????:&nbsp;"dev"<dev@hudi.apache.org&gt;;

????:&nbsp;[DISCUSS] Measure latency by storing event time in WriteStatus



Hi all,

It is a common requirement to measure data latency in Hudi tables. There
isn't a metric reporting latency directly from HoodieMetrics. I'm proposing
to measure the latency for each commit by this formula

latency = commitTime + commitDuration - earliest event time of the incoming
records

There are 4 major parts to make this available (thanks to Vinoth's hints)

- To store the earliest event time, we need to extract the event times from
Hoodie payloads. We can make it available in
org.apache.hudi.common.model.DefaultHoodieRecordPayload#getMetadata()

- then org.apache.hudi.client.WriteStatus#markSuccess() can perform the
comparison and store the min value
in org.apache.hudi.common.model.HoodieWriteStat

- org.apache.hudi.common.model.HoodieCommitMetadata can then aggregate all
the min values and returns a global min of all the partitions.

- lastly, in org.apache.hudi.metrics.HoodieMetrics#updateCommitMetrics we
can compute the latency using the formula above

I have a draft implementation shown in the diff
https://github.com/apache/hudi/compare/master...xushiyan:measure-latency

I think this metric will be commonly used so I made those changes on
default classes like DefaultHoodieRecordPayload and HoodieWriteStat. Hope
to get some early feedback on the implementation. Thank you.

Best,
Raymond

Reply via email to