hudi-bot opened a new issue, #14766: URL: https://github.com/apache/hudi/issues/14766
As a follow-up enhancement to latency and freshness metrics, this is to persist latencies of a batch of records as a histogram in the commit metadata. This is to help implement watermarks and facilitate stream-stream joins. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-1654 - Type: Improvement --- ## Comments 04/Mar/21 02:39;xushiyan;Some previous implementation notes from PR for HUDI-1587 * consider supporting this feature in OverwriteWithLatestAvroPayload, currently it's only available if configured to use DefaultHoodieRecordPayload * to make histogram persisted, avro schema for commit metadata needs to be updated, as well as its facilitating java class. * it's better to re-classify this as commit metadata instead of metrics. Commit metadata can be chosen to emit as metrics. * some notes from the [email discussion|https://lists.apache.org/thread.html/r328a6ad2e51ed936dfd955d65809ea09232ad47044497d04d8c751ea%40%3Cdev.hudi.apache.org%3E] by [~vinoth] ** If we can keep the time interval (i.e the 1 min) configurable and also encode it along with the histogram, we can control the storage footprint better. May be also consider using something like t-digest for histogram?;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
