hudi-bot opened a new issue, #14766:
URL: https://github.com/apache/hudi/issues/14766

   As a follow-up enhancement to latency and freshness metrics, this is to 
persist latencies of a batch of records as a histogram in the commit metadata. 
This is to help implement watermarks and facilitate stream-stream joins.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1654
   - Type: Improvement
   
   
   ---
   
   
   ## Comments
   
   04/Mar/21 02:39;xushiyan;Some previous implementation notes from PR for 
HUDI-1587
    * consider supporting this feature in OverwriteWithLatestAvroPayload, 
currently it's only available if configured to use DefaultHoodieRecordPayload
    * to make histogram persisted, avro schema for commit metadata needs to be 
updated, as well as its facilitating java class.
    * it's better to re-classify this as commit metadata instead of metrics. 
Commit metadata can be chosen to emit as metrics.
    * some notes from the [email 
discussion|https://lists.apache.org/thread.html/r328a6ad2e51ed936dfd955d65809ea09232ad47044497d04d8c751ea%40%3Cdev.hudi.apache.org%3E]
 by [~vinoth]
    ** If we can keep the time interval (i.e the 1 min) configurable and also
   encode it along with the histogram,
   we can control the storage footprint better. May be also consider using
   something like t-digest for histogram?;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to