navina commented on code in PR #9621:
URL: https://github.com/apache/pinot/pull/9621#discussion_r1000970918


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/PartitionLagState.java:
##########
@@ -27,12 +27,20 @@ public class PartitionLagState {
   protected final static String NOT_CALCULATED = "NOT_CALCULATED";
 
   /**
-   * Defines how far away the current record's offset / pointer is from 
upstream latest record
+   * Defines how far behind the current record's offset / pointer is from 
upstream latest record
    * The distance is based on actual record count.
    */
   public String getRecordsLag() {
     return NOT_CALCULATED;
   }
 
-  // TODO: Define record availability lag ($latest_record_consumption_time - 
$latest_record_ingestion_time)
+  /**
+   * Defines how soon after record ingestion was the record consumed by Pinot. 
That is, the difference between the
+   * time the record was consumed and the time at which the record was 
ingested upstream.
+   *
+   * @return Lag value in milliseconds
+   */
+  public String getRecordAvailabilityLag() {

Review Comment:
   > Maybe getRecordsLagMs? 
   
   This would make one think that this lag is computed relative to the 
timestamp of the latest record in upstream (ie. 
latest_upstream_record_timestamp minus current_consumed_record_timestamp).  
This is the lag in terms of record time relative to upstream. This is actually 
a metric that kinesis provides and recommends applications to monitor - 
`millisBehindLatest` 
([here](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html#API_GetRecords_ResponseSyntax)).
 
   
   Where as, what we are computing here is relative to consumption time and the 
time at which the record was ingested upstream. I think it should be treated 
differently than the above and hence, I used the term `recordAvailabilityLag` 
as it shows how soon the record was consumed after it arrived in the stream. 
   
   > Not sure if availability lag is a common term in streaming
   
   I thought it was. At least it was in the kafka ecosystem. 
   
   I am fine renaming the metric in this PR to `recordLagMs` and then, actually 
add another metric called `millisBehindLatest` (similar to kinesis). Wdyt? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to