mcvsubbu commented on a change in pull request #3979: Track Indexed timestamp across consuming segments URL: https://github.com/apache/incubator-pinot/pull/3979#discussion_r266514769
########## File path: pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java ########## @@ -204,20 +205,22 @@ public boolean index(GenericRow row) { // If metrics aggregation is enabled and if the dimension values were already seen, this will return existing docId, // else this will return a new docId. int docId = getOrCreateDocId(dictIdMap); - + boolean canTakeMore = false; // docId == numDocs implies new docId. if (docId == numDocs) { // Add forward and inverted indices for new document. addForwardIndex(row, docId, dictIdMap); addInvertedIndex(docId, dictIdMap); // Update number of document indexed at last to make the latest record queryable - return _numDocsIndexed++ < _capacity; + canTakeMore = _numDocsIndexed++ < _capacity; } else { - Preconditions - .checkState(_aggregateMetrics, "Invalid document-id during indexing: " + docId + " expected: " + numDocs); + Preconditions.checkState(_aggregateMetrics, "Invalid document-id during indexing: " + docId + " expected: " + numDocs); // Update metrics for existing document. - return aggregateMetrics(row, docId); + canTakeMore = aggregateMetrics(row, docId); } + // update indexing time + _lastIndexedTimestamp = System.currentTimeMillis(); Review comment: If there is a delay in the stream, that will show up as a pinot delay. Another choice is to take the last _polled_ timestamp. That will ensure that as long as Pinot polls the stream, the available data is considered fresh. Not sure why we are not doing that. A stream might (and probably should) have its own staleness metric being tracked and monitored so that an application does not need to track a staleness metric via pinot and root cause it to the stream if that is indeed the case. If you want to continue with this approach, then I suggest that you extend the interface to retrieve rows from a stream to include the highest available timestamp of a record _in_ the stream. That way, streams that provide the staleness metric can translate (via Pinot) to the staleness monitoring application. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org