jadami10 opened a new pull request, #17598:
URL: https://github.com/apache/pinot/pull/17598

   The freshness checker was something we added almost 4 years ago to better 
control ingestion lag as servers restarted. Since then, we've discovered using 
"latest" as the threshold led to unexpected results. Pinot ingestion is done in 
a fairly sawtooth pattern:
   - request data from kafka
   - transform
   - index
   
   The time spent transforming and indexing leads to lag which in turn leads to 
a sawtooth lag. For most use cases, this is imperceptible. The sawtooth 
amplitude will be seconds to milliseconds. But there are use cases where the 
sawtooth can have an extremely large amplitude (minutes or more):
   - extremely slowness on Pinot or the upstream stream provider. this one is 
rare.
   - transactional publishes upstream. This one is more common. If you have a 
flink app with a 1 minute checkpoint, the freshest data you see will at most be 
1 minute old.  And if you're not sampling quickly enough in the freshness 
check, you may miss the case where the "latest" event was fresh enough.
   
   All of that said, "minimum" freshness is really what we intended in the 
first place when making this feature. This issue was just exposed later.
   
   We've run this internally at Stripe for over a year with no issue and no 
unexplained ingestion lag. I (and claude) finally had time now to reconcile 
these changes into an OSS PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to