jadami10 opened a new pull request, #17598: URL: https://github.com/apache/pinot/pull/17598
The freshness checker was something we added almost 4 years ago to better control ingestion lag as servers restarted. Since then, we've discovered using "latest" as the threshold led to unexpected results. Pinot ingestion is done in a fairly sawtooth pattern: - request data from kafka - transform - index The time spent transforming and indexing leads to lag which in turn leads to a sawtooth lag. For most use cases, this is imperceptible. The sawtooth amplitude will be seconds to milliseconds. But there are use cases where the sawtooth can have an extremely large amplitude (minutes or more): - extremely slowness on Pinot or the upstream stream provider. this one is rare. - transactional publishes upstream. This one is more common. If you have a flink app with a 1 minute checkpoint, the freshest data you see will at most be 1 minute old. And if you're not sampling quickly enough in the freshness check, you may miss the case where the "latest" event was fresh enough. All of that said, "minimum" freshness is really what we intended in the first place when making this feature. This issue was just exposed later. We've run this internally at Stripe for over a year with no issue and no unexplained ingestion lag. I (and claude) finally had time now to reconcile these changes into an OSS PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
