Hi, > My thinking is that any partition could go stale if there are no records being produced into it.
1. Why would the value become stale if there are no new records? The lag should stay the same, no? > If enough of such partitions are present and are owned by a single MM task, > an OOM could happen. 2. We already have a dozen of metrics per partition [0]. Why do you think adding a few more would cause OutOfMemory errors? Each task should only emit metrics for partitions it owns. > Regarding the scenario where the TTL value is lower than the refresh interval > - I believe that this is an edge that we need to document and prevent > against, for example either failing to start on such a combination or > resorting to a default value that would satisfy the constraint and logging an > error. 3. Can you add the behavior you propose in the KIP? Thanks, Mickael 0: https://github.com/apache/kafka/blob/trunk/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceMetrics.java#L71-L106 On Wed, May 22, 2024 at 9:18 PM Elxan Eminov <elxanemino...@gmail.com> wrote: > > Hey Mickael, > Just checking to see if you have any thoughts on this. > thanks! > > On Thu, 11 Apr 2024 at 15:11, Elxan Eminov <elxanemino...@gmail.com> wrote: > > > Hi Mickael! > > Any thoughts on this? > > Thanks! > > > > On Wed, 3 Apr 2024 at 13:21, Elxan Eminov <elxanemino...@gmail.com> wrote: > > > >> Hi Mickael, > >> Thanks for your response and apologies for a huge delay in mine. > >> > >> My thinking is that any partition could go stale if there are no records > >> being produced into it. If enough of such partitions are present and are > >> owned by a single MM task, an OOM could happen. > >> > >> Regarding the scenario where the TTL value is lower than the refresh > >> interval - I believe that this is an edge that we need to document and > >> prevent against, for example either failing to start on such a combination > >> or resorting to a default value that would satisfy the constraint and > >> logging an error. > >> > >> Thanks, > >> Elkhan > >> > >> On Thu, 8 Feb 2024 at 14:17, Mickael Maison <mickael.mai...@gmail.com> > >> wrote: > >> > >>> Hi, > >>> > >>> Thanks for the updates. > >>> I'm wondering whether we really need the ttl eviction mechanism. The > >>> motivation is to "avoid storing stale LRO entries which can cause an > >>> eventual OOM error". How could it contain stake entries? I would > >>> expect its cache to only contain entries for partitions assigned to > >>> the task that owns it. Also what is the expected behavior if there's > >>> no available LRO in the cache? If we keep this mechanism what happens > >>> if its value is lower than > >>> replication.record.lag.metric.refresh.interval? > >>> > >>> Thanks, > >>> Mickael > >>> > >>> On Mon, Feb 5, 2024 at 5:23 PM Elxan Eminov <elxanemino...@gmail.com> > >>> wrote: > >>> > > >>> > Hi Mickael! > >>> > Any further thoughts on this? > >>> > > >>> > Thanks, > >>> > Elkhan > >>> > > >>> > On Thu, 18 Jan 2024 at 11:53, Mickael Maison <mickael.mai...@gmail.com > >>> > > >>> > wrote: > >>> > > >>> > > Hi Elxan, > >>> > > > >>> > > Thanks for the updates. > >>> > > > >>> > > We used dots to separate words in configuration names, so I think > >>> > > replication.offset.lag.metric.last-replicated-offset.ttl should be > >>> > > named replication.offset.lag.metric.last.replicated.offset.ttl > >>> > > instead. > >>> > > > >>> > > About the names of the metrics, fair enough if you prefer keeping the > >>> > > replication prefix. Out of the alternatives you mentioned, I think I > >>> > > prefer replication-record-lag. I think the metrics and configuration > >>> > > names should match too. Let's see what the others think about it. > >>> > > > >>> > > Thanks, > >>> > > Mickael > >>> > > > >>> > > On Mon, Jan 15, 2024 at 9:50 PM Elxan Eminov < > >>> elxanemino...@gmail.com> > >>> > > wrote: > >>> > > > > >>> > > > Apologies, forgot to reply on your last comment about the metric > >>> name. > >>> > > > I believe both replication-lag and record-lag are a little too > >>> abstract - > >>> > > > what do you think about either leaving it as > >>> replication-offset-lag or > >>> > > > renaming to replication-record-lag? > >>> > > > > >>> > > > Thanks > >>> > > > > >>> > > > On Wed, 10 Jan 2024 at 15:31, Mickael Maison < > >>> mickael.mai...@gmail.com> > >>> > > > wrote: > >>> > > > > >>> > > > > Hi Elxan, > >>> > > > > > >>> > > > > Thanks for the KIP, it looks like a useful addition. > >>> > > > > > >>> > > > > Can you add to the KIP the default value you propose for > >>> > > > > replication.lag.metric.refresh.interval? In MirrorMaker most > >>> interval > >>> > > > > configs can be set to -1 to disable them, will it be the case > >>> for this > >>> > > > > new feature or will this setting only accept positive values? > >>> > > > > I also wonder if replication-lag, or record-lag would be clearer > >>> names > >>> > > > > instead of replication-offset-lag, WDYT? > >>> > > > > > >>> > > > > Thanks, > >>> > > > > Mickael > >>> > > > > > >>> > > > > On Wed, Jan 3, 2024 at 6:15 PM Elxan Eminov < > >>> elxanemino...@gmail.com> > >>> > > > > wrote: > >>> > > > > > > >>> > > > > > Hi all, > >>> > > > > > Here is the vote thread: > >>> > > > > > > >>> https://lists.apache.org/thread/ftlnolcrh858dry89sjg06mdcdj9mrqv > >>> > > > > > > >>> > > > > > Cheers! > >>> > > > > > > >>> > > > > > On Wed, 27 Dec 2023 at 11:23, Elxan Eminov < > >>> elxanemino...@gmail.com> > >>> > > > > wrote: > >>> > > > > > > >>> > > > > > > Hi all, > >>> > > > > > > I've updated the KIP with the details we discussed in this > >>> thread. > >>> > > > > > > I'll call in a vote after the holidays if everything looks > >>> good. > >>> > > > > > > Thanks! > >>> > > > > > > > >>> > > > > > > On Sat, 26 Aug 2023 at 15:49, Elxan Eminov < > >>> > > elxanemino...@gmail.com> > >>> > > > > > > wrote: > >>> > > > > > > > >>> > > > > > >> Relatively minor change with a new metric for MM2 > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >>> > > > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-971%3A+Expose+replication-offset-lag+MirrorMaker2+metric > >>> > > > > > >> > >>> > > > > > > > >>> > > > > > >>> > > > >>> > >>