On Fri, Apr 17, 2026 at 9:02 AM Ricardo Robaina <[email protected]> wrote: > On Thu, Apr 16, 2026 at 5:58 PM Paul Moore <[email protected]> wrote: > > On Thu, Apr 16, 2026 at 4:51 PM Paul Moore <[email protected]> wrote: > > > On Thu, Apr 16, 2026 at 4:33 PM Steve Grubb <[email protected]> wrote: > > > > On Wednesday, April 15, 2026 11:21:52 AM Eastern Daylight Time Paul > > > > Moore > > > > wrote: > > > > > On Wed, Apr 15, 2026 at 11:19 AM Paul Moore <[email protected]> > > > > > wrote: > > > > > > On Tue, Apr 14, 2026 at 11:45 PM Steve Grubb <[email protected]> > > > > > > wrote: > > > > > > > On Friday, April 10, 2026 5:34:08 PM Eastern Daylight Time Paul > > > > > > > Moore > > > > wrote: > > > > > > > > On Mon, Mar 23, 2026 at 11:07 AM Ricardo Robaina > > > > > > > > <[email protected]> > > > > > > > > > > > > > > wrote: > > > > > > ... > > > > > > > > > > > > > ... compliance-driven systems that must use a finite backlog > > > > > > > limit for > > > > > > > memory safety but cannot tolerate dropped events ...> > > > > > > You must pick one of those two requirements, or at the very least > > > > > > prioritize them; it is simply impossible to both limit the backlog > > > > > > queue and require zero dropped events. > > > > > > > > > > To be perfectly honest, it's also impossible to require zero dropped > > > > > events. Even in the most extreme configurations where the admin > > > > > decides to panic the system, that only happens once the system reaches > > > > > the point where it is dropping events. We try *really* hard to not > > > > > drop events, but it is always going to be a possibility. > > > > > > > > You're helping make the point. Those administrators have decided > > > > reliable > > > > auditing is more important than system availability. > > > > > > Users prioritizing reliable auditing over system availability should > > > not run with a backlog limit. It's that simple. > > > > To clarify this further, even on systems without a backlog limit and a > > panic-on-loss configuration, there is still a possibility that the > > system could lose an event when it hits the edge before it panics. A > > maximum backlog stat won't help here. Even if you had a way to > > capture the backlog size before the system took itself out, the size > > is flirting with the maximum resource limits of the system, it would > > be silly to use that as a configured backlog limit, you would still > > want to leave the limit at 0/disabled. > > > > > Regardless, I'm still not convinced this maximum backlog stat alone > > > will solve any meaningful problems. If your audit log is predictable > > > enough that this metric has value, it should be possible to either > > > capture the backlog size during periods of high audit load or simply > > > run the system through that load and verify it doesn't crash and go to > > > hell. If your audit log isn't predictable, capturing a maximum > > > backlog size doesn't really mean anything since it is still a snapshot > > > of one instance of the system and there is always the possibility of > > > the system exceeding it. > > > > -- > > paul-moore.com > > > > Hi Paul, > > Thanks for reviewing the patch and giving your perspective on it. > > I understand your point that if a system truly prioritizes auditing > over everything else, it shouldn't run with a limit. However, in > practice, there is a middle ground where compliance frameworks or > internal infrastructure policies require a finite backlog limit to > ensure memory safety, while still demanding reliable auditing.
It is important that those users understand they are believing a lie if they think one can demand reliable auditing with a finite backlog limit. > I'd like to ask what specific metric or combination of metrics you > would be willing to consider? You mentioned average queue length > earlier, and Steve suggested combining the max depth with a > backlog_lost_since_reset counter. I'm happy to work on a v2 that > addresses your concerns while still delivering the metrics audit users > currently lack. My suggestion would be to put forth a proposal explaining the problems you want to solve and what metrics you believe are important towards solving those problems. I agree that the current list of audit metrics are rather sparse, but as we've seen here, I don't think we yet have agreement on what metrics would be useful. My hope is that having a discussion on the metrics first could avoid false starts as we've seen here. -- paul-moore.com

