On 2026-01-13, Alisdair MacLeod <[email protected]> wrote:
>
>> 
>>> I'd monitor this over time and see if the rise is sudden or gradual.
>>> And whether you can correlate with log entries. Is it happening after
>>> the LCP timeout? Is the LCP timeout happening after this has already
>>> risen?
>>> 
>>> e.g.
>>> 
>>> $ while true; do netstat -m | grep mbuf.2048; sleep 5; done | ts
>>> 
>>> Capture of the output of 'systat mbuf' might also give a clue (it updates
>>> frequently, you could leave it running in ssh).
>> 
>> I have set up some capture around this and I will report back once
>> it fails again and hopefully that will give some additional hints.
>
> I don’t think we need to wait until it fails over again, looking at the
> current rate that the number of mbufs is increasing it is roughly
> 50 per minute, this puts it on a collision course with the 2-3 day
> LCP timeouts. So perhaps the LCP timeout is actually being caused by
> the kernel running out of space for any more allocations.
>
> Is there some way to dig deeper into what is creating and not releasing
> these?

I don't know offhand. Perhaps someone else will have other ideas.

> # snapshot of recent netstat -m | grep mbuf.2048
> Jan 13 14:08:01 23123/23200 mbuf 2048 byte clusters in use (current/peak)
> Jan 13 14:09:02 23168/23256 mbuf 2048 byte clusters in use (current/peak)

ooh that's pretty fast - at least it's good news for trying to figure
out what might be triggering it!

two quick things I would try to see if behaviour changes:

- temporarily disabling wg
- sysctl net.inet.tcp.tso=0

you could also collect netstat -ss at a minute apart and diff the two,
see if any unexpected counters are increasing at a similar rate


Reply via email to