On 2026-01-13, Alisdair MacLeod <[email protected]> wrote: > >> >>> I'd monitor this over time and see if the rise is sudden or gradual. >>> And whether you can correlate with log entries. Is it happening after >>> the LCP timeout? Is the LCP timeout happening after this has already >>> risen? >>> >>> e.g. >>> >>> $ while true; do netstat -m | grep mbuf.2048; sleep 5; done | ts >>> >>> Capture of the output of 'systat mbuf' might also give a clue (it updates >>> frequently, you could leave it running in ssh). >> >> I have set up some capture around this and I will report back once >> it fails again and hopefully that will give some additional hints. > > I don’t think we need to wait until it fails over again, looking at the > current rate that the number of mbufs is increasing it is roughly > 50 per minute, this puts it on a collision course with the 2-3 day > LCP timeouts. So perhaps the LCP timeout is actually being caused by > the kernel running out of space for any more allocations. > > Is there some way to dig deeper into what is creating and not releasing > these?
I don't know offhand. Perhaps someone else will have other ideas. > # snapshot of recent netstat -m | grep mbuf.2048 > Jan 13 14:08:01 23123/23200 mbuf 2048 byte clusters in use (current/peak) > Jan 13 14:09:02 23168/23256 mbuf 2048 byte clusters in use (current/peak) ooh that's pretty fast - at least it's good news for trying to figure out what might be triggering it! two quick things I would try to see if behaviour changes: - temporarily disabling wg - sysctl net.inet.tcp.tso=0 you could also collect netstat -ss at a minute apart and diff the two, see if any unexpected counters are increasing at a similar rate

