On 2022/11/01 16:03, Ryan Freeman wrote: > On Tue, Nov 01, 2022 at 11:04:03AM +0100, Martijn van Duren wrote: > > On Mon, 2022-10-31 at 20:14 -0700, Ryan Freeman wrote: > > > > > > I can confirm the snmpd process is no-longer disappearing with this > > > patch. Almost 24 hours on one VM and 16 hours on another. Thanks! > > > > > > -Ryan > > > > To be complete, what happens is the following: > > - snmpd sends a getnext request to the backend on a scalar > > - libagentx increments the current OID to the OID of the table > > column following the scalar, which contains no elements and after > > reaching the last column it has reached the end OID of the original > > request, resulting in an endOfMibView, but forgets to reset the OID to > > the original start OID (as per RFC3416 section 4.2.2) > > - snmpd validates the output from the backend and sees that the > > OID of the EOMV doesn't match the requested OID and decides that > > it doesn't trust the backend anymore; It is then closed with a > > "too many parse errors" notification. > > - Upon the closing of the agentx socket the backend shuts itself > > down: > > 1) It gets its fd from snmpd and it doesn't know where to connect > > to. > > 2) We don't want lingering processes if snmpd itself goes away > > - Once a backend disappears snmpd shuts itself down. Basically for > > the fail, fail loud reasons. > > > > Note that this only goes for backends under libexec/snmpd, not for > > backends that connect over the agentx listener, like vmd or relayd. > > > > So there's no crash, just a backend that's being kicked for returning a > > non-compliant varbind, which escalated to a premature exit. I also don't > > expect too many people will actually hit this, because it's quite a > > specific set of circumstances: I've had to set up an instance under kvm > > and disable viomb(4) to get an empty sensors table, although there might > > be other ways to trigger this. > > Ah, there it is. Our KVM platform is Proxmox, and we go out of our way > to untick the 'memory ballooning' option every time we make a VM. Up to > now, I've been wondering how we managed to have such a unique setup. > > I will probably keep a local build of libagentx for the duration of the > 7.2 lifetime and fan that out, in lieu of turning on memory ballooning > just to get a sensor to exist. Also keep some instances running -current > in our LibreNMS to help catch this sort of thing before next release. > > Thanks for the detailed explaination, and thanks again for the work to > figure out cause+solution. > -Ryan
btw I have VMs under both ESXi and proxmox which run snmpd, and have LibreNMS pointed at them, but I left that setting at default.