On 2022/11/01 16:03, Ryan Freeman wrote:
> On Tue, Nov 01, 2022 at 11:04:03AM +0100, Martijn van Duren wrote:
> > On Mon, 2022-10-31 at 20:14 -0700, Ryan Freeman wrote:
> > > 
> > > I can confirm the snmpd process is no-longer disappearing with this
> > > patch.  Almost 24 hours on one VM and 16 hours on another. Thanks!
> > > 
> > > -Ryan
> > 
> > To be complete, what happens is the following:
> > - snmpd sends a getnext request to the backend on a scalar
> > - libagentx increments the current OID to the OID of the table
> >   column following the scalar, which contains no elements and after
> >   reaching the last column it has reached the end OID of the original
> >   request, resulting in an endOfMibView, but forgets to reset the OID to
> >   the original start OID (as per RFC3416 section 4.2.2)
> > - snmpd validates the output from the backend and sees that the
> >   OID of the EOMV doesn't match the requested OID and decides that
> >   it doesn't trust the backend anymore; It is then closed with a
> >   "too many parse errors" notification.
> > - Upon the closing of the agentx socket the backend shuts itself
> >   down:
> >   1) It gets its fd from snmpd and it doesn't know where to connect
> >      to.
> >   2) We don't want lingering processes if snmpd itself goes away
> > - Once a backend disappears snmpd shuts itself down. Basically for
> >   the fail, fail loud reasons.
> > 
> > Note that this only goes for backends under libexec/snmpd, not for
> > backends that connect over the agentx listener, like vmd or relayd.
> > 
> > So there's no crash, just a backend that's being kicked for returning a
> > non-compliant varbind, which escalated to a premature exit. I also don't
> > expect too many people will actually hit this, because it's quite a
> > specific set of circumstances: I've had to set up an instance under kvm
> > and disable viomb(4) to get an empty sensors table, although there might
> > be other ways to trigger this.
> 
> Ah, there it is.  Our KVM platform is Proxmox, and we go out of our way
> to untick the 'memory ballooning' option every time we make a VM.  Up to
> now, I've been wondering how we managed to have such a unique setup.
> 
> I will probably keep a local build of libagentx for the duration of the
> 7.2 lifetime and fan that out, in lieu of turning on memory ballooning
> just to get a sensor to exist.  Also keep some instances running -current
> in our LibreNMS to help catch this sort of thing before next release.
> 
> Thanks for the detailed explaination, and thanks again for the work to
> figure out cause+solution.
> -Ryan

btw I have VMs under both ESXi and proxmox which run snmpd, and have
LibreNMS pointed at them, but I left that setting at default.

Reply via email to