Re: Bug - remote DNS monitoring

2022-09-13 Thread Casey Deccio


> On Aug 30, 2022, at 1:12 PM, Casey Deccio  wrote:
> 
> I am having trouble tracking down a bug in my monitoring setup.  It all 
> happened when I upgraded the monitored host (host B in my example below) to 
> bullseye.  Note that Host A is also running bullseye, but the problem didn't 
> show itself until Host B was upgraded.

With some help over at the bind-users mailing list [1], I discovered that 
nrpe-ng closes stdin when launching the command [2], and the new version of 
nslookup (invoked by check_dns) has issues when stdin is closed [3].

Redirecting stdin to /dev/null fixes the issue:

$ diff -u /usr/lib/python3/dist-packages/nrpe_ng/commands.py{.old,}
--- /usr/lib/python3/dist-packages/nrpe_ng/commands.py.old  2017-08-08 
13:05:02.0 -0600
+++ /usr/lib/python3/dist-packages/nrpe_ng/commands.py  2022-09-13 
17:00:36.767239885 -0600
@@ -85,6 +85,7 @@

 proc = tornado.process.Subprocess(
 run_args,
+stdin=subprocess.DEVNULL,
 stdout=tornado.process.Subprocess.STREAM,
 close_fds=True,
 env=env)

I've filed a bug report [4].

Thanks,
Casey

[1] https://lists.isc.org/pipermail/bind-users/2022-September/10.html
[2] https://github.com/bootc/nrpe-ng/blob/master/nrpe_ng/commands.py#L86
[3] https://github.com/libuv/libuv/blob/v1.x/src/unix/core.c#L602
[4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019718


Re: Bug - remote DNS monitoring

2022-08-30 Thread Casey Deccio

> On Aug 30, 2022, at 1:40 PM, Nicholas Geovanis  wrote:
> 
> When you run check_dns by hand on Host B, you don't say who you are logged-in 
> as. That can make a difference. Nagios runs its scripts in a known 
> environment which may be different than you expect.
> 


Thanks for the question.  I have run the check_dns script with:

 - an arbitrary, unprivileged user
 - the nagios user (also unprivileged)
 - root (gasp!)

They all work just fine.  Also, in every case, I run tcpdump, and I can see the 
DNS queries and responses going back and forth just fine.  In the strace 
messages, I can also see that the DNS messages were written and read properly.  
I think the issue is in nslookup, some time *after* the send/recv.  But I can't 
narrow it down much more than that.

Casey

Re: Bug - remote DNS monitoring

2022-08-30 Thread Nicholas Geovanis
On Tue, Aug 30, 2022, 2:13 PM Casey Deccio  wrote:

> Hi all,
>
> I am having trouble tracking down a bug in my monitoring setup.  It all
> happened when I upgraded the monitored host (host B in my example below) to
> bullseye.  Note that Host A is also running bullseye, but the problem
> didn't show itself until Host B was upgraded.
>
> Here is the setup:
>
> Host A (monitoring):
> Installed: nagios4, nrpe-ng
> IP address: 192.0.2.1
>
> Host B (monitored):
> Installed: nrpe-ng, monitoring-plugins-standard, bind9-dnsutils
> IP address: 192.0.2.2
>
> Host C (monitored through host B):
> Installed: bind9
> IP address: 192.0.2.3
> Configured to answer authoritatively for example.com on port 53.
>
>  nrpe
> over HTTPs  DNS
> Host A --> Host B -> Host C
>

When you run check_dns by hand on Host B, you don't say who you are
logged-in as. That can make a difference. Nagios runs its scripts in a
known environment which may be different than you expect.

On Host B, I run the following:
> sudo /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config
> /etc/nagios/nrpe-ng.cfg
>
> While that is running, I run the following on Host A:
> /usr/lib/nagios/plugins/check_nrpe_ng -H 192.0.2.2 -c check_dns -a
> example.com 192.0.2.3 0.1 1.0
>
> The result of running the command on Host A is:
> DNS CRITICAL - '/usr/bin/nslookup -sil' msg parsing exited with no address
>
> On Host B, I see the following debug output:
> 200 POST /v1/check/check_dns (192.0.2.1) 78.05ms
> Executing: /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3
> -A -w 0.1 -c 1.0
>
> When I run this exact command on Host B, I get:
> $ /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1
> -c 1.0
> DNS OK: 0.070 seconds response time. example.com returns
> 192.0.2.10,2001:db8::10|time=0.069825s;0.10;1.00;0.00
>
> Looks good!  When I run nslookup (run by check_dns), it looks good too:
> $ /usr/bin/nslookup -sil example.com 192.0.2.3
> Server: 192.0.2.3
> Address: 192.0.2.3#53
>
> Name: example.com
> Address: 192.0.2.10
> Name: example.com
> Address: 2001:db8::10
>
> After rerunning nrpe-ng with strace -f, I see something:
>
> [pid 1183842] write(2, "nslookup: ./src/unix/core.c:570:"..., 83) = 83
> ...
> [pid 1183841] read(4, "nslookup: ./src/unix/core.c:570:"..., 4096) = 83
>
> So it appears that the nslookup process is reporting an error.  But I
> cannot reproduce it outside of nrpe-ng.
>
> Any suggestions?
>
> Casey
>


Bug - remote DNS monitoring

2022-08-30 Thread Casey Deccio
Hi all,

I am having trouble tracking down a bug in my monitoring setup.  It all 
happened when I upgraded the monitored host (host B in my example below) to 
bullseye.  Note that Host A is also running bullseye, but the problem didn't 
show itself until Host B was upgraded.

Here is the setup:

Host A (monitoring):
Installed: nagios4, nrpe-ng
IP address: 192.0.2.1

Host B (monitored):
Installed: nrpe-ng, monitoring-plugins-standard, bind9-dnsutils
IP address: 192.0.2.2

Host C (monitored through host B):
Installed: bind9
IP address: 192.0.2.3
Configured to answer authoritatively for example.com on port 53.

 nrpe
over HTTPs  DNS
Host A --> Host B -> Host C

On Host B, I run the following:
sudo /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config 
/etc/nagios/nrpe-ng.cfg

While that is running, I run the following on Host A:
/usr/lib/nagios/plugins/check_nrpe_ng -H 192.0.2.2 -c check_dns -a example.com 
192.0.2.3 0.1 1.0

The result of running the command on Host A is:
DNS CRITICAL - '/usr/bin/nslookup -sil' msg parsing exited with no address

On Host B, I see the following debug output:
200 POST /v1/check/check_dns (192.0.2.1) 78.05ms
Executing: /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 
0.1 -c 1.0

When I run this exact command on Host B, I get:
$ /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1 -c 1.0
DNS OK: 0.070 seconds response time. example.com returns 
192.0.2.10,2001:db8::10|time=0.069825s;0.10;1.00;0.00

Looks good!  When I run nslookup (run by check_dns), it looks good too:
$ /usr/bin/nslookup -sil example.com 192.0.2.3
Server: 192.0.2.3
Address:192.0.2.3#53

Name:   example.com
Address: 192.0.2.10
Name:   example.com
Address: 2001:db8::10

After rerunning nrpe-ng with strace -f, I see something:

[pid 1183842] write(2, "nslookup: ./src/unix/core.c:570:"..., 83) = 83
...
[pid 1183841] read(4, "nslookup: ./src/unix/core.c:570:"..., 4096) = 83

So it appears that the nslookup process is reporting an error.  But I cannot 
reproduce it outside of nrpe-ng.

Any suggestions?

Casey