I upgraded our DNS servers when the 9.18.28 release came out, and ran into a
problem today that I wanted to know if anyone else had seen or had any
suggestions about how to debug.
We have our DNS configured in a hidden primary configuration, where the primary
has internal and external views and serves and internal and external copy of
one of our domains. The external version is fairly small, while the internal
version is significantly larger. We use the same DNSSEC keys to sign both
versions of the domain. Every once in a while, we have encountered an issue
where the unsigned and signed versions of the domain get out of sync, which
causes this message to appear in our logs (note that I have modified all of the
following log entries to replace our domain with example.org):
25-Jul-2024 10:12:32.202 general: error: zone example.org/IN/internal (signed):
receive_secure_serial: not exact
The solution I’ve always been able to follow previously is to comment out the
DNSSEC config options in named.conf, restart named with the zone unsigned,
retransfer the unsigned zone to our secondaries, and then put back the DNSSEC
config options, restart named, and let it re-sign the zone. It takes a little
bit, but normally everything has then gotten back to normal.
Today, however, when I tried to do that, it started to sign the zone – and then
named just hung. It stopped updating any of the log files, stopped sending any
notifies, and stopped returning DNS data of any sort. When I tried to restart
named via systemctl it had to kill the process because named would not respond.
I was able to undo the DNSSEC changes, restart named, and it continued to
work. I tried it again, and named hung once again in the middle of signing the
zone. Throughout all of these restarts, the signed version of the external
zone continued to work normally.
This is frustrating because when named hangs, there are no error messages in
the logs that I can see, and no indication of why it has failed. If I try
running rndc commands locally I get this error:
rndc: recv failed: timed out
Remote servers show a timeout and then I saw this in some of their transfer
logs:
25-Jul-2024 10:27:01.827 general: info: zone example.org/IN: refresh: skipping
zone transfer as primary A.B.C.D#53 (source E.F.G.H#0) is unreachable (cached)
I was able to solve that one by sending notifies from the primary after
restarting it without DNSSEC, but I really need to get DNSSEC working again.
The configuration for the zone in named.conf is (and yes, I know I need to
update to dnssec-policy):
view "internal" {
...
zone "example.org" {
type primary;
file "/path/to/internal/example.org";
key-directory "/path/to/keys";
auto-dnssec maintain;
inline-signing yes;
};
...
};
Does anyone have any suggestions for putting named into a debug mode to try to
get more data if it hangs again? I was thinking of turning the DNSSEC options
back on but setting “notify no” so it didn’t try to notify the secondaries in
case all of the notifies and zone transfers going on while it was signing was
part of the problem.
The memory and CPU resources of the system should be sufficient – it’s got 2
virtual CPUs and 8GB of memory, but it’s not close to using up the memory, and
since it doesn’t have clients, the CPU has never been an issue before. I tried
replicating this issue on our test server but it managed to sign the zone with
no problems – though it doesn’t have as many clients.
I don’t think the new max-records-per-type or max-types-per-name options are
involved as we don’t have any cases where we have that many records with the
same name.
Thanks,
Brian
--
Brian Sebby (he/him/his) | Lead Systems Engineer
Email: [email protected]<mailto:[email protected]> | Information Technology
Infrastructure
Phone: +1 630.252.9935 | Business Information Services
Cell: +1 630.921.4305 | Argonne National Laboratory
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from
this list
ISC funds the development of this software with paid support subscriptions.
Contact us at https://www.isc.org/contact/ for more information.
bind-users mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/bind-users