Re: rpz testing -> shut down hung fetch while resolving
>> I recently made an upgrade of BIND to version 9.18.11 on our >> resolver cluster, following the recent announcement. Shortly >> thereafter I received reports that the validation that lookups of >> "known entries" in our quite small RPZ feed (it's around 1MB >> on-disk) no longer succeeds as expected, but instead take a long >> time, finally gives SRVFAIL to the client, and associated with >> this we get this log message: >> >> Jan 26 18:41:27 xxx-res named[6179]: shut down hung fetch while resolving >> 'known-rpz-entry.no/A' > > This usually means there's a circular dependency somewhere in the > resolution or validation process. For example, we can't resolve a name > without looking up the address of a name server, but that lookup can't > succeed until the original name is resolved. The two lookups will wait on > each other for ten seconds, and then the whole query times out and issues > that log message. > > The log message is new in 9.18, but the 10-second delay and SERVFAIL > response would probably have happened in earlier releases as well. This turned out to be related to the fact that we had configured query forwarding from two of our nodes to two of the others with the intention to build a larger central cache, and improve query response time for the resolvers which did that forwarding. Once I commented out the query forwarding, this problem no longer occurred. Our forwarding config was of this form: forwarders { 128.39.x.y; 158.38.z.r; }; // But if both are dead (unlikely), do resolution ourselves forward first; This part is now commented out and I've done "rndc reconfig", and the SERVFAIL responses to the "known rpz-blocked entries" no longer occur. But ... the two resolvers will now have to build a cache of their own, and do not benefit from the cache built on the two more "central" nodes. Regards, - Håvard -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: rpz testing -> shut down hung fetch while resolving
On Thu, Jan 26, 2023 at 07:03:37PM +0100, Havard Eidnes via bind-users wrote: > Hi, > > I recently made an upgrade of BIND to version 9.18.11 on our > resolver cluster, following the recent announcement. Shortly > thereafter I received reports that the validation that lookups of > "known entries" in our quite small RPZ feed (it's around 1MB > on-disk) no longer succeeds as expected, but instead take a long > time, finally gives SRVFAIL to the client, and associated with > this we get this log message: > > Jan 26 18:41:27 xxx-res named[6179]: shut down hung fetch while resolving > 'known-rpz-entry.no/A' This usually means there's a circular dependency somewhere in the resolution or validation process. For example, we can't resolve a name without looking up the address of a name server, but that lookup can't succeed until the original name is resolved. The two lookups will wait on each other for ten seconds, and then the whole query times out and issues that log message. The log message is new in 9.18, but the 10-second delay and SERVFAIL response would probably have happened in earlier releases as well. -- Evan Hunt -- e...@isc.org Internet Systems Consortium, Inc. -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
rpz testing -> shut down hung fetch while resolving
Hi, I recently made an upgrade of BIND to version 9.18.11 on our resolver cluster, following the recent announcement. Shortly thereafter I received reports that the validation that lookups of "known entries" in our quite small RPZ feed (it's around 1MB on-disk) no longer succeeds as expected, but instead take a long time, finally gives SRVFAIL to the client, and associated with this we get this log message: Jan 26 18:41:27 xxx-res named[6179]: shut down hung fetch while resolving 'known-rpz-entry.no/A' Initially I thought that this was new behaviour between BIND 9.18.10 and 9.18.11, but after downgrading to 9.18.10 on one of the affected nodes, this problem is still observable there. Also, only a subset of our 4 nodes exhibit this behaviour, despite the unaffected ones running 9.18.11, which is quite strange. None of the name servers are under severe strain by any measure -- one affected sees around 200qps, another around 50qps at the time of writing. I want to ask if this sort of issue is already known (I briefly searched the issues on ISC's gitlab and came up empty), and also to ask if there is any particular sort of information I should collect to narrow this down if it is a new issue. Regards, - Håvard -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users