Re: [Dnsmasq-discuss] Internal error in cache
Am Fr., 24. Dez. 2021 um 20:12 Uhr schrieb Simon Kelley < si...@thekelleys.org.uk>: > > Reassurance that the bug is fixed for you too would be appreciated. > It looks like it's fixed now. In the past, it took ~12h to trigger the issue. It can be related to my configuration, 300 cache entries and an adblock list with 50k entries like 'address=/googleanalytics.com/'. When I run Steve Gibson's DNS benchmark utility, the issue is triggered immediately. The utility sends ~350 DNS queries to the local DNS server/resolver. ~100 must fail with NXDOMAIN. Regards and Merry Christmas, Hartmut ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Internal error in cache
On 24/12/2021 09:24, Hartmut Birr wrote: > It looks like that > > Commit: 1ce1c6beae9f683bec54cba4c0d375f85b209b95 > Caching cleanup. Use cached NXDOMAIN to answer queries of any type. > > does introduce the error. > > This pre-commits are fine: > > Commit: 51d56df7a3a125e117b3278cab16281c85500287 > Add RFC 4833 DHCP options "posix-timezone" and "tzdb-timezone". > > Commit: cac9ca38f62437c65464f58fc54342c7f294c40b > Treat ANY queries the same as CNAME queries WRT to DNSSEC on CNAME targets. > > Regards, > Hartmut > Nice work finding that. My hypothesis on this goes like this. 1) The "internal error" is triggered during cache insertion when the cache is full, and a record has to be deleted. cache_scan_free() gets called with the contents of the least recently used record in the cache and it deletes all instances of this (so, all A records of the correct name, or all records or whatever). 2) Since there's at least one record which should have been deleted by this (the least recently used record that started the process) then after this process there should be at least one free cache record and the insertion can be retried and should succeed. If nothing gets deleted by cache_scan_free then there will again be no free records, and rather than going into an infinite loop, the internal error gets logged and insertion is abandoned. 3) The commit you found changes the way NXDOMAIN records are stored: These used to be stored with a type, If a query for an A record returned NXDOMAIN then a cache record would be stored with F_NXDOMAIN and F_IPV4 set in the flags. This is a historical ananchronism. If the domain doesn't exist it doesn't exist for all query types. The code therefore now stores a cache entry with only F_NXDOMAIN set, and that's good to answer a query of any type. 4) The problem is that cache_scan_free() fails to delete a cache record with only F_NXDOMAIN set, so if such a record fall to the end of the LRU list and then needs to be deleted, the deletion will fail and the internal error is triggered. Given the above, I found a way to reproduce the bug: start dnsmasq with a small cache, then make more queries which have NXDOMAIN answers than the size of the cache. The cache_size+1'th query triggers the bug. The fix is tiny, and fixes the problem for me, at least for my method of reproduction. Please see https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=ea33a0130366d316f01be4c891e4f5b247f97171 Reassurance that the bug is fixed for you too would be appreciated. Cheers, and Happy Christmas. Simon. > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Internal error in cache
It looks like that Commit: 1ce1c6beae9f683bec54cba4c0d375f85b209b95 Caching cleanup. Use cached NXDOMAIN to answer queries of any type. does introduce the error. This pre-commits are fine: Commit: 51d56df7a3a125e117b3278cab16281c85500287 Add RFC 4833 DHCP options "posix-timezone" and "tzdb-timezone". Commit: cac9ca38f62437c65464f58fc54342c7f294c40b Treat ANY queries the same as CNAME queries WRT to DNSSEC on CNAME targets. Regards, Hartmut ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Internal error in cache
Am 19.12.2021 um 14:49 schrieb Dominik Derigs: Hey Hartmut, I'm using dnsmasq on OpenWrt. Since update dnsmasq from commit 51d56df7a3a125e117b3278cab16281c85500287 Add RFC 4833 DHCP options "posix-timezone" and "tzdb-timezone". to commit 4ac517e4ac19eca65910c145868914587ea46b3b Fix coverity issues in dnssec.c I get the following error message: Sun Dec 19 12:22:25 2021 daemon.err dnsmasq[3321]: Internal error in cache. This is a somewhat concerning warning and points to a bug in the cache. I'm not very familiar with OpenWrt, can you pick any individual commit or are you limited to specific ones? I override the used version from openwrt by the latest version from dnsmasq repo. Sometimes I've to delete or adjust a few patches from openwrt. In the former case, would you be willing to test a few more commits in between them? This would allow us to isolate the cause to the commit introducing the error. I will try to test the commits in between. This occurs ~12 hours after booting the router. This suggests maybe a correlation with a domain that is early requested and has a TTL of 12 hours (entirely hypothetical at this point). Currently I'm using this version: commit 1176cd58c90fc37bf98a6f774b26fc1adc8fd8e9 Fix regression in --rebind-domain-ok in 2.86 Does it show the error? I guess the answer is yes as this is the most recent commit. Yes it does show the error too. ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] Internal error in cache
Hey Hartmut, > I'm using dnsmasq on OpenWrt. Since update dnsmasq from > > commit 51d56df7a3a125e117b3278cab16281c85500287 > Add RFC 4833 DHCP options "posix-timezone" and "tzdb-timezone". > > to > > commit 4ac517e4ac19eca65910c145868914587ea46b3b > Fix coverity issues in dnssec.c > > I get the following error message: > > Sun Dec 19 12:22:25 2021 daemon.err dnsmasq[3321]: Internal > error in cache. This is a somewhat concerning warning and points to a bug in the cache. I'm not very familiar with OpenWrt, can you pick any individual commit or are you limited to specific ones? In the former case, would you be willing to test a few more commits in between them? This would allow us to isolate the cause to the commit introducing the error. > This occurs ~12 hours after booting the router. This suggests maybe a correlation with a domain that is early requested and has a TTL of 12 hours (entirely hypothetical at this point). > Currently I'm using this version: > > commit 1176cd58c90fc37bf98a6f774b26fc1adc8fd8e9 > Fix regression in --rebind-domain-ok in 2.86 Does it show the error? I guess the answer is yes as this is the most recent commit. Best, Dominik ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
[Dnsmasq-discuss] Internal error in cache
Hi, I'm using dnsmasq on OpenWrt. Since update dnsmasq from commit 51d56df7a3a125e117b3278cab16281c85500287 Add RFC 4833 DHCP options "posix-timezone" and "tzdb-timezone". to commit 4ac517e4ac19eca65910c145868914587ea46b3b Fix coverity issues in dnssec.c I get the following error message: Sun Dec 19 12:22:25 2021 daemon.err dnsmasq[3321]: Internal error in cache. This occurs ~12 hours after booting the router. Currently I'm using this version: commit 1176cd58c90fc37bf98a6f774b26fc1adc8fd8e9 Fix regression in --rebind-domain-ok in 2.86 Any idea why this occurs? Regards, Hartmut ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss