Cached negative responses
Hi ! I am investigating the memory usage on a BIND 9.9 instance. Yes, I know I should update to 9.11 or 9.12, but the situation there will probably not be a lot different. When checking the BIND XML statistics I see the following in the "Cache DB RRsets" section, which I think could be the reason for the high memory usage: ! 18446744073709551559 !A6 18446744073709551607 Is this an overflow on the counter for these negative responses in Cache, or could there really be that many objects in the cache ? max-ncache-ttl is not set, so should be the default of 3 hours, but I don't really see those numbers decreasing. max-cache-size is set, but I see the named process easily exceeding that limit by not just a bit, sometimes doubling it or even more. How can I better find out what is currently using most of the memory allocated by BIND, instead of try-and-error on some settings to see if that helps ? Regards Marc signature.asc Description: OpenPGP digital signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
BIND 9.11.4 crashing with SIGBUS error shortly after starting
Hi, trying to update from 9.11.3 to 9.11.4 I ran into a strange issue. Shortly after starting BIND, it is crashing with a SIGBUS error. The debugger shows me the following: > ::status debugging core file of named (64-bit) file: named threading model: native threads status: process terminated by SIGBUS (Bus Error), addr=7b2ff92f > $c named`process_cmsg+0x174(11250fda0, 7b2ffa78, 112511da0, 100819118, 3c, 100870440) named`doio_recv+0x958(11250fda0, 112511da0, 8000, 100819118, 3c, 100870440) named`internal_recv+0x50c(106bf8f90, 11250fe58, 11250fe58, 1006e77e0, 7c005240, ff00) named`dispatch+0xb50(100892700, 0, 0, 7c005240, 7c005240, ff00) named`run+0x18(100892700, 0, 0, 1006b64b0, 0, 1) libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0) Checking with truss on the thread that seems to be causing the issue shows: 20325/7:connect(531, 0x10886C3F8, 32, SOV_XPG4_2) = 0 20325/7:sendmsg(531, 0x7AAFEB50, 32768) = 40 20325/7:lwp_park(0x, 0) = 0 20325/7:lwp_park(0x, 0) = 0 20325/7:lwp_park(0x, 0) = 0 20325/7:recvmsg(529, 0x7AAFFA78, 32768) = 28 20325/7:Incurred fault #5, FLTACCESS %pc = 0x1006DC384 20325/7: siginfo: SIGBUS BUS_ADRALN addr=0x7AAFF92F 20325/7:Received signal #10, SIGBUS [default] 20325/7: siginfo: SIGBUS BUS_ADRALN addr=0x7AAFF92F BIND has been compiled as follows (exact compile flags removed): # named -V BIND 9.11.4-P1 (Extended Support Version) running on SunOS sun4v 5.11 11.3 compiled by Solaris Studio 5150 compiled with OpenSSL version: OpenSSL 1.1.0h 27 Mar 2018 linked to OpenSSL version: OpenSSL 1.1.0h 27 Mar 2018 compiled with libxml2 version: 2.9.5 linked to libxml2 version: 20905 compiled with libjson-c version: 0.12 linked to libjson-c version: 0.12 compiled with zlib version: 1.2.11 linked to zlib version: 1.2.11 threads support is enabled The BIND logs don't give any indication of why it is crashing, not even when running it with debug level 3. The environment is: + OS: Solaris 11.3 SRU 32 + Compiler: Oracle DeveloperStudio12.6 + BIND running via chroot As mentioned initially, this has worked fine up until BIND 9.11.3. But using the same built environment and same build flags to build 9.11.4 I started getting these issues. Could someone give a clue whether that's more likely to be an issue with my environment, or in the code ? Regards Marc signature.asc Description: OpenPGP digital signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: strange problem with query being dropped/ignored by the BIND process
Hi Dennis, > Do you have any adjustments in /etc/system ? No. And as mentioned before this is a Solaris 11 system, so /etc/system is (mostly) irrelevant, as the IP settings are all done with ipadm now. > > # ndd -get /dev/ip \? | grep "read" > # ndd -get /dev/tcp \? | grep "read" > That, as well as the script and examples you provided, won't help me a lot, as I am looking at UDP receive buffer overflows, not TCP. I have set udp_max_buf to 4MB now and udp_send_buf & udp_recv_buf to 2MB each, then restarted BIND. It seems to be working better now as I don't see that much receive buffer overflows anymore. However, the initial question still stands. How can I reconfigure BIND to pick up the data faster from the receive buffer ? > Since you are on contract ( me too .. arn't we all these days ) then I > have to assume you have reasonable kernel updates and tcp patches in > this Solaris server ? Yes, of course. Regards Marc ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: strange problem with query being dropped/ignored by the BIND process
Hi again, I have checked this again today. Send & receive buffers are both 1MB, the Server has 8 CPUs and during startup BIND is reporting this: found 8 CPUs, using 8 worker threads using 7 UDP listeners per interface using up to 32768 sockets We only have about 1.500 queries per second on this server. CPU(30%) and memory(50%) usage also is not an issue here. Now Oracle support is saying that the buffer sizes are fine and we need to "speed up the application" to read the data faster from the receive buffer and this prevent package drops. Do you think that is a reasonable statement in this environment ? What would be the best way to "speed up the application" ? Just increase the worker threads ? Regards Marc On 06/28/17 15:31, Marc Richter wrote: > Hi Ben, > > thanks for the answer. > > Yeah, I think you are right. I see a lot of udpInOverflows on the system, > which suggest that the receive buffer is too small indeed. > > Is there any kind of recommendation or best-practice advice what the > buffers should ideally be set to on Solaris ? > I did search the ISC Knowledge Base, but didn't find any useful advice. > > Regards > arc > > On 06/28/17 14:37, Ben Croswell wrote: >> Have you checked deeper at the OS level? I have seen on Linux DNS servers >> silent drops of queries on very busy servers that were exhausting UDP >> receive buffers. >> >> On Jun 28, 2017 10:26 AM, "Marc Richter" <marc.rich...@de.verizon.com >> <mailto:marc.rich...@de.verizon.com>> wrote: >> >> Hi, >> >> we have a setup here consisting of a recursive DNS server and two >> monitoring servers. The monitoring servers sent a test query to the DNS >> server once every two minutes to check if it is answering properly. >> >> We now have the problems that these test queries are timing out from time >> to time, (correctly) resulting in alarms in our monitoring system. >> >> I have checked this now and noticed that each time we see that alarm, the >> query sent by the monitoring server is not being answered at all. >> To debug that I ran tcpdump on both the monitoring server and the >> recursive >> DNS server. I see the query being sent out on the monitoring server and I >> also see the query being received on the DNS server, however there is no >> response sent to this query at all. >> Looking at the query log, which I enabled temporarily, the query is also >> not logged there so it looks like BIND is ignoring that query somewhere, >> although it is properly received by the IP stack of the server. >> >> Do you have any suggestions how to debug this further, to hopefully find >> out where these queries are stuck/dropped/ignored, as I have run out of >> ideas ? >> >> The environment is: >> BIND 9.9.9-P5 (Extended Support Version) >> running on SunOS sun4v 5.11 11.3 >> >> >> Thanks ! >> Marc >> ___ >> Please visit >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwICAg=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=b8p_t6atDvFHu2tWe4Jgw_EvLufZakMUJL0w06aA3V0=bXYnQq1IzLGZG6xbey81qsaTVpqiLVlwxazV8CXVP_A= >> >> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=> >> to unsubscribe from this list >> >> bind-users mailing list >> bind-users@lists.isc.org <mailto:bind-users@lists.isc.org> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwICAg=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=b8p_t6atDvFHu2tWe4Jgw_EvLufZakMUJL0w06aA3V0=bXYnQq1IzLGZG6xbey81qsaTVpqiLVlwxazV8CXVP_A= >> >> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=> >> >> > -- Marc Richter Engr III Cslt-Ntwk Eng Sebrathweg 20 44149 Dortmund Germany O +49 231 972 1293 F +49 231 972 2587 E marc.rich...@de.verizon.com ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: [E] Re: strange problem with query being dropped/ignored by the BIND process
Hi Ben, thanks for the answer. Yeah, I think you are right. I see a lot of udpInOverflows on the system, which suggest that the receive buffer is too small indeed. Is there any kind of recommendation or best-practice advice what the buffers should ideally be set to on Solaris ? I did search the ISC Knowledge Base, but didn't find any useful advice. Regards arc On 06/28/17 14:37, Ben Croswell wrote: > Have you checked deeper at the OS level? I have seen on Linux DNS servers > silent drops of queries on very busy servers that were exhausting UDP > receive buffers. > > On Jun 28, 2017 10:26 AM, "Marc Richter" <marc.rich...@de.verizon.com > <mailto:marc.rich...@de.verizon.com>> wrote: > > Hi, > > we have a setup here consisting of a recursive DNS server and two > monitoring servers. The monitoring servers sent a test query to the DNS > server once every two minutes to check if it is answering properly. > > We now have the problems that these test queries are timing out from time > to time, (correctly) resulting in alarms in our monitoring system. > > I have checked this now and noticed that each time we see that alarm, the > query sent by the monitoring server is not being answered at all. > To debug that I ran tcpdump on both the monitoring server and the > recursive > DNS server. I see the query being sent out on the monitoring server and I > also see the query being received on the DNS server, however there is no > response sent to this query at all. > Looking at the query log, which I enabled temporarily, the query is also > not logged there so it looks like BIND is ignoring that query somewhere, > although it is properly received by the IP stack of the server. > > Do you have any suggestions how to debug this further, to hopefully find > out where these queries are stuck/dropped/ignored, as I have run out of > ideas ? > > The environment is: > BIND 9.9.9-P5 (Extended Support Version) > running on SunOS sun4v 5.11 11.3 > > > Thanks ! > Marc > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=> > to unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org <mailto:bind-users@lists.isc.org> > https://lists.isc.org/mailman/listinfo/bind-users > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=> > > -- Marc Richter Engr III Cslt-Ntwk Eng Sebrathweg 20 44149 Dortmund Germany O +49 231 972 1293 F +49 231 972 2587 E marc.rich...@de.verizon.com ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
strange problem with query being dropped/ignored by the BIND process
Hi, we have a setup here consisting of a recursive DNS server and two monitoring servers. The monitoring servers sent a test query to the DNS server once every two minutes to check if it is answering properly. We now have the problems that these test queries are timing out from time to time, (correctly) resulting in alarms in our monitoring system. I have checked this now and noticed that each time we see that alarm, the query sent by the monitoring server is not being answered at all. To debug that I ran tcpdump on both the monitoring server and the recursive DNS server. I see the query being sent out on the monitoring server and I also see the query being received on the DNS server, however there is no response sent to this query at all. Looking at the query log, which I enabled temporarily, the query is also not logged there so it looks like BIND is ignoring that query somewhere, although it is properly received by the IP stack of the server. Do you have any suggestions how to debug this further, to hopefully find out where these queries are stuck/dropped/ignored, as I have run out of ideas ? The environment is: BIND 9.9.9-P5 (Extended Support Version) running on SunOS sun4v 5.11 11.3 Thanks ! Marc ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users