Cached negative responses

2018-12-20 Thread Marc Richter
Hi !

I am investigating the memory usage on a BIND 9.9 instance.
Yes, I know I should update to 9.11 or 9.12, but the situation there will 
probably not be a lot different.

When checking the BIND XML statistics I see the following in the "Cache DB 
RRsets" section, which I think
could be the reason for the high memory usage:

!   18446744073709551559
!A6 18446744073709551607

Is this an overflow on the counter for these negative responses in Cache, or 
could there really be that many objects in the cache ?

max-ncache-ttl is not set, so should be the default of 3 hours, but I don't 
really see those numbers decreasing.

max-cache-size is set, but I see the named process easily exceeding that limit 
by not just a bit, sometimes doubling it or even more.

How can I better find out what is currently using most of the memory allocated 
by BIND, instead of try-and-error on some settings to see if that helps ?

Regards
Marc





signature.asc
Description: OpenPGP digital signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


BIND 9.11.4 crashing with SIGBUS error shortly after starting

2018-08-29 Thread Marc Richter
Hi,

trying to update from 9.11.3 to 9.11.4 I ran into a strange issue.

Shortly after starting BIND, it is crashing with a SIGBUS error. The
debugger shows me the following:

> ::status
debugging core file of named (64-bit)
file: named
threading model: native threads
status: process terminated by SIGBUS (Bus Error), addr=7b2ff92f

> $c
named`process_cmsg+0x174(11250fda0, 7b2ffa78, 112511da0, 100819118,
3c, 100870440)
named`doio_recv+0x958(11250fda0, 112511da0, 8000, 100819118, 3c, 100870440)
named`internal_recv+0x50c(106bf8f90, 11250fe58, 11250fe58, 1006e77e0,
7c005240, ff00)
named`dispatch+0xb50(100892700, 0, 0, 7c005240, 7c005240,
ff00)
named`run+0x18(100892700, 0, 0, 1006b64b0, 0, 1)
libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)

Checking with truss on the thread that seems to be causing the issue shows:

20325/7:connect(531, 0x10886C3F8, 32, SOV_XPG4_2)   = 0
20325/7:sendmsg(531, 0x7AAFEB50, 32768) = 40
20325/7:lwp_park(0x, 0) = 0
20325/7:lwp_park(0x, 0) = 0
20325/7:lwp_park(0x, 0) = 0
20325/7:recvmsg(529, 0x7AAFFA78, 32768) = 28
20325/7:Incurred fault #5, FLTACCESS  %pc = 0x1006DC384
20325/7:  siginfo: SIGBUS BUS_ADRALN addr=0x7AAFF92F
20325/7:Received signal #10, SIGBUS [default]
20325/7:  siginfo: SIGBUS BUS_ADRALN addr=0x7AAFF92F

BIND has been compiled as follows (exact compile flags removed):

# named -V
BIND 9.11.4-P1 (Extended Support Version) 
running on SunOS sun4v 5.11 11.3
compiled by Solaris Studio 5150
compiled with OpenSSL version: OpenSSL 1.1.0h  27 Mar 2018
linked to OpenSSL version: OpenSSL 1.1.0h  27 Mar 2018
compiled with libxml2 version: 2.9.5
linked to libxml2 version: 20905
compiled with libjson-c version: 0.12
linked to libjson-c version: 0.12
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled

The BIND logs don't give any indication of why it is crashing, not even
when running it with debug level 3.

The environment is:

+ OS: Solaris 11.3 SRU 32
+ Compiler: Oracle DeveloperStudio12.6
+ BIND running via chroot

As mentioned initially, this has worked fine up until BIND 9.11.3. But
using the same built environment and same build flags to build 9.11.4 I
started getting these issues.

Could someone give a clue whether that's more likely to be an issue with my
environment, or in the code ?

Regards
Marc




signature.asc
Description: OpenPGP digital signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: strange problem with query being dropped/ignored by the BIND process

2017-06-29 Thread Marc Richter
Hi Dennis,

> Do you have any adjustments in /etc/system ?

No. And as mentioned before this is a Solaris 11 system, so /etc/system is
(mostly) irrelevant, as the IP settings are all done with ipadm now.

> 
> # ndd -get /dev/ip \? | grep "read"
> # ndd -get /dev/tcp \? | grep "read"
> 

That, as well as the script and examples you provided, won't help me a lot,
as I am looking at UDP receive buffer overflows, not TCP.

I have set udp_max_buf to 4MB now and udp_send_buf & udp_recv_buf to 2MB
each, then restarted BIND.
It seems to be working better now as I don't see that much receive buffer
overflows anymore.

However, the initial question still stands. How can I reconfigure BIND to
pick up the data faster from the receive buffer ?

> Since you are on contract ( me too .. arn't we all these days ) then I
> have to assume you have reasonable kernel updates and tcp patches in
> this Solaris server ?

Yes, of course.

Regards
Marc
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: strange problem with query being dropped/ignored by the BIND process

2017-06-29 Thread Marc Richter
Hi again,

I have checked this again today.

Send & receive buffers are both 1MB, the Server has 8 CPUs and during
startup BIND is reporting this:

found 8 CPUs, using 8 worker threads
using 7 UDP listeners per interface
using up to 32768 sockets

We only have about 1.500 queries per second on this server. CPU(30%) and
memory(50%) usage also is not an issue here.

Now Oracle support is saying that the buffer sizes are fine and we need to
"speed up the application" to read the data faster from the receive buffer
and this prevent package drops.

Do you think that is a reasonable statement in this environment ?
What would be the best way to "speed up the application" ? Just increase
the worker threads ?

Regards
Marc


On 06/28/17 15:31, Marc Richter wrote:
> Hi Ben,
> 
> thanks for the answer.
> 
> Yeah, I think you are right. I see a lot of udpInOverflows on the system,
> which suggest that the receive buffer is too small indeed.
> 
> Is there any kind of recommendation or best-practice advice what the
> buffers should ideally be set to on Solaris ?
> I did search the ISC Knowledge Base, but didn't find any useful advice.
> 
> Regards
> arc
> 
> On 06/28/17 14:37, Ben Croswell wrote:
>> Have you checked deeper at the OS level? I have seen on Linux DNS servers
>> silent drops of queries on very busy servers that were exhausting UDP
>> receive buffers.
>>
>> On Jun 28, 2017 10:26 AM, "Marc Richter" <marc.rich...@de.verizon.com
>> <mailto:marc.rich...@de.verizon.com>> wrote:
>>
>> Hi,
>>
>> we have a setup here consisting of a recursive DNS server and two
>> monitoring servers. The monitoring servers sent a test query to the DNS
>> server once every two minutes to check if it is answering properly.
>>
>> We now have the problems that these test queries are timing out from time
>> to time, (correctly) resulting in alarms in our monitoring system.
>>
>> I have checked this now and noticed that each time we see that alarm, the
>> query sent by the monitoring server is not being answered at all.
>> To debug that I ran tcpdump on both the monitoring server and the 
>> recursive
>> DNS server. I see the query being sent out on the monitoring server and I
>> also see the query being received on the DNS server, however there is no
>> response sent to this query at all.
>> Looking at the query log, which I enabled temporarily, the query is also
>> not logged there so it looks like BIND is ignoring that query somewhere,
>> although it is properly received by the IP stack of the server.
>>
>> Do you have any suggestions how to debug this further, to hopefully find
>> out where these queries are stuck/dropped/ignored, as I have run out of
>> ideas ?
>>
>> The environment is:
>> BIND 9.9.9-P5 (Extended Support Version) 
>> running on SunOS sun4v 5.11 11.3
>>
>>
>> Thanks !
>> Marc
>> ___
>> Please visit 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwICAg=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=b8p_t6atDvFHu2tWe4Jgw_EvLufZakMUJL0w06aA3V0=bXYnQq1IzLGZG6xbey81qsaTVpqiLVlwxazV8CXVP_A=
>>  
>> 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=>
>> to unsubscribe from this list
>>
>> bind-users mailing list
>> bind-users@lists.isc.org <mailto:bind-users@lists.isc.org>
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwICAg=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=b8p_t6atDvFHu2tWe4Jgw_EvLufZakMUJL0w06aA3V0=bXYnQq1IzLGZG6xbey81qsaTVpqiLVlwxazV8CXVP_A=
>>  
>> 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=>
>>
>>
> 

-- 
Marc Richter
Engr III Cslt-Ntwk Eng

Sebrathweg 20
44149 Dortmund
Germany

O +49 231 972 1293
F +49 231 972 2587
E marc.rich...@de.verizon.com
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: [E] Re: strange problem with query being dropped/ignored by the BIND process

2017-06-28 Thread Marc Richter
Hi Ben,

thanks for the answer.

Yeah, I think you are right. I see a lot of udpInOverflows on the system,
which suggest that the receive buffer is too small indeed.

Is there any kind of recommendation or best-practice advice what the
buffers should ideally be set to on Solaris ?
I did search the ISC Knowledge Base, but didn't find any useful advice.

Regards
arc

On 06/28/17 14:37, Ben Croswell wrote:
> Have you checked deeper at the OS level? I have seen on Linux DNS servers
> silent drops of queries on very busy servers that were exhausting UDP
> receive buffers.
> 
> On Jun 28, 2017 10:26 AM, "Marc Richter" <marc.rich...@de.verizon.com
> <mailto:marc.rich...@de.verizon.com>> wrote:
> 
> Hi,
> 
> we have a setup here consisting of a recursive DNS server and two
> monitoring servers. The monitoring servers sent a test query to the DNS
> server once every two minutes to check if it is answering properly.
> 
> We now have the problems that these test queries are timing out from time
> to time, (correctly) resulting in alarms in our monitoring system.
> 
> I have checked this now and noticed that each time we see that alarm, the
> query sent by the monitoring server is not being answered at all.
> To debug that I ran tcpdump on both the monitoring server and the 
> recursive
> DNS server. I see the query being sent out on the monitoring server and I
> also see the query being received on the DNS server, however there is no
> response sent to this query at all.
> Looking at the query log, which I enabled temporarily, the query is also
> not logged there so it looks like BIND is ignoring that query somewhere,
> although it is properly received by the IP stack of the server.
> 
> Do you have any suggestions how to debug this further, to hopefully find
> out where these queries are stuck/dropped/ignored, as I have run out of
> ideas ?
> 
> The environment is:
> BIND 9.9.9-P5 (Extended Support Version) 
> running on SunOS sun4v 5.11 11.3
> 
> 
> Thanks !
> Marc
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users
> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=>
> to unsubscribe from this list
> 
> bind-users mailing list
> bind-users@lists.isc.org <mailto:bind-users@lists.isc.org>
> https://lists.isc.org/mailman/listinfo/bind-users
> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.isc.org_mailman_listinfo_bind-2Dusers=DwMFaQ=udBTRvFvXC5Dhqg7UHpJlPps3mZ3LRxpb6__0PomBTQ=wDgZv-d1RrBMzWr_7pSF_09ZAXIr59EgoXQU4ctOHMk=t6jk-SZ5v_kNlupaNbpfob7Dm6Iddy_gUndDBwWnkmc=Ko40xVILMIdx3tQ9ElkdPqboTH8RpH1ZKJ4ZXcGp9NM=>
> 
> 

-- 
Marc Richter
Engr III Cslt-Ntwk Eng

Sebrathweg 20
44149 Dortmund
Germany

O +49 231 972 1293
F +49 231 972 2587
E marc.rich...@de.verizon.com
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


strange problem with query being dropped/ignored by the BIND process

2017-06-28 Thread Marc Richter
Hi,

we have a setup here consisting of a recursive DNS server and two
monitoring servers. The monitoring servers sent a test query to the DNS
server once every two minutes to check if it is answering properly.

We now have the problems that these test queries are timing out from time
to time, (correctly) resulting in alarms in our monitoring system.

I have checked this now and noticed that each time we see that alarm, the
query sent by the monitoring server is not being answered at all.
To debug that I ran tcpdump on both the monitoring server and the recursive
DNS server. I see the query being sent out on the monitoring server and I
also see the query being received on the DNS server, however there is no
response sent to this query at all.
Looking at the query log, which I enabled temporarily, the query is also
not logged there so it looks like BIND is ignoring that query somewhere,
although it is properly received by the IP stack of the server.

Do you have any suggestions how to debug this further, to hopefully find
out where these queries are stuck/dropped/ignored, as I have run out of ideas ?

The environment is:
BIND 9.9.9-P5 (Extended Support Version) 
running on SunOS sun4v 5.11 11.3


Thanks !
Marc
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users