Re: BIND and UDP tuning
On 9/27/18, Alex wrote: > Hi, > >> Just a wild thought: >> It works with a lower speed line (at least I read it that way) but has >> problems with higher speeds. >> Could it be that the line is so fast that it "overtakes" the host in >> question? >> >> A faster incoming line will give less time between the packets for >> processing. > > No, I actually upgraded from a 65/20mbit to a 165/35mbit recently, > thinking it was too slow because it was happening at the slower speeds > as well. I've also implemented some basic QoS to throttle outgoing > smtp and prioritize DNS but it made no difference. Has your provider enabled qos? I'd bet their dropping packets that exceed qos rate limits would be considered "working as expected". Which brings up the question of exactly what does SERVFAIL mean? Can no response to a query result in SERVFAIL? Is there a way to tell the difference between no response & getting a response indicating a failure? Lee ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
Hi Alex, Have you tried on a separate physical server? To rule out the actual hardware as being the problem? Is this some user grade PC with either onboard or external ethernet interface, or a proper server grade equipment? Age of equipment? What else does that machine do? Cheers On 28/09/2018 02:07, Alex wrote: > Hi, > >> Just a wild thought: >> It works with a lower speed line (at least I read it that way) but has >> problems with higher speeds. >> Could it be that the line is so fast that it "overtakes" the host in >> question? >> >> A faster incoming line will give less time between the packets for >> processing. > > No, I actually upgraded from a 65/20mbit to a 165/35mbit recently, > thinking it was too slow because it was happening at the slower speeds > as well. I've also implemented some basic QoS to throttle outgoing > smtp and prioritize DNS but it made no difference. > > Thanks, > Alex > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users -- Kind Regards, Noel Butler This Email, including any attachments, may contain legally privileged information, therefore remains confidential and subject to copyright protected under international law. You may not disseminate, discuss, or reveal, any part, to anyone, without the authors express written authority to do so. If you are not the intended recipient, please notify the sender then delete all copies of this message including attachments, immediately. Confidentiality, copyright, and legal privilege are not waived or lost by reason of the mistaken delivery of this message. Only PDF [1] and ODF [2] documents accepted, please do not send proprietary formatted documents Links: -- [1] http://www.adobe.com/ [2] http://en.wikipedia.org/wiki/OpenDocument___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
Hi, > Just a wild thought: > It works with a lower speed line (at least I read it that way) but has > problems with higher speeds. > Could it be that the line is so fast that it "overtakes" the host in question? > > A faster incoming line will give less time between the packets for processing. No, I actually upgraded from a 65/20mbit to a 165/35mbit recently, thinking it was too slow because it was happening at the slower speeds as well. I've also implemented some basic QoS to throttle outgoing smtp and prioritize DNS but it made no difference. Thanks, Alex ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
When we ran into UDP tuning issues on high traffic devices it presented as silent discards rather than SERVFAIL. On Thu, Sep 27, 2018, 12:04 PM Alex wrote: > Hi, > > > On Thu, Sep 27, 2018 at 10:53:25AM -0400, Alex wrote: > > > Many of these values I've already tweaked and have had no effect on my > > > SERVFAIL issues :-( > > > > If you are getting SERVFAILs from a BIND resolver you administer, then > > it has responded to your query. If you turn up the log level to > > something like -d 99, it'll print the steps that led to that SERVFAIL. > > Usually you'll find something there that directs you to next steps. > > > > On this topic, my home resolver is also a stock packaged BIND version as > > you, and I too see spurious SERVFAILs sometimes. I used to think this > > was due to too much indirection, e.g., when named starts up and you run: > > > > dig -x 176.9.81.50 > > It doesn't typically happen when running from the command-line. It > does occasionally happen, though. I usually run something like "dig > +all +trace +nodnssec ". It sometimes times out in the > middle, with something like "cannot resolve xyz host", which may even > be one of the root servers. > > I also typically run it with "rndc trace 11" which shows me quite a > bit of debugging info - too much to look through manually. With trace > 99, I can imagine it being overwhelming amount of info. Do you have > any ideas of what to look for? "query-errors"? > > Also, I also see other SERVFAIL errors that really are SERVFAIL errors > - when querying the host manually, it still responds immediately with > SERVFAIL. > > Thanks, > Alex > > > > > > > on a cold cache. However it seems to be returning SERVFAIL sometimes for > > what should be a cached answer. I'll also turn up the debug logging and > > watch it. > > > > Mukund > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to > unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users > ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
Hi, > > This is also only happening on the two identical systems connected > > to the 165/35mbit cable modem. > > ... > > I really hope there is > someone with some additional ideas. > > Is it the modem? No, it's been replaced at least once, and I've been assured by both the cable tech that was here and the dimwits on the other end that it's operating normally. I really wish it were that easy. Thanks, Alex > > -- > > 73, > Ged. > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
Hi, > On Thu, Sep 27, 2018 at 10:53:25AM -0400, Alex wrote: > > Many of these values I've already tweaked and have had no effect on my > > SERVFAIL issues :-( > > If you are getting SERVFAILs from a BIND resolver you administer, then > it has responded to your query. If you turn up the log level to > something like -d 99, it'll print the steps that led to that SERVFAIL. > Usually you'll find something there that directs you to next steps. > > On this topic, my home resolver is also a stock packaged BIND version as > you, and I too see spurious SERVFAILs sometimes. I used to think this > was due to too much indirection, e.g., when named starts up and you run: > > dig -x 176.9.81.50 It doesn't typically happen when running from the command-line. It does occasionally happen, though. I usually run something like "dig +all +trace +nodnssec ". It sometimes times out in the middle, with something like "cannot resolve xyz host", which may even be one of the root servers. I also typically run it with "rndc trace 11" which shows me quite a bit of debugging info - too much to look through manually. With trace 99, I can imagine it being overwhelming amount of info. Do you have any ideas of what to look for? "query-errors"? Also, I also see other SERVFAIL errors that really are SERVFAIL errors - when querying the host manually, it still responds immediately with SERVFAIL. Thanks, Alex > > on a cold cache. However it seems to be returning SERVFAIL sometimes for > what should be a cached answer. I'll also turn up the debug logging and > watch it. > > Mukund ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
Hi there, On Thu, 27 Sep 2018, Alex wrote This is also only happening on the two identical systems connected to the 165/35mbit cable modem. ... I really hope there is > someone with some additional ideas. Is it the modem? -- 73, Ged. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
On Thu, Sep 27, 2018 at 10:53:25AM -0400, Alex wrote: > Many of these values I've already tweaked and have had no effect on my > SERVFAIL issues :-( If you are getting SERVFAILs from a BIND resolver you administer, then it has responded to your query. If you turn up the log level to something like -d 99, it'll print the steps that led to that SERVFAIL. Usually you'll find something there that directs you to next steps. On this topic, my home resolver is also a stock packaged BIND version as you, and I too see spurious SERVFAILs sometimes. I used to think this was due to too much indirection, e.g., when named starts up and you run: dig -x 176.9.81.50 on a cold cache. However it seems to be returning SERVFAIL sometimes for what should be a cached answer. I'll also turn up the debug logging and watch it. Mukund ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
On 27/09/2018 16.53, Alex wrote: > Hi, > >>> I reported a few weeks ago that I was experiencing a really high >>> number of "SERVFAIL" messages in my bind-9.11.4-P1 system running on >>> fedora28, and I haven't yet found a solution. This is all now running >>> on a 165/35 cable system. >>> >>> I found a program named dropwatch which is showing a significant >>> number of dropped UDP packets, particularly when there are bursts of >>> email traffic: >>> >>> 12 drops at skb_queue_purge+13 (0x9f79a0c3) >>> 1 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6) >>> 4 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6) >>> 5 drops at nf_hook_slow+a7 (0x9f7faff7) >>> 3 drops at sk_stream_kill_queues+48 (0x9f7a1158) >>> 3 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6) >>> ... >>> >>> # netstat -us >>> ... >>> Udp: >>> 23449482 packets received >>> 1724269 packets to unknown port received >>> 8248 packet receive errors >>> 31394909 packets sent >>> 8243 receive buffer errors >>> 0 send buffer errors >>> InCsumErrors: 5 >>> IgnoredMulti: 43247 >>> >>> The SERVFAIL messages don't necessarily correspond to the UDP packet >>> errors shown by netstat, but the dropwatch output is continuous. The >>> netstat packet receive errors also don't seem to correspond to >>> "SERVFAIL" or "Name service" errors: >>> >>> 26-Sep-2018 12:42:49.743 query-errors: info: client @0x7fb3c41634d0 >>> 127.0.0.1#44104 (46.36.47.104.wl.mailspike.net): query failed >>> (SERVFAIL) for 46.36.47.104.wl.mailspike.net/IN/A at >>> ../../../bin/named/query.c:8580 >>> >>> Sep 26 12:47:11 mail03 postfix/dnsblog[22821]: warning: dnsblog_query: >>> lookup error for DNS query 196.91.107.80.bl.spameatingmonkey.net: Host >>> or domain name not found. Name service error for >>> name=196.91.107.80.bl.spameatingmonkey.net type=A: Host not found, try >>> again >>> >>> I've been following this thread from some time ago, but nothing I've >>> done has made a difference. I really don't know what the buffer sizes >>> should be. >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bind-2Dusers- >>> 2Dforum.2342410.n4.nabble.com_Tuning-2Dsuggestions-2Dfor-2Dhigh-2Dcore- >>> 2Dcount-2DLinux-2Dservers- >>> 2Dtd3899.html&d=DwICAg&c=MOptNlVtIETeDALC_lULrw&r=udvvbouEjrWNUMab5xo_vLb >>> UE6LRGu5fmxLhrDvVJS8&m=5XQNuuRQ4kxK03zqoWaJHIdaJvNdsyTKHuFlDKedbpc&s=5Dqh >>> ne-5w5V_1coBTBvTITwK2EFeankOegTaofy8S5w&e= >>> >>> Are there specific bind tunables you might recommend? edns-udp-size, >>> perhaps? >>> >>> Any ideas on other tunables such as net.core.*mem_default etc? >> *chuckles to self* >> >> I was just referring back to that thread myself to try remember what I did. >> >> I ended up tuning the following items: >> >> - name: SYSCTL system tuning, basics >> sysctl: >> name: "{{ item.name }}" >> value: "{{ item.value }}" >> sysctl_set: yes >> state: present >> with_items: >> - { name: 'vm.swappiness', value: 0 } >> - { name: 'net.core.netdev_max_backlog', value: 32768 } >> - { name: 'net.core.netdev_budget', value: 2700 } >> - { name: 'net.ipv4.tcp_sack', value: 0 } >> - { name: 'net.core.somaxconn', value: 2048 } >> - { name: 'net.core.rmem_default', value: 16777216 } >> - { name: 'net.core.rmem_max', value: 16777216 } >> - { name: 'net.core.wmem_default', value: 16777216 } >> - { name: 'net.core.wmem_max', value: 16777216 } > Were you troubleshooting the same problems as I'm experiencing? > > Many of these values I've already tweaked and have had no effect on my > SERVFAIL issues :-( > > I've also been following the performance tuning variables in this RH document: > https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf > > These errors appear to occur in spurts - there is typically ten or > more in a row at a time, then any number of minutes/seconds before the > next one. > > It looks like there are periods of as many as 500 queries per second, > although the usual amount is closer to 200 per second. > > I don't believe this is a bind configuration problem, as the "Name > service error" errors from postfix also occur when testing with > unbound. > > This is also only happening on the two identical systems connected to > the 165/35mbit cable modem. I've verified with Oponline, and they've > emphatically asserted there are no problems with the circuit. The > systems are 8-core Xeon E31240 with 16GB RAM. I've also tried other > systems, including a 12-core i7 with 32GB. > > We have several other systems connected to a 10mbit DIA ethernet > circuit where these errors don't generally occur. They are also > similarly configured fedora systems with the same version of bind. > > I'm really at a loss as to what the problem(s) are, but feel like it's > really impacting our ability to query RBLs for processing mail. > >> Whilst mentioned in passing on that thread, there was also poking around >> with T
Re: NTP through DNS?
Having multiple CNAME records for the same hsotname is a violation of rfc1034. (that and bind9 won't allow it...) Surely there must be some creative solution which doesn't a) violate the DNS specs and b) doesn't suggest the use of deprecated software (bind8). Regards, Bob ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND and UDP tuning
Hi, > > I reported a few weeks ago that I was experiencing a really high > > number of "SERVFAIL" messages in my bind-9.11.4-P1 system running on > > fedora28, and I haven't yet found a solution. This is all now running > > on a 165/35 cable system. > > > > I found a program named dropwatch which is showing a significant > > number of dropped UDP packets, particularly when there are bursts of > > email traffic: > > > > 12 drops at skb_queue_purge+13 (0x9f79a0c3) > > 1 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6) > > 4 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6) > > 5 drops at nf_hook_slow+a7 (0x9f7faff7) > > 3 drops at sk_stream_kill_queues+48 (0x9f7a1158) > > 3 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6) > > ... > > > > # netstat -us > > ... > > Udp: > > 23449482 packets received > > 1724269 packets to unknown port received > > 8248 packet receive errors > > 31394909 packets sent > > 8243 receive buffer errors > > 0 send buffer errors > > InCsumErrors: 5 > > IgnoredMulti: 43247 > > > > The SERVFAIL messages don't necessarily correspond to the UDP packet > > errors shown by netstat, but the dropwatch output is continuous. The > > netstat packet receive errors also don't seem to correspond to > > "SERVFAIL" or "Name service" errors: > > > > 26-Sep-2018 12:42:49.743 query-errors: info: client @0x7fb3c41634d0 > > 127.0.0.1#44104 (46.36.47.104.wl.mailspike.net): query failed > > (SERVFAIL) for 46.36.47.104.wl.mailspike.net/IN/A at > > ../../../bin/named/query.c:8580 > > > > Sep 26 12:47:11 mail03 postfix/dnsblog[22821]: warning: dnsblog_query: > > lookup error for DNS query 196.91.107.80.bl.spameatingmonkey.net: Host > > or domain name not found. Name service error for > > name=196.91.107.80.bl.spameatingmonkey.net type=A: Host not found, try > > again > > > > I've been following this thread from some time ago, but nothing I've > > done has made a difference. I really don't know what the buffer sizes > > should be. > > https://urldefense.proofpoint.com/v2/url?u=http-3A__bind-2Dusers- > > 2Dforum.2342410.n4.nabble.com_Tuning-2Dsuggestions-2Dfor-2Dhigh-2Dcore- > > 2Dcount-2DLinux-2Dservers- > > 2Dtd3899.html&d=DwICAg&c=MOptNlVtIETeDALC_lULrw&r=udvvbouEjrWNUMab5xo_vLb > > UE6LRGu5fmxLhrDvVJS8&m=5XQNuuRQ4kxK03zqoWaJHIdaJvNdsyTKHuFlDKedbpc&s=5Dqh > > ne-5w5V_1coBTBvTITwK2EFeankOegTaofy8S5w&e= > > > > Are there specific bind tunables you might recommend? edns-udp-size, > > perhaps? > > > > Any ideas on other tunables such as net.core.*mem_default etc? > > *chuckles to self* > > I was just referring back to that thread myself to try remember what I did. > > I ended up tuning the following items: > > - name: SYSCTL system tuning, basics > sysctl: > name: "{{ item.name }}" > value: "{{ item.value }}" > sysctl_set: yes > state: present > with_items: > - { name: 'vm.swappiness', value: 0 } > - { name: 'net.core.netdev_max_backlog', value: 32768 } > - { name: 'net.core.netdev_budget', value: 2700 } > - { name: 'net.ipv4.tcp_sack', value: 0 } > - { name: 'net.core.somaxconn', value: 2048 } > - { name: 'net.core.rmem_default', value: 16777216 } > - { name: 'net.core.rmem_max', value: 16777216 } > - { name: 'net.core.wmem_default', value: 16777216 } > - { name: 'net.core.wmem_max', value: 16777216 } Were you troubleshooting the same problems as I'm experiencing? Many of these values I've already tweaked and have had no effect on my SERVFAIL issues :-( I've also been following the performance tuning variables in this RH document: https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf These errors appear to occur in spurts - there is typically ten or more in a row at a time, then any number of minutes/seconds before the next one. It looks like there are periods of as many as 500 queries per second, although the usual amount is closer to 200 per second. I don't believe this is a bind configuration problem, as the "Name service error" errors from postfix also occur when testing with unbound. This is also only happening on the two identical systems connected to the 165/35mbit cable modem. I've verified with Oponline, and they've emphatically asserted there are no problems with the circuit. The systems are 8-core Xeon E31240 with 16GB RAM. I've also tried other systems, including a 12-core i7 with 32GB. We have several other systems connected to a 10mbit DIA ethernet circuit where these errors don't generally occur. They are also similarly configured fedora systems with the same version of bind. I'm really at a loss as to what the problem(s) are, but feel like it's really impacting our ability to query RBLs for processing mail. > Whilst mentioned in passing on that thread, there was also poking around with > TOE, pause, coalesce adaptive and ring size settings (look at ethtool -K, > ethtool -A, ethtool -C and ethtool -G)
RE: BIND and UDP tuning
> -Original Message- > From: Tony Finch [mailto:d...@dotat.at] > > > - { name: 'net.ipv4.tcp_sack', value: 0 } > > Why? SACK is super important for TCP performance over links that have any > degree of lossiness, and I don't recall hearing of any caveats. > > Tony. > -- > f.anthony.n.finch If I recall correctly, it had to do with the fact that we were in a very-network-close test environment with very-small packets so it wasn't necessary to even consider resends. I don't recall whether it did anything at all to the results; it is just one of the various things I stuck into the blender in order to see if it made a difference and was still in at the end of testing. The number of test iterations I went through was in the hundreds and most of it was "Moar! MOAR!" rather than good arguments; more about proving a design could reach a theoretical limit than whether it would be 100% stable in production. The environment design that these tests were preparing for haven't been implemented yet; that's what I'm working on over the next few weeks, so I'll be going over these settings with some kid-gloves and being a little gentler as we don't need a single location churning out 2M5 qps; we're quite happy with 2M. Let's hear it for overkill! Stuart ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: BIND and UDP tuning
Browne, Stuart via bind-users wrote: > - { name: 'net.ipv4.tcp_sack', value: 0 } Why? SACK is super important for TCP performance over links that have any degree of lossiness, and I don't recall hearing of any caveats. Tony. -- f.anthony.n.finchhttp://dotat.at/ a just distribution of the rewards of success ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users