Re: DNS and RBL problems
Alex, if you want i can give you an temporal SSH tunnel for DNS traffic so you can discard a Optonline/Cablevision/Altice problem... Regards! PedroD. On Saturday, September 15, 2018, 6:42:07 PM GMT+2, Axb wrote: So this is the moment where this becomes SA OT and your ISP or networking guys/support & Wireshark / hping, etc should help you out. On 9/15/18 6:28 PM, Alex wrote: > Hi, > > On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen wrote: >> >> Pedro David Marco skrev den 2018-09-15 09:46: >>> Sorry, typo issue.. i meant 512 bytes >> >> and with EDNS0 its upto 4096 >> >> but not all dns servers support it >> >> one could force tcp if wanted >> >> or drop buggy rbl zones > > Thank you all so much for your help. The only thing between this > system and the Internet is the Optonline modem/router. I've even gone > without any local firewall rules to eliminate that possibility. > > Just last night I implemented htb shaping to limit the outgoing SMTP > traffic rate to be sure it's not consuming the entire pipe, preventing > UDP traffic from being received. I don't think that's the problem, > though, as it happens during all times of the day. > >> zone "hostkarma.junkemailfilter.com" { type forward; forward first; >> forwarders {}; }; > > I'm not sure this would help, as our nameservers aren't set up for > forwarding at all. > >> Can you place a sniffer on LAN and WAN interfaces of your Firewall? > > I've done this, and even posted packets for people to look at on the > bind-users list, and it was inconclusive. The packet involving the > "SERVFAIL" error doesn't provide any info as to why it failed. It > appears there was just never a response to the packet and the query > timed out. > >> Just in case of unexpected throttling by someone/something in the middle... >> have you tried with a VPN (only for DNS traffic)? > > I'll try that to see if somehow Optonline/Cablevision/Altice is > dropping my packets. However, it does also happen to our DIA ethernet > circuit, so I'm not hopeful. > > Here's the packet trace of one of the failed packets, in case someone > has some ideas or was curious. > > No. Time Source Destination > Protocol Length Info > 9083 11.730327 127.0.0.1 127.0.0.1 DNS > 104 Standard query response 0xded6 Server failure A > 25.188.223.216.wl.mailspike.net OPT > > Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits) > Encapsulation type: Linux cooked-mode capture (25) > Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT > [Time shift for this packet: 0.0 seconds] > Epoch Time: 1536867996.633305000 seconds > [Time delta from previous captured frame: 0.000969000 seconds] > [Time delta from previous displayed frame: 0.006367000 seconds] > [Time since reference or first frame: 11.730327000 seconds] > Frame Number: 9083 > Frame Length: 104 bytes (832 bits) > Capture Length: 104 bytes (832 bits) > [Frame is marked: False] > [Frame is ignored: False] > [Protocols in frame: sll:ethertype:ip:udp:dns] > [Coloring Rule Name: UDP] > [Coloring Rule String: udp] > Linux cooked capture > Packet type: Unicast to us (0) > Link-layer address type: 772 > Link-layer address length: 6 > Source: 00:00:00_00:00:00 (00:00:00:00:00:00) > Unused: 6fc0 > Protocol: IPv4 (0x0800) > Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1 > 0100 = Version: 4 > 0101 = Header Length: 20 bytes (5) > Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) > 00.. = Differentiated Services Codepoint: Default (0) > ..00 = Explicit Congestion Notification: Not ECN-Capable > Transport (0) > Total Length: 88 > Identification: 0x2dff (11775) > Flags: 0x > 0... = Reserved bit: Not set > .0.. = Don't fragment: Not set > ..0. = More fragments: Not set > ...0 = Fragment offset: 0 > Time to live: 64 > Protocol: UDP (17) > Header checksum: 0x4e94 [validation disabled] > [Header checksum status: Unverified] > Source: 127.0.0.1 > Destination: 127.0.0.1 > User Datagram Protocol, Src Port: 53, Dst Port: 12304 > Source Port: 53 > Destination Port: 12304 > Length: 68 > Checksum: 0xfe57 [unverified] > [Checksum Status: Unverified] > [Stream index: 320] > Domain Name System (response) > Transaction ID: 0xded6 > Flags: 0x8182 Standard query response, Server failure > 1... = Response: Message is a response > .000 0... = Opcode: Standard query (0) > .0.. = Authoritative: Server is not an > authority for domain > ..0. = Truncated: Message is not truncated > ...1 = Recursion
Re: DNS and RBL problems
Hi, On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen wrote: > > Pedro David Marco skrev den 2018-09-15 09:46: > > Sorry, typo issue.. i meant 512 bytes > > and with EDNS0 its upto 4096 > > but not all dns servers support it > > one could force tcp if wanted > > or drop buggy rbl zones Thank you all so much for your help. The only thing between this system and the Internet is the Optonline modem/router. I've even gone without any local firewall rules to eliminate that possibility. Just last night I implemented htb shaping to limit the outgoing SMTP traffic rate to be sure it's not consuming the entire pipe, preventing UDP traffic from being received. I don't think that's the problem, though, as it happens during all times of the day. > zone "hostkarma.junkemailfilter.com" { type forward; forward first; > forwarders {}; }; I'm not sure this would help, as our nameservers aren't set up for forwarding at all. > Can you place a sniffer on LAN and WAN interfaces of your Firewall? I've done this, and even posted packets for people to look at on the bind-users list, and it was inconclusive. The packet involving the "SERVFAIL" error doesn't provide any info as to why it failed. It appears there was just never a response to the packet and the query timed out. > Just in case of unexpected throttling by someone/something in the middle... > have you tried with a VPN (only for DNS traffic)? I'll try that to see if somehow Optonline/Cablevision/Altice is dropping my packets. However, it does also happen to our DIA ethernet circuit, so I'm not hopeful. Here's the packet trace of one of the failed packets, in case someone has some ideas or was curious. No. Time SourceDestination Protocol Length Info 9083 11.730327 127.0.0.1 127.0.0.1 DNS 104Standard query response 0xded6 Server failure A 25.188.223.216.wl.mailspike.net OPT Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits) Encapsulation type: Linux cooked-mode capture (25) Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT [Time shift for this packet: 0.0 seconds] Epoch Time: 1536867996.633305000 seconds [Time delta from previous captured frame: 0.000969000 seconds] [Time delta from previous displayed frame: 0.006367000 seconds] [Time since reference or first frame: 11.730327000 seconds] Frame Number: 9083 Frame Length: 104 bytes (832 bits) Capture Length: 104 bytes (832 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: sll:ethertype:ip:udp:dns] [Coloring Rule Name: UDP] [Coloring Rule String: udp] Linux cooked capture Packet type: Unicast to us (0) Link-layer address type: 772 Link-layer address length: 6 Source: 00:00:00_00:00:00 (00:00:00:00:00:00) Unused: 6fc0 Protocol: IPv4 (0x0800) Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1 0100 = Version: 4 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 00.. = Differentiated Services Codepoint: Default (0) ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 88 Identification: 0x2dff (11775) Flags: 0x 0... = Reserved bit: Not set .0.. = Don't fragment: Not set ..0. = More fragments: Not set ...0 = Fragment offset: 0 Time to live: 64 Protocol: UDP (17) Header checksum: 0x4e94 [validation disabled] [Header checksum status: Unverified] Source: 127.0.0.1 Destination: 127.0.0.1 User Datagram Protocol, Src Port: 53, Dst Port: 12304 Source Port: 53 Destination Port: 12304 Length: 68 Checksum: 0xfe57 [unverified] [Checksum Status: Unverified] [Stream index: 320] Domain Name System (response) Transaction ID: 0xded6 Flags: 0x8182 Standard query response, Server failure 1... = Response: Message is a response .000 0... = Opcode: Standard query (0) .0.. = Authoritative: Server is not an authority for domain ..0. = Truncated: Message is not truncated ...1 = Recursion desired: Do query recursively 1... = Recursion available: Server can do recursive queries .0.. = Z: reserved (0) ..0. = Answer authenticated: Answer/authority portion was not authenticated by the server ...0 = Non-authenticated data: Unacceptable 0010 = Reply code: Server failure (2) Questions: 1 Answer RRs: 0 Authority RRs: 0 Additional RRs: 1 Queries 25.188.223.216.wl.mailspike.net: type A, class IN Name: 25.188.223.216.wl.mailspike.net [Name Length: 31] [Label Count: 7]
Re: DNS and RBL problems
Pedro David Marco skrev den 2018-09-15 09:46: Sorry, typo issue.. i meant 512 bytes and with EDNS0 its upto 4096 but not all dns servers support it one could force tcp if wanted or drop buggy rbl zones
Re: DNS and RBL problems
Sorry, typo issue.. i meant 512 bytes -PedroD
Re: DNS and RBL problems
>Maybe something in your setup is throttling UDP traffic. >I've seen Zyxel DSL modems do this. >Some new IDS in your firewall? do not forget that DNS can use also TCP when the query is longer than 521 bytes... -PedroD
Re: DNS and RBL problems
On 15/09/2018 02:44, Alex wrote: On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke wrote: On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail wrote: On 9/14/2018 3:22 PM, Alex wrote: I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, which is bind configured as a my local caching resolver. Sinister issues like this are hard. I'll try and escalate our plans for rsync access. Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening. I don't see anything relating to bad checksums with netstat :-( I've also tried numerous ethtool config changes. I've also looked through hundreds of packets with tcpdump and wireshark. This isn't a spamassassin message, but does anyone with a postfix system ever see similar "Name service error" messages such as the one below? Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query: lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or domain name not found. Name service error for name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try again It appears to occur quite frequently, and on multiple unrelated systems. I'd love to find out what's causing it. The postfix people ascribed it to a remote server problem, but I can't believe virtually all RBLs, including spamhaus, would have such intermittent problems with *their* name servers. On one of our mailservers (but not others, which are at different locations with different isps) we had a problem with queries to rbls being blocked either by the rbls themselves or by one of the intermediate dns servers. So we set up local bind9 resolver; it uses forwarding for normal queries but for the rbls we set up special zones to prevent forwarding. Example: zone "hostkarma.junkemailfilter.com" { type forward; forward first; forwarders {}; }; This solved nearly all our problems - we still see b.barracuda.org refusing some queries from this mailserver (despite this ip being registered with them). But not from our other mailservers, and not any other rbls.
Re: DNS and RBL problems
On 9/15/18 3:44 AM, Alex wrote: On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke wrote: On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail wrote: On 9/14/2018 3:22 PM, Alex wrote: I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, which is bind configured as a my local caching resolver. Sinister issues like this are hard. I'll try and escalate our plans for rsync access. Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening. I don't see anything relating to bad checksums with netstat :-( I've also tried numerous ethtool config changes. I've also looked through hundreds of packets with tcpdump and wireshark. This isn't a spamassassin message, but does anyone with a postfix system ever see similar "Name service error" messages such as the one below? Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query: lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or domain name not found. Name service error for name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try again It appears to occur quite frequently, and on multiple unrelated systems. I'd love to find out what's causing it. The postfix people ascribed it to a remote server problem, but I can't believe virtually all RBLs, including spamhaus, would have such intermittent problems with *their* name servers. Maybe something in your setup is throttling UDP traffic. I've seen Zyxel DSL modems do this. Some new IDS in your firewall?
Re: DNS and RBL problems
> On 9/14/2018 3:22 PM, Alex wrote: >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, >> which is bind configured as a my local caching resolver. > Sinister issues like this are hard. I'll try and escalate our plans for > rsync access. Alex, I also bet for a comms problem. On purpose or not Can you place a sniffer on LAN and WAN interfaces of your Firewall? Just in case of unexpected throttling by someone/something in the middle... have you tried with a VPN (only for DNS traffic)? ---PedroD
Re: DNS and RBL problems
On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke wrote: > > On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail wrote: > > On 9/14/2018 3:22 PM, Alex wrote: > >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, > >> which is bind configured as a my local caching resolver. > > Sinister issues like this are hard. I'll try and escalate our plans for > > rsync access. > > Alex - have you looked at bad checksum counters on the host? (netstat -s) - > I've seen strange issues before with broken network hardware (or bugs in > switch/router code) caused changes to packets as they passed through the > 'bad' device. The first hints were those counters increasing at the same time > as the mysterious issue happening. I don't see anything relating to bad checksums with netstat :-( I've also tried numerous ethtool config changes. I've also looked through hundreds of packets with tcpdump and wireshark. This isn't a spamassassin message, but does anyone with a postfix system ever see similar "Name service error" messages such as the one below? Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query: lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or domain name not found. Name service error for name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try again It appears to occur quite frequently, and on multiple unrelated systems. I'd love to find out what's causing it. The postfix people ascribed it to a remote server problem, but I can't believe virtually all RBLs, including spamhaus, would have such intermittent problems with *their* name servers.
Re: DNS and RBL problems
On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail wrote: > On 9/14/2018 3:22 PM, Alex wrote: >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, >> which is bind configured as a my local caching resolver. > Sinister issues like this are hard. I'll try and escalate our plans for > rsync access. Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening. -- Daniel J. Luke
Re: DNS and RBL problems
On 9/14/2018 3:22 PM, Alex wrote: > I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, > which is bind configured as a my local caching resolver. Sinister issues like this are hard. I'll try and escalate our plans for rsync access.
Re: DNS and RBL problems
Hi, On Fri, Sep 14, 2018 at 1:51 PM Rob McEwen wrote: > > On 9/14/2018 1:36 PM, Alex wrote: > > Hi, > > > > For the past few weeks I've been having problems with queries to many > > of the common RBLs, including barracuda, mailspike and unsubscore. My > > logs are filled with "Name service error", SERVFAIL and lame-server > > messages for RBLs I know to be valid. > > > > > Alex, > > Coincidentally, a recent new invaluement subscriber was initially having > at least similar problems that didn't make sense. I was stumped. It made > no sense that it wasn't working because everything looked correct. But > then he figured out that the following bug was the cause, and fixing > this bug enabled the queries to start working again: > > NOTICE: SpamAssassin installations affected by a bug, due to a change > Net::DNS made in an earlier version, here is the bug for reference: > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 > > So you should definitely check to see if this is causing your problem? I should have added that I'm aware of that Net::DNS bug, and I'm using a version long-since fixed. > I will also mention that if you are using a server such as 8.8.8.8, you MUST > change. I found > that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds. > They are doing some > interesting things likely anti-abuse that just screw with things. I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1, which is bind configured as a my local caching resolver. It also fails for one out of every thousand queries of the PCCC RBL for no clear reason. 14-Sep-2018 15:16:39.333 query-errors: info: client @0x7ff797169d70 68.195.193.45#34244 (hungryhowies.com.wild.pccc.com): query failed (SERVFAIL) for hungryhowies.com.wild.pccc.com/IN/A at ../../../bin/named/query.c:8580 14-Sep-2018 15:16:39.333 query-errors: debug 2: fetch completed at ../../../lib/dns/resolver.c:3927 for hungryhowies.com.wild.pccc.com/A in 30.000163: timed out/success [domain:wild.pccc.com,referral:0,restart:7,qrysent:7,timeout:6,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] The check for hungryhowies.com succeeded at that time for a dozen other RBLs, but later checks could fail for even one of those.
Re: DNS and RBL problems
I will also mention that if you are using a server such as 8.8.8.8, you MUST change. I found that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds. They are doing some interesting things likely anti-abuse that just screw with things. Regards, KAM -- Kevin A. McGrail VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Fri, Sep 14, 2018 at 1:50 PM, Rob McEwen wrote: > On 9/14/2018 1:36 PM, Alex wrote: > >> Hi, >> >> For the past few weeks I've been having problems with queries to many >> of the common RBLs, including barracuda, mailspike and unsubscore. My >> logs are filled with "Name service error", SERVFAIL and lame-server >> messages for RBLs I know to be valid. >> >> > > > Alex, > > Coincidentally, a recent new invaluement subscriber was initially having > at least similar problems that didn't make sense. I was stumped. It made no > sense that it wasn't working because everything looked correct. But then he > figured out that the following bug was the cause, and fixing this bug > enabled the queries to start working again: > > NOTICE: SpamAssassin installations affected by a bug, due to a change > Net::DNS made in an earlier version, here is the bug for reference: > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 > > So you should definitely check to see if this is causing your problem? > > -- > Rob McEwen > https://www.invaluement.com > > >
Re: DNS and RBL problems
On 9/14/2018 1:36 PM, Alex wrote: Hi, For the past few weeks I've been having problems with queries to many of the common RBLs, including barracuda, mailspike and unsubscore. My logs are filled with "Name service error", SERVFAIL and lame-server messages for RBLs I know to be valid. Alex, Coincidentally, a recent new invaluement subscriber was initially having at least similar problems that didn't make sense. I was stumped. It made no sense that it wasn't working because everything looked correct. But then he figured out that the following bug was the cause, and fixing this bug enabled the queries to start working again: NOTICE: SpamAssassin installations affected by a bug, due to a change Net::DNS made in an earlier version, here is the bug for reference: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223 So you should definitely check to see if this is causing your problem? -- Rob McEwen https://www.invaluement.com
DNS and RBL problems
Hi, For the past few weeks I've been having problems with queries to many of the common RBLs, including barracuda, mailspike and unsubscore. My logs are filled with "Name service error", SERVFAIL and lame-server messages for RBLs I know to be valid. 14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f105735f3b0 127.0.0.1#44791 (139.33.47.104.bl.mailspike.net): query failed (SERVFAIL) for 139.33.47.104.bl.mailspike.net/IN/A at ../../../bin/named/query.c:8580 14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f10342d4650 127.0.0.1#44791 (139.33.47.104.db.wpbl.info): query failed (SERVFAIL) for 139.33.47.104.db.wpbl.info/IN/A at ../../../bin/named/query.c:8580 14-Sep-2018 12:21:10.928 query-errors: debug 2: fetch completed at ../../../lib/dns/resolver.c:3927 for 139.33.47.104.bl.mailspike.net/A in 30.000146: timed out/success [domain:bl.mailspike.net,referral:0,restart:5,qrysent:14,timeout:13,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0] This shows a failure while other times these same queries succeed. This is using bind set up as a standard recursive name server on fedora28. These are bind logs, but does anyone know why spamassassin queries to these RBLs would timeout? There's no firewall involved. It appears to happen at all times during the day. I really have no other ideas after staring at the logs for weeks, seeing it happen on all my systems, and asking on numerous other lists (including postfix and bind-users).