Re: DNS and RBL problems

2018-09-15 Thread Pedro David Marco
 Alex, 
if you want i can give you an temporal SSH tunnel for DNS traffic so you can 
discard a Optonline/Cablevision/Altice problem...
Regards!
PedroD.

On Saturday, September 15, 2018, 6:42:07 PM GMT+2, Axb 
 wrote:  
 
 So this is the moment where this becomes SA OT and your ISP or 
networking guys/support & Wireshark / hping, etc should help you out.


On 9/15/18 6:28 PM, Alex wrote:
> Hi,
> 
> On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen  wrote:
>>
>> Pedro David Marco skrev den 2018-09-15 09:46:
>>> Sorry, typo issue.. i meant 512 bytes
>>
>> and with EDNS0 its upto 4096
>>
>> but not all dns servers support it
>>
>> one could force tcp if wanted
>>
>> or drop buggy rbl zones
> 
> Thank you all so much for your help. The only thing between this
> system and the Internet is the Optonline modem/router. I've even gone
> without any local firewall rules to eliminate that possibility.
> 
> Just last night I implemented htb shaping to limit the outgoing SMTP
> traffic rate to be sure it's not consuming the entire pipe, preventing
> UDP traffic from being received. I don't think that's the problem,
> though, as it happens during all times of the day.
> 
>> zone "hostkarma.junkemailfilter.com" { type forward; forward first;
>> forwarders {}; };
> 
> I'm not sure this would help, as our nameservers aren't set up for
> forwarding at all.
> 
>> Can you place a sniffer on LAN and WAN interfaces of your Firewall?
> 
> I've done this, and even posted packets for people to look at on the
> bind-users list, and it was inconclusive. The packet involving the
> "SERVFAIL" error doesn't provide any info as to why it failed. It
> appears there was just never a response to the packet and the query
> timed out.
> 
>> Just in case of unexpected throttling by someone/something in the middle... 
>> have you tried with a VPN (only for DNS traffic)?
> 
> I'll try that to see if somehow Optonline/Cablevision/Altice is
> dropping my packets. However, it does also happen to our DIA ethernet
> circuit, so I'm not hopeful.
> 
> Here's the packet trace of one of the failed packets, in case someone
> has some ideas or was curious.
> 
> No.    Time          Source                Destination
> Protocol Length Info
>    9083 11.730327      127.0.0.1            127.0.0.1            DNS
>      104    Standard query response 0xded6 Server failure A
> 25.188.223.216.wl.mailspike.net OPT
> 
> Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits)
>      Encapsulation type: Linux cooked-mode capture (25)
>      Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT
>      [Time shift for this packet: 0.0 seconds]
>      Epoch Time: 1536867996.633305000 seconds
>      [Time delta from previous captured frame: 0.000969000 seconds]
>      [Time delta from previous displayed frame: 0.006367000 seconds]
>      [Time since reference or first frame: 11.730327000 seconds]
>      Frame Number: 9083
>      Frame Length: 104 bytes (832 bits)
>      Capture Length: 104 bytes (832 bits)
>      [Frame is marked: False]
>      [Frame is ignored: False]
>      [Protocols in frame: sll:ethertype:ip:udp:dns]
>      [Coloring Rule Name: UDP]
>      [Coloring Rule String: udp]
> Linux cooked capture
>      Packet type: Unicast to us (0)
>      Link-layer address type: 772
>      Link-layer address length: 6
>      Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
>      Unused: 6fc0
>      Protocol: IPv4 (0x0800)
> Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
>      0100  = Version: 4
>       0101 = Header Length: 20 bytes (5)
>      Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>           00.. = Differentiated Services Codepoint: Default (0)
>           ..00 = Explicit Congestion Notification: Not ECN-Capable
> Transport (0)
>      Total Length: 88
>      Identification: 0x2dff (11775)
>      Flags: 0x
>          0...    = Reserved bit: Not set
>          .0..    = Don't fragment: Not set
>          ..0.    = More fragments: Not set
>          ...0    = Fragment offset: 0
>      Time to live: 64
>      Protocol: UDP (17)
>      Header checksum: 0x4e94 [validation disabled]
>      [Header checksum status: Unverified]
>      Source: 127.0.0.1
>      Destination: 127.0.0.1
> User Datagram Protocol, Src Port: 53, Dst Port: 12304
>      Source Port: 53
>      Destination Port: 12304
>      Length: 68
>      Checksum: 0xfe57 [unverified]
>      [Checksum Status: Unverified]
>      [Stream index: 320]
> Domain Name System (response)
>      Transaction ID: 0xded6
>      Flags: 0x8182 Standard query response, Server failure
>          1...    = Response: Message is a response
>          .000 0...   = Opcode: Standard query (0)
>           .0..   = Authoritative: Server is not an
> authority for domain
>           ..0.   = Truncated: Message is not truncated
>           ...1   = Recursion 

Re: DNS and RBL problems

2018-09-15 Thread Alex
Hi,

On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen  wrote:
>
> Pedro David Marco skrev den 2018-09-15 09:46:
> > Sorry, typo issue.. i meant 512 bytes
>
> and with EDNS0 its upto 4096
>
> but not all dns servers support it
>
> one could force tcp if wanted
>
> or drop buggy rbl zones

Thank you all so much for your help. The only thing between this
system and the Internet is the Optonline modem/router. I've even gone
without any local firewall rules to eliminate that possibility.

Just last night I implemented htb shaping to limit the outgoing SMTP
traffic rate to be sure it's not consuming the entire pipe, preventing
UDP traffic from being received. I don't think that's the problem,
though, as it happens during all times of the day.

> zone "hostkarma.junkemailfilter.com" { type forward; forward first;
> forwarders {}; };

I'm not sure this would help, as our nameservers aren't set up for
forwarding at all.

> Can you place a sniffer on LAN and WAN interfaces of your Firewall?

I've done this, and even posted packets for people to look at on the
bind-users list, and it was inconclusive. The packet involving the
"SERVFAIL" error doesn't provide any info as to why it failed. It
appears there was just never a response to the packet and the query
timed out.

> Just in case of unexpected throttling by someone/something in the middle... 
> have you tried with a VPN (only for DNS traffic)?

I'll try that to see if somehow Optonline/Cablevision/Altice is
dropping my packets. However, it does also happen to our DIA ethernet
circuit, so I'm not hopeful.

Here's the packet trace of one of the failed packets, in case someone
has some ideas or was curious.

No. Time   SourceDestination
Protocol Length Info
   9083 11.730327  127.0.0.1 127.0.0.1 DNS
 104Standard query response 0xded6 Server failure A
25.188.223.216.wl.mailspike.net OPT

Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits)
Encapsulation type: Linux cooked-mode capture (25)
Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT
[Time shift for this packet: 0.0 seconds]
Epoch Time: 1536867996.633305000 seconds
[Time delta from previous captured frame: 0.000969000 seconds]
[Time delta from previous displayed frame: 0.006367000 seconds]
[Time since reference or first frame: 11.730327000 seconds]
Frame Number: 9083
Frame Length: 104 bytes (832 bits)
Capture Length: 104 bytes (832 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: sll:ethertype:ip:udp:dns]
[Coloring Rule Name: UDP]
[Coloring Rule String: udp]
Linux cooked capture
Packet type: Unicast to us (0)
Link-layer address type: 772
Link-layer address length: 6
Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
Unused: 6fc0
Protocol: IPv4 (0x0800)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
0100  = Version: 4
 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
 00.. = Differentiated Services Codepoint: Default (0)
 ..00 = Explicit Congestion Notification: Not ECN-Capable
Transport (0)
Total Length: 88
Identification: 0x2dff (11775)
Flags: 0x
0...    = Reserved bit: Not set
.0..    = Don't fragment: Not set
..0.    = More fragments: Not set
...0    = Fragment offset: 0
Time to live: 64
Protocol: UDP (17)
Header checksum: 0x4e94 [validation disabled]
[Header checksum status: Unverified]
Source: 127.0.0.1
Destination: 127.0.0.1
User Datagram Protocol, Src Port: 53, Dst Port: 12304
Source Port: 53
Destination Port: 12304
Length: 68
Checksum: 0xfe57 [unverified]
[Checksum Status: Unverified]
[Stream index: 320]
Domain Name System (response)
Transaction ID: 0xded6
Flags: 0x8182 Standard query response, Server failure
1...    = Response: Message is a response
.000 0...   = Opcode: Standard query (0)
 .0..   = Authoritative: Server is not an
authority for domain
 ..0.   = Truncated: Message is not truncated
 ...1   = Recursion desired: Do query recursively
  1...  = Recursion available: Server can do
recursive queries
  .0..  = Z: reserved (0)
  ..0.  = Answer authenticated: Answer/authority
portion was not authenticated by the server
  ...0  = Non-authenticated data: Unacceptable
   0010 = Reply code: Server failure (2)
Questions: 1
Answer RRs: 0
Authority RRs: 0
Additional RRs: 1
Queries
25.188.223.216.wl.mailspike.net: type A, class IN
Name: 25.188.223.216.wl.mailspike.net
[Name Length: 31]
[Label Count: 7]
   

Re: DNS and RBL problems

2018-09-15 Thread Benny Pedersen

Pedro David Marco skrev den 2018-09-15 09:46:

Sorry, typo issue.. i meant 512 bytes


and with EDNS0 its upto 4096

but not all dns servers support it

one could force tcp if wanted

or drop buggy rbl zones


Re: DNS and RBL problems

2018-09-15 Thread Pedro David Marco
 Sorry, typo issue.. i meant 512 bytes
   
-PedroD

Re: DNS and RBL problems

2018-09-15 Thread Pedro David Marco
 

>Maybe something in your setup is throttling UDP traffic.
>I've seen Zyxel DSL modems do this.
>Some new IDS in your firewall?

do not forget that DNS can use also TCP when the query is longer than 521 
bytes...


-PedroD  

Re: DNS and RBL problems

2018-09-15 Thread Dominic Raferd




On 15/09/2018 02:44, Alex wrote:

On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke  wrote:

On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail  wrote:

On 9/14/2018 3:22 PM, Alex wrote:

I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
which is bind configured as a my local caching resolver.

Sinister issues like this are hard.  I'll try and escalate our plans for
rsync access.

Alex - have you looked at bad checksum counters on the host? (netstat -s) - 
I've seen strange issues before with broken network hardware (or bugs in 
switch/router code) caused changes to packets as they passed through the 'bad' 
device. The first hints were those counters increasing at the same time as the 
mysterious issue happening.

I don't see anything relating to bad checksums with netstat :-( I've
also tried numerous ethtool config changes. I've also looked through
hundreds of packets with tcpdump and wireshark.

This isn't a spamassassin message, but does anyone with a postfix
system ever see similar "Name service error" messages such as the one
below?

Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
domain name not found. Name service error for
name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
again

It appears to occur quite frequently, and on multiple unrelated
systems. I'd love to find out what's causing it. The postfix people
ascribed it to a remote server problem, but I can't believe virtually
all RBLs, including spamhaus, would have such intermittent problems
with *their* name servers.


On one of our mailservers (but not others, which are at different 
locations with different isps) we had a problem with queries to rbls 
being blocked either by the rbls themselves or by one of the 
intermediate dns servers. So we set up local bind9 resolver; it uses 
forwarding for normal queries but for the rbls we set up special zones 
to prevent forwarding. Example:


zone "hostkarma.junkemailfilter.com" { type forward; forward first; 
forwarders {}; };


This solved nearly all our problems - we still see b.barracuda.org 
refusing some queries from this mailserver (despite this ip being 
registered with them). But not from our other mailservers, and not any 
other rbls.


Re: DNS and RBL problems

2018-09-15 Thread Axb

On 9/15/18 3:44 AM, Alex wrote:

On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke  wrote:


On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail  wrote:

On 9/14/2018 3:22 PM, Alex wrote:

I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
which is bind configured as a my local caching resolver.

Sinister issues like this are hard.  I'll try and escalate our plans for
rsync access.


Alex - have you looked at bad checksum counters on the host? (netstat -s) - 
I've seen strange issues before with broken network hardware (or bugs in 
switch/router code) caused changes to packets as they passed through the 'bad' 
device. The first hints were those counters increasing at the same time as the 
mysterious issue happening.


I don't see anything relating to bad checksums with netstat :-( I've
also tried numerous ethtool config changes. I've also looked through
hundreds of packets with tcpdump and wireshark.

This isn't a spamassassin message, but does anyone with a postfix
system ever see similar "Name service error" messages such as the one
below?

Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
domain name not found. Name service error for
name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
again

It appears to occur quite frequently, and on multiple unrelated
systems. I'd love to find out what's causing it. The postfix people
ascribed it to a remote server problem, but I can't believe virtually
all RBLs, including spamhaus, would have such intermittent problems
with *their* name servers.



Maybe something in your setup is throttling UDP traffic.
I've seen Zyxel DSL modems do this.
Some new IDS in your firewall?


Re: DNS and RBL problems

2018-09-15 Thread Pedro David Marco
 
> On 9/14/2018 3:22 PM, Alex wrote:
>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>> which is bind configured as a my local caching resolver.
> Sinister issues like this are hard.  I'll try and escalate our plans for
> rsync access.

Alex, I also bet for a comms problem. On purpose or not     
Can you place a sniffer on LAN and WAN interfaces of your Firewall?
Just in case of unexpected throttling by someone/something in the middle... 
have you tried with a VPN (only for DNS traffic)? 

---PedroD
  

Re: DNS and RBL problems

2018-09-14 Thread Alex
On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke  wrote:
>
> On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail  wrote:
> > On 9/14/2018 3:22 PM, Alex wrote:
> >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
> >> which is bind configured as a my local caching resolver.
> > Sinister issues like this are hard.  I'll try and escalate our plans for
> > rsync access.
>
> Alex - have you looked at bad checksum counters on the host? (netstat -s) - 
> I've seen strange issues before with broken network hardware (or bugs in 
> switch/router code) caused changes to packets as they passed through the 
> 'bad' device. The first hints were those counters increasing at the same time 
> as the mysterious issue happening.

I don't see anything relating to bad checksums with netstat :-( I've
also tried numerous ethtool config changes. I've also looked through
hundreds of packets with tcpdump and wireshark.

This isn't a spamassassin message, but does anyone with a postfix
system ever see similar "Name service error" messages such as the one
below?

Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
domain name not found. Name service error for
name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
again

It appears to occur quite frequently, and on multiple unrelated
systems. I'd love to find out what's causing it. The postfix people
ascribed it to a remote server problem, but I can't believe virtually
all RBLs, including spamhaus, would have such intermittent problems
with *their* name servers.


Re: DNS and RBL problems

2018-09-14 Thread Daniel J. Luke
On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail  wrote:
> On 9/14/2018 3:22 PM, Alex wrote:
>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>> which is bind configured as a my local caching resolver.
> Sinister issues like this are hard.  I'll try and escalate our plans for
> rsync access.

Alex - have you looked at bad checksum counters on the host? (netstat -s) - 
I've seen strange issues before with broken network hardware (or bugs in 
switch/router code) caused changes to packets as they passed through the 'bad' 
device. The first hints were those counters increasing at the same time as the 
mysterious issue happening.

-- 
Daniel J. Luke





Re: DNS and RBL problems

2018-09-14 Thread Kevin A. McGrail
On 9/14/2018 3:22 PM, Alex wrote:
> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
> which is bind configured as a my local caching resolver.
Sinister issues like this are hard.  I'll try and escalate our plans for
rsync access.


Re: DNS and RBL problems

2018-09-14 Thread Alex
Hi,

On Fri, Sep 14, 2018 at 1:51 PM Rob McEwen  wrote:
>
> On 9/14/2018 1:36 PM, Alex wrote:
> > Hi,
> >
> > For the past few weeks I've been having problems with queries to many
> > of the common RBLs, including barracuda, mailspike and unsubscore. My
> > logs are filled with "Name service error", SERVFAIL and lame-server
> > messages for RBLs I know to be valid.
> > 
>
>
> Alex,
>
> Coincidentally, a recent new invaluement subscriber was initially having
> at least similar problems that didn't make sense. I was stumped. It made
> no sense that it wasn't working because everything looked correct. But
> then he figured out that the following bug was the cause, and fixing
> this bug enabled the queries to start working again:
>
> NOTICE: SpamAssassin installations affected by a bug, due to a change
> Net::DNS made in an earlier version, here is the bug for reference:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223
>
> So you should definitely check to see if this is causing your problem?

I should have added that I'm aware of that Net::DNS bug, and I'm using
a version long-since fixed.

> I will also mention that if you are using a server such as 8.8.8.8, you MUST 
> change.  I found
> that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds. 
>  They are doing some
> interesting things likely anti-abuse that just screw with things.

I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
which is bind configured as a my local caching resolver.

It also fails for one out of every thousand queries of the PCCC RBL
for no clear reason.

14-Sep-2018 15:16:39.333 query-errors: info: client @0x7ff797169d70
68.195.193.45#34244 (hungryhowies.com.wild.pccc.com): query failed
(SERVFAIL) for hungryhowies.com.wild.pccc.com/IN/A at
../../../bin/named/query.c:8580

14-Sep-2018 15:16:39.333 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for hungryhowies.com.wild.pccc.com/A
in 30.000163: timed out/success
[domain:wild.pccc.com,referral:0,restart:7,qrysent:7,timeout:6,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

The check for hungryhowies.com succeeded at that time for a dozen
other RBLs, but later checks could fail for even one of those.


Re: DNS and RBL problems

2018-09-14 Thread Kevin A. McGrail
I will also mention that if you are using a server such as 8.8.8.8, you
MUST change.  I found that if you use 8.8.8.8, you cannot even pass a test
for spamassassin builds.  They are doing some interesting things likely
anti-abuse that just screw with things.

Regards,
KAM

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Fri, Sep 14, 2018 at 1:50 PM, Rob McEwen  wrote:

> On 9/14/2018 1:36 PM, Alex wrote:
>
>> Hi,
>>
>> For the past few weeks I've been having problems with queries to many
>> of the common RBLs, including barracuda, mailspike and unsubscore. My
>> logs are filled with "Name service error", SERVFAIL and lame-server
>> messages for RBLs I know to be valid.
>> 
>>
>
>
> Alex,
>
> Coincidentally, a recent new invaluement subscriber was initially having
> at least similar problems that didn't make sense. I was stumped. It made no
> sense that it wasn't working because everything looked correct. But then he
> figured out that the following bug was the cause, and fixing this bug
> enabled the queries to start working again:
>
> NOTICE: SpamAssassin installations affected by a bug, due to a change
> Net::DNS made in an earlier version, here is the bug for reference:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223
>
> So you should definitely check to see if this is causing your problem?
>
> --
> Rob McEwen
> https://www.invaluement.com
>
>
>


Re: DNS and RBL problems

2018-09-14 Thread Rob McEwen

On 9/14/2018 1:36 PM, Alex wrote:

Hi,

For the past few weeks I've been having problems with queries to many
of the common RBLs, including barracuda, mailspike and unsubscore. My
logs are filled with "Name service error", SERVFAIL and lame-server
messages for RBLs I know to be valid.




Alex,

Coincidentally, a recent new invaluement subscriber was initially having 
at least similar problems that didn't make sense. I was stumped. It made 
no sense that it wasn't working because everything looked correct. But 
then he figured out that the following bug was the cause, and fixing 
this bug enabled the queries to start working again:


NOTICE: SpamAssassin installations affected by a bug, due to a change 
Net::DNS made in an earlier version, here is the bug for reference:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223

So you should definitely check to see if this is causing your problem?

--
Rob McEwen
https://www.invaluement.com




DNS and RBL problems

2018-09-14 Thread Alex
Hi,

For the past few weeks I've been having problems with queries to many
of the common RBLs, including barracuda, mailspike and unsubscore. My
logs are filled with "Name service error", SERVFAIL and lame-server
messages for RBLs I know to be valid.

14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f105735f3b0
127.0.0.1#44791 (139.33.47.104.bl.mailspike.net): query failed
(SERVFAIL) for 139.33.47.104.bl.mailspike.net/IN/A at
../../../bin/named/query.c:8580
14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f10342d4650
127.0.0.1#44791 (139.33.47.104.db.wpbl.info): query failed (SERVFAIL)
for 139.33.47.104.db.wpbl.info/IN/A at ../../../bin/named/query.c:8580
14-Sep-2018 12:21:10.928 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 139.33.47.104.bl.mailspike.net/A
in 30.000146: timed out/success
[domain:bl.mailspike.net,referral:0,restart:5,qrysent:14,timeout:13,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

This shows a failure while other times these same queries succeed.

This is using bind set up as a standard recursive name server on
fedora28. These are bind logs, but does anyone know why spamassassin
queries to these RBLs would timeout? There's no firewall involved. It
appears to happen at all times during the day.

I really have no other ideas after staring at the logs for weeks,
seeing it happen on all my systems, and asking on numerous other lists
(including postfix and bind-users).