Re: Incorrect NAT translation for sip traffic ?

Magnus Rixtorp Thu, 23 Jun 2011 10:14:13 -0700

On 2011-06-23 14:09, Magnus Rixtorp wrote:

On 2011-06-23 11:52, Stuart Henderson wrote:

On 2011-06-23, Magnus Rixtorp<mag...@tokra.org>  wrote:

pass out quick log on $ext_if inet from 192.168.0.0/24 nat-to $ext_if
pass out quick log on $ext_if inet from 192.168.230.0/24 nat-to $ext_if
pass out quick log on $ext_if inet from 192.168.231.0/24 nat-to $ext_if
pass out quick log on $ext_if inet from 192.168.239.0/24 nat-to $ext_if
pass out quick log on $ext_if inet from 192.168.240.0/24 nat-to $ext_if
pass out quick log on $ext_if inet from 192.168.241.0/24 nat-to $ext_if
pass out quick log on $ext_if inet from 192.168.242.0/24 nat-to $ext_if

This probably isn't your problem, but that seems quite a lot of networks
to be natting behind a single IP especially with the default port range
(50001:65535). if you've got a lot of active natted states, the
search for a free port could involve a bunch of state searches
(pick a random port, lookup state to see if it's used, then search
sequentially for a free port looking up state each time).


So if you do have a lot of states you might want to either add more IPs
or increase the port range available (e.g. pass...nat-to $ext_if \
port 20000:65535) and adjust the net.inet.ip.port* sysctls for
connections coming from the firewall itself (to make sure you have
some free ports which don't conflict with the range used by that
PF rule).

No, thats not a real issue, since there may be alot of netowrks/ips inthose nats, but there is only 1-2 active hosts on those networks.

Jun 15 09:41:21 pbxfw /bsd: pf: state key linking mismatch! dir=OUT,
if=re0, stored af=2, a0: 130.244.190.46:5060, a1: 192.168.230.101:5060,
proto=17, found af=2, a0: 192.168.230.101:5060, a1:
187.170.255.239:5060, proto=17
Jun 17 12:02:55 pbxfw /bsd: pf: state key linking mismatch! dir=OUT,
if=re0, stored af=2, a0: 130.244.190.46:5060, a1: 192.168.230.101:5060,
proto=17, found af=2, a0: 192.168.230.101:5060, a1:
187.170.255.239:5060, proto=17

Is the only error output ive found on the problem.

So the problem, has to do with the ip 187.170.255.239,
239.255.170.187.in-addr.arpa domain name pointer
dsl-187-170-255-239-dyn.prod-infinitum.com.mx.
Our system has no relation at all with this ip.
But somehow our NAT translation at random intervals, decides to
redirects traffic to that ip instead of the intended destination.
Sofar we have primarily noted the problem towards 130.244.190.46 and
130.244.190.42, that are our providers sip gateways.
Since the only thing beeing used on the connection is a PBx solution.

A google on that perticular IP, gives a simular dmesg error output in
this post:
http://www.mail-archive.com/misc@openbsd.org/msg95116.html
But in his case, the system hangs, our system keeps on going.
And instead interferes with the connection of phonecalls.

since the problem was discovered ive set up pf to log the first packet
of every new state,
and then that is tcpdump thru tcpdump -n -e -ttt -s 1600 -vvv -XX to a
ascii log using the
http://www.openbsd.org/faq/pf/logging.html syslog method.

Jun 22 15:40:06.212694 rule 26/(match) [uid 0, pid 20284] pass in on
bge0: 130.244.190.46.5060>  212.247.80.66.5060: udp 442 (DF) [tos 0xb8]
(ttl 56, id 0, len 470)
    0000: 45b8 01d6 0000 4000 3811 da02 82f4 be2e
E\M-8.\M-V..@.8.\M-Z..\M-t\M->.
    0010: d4f7 5042 13c4 13c4 01c2 f6b9 4259 4520
\M-T\M-wPB.\M-D.\M-D.\M-B\M-v\M-9BYE
    0020: 7369 703a 3835 3933 4032 3132 2e32 3437 sip:8593@212.247
    0030: 2e38 302e 3636 2053 4950 2f32            .80.66 SIP/2

Jun 22 15:40:06.307515 rule 60/(match) [uid 0, pid 20284] pass in on
re0: 192.168.230.101.5060>  187.170.255.239.5060: udp 550 (ttl 64, id
33961, len 578)
    0000: 4500 0242 84a9 0000 4011 9159 c0a8 e665
E..B.\M-)..@..Y\M-@\M-(\M-fe
    0010: bbaa ffef 13c4 13c4 022e 9dc3 5349 502f
\M-;\M-*\M^?\M-o.\M-D.\M-D...\M-CSIP/
    0020: 322e 3020 3230 3020 4f4b 0d0a 5669 613a  2.0 200 OK..Via:
    0030: 2053 4950 2f32 2e30 2f55 4450             SIP/2.0/UDP

Considering this snippet alone, there's no indication of a problem
with PF; it looks to me like 192.168.230.101 is itself sending
packets to 187.170.255.239, maybe your PBX software is confused.

I would look at packets on the inbound/outbound interfaces rather
than pflog and see what addresses show up there. ("tcpdump -Xs1500
-nire0 port 5060" or something, and same for bge0).

The xxx.255.239 makes me wonder if the PBX is trying to do some
multicast thing and getting the byte-order wrong (239.255.xxx would
be a multicast address).

I have been taking a closer look on the packets, both on the externalbge0 itnerface,

and the internal re0.

And the problem happens when the packet transfers thru pf from bge0 tore0.

using the rule

pass in quick log on $ext_if proto {tcp,udp} from any to $ext_if port
5060 rdr-to 192.168.230.101

the packet on bge0 looks like this:

There was some interest to see the raw tcpdump rathern than thewireshark output, and since its raw and not smalish,

heres a link to the files http://www.tokra.org/tcpdump/

So, it seems to be a PF issue, althou not a PF NAT issue,
since the problem is ingress, not egress.

Jun 22 15:40:06.307526 rule 0/(match) [uid 0, pid 20284] pass out on
bge0: 192.168.230.101.5060>  187.170.255.239.5060: udp 550 (ttl 63, id
33961, len 578, bad cksum 9159! differs by 100)
    0000: 4500 0242 84a9 0000 3f11 9159 c0a8 e665
E..B.\M-)..?..Y\M-@\M-(\M-fe
    0010: bbaa ffef 13c4 13c4 022e 9dc3 5349 502f
\M-;\M-*\M^?\M-o.\M-D.\M-D...\M-CSIP/
    0020: 322e 3020 3230 3020 4f4b 0d0a 5669 613a  2.0 200 OK..Via:
    0030: 2053 4950 2f32 2e30 2f55 4450             SIP/2.0/UDP

and on a side note, if anyone has a suggestion how to actually get the
complete package logged, and not just the first snap, it would be nice,
openbsd tcpdump seems to not support -s 0 as snaplen, to get the whole
thing.

see tcpdump(8) about -s (or ngrep has fairly clear formatting for
reading inside sip packets, "ngrep -d re0 -W byline port 5060",
though less information from the IP/TCP header is displayed).

anyway, that log snippet, is 130.244.190.46 asking us to setup a sip
connection with them on 5060,
but our respond to that ip, goes to 187.170.255.239. and the connection
fails.

another side note would be about the rampant amount of bad ckdsum onudp

traffic, if anyone would care to chime in about that.
Since about 98% of all udp packets get a bad cksum.

see tcpdump(8) about IP Checksum Offload.

but my main problem and concern is this 187.170.255.239, and why they
should get my phonecalls.

Regards

Magnus

Re: Incorrect NAT translation for sip traffic ?

Reply via email to