Hi all

(Sorry for flooding, this seems related to the question I asked earlier. Please bear with me.)

I am using relayd on 7.3-release as an IP loadbalancer in front of some dualstack backend hosts. This setup has worked for some years now.

After upgrading to 7.3 about 4 weeks ago I noticed a steady decline of IPv6 sessions coming into the backend servers, up to the point where none arrive at all (for 2 days now).

Now users start complaining that their connections to the servers (public IP) are either timing out or are established only after a very long time (usually the tcp start timeout when the client switches from IPv6 to trying IPv4). The IPv4 connections succeed immediately.

pflog shows that the IPv6 SYN-ACK replies from the backend servers are being dropped by pf. But weirdly the blocks are logged over 30 seconds after the SYN is allowed through:


Jun 20 14:12:49.489707 rule 2/(match) [uid 0, pid 85766] pass out on vlanX: [Client.IP6].50210 > [Server.IP6].443: S 2508622700:2508622700(0) win 64800 <[|tcp]> [flowlabel 0xd4400] (len 32, hlim 52) Jun 20 14:12:49.493267 rule 2/(match) [uid 0, pid 85766] pass out on vlanX: [Client.IP6].50211 > [Server.IP6].443: S 806421981:806421981(0) win 64800 <[|tcp]> [flowlabel 0x162e5] (len 32, hlim 52) Jun 20 14:12:49.507508 rule 2/(match) [uid 0, pid 85766] pass out on vlanX: [Client.IP6].50212 > [Server.IP6].443: S 3945655871:3945655871(0) win 64800 <[|tcp]> [flowlabel 0x8abc6] (len 32, hlim 52) Jun 20 14:12:49.517783 rule 2/(match) [uid 0, pid 85766] pass out on vlanX: [Client.IP6].50213 > [Server.IP6].443: S 1191028748:1191028748(0) win 64800 <[|tcp]> [flowlabel 0xa7d6] (len 32, hlim 52)

Jun 20 14:13:20.943370 rule 2/(match) [uid 0, pid 85766] block in on vlanX: [Server.IP6].443 > [Client.IP6].50213: S 3650589557:3650589557(0) ack 209077342 win 64800 <[|tcp]> [flowlabel 0xd922c] (len 32, hlim 64) Jun 20 14:13:20.943433 rule 2/(match) [uid 0, pid 85766] block in on vlanX: [Server.IP6].443 > [Client.IP6].50212: S 2068945110:2068945110(0) ack 2313561433 win 64800 <[|tcp]> [flowlabel 0xf8c9c] (len 32, hlim 64) Jun 20 14:13:20.943476 rule 2/(match) [uid 0, pid 85766] block in on vlanX: [Server.IP6].443 > [Client.IP6].50211: S 3395939328:3395939328(0) ack 1849611325 win 64800 <[|tcp]> [flowlabel 0xb519e] (len 32, hlim 64) Jun 20 14:13:20.943518 rule 2/(match) [uid 0, pid 85766] block in on vlanX: [Server.IP6].443 > [Client.IP6].50210: S 106368970:106368970(0) ack 1534267447 win 64800 <[|tcp]> [flowlabel 0xca19a] (len 32, hlim 64)

(The rule 2 that is logged is the rule number of the relayd/* anchor.)

tcpdump on vlanX shows the backend server sends the SYN-ACK immediately.

The IPv4 addresses are natted from public to rfc-1918 space and work.

For IPv6, the address of backend server.A is used as the public IP (service.pub). Only if server.A becomes unavailable, are packets redirected to server.B.

relayd.conf:
...
table <server.A> {
   Server.A.IP6 retry 2
}
table <server.B> {
   Server.B.IP6 retry 2
}
redirect "service.pub.80.v6" {
  listen on Server.A.IP6 tcp port 80 interface trunk0
  forward to <server.A> port 80 \
    check http "/" host "server.A" code 200
  forward to <server.B> port 80 \
    check http "/" host "server.B" code 200
}
redirect "service.pub.443.v6" {
  listen on Server.A.IP6 tcp port 443 interface trunk0
  forward to <server.A> port 443 \
    check https "/" host "server.A" code 200
  forward to <server.B> port 443 \
    check https "/" host "server.B" code 200
}

I am not 100% sure that the IPv6 failover actually worked before, but the connections to Server.A.IP6 were definitely working.
I do see the http and https checks succeed on both backend servers.

I've tried flushing the states and rebooting the firewall, to no avail.

relayctl shows all redirects/tables as active and all hosts as up:

2       redirect        service.pub.80.v6      active
3       table           server.A:80            active (1 hosts)
3       host            Server.A.IP6   100.00% up
4       table           server.B:80            active (1 hosts)
4       host            Server.B.IP6   100.00% up

3       redirect        service.pub.443.v6     active
5       table           server.A:443           active (1 hosts)
5       host            Server.A.IP6   100.00% up
6       table           server.B:443           active (1 hosts)
6       host            Server.B.IP6   100.00% up


Now I'm out of ideas on how to debug this further.

Has anyone been experiencing something similar?
Has something fundamental changed in relayd or pf that could cause this?
Does anybody spot an error in my configuration?

Thanks for any pointer!

Best regards
Markus

Reply via email to