Hi all
(Sorry for flooding, this seems related to the question I asked earlier.
Please bear with me.)
I am using relayd on 7.3-release as an IP loadbalancer in front of some
dualstack backend hosts. This setup has worked for some years now.
After upgrading to 7.3 about 4 weeks ago I noticed a steady decline of
IPv6 sessions coming into the backend servers, up to the point where
none arrive at all (for 2 days now).
Now users start complaining that their connections to the servers
(public IP) are either timing out or are established only after a very
long time (usually the tcp start timeout when the client switches from
IPv6 to trying IPv4). The IPv4 connections succeed immediately.
pflog shows that the IPv6 SYN-ACK replies from the backend servers are
being dropped by pf. But weirdly the blocks are logged over 30 seconds
after the SYN is allowed through:
Jun 20 14:12:49.489707 rule 2/(match) [uid 0, pid 85766] pass out on
vlanX: [Client.IP6].50210 > [Server.IP6].443:
S 2508622700:2508622700(0) win 64800 <[|tcp]> [flowlabel 0xd4400] (len
32, hlim 52)
Jun 20 14:12:49.493267 rule 2/(match) [uid 0, pid 85766] pass out on
vlanX: [Client.IP6].50211 > [Server.IP6].443:
S 806421981:806421981(0) win 64800 <[|tcp]> [flowlabel 0x162e5] (len 32,
hlim 52)
Jun 20 14:12:49.507508 rule 2/(match) [uid 0, pid 85766] pass out on
vlanX: [Client.IP6].50212 > [Server.IP6].443:
S 3945655871:3945655871(0) win 64800 <[|tcp]> [flowlabel 0x8abc6] (len
32, hlim 52)
Jun 20 14:12:49.517783 rule 2/(match) [uid 0, pid 85766] pass out on
vlanX: [Client.IP6].50213 > [Server.IP6].443: S 1191028748:1191028748(0)
win 64800 <[|tcp]> [flowlabel 0xa7d6] (len 32, hlim 52)
Jun 20 14:13:20.943370 rule 2/(match) [uid 0, pid 85766] block in on
vlanX: [Server.IP6].443 > [Client.IP6].50213: S 3650589557:3650589557(0)
ack 209077342 win 64800 <[|tcp]> [flowlabel 0xd922c] (len 32, hlim 64)
Jun 20 14:13:20.943433 rule 2/(match) [uid 0, pid 85766] block in on
vlanX: [Server.IP6].443 > [Client.IP6].50212: S 2068945110:2068945110(0)
ack 2313561433 win 64800 <[|tcp]> [flowlabel 0xf8c9c] (len 32, hlim 64)
Jun 20 14:13:20.943476 rule 2/(match) [uid 0, pid 85766] block in on
vlanX: [Server.IP6].443 > [Client.IP6].50211: S 3395939328:3395939328(0)
ack 1849611325 win 64800 <[|tcp]> [flowlabel 0xb519e] (len 32, hlim 64)
Jun 20 14:13:20.943518 rule 2/(match) [uid 0, pid 85766] block in on
vlanX: [Server.IP6].443 > [Client.IP6].50210: S 106368970:106368970(0)
ack 1534267447 win 64800 <[|tcp]> [flowlabel 0xca19a] (len 32, hlim 64)
(The rule 2 that is logged is the rule number of the relayd/* anchor.)
tcpdump on vlanX shows the backend server sends the SYN-ACK immediately.
The IPv4 addresses are natted from public to rfc-1918 space and work.
For IPv6, the address of backend server.A is used as the public IP
(service.pub). Only if server.A becomes unavailable, are packets
redirected to server.B.
relayd.conf:
...
table <server.A> {
Server.A.IP6 retry 2
}
table <server.B> {
Server.B.IP6 retry 2
}
redirect "service.pub.80.v6" {
listen on Server.A.IP6 tcp port 80 interface trunk0
forward to <server.A> port 80 \
check http "/" host "server.A" code 200
forward to <server.B> port 80 \
check http "/" host "server.B" code 200
}
redirect "service.pub.443.v6" {
listen on Server.A.IP6 tcp port 443 interface trunk0
forward to <server.A> port 443 \
check https "/" host "server.A" code 200
forward to <server.B> port 443 \
check https "/" host "server.B" code 200
}
I am not 100% sure that the IPv6 failover actually worked before, but
the connections to Server.A.IP6 were definitely working.
I do see the http and https checks succeed on both backend servers.
I've tried flushing the states and rebooting the firewall, to no avail.
relayctl shows all redirects/tables as active and all hosts as up:
2 redirect service.pub.80.v6 active
3 table server.A:80 active (1 hosts)
3 host Server.A.IP6 100.00% up
4 table server.B:80 active (1 hosts)
4 host Server.B.IP6 100.00% up
3 redirect service.pub.443.v6 active
5 table server.A:443 active (1 hosts)
5 host Server.A.IP6 100.00% up
6 table server.B:443 active (1 hosts)
6 host Server.B.IP6 100.00% up
Now I'm out of ideas on how to debug this further.
Has anyone been experiencing something similar?
Has something fundamental changed in relayd or pf that could cause this?
Does anybody spot an error in my configuration?
Thanks for any pointer!
Best regards
Markus