Hi All,
We met with unexpected behavior of health check in load balancer with a
windows server as backend, for example:
ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv RST
Dump where 172.20.2.2 is ovn-controller and 172.20.2.49 is windows
server backend:
17:18:20.324464 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [S], seq
2289025863, win 65160, length 0
17:18:20.324572 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:20.325233 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq
2, ack 1, win 65160, length 0
17:18:23.336091 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:23.336559 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq
2, ack 1, win 65160, length 0
17:18:29.335992 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:29.336423 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq
2, ack 1, win 65160, length 0
17:18:41.335919 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [R], seq
2447380614, win 0, length 0
For the linux it doesn't affected because linux just ignore RST_ACK and
didn't made tcp retransmission from client side, it looks like:
ovn-controller --> linux_srv SYN
ovn-controller <-- linux_srv SYN_ACK
ovn-controller --> linux_srv RST_ACK
The linux client didn't made TCP retransmission.
The main issue:
The status in svc_mon flapping between online and offline because of
behavior with a windows server backend. Every change for status in
svc_mon trigger ovn-northd and increase CPU usage of ovn-northd to 100%.
I checked it on windows server with simple http server as backend and
this is behavior reproduced and looks like all backends with windows
affected.
I want make a patch to fix this behavior, but I don't know which method
is preferred:
1) Add a new boolean field in struct svc_mon. After the func
svc_monitor_run() called with init/online/offline cases set this field
in the func svc_monitor_send_health_check(). If the func
process_packet_in() called in ACTION_OPCODE_HANDLE_SVC_CHECK case we
change the boolean field and send RST_ACK to client if packet_in with
SYN_ACK flag or we make a choice based on the new boolean field, change
or not the svc_mon->state field if packet_in with RST flag.
The func where we make decision:
https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L7810-L7858
2) Implement established tcp connection, for example:
ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv ACK
ovn-controller --> win_srv FIN_ACK
ovn-controller <-- win_srv FIN_ACK
ovn-controller --> win_srv ACK
3) Send RST instead of RST_ACK
It will be great if you advice which method is better or I need fix it
on another way.
Thanks.
--
Regards,
Evgenii Kovalev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev