[ovs-dev] [ovn] pinctrl: unexpected behavior LB health check with windows server

Evgenii Kovalev Wed, 18 Oct 2023 09:49:45 -0700

Hi All,

We met with unexpected behavior of health check in load balancer with awindows server as backend, for example:

ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv RST

Dump where 172.20.2.2 is ovn-controller and 172.20.2.49 is windowsserver backend:17:18:20.324464 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [S], seq2289025863, win 65160, length 017:18:20.324572 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq2447380613, ack 2289025864, win 8192, options [mss 1460], length 017:18:20.325233 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq2, ack 1, win 65160, length 017:18:23.336091 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq2447380613, ack 2289025864, win 8192, options [mss 1460], length 017:18:23.336559 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq2, ack 1, win 65160, length 017:18:29.335992 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq2447380613, ack 2289025864, win 8192, options [mss 1460], length 017:18:29.336423 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq2, ack 1, win 65160, length 017:18:41.335919 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [R], seq2447380614, win 0, length 0

For the linux it doesn't affected because linux just ignore RST_ACK anddidn't made tcp retransmission from client side, it looks like:

ovn-controller --> linux_srv SYN
ovn-controller <-- linux_srv SYN_ACK
ovn-controller --> linux_srv RST_ACK
The linux client didn't made TCP retransmission.

The main issue:

The status in svc_mon flapping between online and offline because ofbehavior with a windows server backend. Every change for status insvc_mon trigger ovn-northd and increase CPU usage of ovn-northd to 100%.

I checked it on windows server with simple http server as backend andthis is behavior reproduced and looks like all backends with windowsaffected.

I want make a patch to fix this behavior, but I don't know which methodis preferred:

1) Add a new boolean field in struct svc_mon. After the funcsvc_monitor_run() called with init/online/offline cases set this fieldin the func svc_monitor_send_health_check(). If the funcprocess_packet_in() called in ACTION_OPCODE_HANDLE_SVC_CHECK case wechange the boolean field and send RST_ACK to client if packet_in withSYN_ACK flag or we make a choice based on the new boolean field, changeor not the svc_mon->state field if packet_in with RST flag.

The func where we make decision:
https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L7810-L7858

2) Implement established tcp connection, for example:
ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv ACK
ovn-controller --> win_srv FIN_ACK
ovn-controller <-- win_srv FIN_ACK
ovn-controller --> win_srv ACK

3) Send RST instead of RST_ACK

It will be great if you advice which method is better or I need fix iton another way.

Thanks.

--
Regards,
Evgenii Kovalev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [ovn] pinctrl: unexpected behavior LB health check with windows server

Reply via email to