[ovs-dev] [PATCH ovn 2/2] pinctrl: reset success and failures n_count regardless of svc state
This patch adds reset n_count regardless of svc state and does only if n_count >= configured_count. The behavior hc with windows server as backend in load balancers show some issue with counters, e.g.: ovn-controller --> win_srv | SYN ovn-controller <-- win_srv | SYN_ACK <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success n_success == success_count => status = online; n_success = 0 ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) increase n_failures (count = 1) After wait_time: ovn-controller --> win_srv | SYN ovn-controller <-- win_srv | SYN_ACK <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success n_success == success_count => status = online; n_success = 0 ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) increase n_failures (count = 2) After wait_time: ovn-controller --> win_srv | SYN ovn-controller <-- win_srv | SYN_ACK <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success n_success == success_count => status = online; n_success = 0 ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) increase n_failures (count = 3) n_failures == failure_count => status = offline; n_failures = 0 So, the main point is svc reset the counters only for current state if n_success >= success_count, but if a backend of lb for some reason sent RST to ovn-controller it's just increase count and will reset only then state is offline and greater or equal than failure_count. The same for SYN_ACK. Signed-off-by: Evgenii Kovalev --- controller/pinctrl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/controller/pinctrl.c b/controller/pinctrl.c index f98f6d70c..158e0ee75 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn, if (svc_mon->n_success >= svc_mon->success_count) { svc_mon->status = SVC_MON_ST_ONLINE; svc_mon->n_success = 0; +svc_mon->n_failures = 0; } + if (current_time >= svc_mon->next_send_time) { svc_monitor_send_health_check(swconn, svc_mon); next_run_time = svc_mon->wait_time; @@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn, case SVC_MON_S_OFFLINE: if (svc_mon->n_failures >= svc_mon->failure_count) { svc_mon->status = SVC_MON_ST_OFFLINE; +svc_mon->n_success = 0; svc_mon->n_failures = 0; } ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 1/2] pinctrl: send RST instead of RST_ACK bit for lb hc
Currently an ovn-controller send RST_ACK after receive SYN_ACK in load balancer health check, but for a windows as load balancer backend this behavior unexcpected and windows send SYN_ACK like a tcp retransmission after every RST_ACK, finaly after few reties windows send RST bit to ovn-controller. In this case the svc_mon status flapping between online and offline, which triger the ovn-northd for each changes and increase CPU-usage ovn-northd to 100%. This patch fixing this behavior for windows and doesn't affect linux with send RST instead of RST_ACK after received SYN_ACK. In RFC1122 section 4.2.2.13 described the ways for closing a connection: https://datatracker.ietf.org/doc/html/rfc1122#page-87 A TCP connection may terminate in two ways: (1) the normal TCP close sequence using a FIN handshake, and (2) an "abort" in which one or more RST segments are sent and the connection state is immediately discarded. For example, nmap tool use the same logic with RST for scaning ports. Signed-off-by: Evgenii Kovalev --- controller/pinctrl.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 3c1cecfde..f98f6d70c 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -7818,7 +7818,6 @@ pinctrl_handle_tcp_svc_check(struct rconn *swconn, return false; } -uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq)); uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack)); if (th->tcp_dst != svc_mon->tp_src) { @@ -7836,9 +7835,9 @@ pinctrl_handle_tcp_svc_check(struct rconn *swconn, svc_mon->state = SVC_MON_S_ONLINE; /* Send RST-ACK packet. */ -svc_monitor_send_tcp_health_check__(swconn, svc_mon, TCP_RST | TCP_ACK, -htonl(tcp_ack + 1), -htonl(tcp_seq + 1), th->tcp_dst); +svc_monitor_send_tcp_health_check__(swconn, svc_mon, TCP_RST, +htonl(tcp_ack), +htonl(0), th->tcp_dst); /* Calculate next_send_time. */ svc_mon->next_send_time = time_msec() + svc_mon->interval; return true; ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovn] pinctrl: unexpected behavior LB health check with windows server
On 18.10.2023 19:49, Evgenii Kovalev wrote: Hi All, We met with unexpected behavior of health check in load balancer with a windows server as backend, for example: ovn-controller --> win_srv SYN ovn-controller <-- win_srv SYN_ACK ovn-controller --> win_srv RST_ACK ovn-controller <-- win_srv SYN_ACK TCP Retransmission ovn-controller --> win_srv RST_ACK ovn-controller <-- win_srv SYN_ACK TCP Retransmission ovn-controller --> win_srv RST_ACK ovn-controller <-- win_srv RST Dump where 172.20.2.2 is ovn-controller and 172.20.2.49 is windows server backend: 17:18:20.324464 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [S], seq 2289025863, win 65160, length 0 17:18:20.324572 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 2447380613, ack 2289025864, win 8192, options [mss 1460], length 0 17:18:20.325233 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 2, ack 1, win 65160, length 0 17:18:23.336091 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 2447380613, ack 2289025864, win 8192, options [mss 1460], length 0 17:18:23.336559 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 2, ack 1, win 65160, length 0 17:18:29.335992 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 2447380613, ack 2289025864, win 8192, options [mss 1460], length 0 17:18:29.336423 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 2, ack 1, win 65160, length 0 17:18:41.335919 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [R], seq 2447380614, win 0, length 0 For the linux it doesn't affected because linux just ignore RST_ACK and didn't made tcp retransmission from client side, it looks like: ovn-controller --> linux_srv SYN ovn-controller <-- linux_srv SYN_ACK ovn-controller --> linux_srv RST_ACK The linux client didn't made TCP retransmission. The main issue: The status in svc_mon flapping between online and offline because of behavior with a windows server backend. Every change for status in svc_mon trigger ovn-northd and increase CPU usage of ovn-northd to 100%. I checked it on windows server with simple http server as backend and this is behavior reproduced and looks like all backends with windows affected. I want make a patch to fix this behavior, but I don't know which method is preferred: 1) Add a new boolean field in struct svc_mon. After the func svc_monitor_run() called with init/online/offline cases set this field in the func svc_monitor_send_health_check(). If the func process_packet_in() called in ACTION_OPCODE_HANDLE_SVC_CHECK case we change the boolean field and send RST_ACK to client if packet_in with SYN_ACK flag or we make a choice based on the new boolean field, change or not the svc_mon->state field if packet_in with RST flag. The func where we make decision: https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L7810-L7858 2) Implement established tcp connection, for example: ovn-controller --> win_srv SYN ovn-controller <-- win_srv SYN_ACK ovn-controller --> win_srv ACK ovn-controller --> win_srv FIN_ACK ovn-controller <-- win_srv FIN_ACK ovn-controller --> win_srv ACK 3) Send RST instead of RST_ACK It will be great if you advice which method is better or I need fix it on another way. Thanks. Gentle ping. I'll appreciate for your advice. -- Regards, Evgenii Kovalev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [ovn] pinctrl: unexpected behavior LB health check with windows server
Hi All, We met with unexpected behavior of health check in load balancer with a windows server as backend, for example: ovn-controller --> win_srv SYN ovn-controller <-- win_srv SYN_ACK ovn-controller --> win_srv RST_ACK ovn-controller <-- win_srv SYN_ACK TCP Retransmission ovn-controller --> win_srv RST_ACK ovn-controller <-- win_srv SYN_ACK TCP Retransmission ovn-controller --> win_srv RST_ACK ovn-controller <-- win_srv RST Dump where 172.20.2.2 is ovn-controller and 172.20.2.49 is windows server backend: 17:18:20.324464 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [S], seq 2289025863, win 65160, length 0 17:18:20.324572 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 2447380613, ack 2289025864, win 8192, options [mss 1460], length 0 17:18:20.325233 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 2, ack 1, win 65160, length 0 17:18:23.336091 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 2447380613, ack 2289025864, win 8192, options [mss 1460], length 0 17:18:23.336559 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 2, ack 1, win 65160, length 0 17:18:29.335992 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 2447380613, ack 2289025864, win 8192, options [mss 1460], length 0 17:18:29.336423 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 2, ack 1, win 65160, length 0 17:18:41.335919 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [R], seq 2447380614, win 0, length 0 For the linux it doesn't affected because linux just ignore RST_ACK and didn't made tcp retransmission from client side, it looks like: ovn-controller --> linux_srv SYN ovn-controller <-- linux_srv SYN_ACK ovn-controller --> linux_srv RST_ACK The linux client didn't made TCP retransmission. The main issue: The status in svc_mon flapping between online and offline because of behavior with a windows server backend. Every change for status in svc_mon trigger ovn-northd and increase CPU usage of ovn-northd to 100%. I checked it on windows server with simple http server as backend and this is behavior reproduced and looks like all backends with windows affected. I want make a patch to fix this behavior, but I don't know which method is preferred: 1) Add a new boolean field in struct svc_mon. After the func svc_monitor_run() called with init/online/offline cases set this field in the func svc_monitor_send_health_check(). If the func process_packet_in() called in ACTION_OPCODE_HANDLE_SVC_CHECK case we change the boolean field and send RST_ACK to client if packet_in with SYN_ACK flag or we make a choice based on the new boolean field, change or not the svc_mon->state field if packet_in with RST flag. The func where we make decision: https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L7810-L7858 2) Implement established tcp connection, for example: ovn-controller --> win_srv SYN ovn-controller <-- win_srv SYN_ACK ovn-controller --> win_srv ACK ovn-controller --> win_srv FIN_ACK ovn-controller <-- win_srv FIN_ACK ovn-controller --> win_srv ACK 3) Send RST instead of RST_ACK It will be great if you advice which method is better or I need fix it on another way. Thanks. -- Regards, Evgenii Kovalev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] nbctl: Add optional ha-chassis-group arg to ha-chassis-group-list
This commit adds support for optional ha-chassis-group argument to ha-chassis-group-list command to show one record. Also add few tests to validate output from ha-chassis-group-list. Signed-off-by: Evgenii Kovalev --- tests/ovn-nbctl.at| 53 + utilities/ovn-nbctl.8.xml | 8 +++--- utilities/ovn-nbctl.c | 55 +++ 3 files changed, 96 insertions(+), 20 deletions(-) diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at index fde3a28ee..7206d1fe5 100644 --- a/tests/ovn-nbctl.at +++ b/tests/ovn-nbctl.at @@ -2741,3 +2741,56 @@ cp $PKIDIR/$cert $cert OVS_WAIT_UNTIL([ovn-appctl -t ovn-nbctl run show]) AT_CLEANUP + +dnl - + +AT_SETUP([ovn-nbctl - ha-chassis-group-list group]) +OVN_NBCTL_TEST_START daemon +check ovn-nbctl ha-chassis-group-add chg1 +chg1uuid=$(fetch_column nb:HA_Chassis_Group _uuid name=chg1) +check ovn-nbctl ha-chassis-group-add chg2 +chg2uuid=$(fetch_column nb:HA_Chassis_Group _uuid name=chg2) +check ovn-nbctl ha-chassis-group-add-chassis chg1 hv1 1 +check ovn-nbctl ha-chassis-group-add-chassis chg1 hv2 2 +check ovn-nbctl ha-chassis-group-add-chassis chg2 hv3 3 +check ovn-nbctl ha-chassis-group-add-chassis chg2 hv4 4 +AT_CHECK([ovn-nbctl ha-chassis-group-list], [0], [ignore]) + +AT_CHECK_UNQUOTED([ovn-nbctl ha-chassis-group-list $chg1uuid | grep chg1 | awk '{print $1}'], [0], [dnl +$chg1uuid +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list chg1 | awk '{print $2}' | grep chg], [0], [dnl +(chg1) +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list chg1 | awk '{print $2}' | grep -A1 hv1], [0], [dnl +(hv1) +1 +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list chg1 | awk '{print $2}' | grep -A1 hv2], [0], [dnl +(hv2) +2 +]) + +AT_CHECK_UNQUOTED([ovn-nbctl ha-chassis-group-list $chg2uuid | grep chg2 | awk '{print $1}'], [0], [dnl +$chg2uuid +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list chg2 | awk '{print $2}' | grep chg], [0], [dnl +(chg2) +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list chg2 | awk '{print $2}' | grep -A1 hv3], [0], [dnl +(hv3) +3 +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list chg2 | awk '{print $2}' | grep -A1 hv4], [0], [dnl +(hv4) +4 +]) + +AT_CHECK([ovn-nbctl ha-chassis-group-list negative], [1], [], [dnl +ovn-nbctl: negative: ha_chassis_group name not found +]) +AT_CHECK([ovn-nbctl ha-chassis-group-list ----], [1], [], [dnl +ovn-nbctl: ----: ha_chassis_group UUID not found +]) +OVN_NBCTL_TEST_STOP "/terminating with signal 15/d" +AT_CLEANUP diff --git a/utilities/ovn-nbctl.8.xml b/utilities/ovn-nbctl.8.xml index b4af43185..6f74bd557 100644 --- a/utilities/ovn-nbctl.8.xml +++ b/utilities/ovn-nbctl.8.xml @@ -1466,10 +1466,12 @@ group does not exist. - ha-chassis-group-list + ha-chassis-group-list [ha-chassis-group] -Lists the HA chassis group group along with the -HA chassis if any associated with it. +Lists all HA chassis groups along with the HA chassis +if any associated with it. +If ha-chassis-group is also specified, then only the +specified ha-chassis-group will be listed. ha-chassis-group-add-chassis group diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c index 444fbd2fe..9029e8133 100644 --- a/utilities/ovn-nbctl.c +++ b/utilities/ovn-nbctl.c @@ -464,9 +464,9 @@ Port group commands:\n\ pg-set-ports PG PORTS Set PORTS on port group PG\n\ pg-del PG Delete port group PG\n\ HA chassis group commands:\n\ - ha-chassis-group-add GRP Create an HA chassis group GRP\n\ - ha-chassis-group-del GRP Delete the HA chassis group GRP\n\ - ha-chassis-group-list List the HA chassis groups\n\ + ha-chassis-group-add GRPCreate an HA chassis group GRP\n\ + ha-chassis-group-del GRPDelete the HA chassis group GRP\n\ + ha-chassis-group-list [GRP] Print the supplied HA chassis group or all if none supplied\n\ ha-chassis-group-add-chassis GRP CHASSIS PRIORITY Adds an HA\ chassis with mandatory PRIORITY to the HA chassis group GRP\n\ ha-chassis-group-remove-chassis GRP CHASSIS Removes the HA chassis\ @@ -7260,7 +7260,7 @@ ha_chassis_group_by_name_or_uuid(struct ctl_context *ctx, const char *id, } if (!ha_ch_grp && must_exist) { -ctx->error = xasprintf("%s: ha_chassi_group %s not found", +ctx->error = xasprintf("%s: ha_chassis_group %s not found", id, is_uuid ? "UUID" : "name"); } @@ -7306,23 +7306,44 @@ pre_ha_ch_grp_list(struct ctl_context *ctx) } static void -cmd_ha_ch_grp_list(struct ctl_context *ctx) +ha_ch_grp_info_print(struct ctl_context *ctx, const struct nbrec_ha_chassis_group *ha_ch_grp) +{