[ovs-dev] [PATCH ovn 2/2] pinctrl: reset success and failures n_count regardless of svc state

2023-11-07 Thread Evgenii Kovalev
This patch adds reset n_count regardless of svc state and does only
if n_count >= configured_count.

The behavior hc with windows server as backend in load balancers show some
issue with counters, e.g.:

ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
 n_success == success_count => status = online; 
n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
 increase n_failures (count = 1)
After wait_time:
ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
 n_success == success_count => status = online; 
n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
 increase n_failures (count = 2)
After wait_time:
ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
 n_success == success_count => status = online; 
n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
 increase n_failures (count = 3)
 n_failures == failure_count => status = offline; 
n_failures = 0

So, the main point is svc reset the counters only for current state if
n_success >= success_count, but if a backend of lb for some reason
sent RST to ovn-controller it's just increase count and will reset only then
state is offline and greater or equal than failure_count. The same for SYN_ACK.

Signed-off-by: Evgenii Kovalev 
---
 controller/pinctrl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f98f6d70c..158e0ee75 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn,
 if (svc_mon->n_success >= svc_mon->success_count) {
 svc_mon->status = SVC_MON_ST_ONLINE;
 svc_mon->n_success = 0;
+svc_mon->n_failures = 0;
 }
+
 if (current_time >= svc_mon->next_send_time) {
 svc_monitor_send_health_check(swconn, svc_mon);
 next_run_time = svc_mon->wait_time;
@@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn,
 case SVC_MON_S_OFFLINE:
 if (svc_mon->n_failures >= svc_mon->failure_count) {
 svc_mon->status = SVC_MON_ST_OFFLINE;
+svc_mon->n_success = 0;
 svc_mon->n_failures = 0;
 }
 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/2] pinctrl: send RST instead of RST_ACK bit for lb hc

2023-11-07 Thread Evgenii Kovalev
Currently an ovn-controller send RST_ACK after receive SYN_ACK in
load balancer health check, but for a windows as load balancer
backend this behavior unexcpected and windows send SYN_ACK like a
tcp retransmission after every RST_ACK, finaly after few reties windows
send RST bit to ovn-controller. In this case the svc_mon status flapping
between online and offline, which triger the ovn-northd for each changes
and increase CPU-usage ovn-northd to 100%.

This patch fixing this behavior for windows and doesn't affect linux with
send RST instead of RST_ACK after received SYN_ACK.

In RFC1122 section 4.2.2.13 described the ways for closing a connection:
https://datatracker.ietf.org/doc/html/rfc1122#page-87
A TCP connection may terminate in two ways: (1) the normal
TCP close sequence using a FIN handshake, and (2) an "abort"
in which one or more RST segments are sent and the
connection state is immediately discarded.
For example, nmap tool use the same logic with RST for scaning ports.

Signed-off-by: Evgenii Kovalev 
---
 controller/pinctrl.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 3c1cecfde..f98f6d70c 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -7818,7 +7818,6 @@ pinctrl_handle_tcp_svc_check(struct rconn *swconn,
 return false;
 }
 
-uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq));
 uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack));
 
 if (th->tcp_dst != svc_mon->tp_src) {
@@ -7836,9 +7835,9 @@ pinctrl_handle_tcp_svc_check(struct rconn *swconn,
 svc_mon->state = SVC_MON_S_ONLINE;
 
 /* Send RST-ACK packet. */
-svc_monitor_send_tcp_health_check__(swconn, svc_mon, TCP_RST | TCP_ACK,
-htonl(tcp_ack + 1),
-htonl(tcp_seq + 1), th->tcp_dst);
+svc_monitor_send_tcp_health_check__(swconn, svc_mon, TCP_RST,
+htonl(tcp_ack),
+htonl(0), th->tcp_dst);
 /* Calculate next_send_time. */
 svc_mon->next_send_time = time_msec() + svc_mon->interval;
 return true;
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] pinctrl: unexpected behavior LB health check with windows server

2023-10-26 Thread Evgenii Kovalev



On 18.10.2023 19:49, Evgenii Kovalev wrote:

Hi All,

We met with unexpected behavior of health check in load balancer with a 
windows server as backend, for example:

ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv RST

Dump where 172.20.2.2 is ovn-controller and 172.20.2.49 is windows 
server backend:
17:18:20.324464 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [S], seq 
2289025863, win 65160, length 0
17:18:20.324572 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:20.325233 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 
2, ack 1, win 65160, length 0
17:18:23.336091 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:23.336559 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 
2, ack 1, win 65160, length 0
17:18:29.335992 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:29.336423 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 
2, ack 1, win 65160, length 0
17:18:41.335919 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [R], seq 
2447380614, win 0, length 0


For the linux it doesn't affected because linux just ignore RST_ACK and 
didn't made tcp retransmission from client side, it looks like:

ovn-controller --> linux_srv SYN
ovn-controller <-- linux_srv SYN_ACK
ovn-controller --> linux_srv RST_ACK
The linux client didn't made TCP retransmission.

The main issue:
The status in svc_mon flapping between online and offline because of 
behavior with a windows server backend. Every change for status in 
svc_mon trigger ovn-northd and increase CPU usage of ovn-northd to 100%.


I checked it on windows server with simple http server as backend and 
this is behavior reproduced and looks like all backends with windows 
affected.


I want make a patch to fix this behavior, but I don't know which method 
is preferred:


1) Add a new boolean field in struct svc_mon. After the func 
svc_monitor_run() called with init/online/offline cases set this field 
in the func svc_monitor_send_health_check(). If the func 
process_packet_in() called in ACTION_OPCODE_HANDLE_SVC_CHECK case we 
change the boolean field and send RST_ACK to client if packet_in with 
SYN_ACK flag or we make a choice based on the new boolean field, change 
or not the svc_mon->state field if packet_in with RST flag.

The func where we make decision:
https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L7810-L7858

2) Implement established tcp connection, for example:
ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv ACK
ovn-controller --> win_srv FIN_ACK
ovn-controller <-- win_srv FIN_ACK
ovn-controller --> win_srv ACK

3) Send RST instead of RST_ACK

It will be great if you advice which method is better or I need fix it 
on another way.

Thanks.



Gentle ping.
I'll appreciate for your advice.

--
Regards,
Evgenii Kovalev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovn] pinctrl: unexpected behavior LB health check with windows server

2023-10-18 Thread Evgenii Kovalev

Hi All,

We met with unexpected behavior of health check in load balancer with a 
windows server as backend, for example:

ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv SYN_ACK TCP Retransmission
ovn-controller --> win_srv RST_ACK
ovn-controller <-- win_srv RST

Dump where 172.20.2.2 is ovn-controller and 172.20.2.49 is windows 
server backend:
17:18:20.324464 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [S], seq 
2289025863, win 65160, length 0
17:18:20.324572 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:20.325233 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 
2, ack 1, win 65160, length 0
17:18:23.336091 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:23.336559 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 
2, ack 1, win 65160, length 0
17:18:29.335992 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [S.], seq 
2447380613, ack 2289025864, win 8192, options [mss 1460], length 0
17:18:29.336423 IP 172.20.2.2.29636 > 172.20.2.49.7045: Flags [R.], seq 
2, ack 1, win 65160, length 0
17:18:41.335919 IP 172.20.2.49.7045 > 172.20.2.2.29636: Flags [R], seq 
2447380614, win 0, length 0


For the linux it doesn't affected because linux just ignore RST_ACK and 
didn't made tcp retransmission from client side, it looks like:

ovn-controller --> linux_srv SYN
ovn-controller <-- linux_srv SYN_ACK
ovn-controller --> linux_srv RST_ACK
The linux client didn't made TCP retransmission.

The main issue:
The status in svc_mon flapping between online and offline because of 
behavior with a windows server backend. Every change for status in 
svc_mon trigger ovn-northd and increase CPU usage of ovn-northd to 100%.


I checked it on windows server with simple http server as backend and 
this is behavior reproduced and looks like all backends with windows 
affected.


I want make a patch to fix this behavior, but I don't know which method 
is preferred:


1) Add a new boolean field in struct svc_mon. After the func 
svc_monitor_run() called with init/online/offline cases set this field 
in the func svc_monitor_send_health_check(). If the func 
process_packet_in() called in ACTION_OPCODE_HANDLE_SVC_CHECK case we 
change the boolean field and send RST_ACK to client if packet_in with 
SYN_ACK flag or we make a choice based on the new boolean field, change 
or not the svc_mon->state field if packet_in with RST flag.

The func where we make decision:
https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L7810-L7858

2) Implement established tcp connection, for example:
ovn-controller --> win_srv SYN
ovn-controller <-- win_srv SYN_ACK
ovn-controller --> win_srv ACK
ovn-controller --> win_srv FIN_ACK
ovn-controller <-- win_srv FIN_ACK
ovn-controller --> win_srv ACK

3) Send RST instead of RST_ACK

It will be great if you advice which method is better or I need fix it 
on another way.

Thanks.

--
Regards,
Evgenii Kovalev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] nbctl: Add optional ha-chassis-group arg to ha-chassis-group-list

2023-10-17 Thread Evgenii Kovalev
This commit adds support for optional ha-chassis-group argument to
ha-chassis-group-list command to show one record.

Also add few tests to validate output from ha-chassis-group-list.

Signed-off-by: Evgenii Kovalev 
---
 tests/ovn-nbctl.at| 53 +
 utilities/ovn-nbctl.8.xml |  8 +++---
 utilities/ovn-nbctl.c | 55 +++
 3 files changed, 96 insertions(+), 20 deletions(-)

diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at
index fde3a28ee..7206d1fe5 100644
--- a/tests/ovn-nbctl.at
+++ b/tests/ovn-nbctl.at
@@ -2741,3 +2741,56 @@ cp $PKIDIR/$cert $cert
 OVS_WAIT_UNTIL([ovn-appctl -t ovn-nbctl run show])
 
 AT_CLEANUP
+
+dnl -
+
+AT_SETUP([ovn-nbctl - ha-chassis-group-list group])
+OVN_NBCTL_TEST_START daemon
+check ovn-nbctl ha-chassis-group-add chg1
+chg1uuid=$(fetch_column nb:HA_Chassis_Group _uuid name=chg1)
+check ovn-nbctl ha-chassis-group-add chg2
+chg2uuid=$(fetch_column nb:HA_Chassis_Group _uuid name=chg2)
+check ovn-nbctl ha-chassis-group-add-chassis chg1 hv1 1
+check ovn-nbctl ha-chassis-group-add-chassis chg1 hv2 2
+check ovn-nbctl ha-chassis-group-add-chassis chg2 hv3 3
+check ovn-nbctl ha-chassis-group-add-chassis chg2 hv4 4
+AT_CHECK([ovn-nbctl ha-chassis-group-list], [0], [ignore])
+
+AT_CHECK_UNQUOTED([ovn-nbctl ha-chassis-group-list $chg1uuid | grep chg1 | awk 
'{print $1}'], [0], [dnl
+$chg1uuid
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list chg1 | awk '{print $2}' | grep chg], 
[0], [dnl
+(chg1)
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list chg1 | awk '{print $2}' | grep -A1 
hv1], [0], [dnl
+(hv1)
+1
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list chg1 | awk '{print $2}' | grep -A1 
hv2], [0], [dnl
+(hv2)
+2
+])
+
+AT_CHECK_UNQUOTED([ovn-nbctl ha-chassis-group-list $chg2uuid | grep chg2 | awk 
'{print $1}'], [0], [dnl
+$chg2uuid
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list chg2 | awk '{print $2}' | grep chg], 
[0], [dnl
+(chg2)
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list chg2 | awk '{print $2}' | grep -A1 
hv3], [0], [dnl
+(hv3)
+3
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list chg2 | awk '{print $2}' | grep -A1 
hv4], [0], [dnl
+(hv4)
+4
+])
+
+AT_CHECK([ovn-nbctl ha-chassis-group-list negative], [1], [], [dnl
+ovn-nbctl: negative: ha_chassis_group name not found
+])
+AT_CHECK([ovn-nbctl ha-chassis-group-list 
----], [1], [], [dnl
+ovn-nbctl: ----: ha_chassis_group UUID not 
found
+])
+OVN_NBCTL_TEST_STOP "/terminating with signal 15/d"
+AT_CLEANUP
diff --git a/utilities/ovn-nbctl.8.xml b/utilities/ovn-nbctl.8.xml
index b4af43185..6f74bd557 100644
--- a/utilities/ovn-nbctl.8.xml
+++ b/utilities/ovn-nbctl.8.xml
@@ -1466,10 +1466,12 @@
 group does not exist.
   
 
-  ha-chassis-group-list
+  ha-chassis-group-list [ha-chassis-group]
   
-Lists the HA chassis group group along with the
-HA chassis if any associated with it.
+Lists all HA chassis groups along with the HA chassis
+if any associated with it.
+If ha-chassis-group is also specified, then only the
+specified ha-chassis-group will be listed.
   
 
   ha-chassis-group-add-chassis group
diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
index 444fbd2fe..9029e8133 100644
--- a/utilities/ovn-nbctl.c
+++ b/utilities/ovn-nbctl.c
@@ -464,9 +464,9 @@ Port group commands:\n\
   pg-set-ports PG PORTS   Set PORTS on port group PG\n\
   pg-del PG   Delete port group PG\n\
 HA chassis group commands:\n\
-  ha-chassis-group-add GRP  Create an HA chassis group GRP\n\
-  ha-chassis-group-del GRP  Delete the HA chassis group GRP\n\
-  ha-chassis-group-list List the HA chassis groups\n\
+  ha-chassis-group-add GRPCreate an HA chassis group GRP\n\
+  ha-chassis-group-del GRPDelete the HA chassis group GRP\n\
+  ha-chassis-group-list [GRP] Print the supplied HA chassis group or all if 
none supplied\n\
   ha-chassis-group-add-chassis GRP CHASSIS PRIORITY Adds an HA\
 chassis with mandatory PRIORITY to the HA chassis group GRP\n\
   ha-chassis-group-remove-chassis GRP CHASSIS Removes the HA chassis\
@@ -7260,7 +7260,7 @@ ha_chassis_group_by_name_or_uuid(struct ctl_context *ctx, 
const char *id,
 }
 
 if (!ha_ch_grp && must_exist) {
-ctx->error = xasprintf("%s: ha_chassi_group %s not found",
+ctx->error = xasprintf("%s: ha_chassis_group %s not found",
id, is_uuid ? "UUID" : "name");
 }
 
@@ -7306,23 +7306,44 @@ pre_ha_ch_grp_list(struct ctl_context *ctx)
 }
 
 static void
-cmd_ha_ch_grp_list(struct ctl_context *ctx)
+ha_ch_grp_info_print(struct ctl_context *ctx, const struct 
nbrec_ha_chassis_group *ha_ch_grp)
+{