[ovs-dev] [PATCH ovn] northd: Allow need frag to be SNATed

2023-09-19 Thread Ales Musil
Considering following topology:
client - sw0 - lrp0 - lr - lrp1 - sw1 - server
sw0 in subnet 192.168.0.0/24
sw1 in subnet 172.168.0.0/24
SNAT configured for sw0 subnet
gateway_mtu=1400 configured for lrp0

If we send UDP traffic from client to server
and server responds with packet bigger than 1400
the following sequence will happen:

1) Packet is coming into lr via lrp1
2) unSNAT
3) Routing, the outport will be set to lrp1
4) Check for packet larger will fail
5) We will generate ICMP need frag

However, the last step is wrong from the server
perspective. The ICMP message will have IP source
address = lrp1 IP address. Which means that SNAT won't
happen because the source is not within the sw0 subnet,
but the inner packet has sw0 subnet address, because it
was unSNATted. This results in server ignoring the ICMP
message because server never sent any packet to the
sw0 subnet.

To fix this issue use outport IP address as source instead
of the inport one for the ICMP error message. This will
lead to SNAT for the packet which will result in correct
addresses on the sw1 side.

Reported-at: https://issues.redhat.com/browse/FDP-39
Signed-off-by: Ales Musil 
---
 northd/northd.c  |  30 +---
 tests/ovn-northd.at  |  65 -
 tests/ovn.at |   6 +-
 tests/system-ovn-kmod.at | 152 +++
 4 files changed, 210 insertions(+), 43 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 2a1f23fb9..1d306bb47 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -13894,7 +13894,15 @@ build_icmperr_pkt_big_flows(struct ovn_port *op, int 
mtu, struct hmap *lflows,
   outport->json_key)
   : NULL;
 
-if (op->lrp_networks.ipv4_addrs) {
+char *ip4_src = NULL;
+
+if (outport && outport->lrp_networks.ipv4_addrs) {
+ip4_src = outport->lrp_networks.ipv4_addrs[0].addr_s;
+} else if (op->lrp_networks.ipv4_addrs) {
+ip4_src = op->lrp_networks.ipv4_addrs[0].addr_s;
+}
+
+if (ip4_src) {
 ds_clear(match);
 ds_put_format(match, "inport == %s && %sip4 && "REGBIT_PKT_LARGER
   " && "REGBIT_EGRESS_LOOPBACK" == 0", op->json_key,
@@ -13914,9 +13922,8 @@ build_icmperr_pkt_big_flows(struct ovn_port *op, int 
mtu, struct hmap *lflows,
 "icmp4.code = 4; /* Frag Needed and DF was Set. */ "
 "icmp4.frag_mtu = %d; "
 "next(pipeline=ingress, table=%d); };",
-op->lrp_networks.ea_s,
-op->lrp_networks.ipv4_addrs[0].addr_s,
-mtu, ovn_stage_get_table(S_ROUTER_IN_ADMISSION));
+op->lrp_networks.ea_s, ip4_src, mtu,
+ovn_stage_get_table(S_ROUTER_IN_ADMISSION));
 ovn_lflow_add_with_hint__(lflows, op->od, stage, 150,
   ds_cstr(match), ds_cstr(actions),
   NULL,
@@ -13927,7 +13934,15 @@ build_icmperr_pkt_big_flows(struct ovn_port *op, int 
mtu, struct hmap *lflows,
   &op->nbrp->header_);
 }
 
-if (op->lrp_networks.ipv6_addrs) {
+char *ip6_src = NULL;
+
+if (outport && outport->lrp_networks.ipv6_addrs) {
+ip6_src = outport->lrp_networks.ipv6_addrs[0].addr_s;
+} else if (op->lrp_networks.ipv6_addrs) {
+ip6_src = op->lrp_networks.ipv6_addrs[0].addr_s;
+}
+
+if (ip6_src) {
 ds_clear(match);
 ds_put_format(match, "inport == %s && %sip6 && "REGBIT_PKT_LARGER
   " && "REGBIT_EGRESS_LOOPBACK" == 0", op->json_key,
@@ -13947,9 +13962,8 @@ build_icmperr_pkt_big_flows(struct ovn_port *op, int 
mtu, struct hmap *lflows,
 "icmp6.code = 0; "
 "icmp6.frag_mtu = %d; "
 "next(pipeline=ingress, table=%d); };",
-op->lrp_networks.ea_s,
-op->lrp_networks.ipv6_addrs[0].addr_s,
-mtu, ovn_stage_get_table(S_ROUTER_IN_ADMISSION));
+op->lrp_networks.ea_s, ip6_src, mtu,
+ovn_stage_get_table(S_ROUTER_IN_ADMISSION));
 ovn_lflow_add_with_hint__(lflows, op->od, stage, 150,
   ds_cstr(match), ds_cstr(actions),
   NULL,
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 13c8a0d42..044d9c2a2 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -6106,10 +6106,10 @@ AT_CHECK([grep -e "chk_pkt_len" -e "lr_in_larger_pkts" 
lr0flows | sed 's/table=.
   table=??(lr_in_chk_pkt_len  ), priority=0, match=(1), action=(next;)
   table=??(lr_in_chk_pkt_len  ), priority=50   , match=(outport == 
"lr0-public"), action=(reg9[[1]] = check_pkt_larger(1514); next;)
   table=??(lr_in_larger_pkts  ), priority=0, match=(1), action=(next;)
-  table=??(lr_in_larger_pkts  ), priority=150  , match=(inport == "lr0-sw0" && 
outport == "lr0-public" && ip4 && reg9[[1]] && reg9[[0]] == 0), 
action=(icmp4_error {reg9[[0]] = 1; reg9[

[ovs-dev] 答复: [PATCH] netdev-dpdk: fix tso bug

2023-09-19 Thread Dexia Li via dev


-邮件原件-
发件人: Mike Pattrick  
发送时间: 2023年9月19日 23:58
收件人: Dexia Li 
抄送: David Marchand ; ovs-dev@openvswitch.org; 
i.maxim...@ovn.org; Eelco Chaudron ; Qingmin Liu 
; Joey Xing 
主题: Re: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug

On Tue, Sep 19, 2023 at 7:51 AM Dexia Li  wrote:
>
> Hi, David
>
> Sorry, I am missing something too...
>
> Firstly, RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags implies 
> RTE_MBUF_F_TX_TCP_CKSUM in last letter.
> Some drivers such as ice can use RTE_MBUF_F_TX_TCP_SEG with 
> RTE_MBUF_F_TX_TCP_CKSUM but Some drivers such as Iavf will result in 
> driver hang when using this two flags meanwhile. Drivers like ice Can segment 
> tcp packets when only RTE_MBUF_F_TX_TCP_SEG flag is set.
>
> Secondly, testpmd csum engine support example, we can see as follows 
> RTE_MBUF_F_TX_TCP_CKSUM is unset When RTE_MBUF_F_TX_TCP_SEG flag is set. 
> Should ovs follow this example? I suggest so.
> if (tso_segsz)
> ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
> else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
> ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
> }
>
> Thirdly, I use ovs 2.17 and dpdk 21.11 for test. In dpdk 21.11, 
> iavf_build_data_desc_cmd_offset_fields func may set 
> RTE_MBUF_F_TX_TCP_SEG and RTE_MBUF_F_TX_TCP_CKSUM. Code as follows
>
> if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
> offset |= (m->l4_len >> 2) <<
>   IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> }
>
> /* Enable L4 checksum offloads */
> switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
> case RTE_MBUF_F_TX_TCP_CKSUM:
> command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
> offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> break;
>
> and dpdk 22.11 only set RTE_MBUF_F_TX_TCP_SEG truly.

This conflicts with the Mbuf lib programming guide, which includes an example 
with both flags set:
https://doc.dpdk.org/guides/prog_guide/mbuf_lib.html

Other locations in the DPDK documentation suggest that both flags can or should 
be set, that they're redundant, or one implies the other. I can't find anywhere 
that suggests they are exclusive.


Thanks,
Mike


I don't mean they're exclusive. Only setting RTE_MBUF_F_TX_TCP_SEG flag 
provides better compatibility I think.
However, we can also think dpdk 21.11 iavf driver has the bug when setting 
RTE_MBUF_F_TX_TCP_SEG and 
RTE_MBUF_F_TX_TCP_CKSUM flags meanwhile. This has been corrected in dpdk 22.11. 

Ok, I accept your point of view. 

Thanks
Dexia

>
> I think, upper application like ovs should provide compatibility for various 
> drivers.
> Only setting RTE_MBUF_F_TX_TCP_SEG can support compatibility for various 
> drivers tso fuction.
>
>
>
> -邮件原件-
> 发件人: Dexia Li
> 发送时间: 2023年9月18日 17:47
> 收件人: David Marchand 
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Mike Pattrick 
> ; Eelco Chaudron 
> 主题: 答复: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug
>
> Hi,david
>
> In dpdk 22.11 rte_mbuf_core.h file, there are notes as follows:
>
> /**
>  * TCP segmentation offload. To enable this offload feature for a
>  * packet to be transmitted on hardware supporting TSO:
>  *  - set the RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>  *RTE_MBUF_F_TX_TCP_CKSUM)
>  *  - set the flag RTE_MBUF_F_TX_IPV4 or RTE_MBUF_F_TX_IPV6
>  *  - if it's IPv4, set the RTE_MBUF_F_TX_IP_CKSUM flag
>  *  - fill the mbuf offload information: l2_len, l3_len, l4_len, 
> tso_segsz  */
>
> And also, in testpmd example csum engine file csumonly.c, we can see  
> func process_inner_cksums partial code as follows
>
> if (tso_segsz)
> ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
> else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
> ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
> } else {
> if (info->is_tunnel)
> l4_off = info->outer_l2_len + 
> info->outer_l3_len +
>  info->l2_len + info->l3_len;
> else
> l4_off = info->l2_len + info->l3_len;
> tcp_hdr->cksum = 0;
> tcp_hdr->cksum =
> get_udptcp_checksum(m, l3_hdr, l4_off,
> info->ethertype);
> }
>
> We can see when RTE_MBUF_F_TX_TCP_SEG is set, and 
> RTE_ETH_TX_OFFLOAD_TCP_CKSUM is not set.
>
> Best regards
>
> dexia
>
>
> -邮件原件-
> 发件人: David Marchand 
> 发送时间: 2023年9月14日 19:23
> 收件人: Dexia Li 
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Mike Pattrick 
> ; Eelco Chaudron 
> 主题: Re: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug
>
> Hello,
>
> 

Re: [ovs-dev] [PATCH ovn] Fix missing flows in ls_in_dhcp_options table

2023-09-19 Thread Mark Michelson

Thanks Xavier,

Acked-by: Mark Michelson 

On 9/18/23 12:46, Xavier Simonart wrote:

When ha-chassis-group is updated, causing a new hv to become master,
port move to a different chassis, and flows get created in table 
ls_in_dhcp_options.
If pb->chassis is reported empty by sb (because set to [] by the hv being the
previous owner of the port) those flows were deleted.
However, when pb->chassis is finally updated by sb (to the new hv), flows were
not re-installed, as runtime_data was not seen as updated.

This caused some random failures in test "external logical port".

Signed-off-by: Xavier Simonart 
---
  controller/binding.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/controller/binding.c b/controller/binding.c
index fd08aaafa..d48fd99aa 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1362,6 +1362,7 @@ claim_lport(const struct sbrec_port_binding *pb,
  register_claim_timestamp(pb->logical_port, now);
  sset_find_and_delete(postponed_ports, pb->logical_port);
  } else {
+update_tracked = true;
  if (!notify_up) {
  if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
  return false;


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] tests: offload scapy transformations to a separate unixctl daemon

2023-09-19 Thread Mark Michelson

Thank you Ihar and Ales.

I pushed the change to main and all branches back to 22.03.

The speedup is amazing even with the daemon restarting for each test. 
Out of curiosity, I tried to determine how we could run the scapy daemon 
globally for the entire testsuite run. In my (very brief) look at 
autotest and our makefiles, the only thing that made sense to me was to 
update the "check-local" Makefile target in tests/automake.mk to invoke 
something that can start/stop the scapy daemon. I have no idea if this 
is the "proper" way to do it though.


On 9/15/23 03:36, Ales Musil wrote:

On Thu, Sep 14, 2023 at 9:56 PM Ihar Hrachyshka  wrote:


The daemon life cycle spans over the whole test case life time, which
significantly speeds up test cases that rely on fmt_pkt for packet byte
representations.

The speed-up comes from the fact that we no longer start a python
environment with all scapy modules imported on any fmt_pkt invocation;
but only on the first call to fmt_pkt.

The daemon is not started for test cases that don't trigger fmt_pkt.
(The server is started lazily as part of fmt_pkt call.)

For example, without the daemon, all tests that use fmt_pkt took the
following on my vagrant box:

real17m23.092s
user26m27.935s
sys 5m25.486s

With the daemon, the same set of tests run in:

real2m16.741s
user2m40.155s
sys 0m47.514s

We may want to make the daemon global, so that it's invoked once per
test suite run. But I haven't figured out, yet, how to run a trap to
clean up the deamon and its socket and pid files on suite exit (and not
on test case exit.)

Signed-off-by: Ihar Hrachyshka 

unixctl impl
---
  tests/automake.mk   |  3 +-
  tests/ovn-macros.at | 16 ---
  tests/scapy-server  | 69 +
  3 files changed, 83 insertions(+), 5 deletions(-)
  create mode 100755 tests/scapy-server

diff --git a/tests/automake.mk b/tests/automake.mk
index eea0d00f4..f23ec353e 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -310,7 +310,8 @@ CHECK_PYFILES = \
 tests/test-l7.py \
 tests/uuidfilt.py \
 tests/test-tcp-rst.py \
-   tests/check_acl_log.py
+   tests/check_acl_log.py \
+   tests/scapy-server

  EXTRA_DIST += $(CHECK_PYFILES)
  PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 13d5dc3d4..b1b2a8156 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -833,10 +833,18 @@ ovn_trace_client() {
  # ovs-appctl netdev-dummy/receive $vif $packet
  #
  fmt_pkt() {
-echo "from scapy.all import *; \
-  import binascii; \
-  out = binascii.hexlify(raw($1)); \
-  print(out.decode())" | $PYTHON3
+ctlfile=$ovs_base/scapy.ctl
+if [[ ! -e $ctlfile ]]; then
+start_scapy_server
+fi
+echo $(ovs-appctl -t $ctlfile payload "$1")
+}
+
+start_scapy_server() {
+pidfile=$ovs_base/scapy.pid
+ctlfile=$ovs_base/scapy.ctl
+"$top_srcdir"/tests/scapy-server --pidfile=$pidfile
--unixctl=$ctlfile --detach
+on_exit "test -e \"$pidfile\" && ovs-appctl -t $ctlfile exit"
  }

  sleep_sb() {
diff --git a/tests/scapy-server b/tests/scapy-server
new file mode 100755
index 0..a7255c84d
--- /dev/null
+++ b/tests/scapy-server
@@ -0,0 +1,69 @@
+#!/usr/bin/env python3
+
+import argparse
+
+import ovs.daemon
+import ovs.unixctl
+import ovs.unixctl.server
+
+import binascii
+from scapy.all import *  # noqa: F401,F403
+from scapy.all import raw
+
+
+vlog = ovs.vlog.Vlog("scapy-server")
+exiting = False
+
+
+def exit(conn, argv, aux):
+global exiting
+
+exiting = True
+conn.reply(None)
+
+
+def process(data):
+try:
+data = data.replace('\n', '')
+return binascii.hexlify(raw(eval(data))).decode()
+except Exception:
+return ""
+
+
+def payload(conn, argv, aux):
+conn.reply(process(argv[0]))
+
+
+def main():
+parser = argparse.ArgumentParser(
+description="Scapy-based Frame Payload Generator")
+parser.add_argument("--unixctl", help="UNIXCTL socket location or
'none'.")
+
+ovs.daemon.add_args(parser)
+ovs.vlog.add_args(parser)
+args = parser.parse_args()
+ovs.daemon.handle_args(args)
+ovs.vlog.handle_args(args)
+
+ovs.daemon.daemonize_start()
+error, server = ovs.unixctl.server.UnixctlServer.create(args.unixctl)
+if error:
+ovs.util.ovs_fatal(error, "could not create unixctl server at %s"
+   % args.unixctl, vlog)
+
+ovs.unixctl.command_register("exit", "", 0, 0, exit, None)
+ovs.unixctl.command_register("payload", "", 1, 1, payload, None)
+ovs.daemon.daemonize_complete()
+
+poller = ovs.poller.Poller()
+while not exiting:
+server.run()
+server.wait(poller)
+if exiting:
+poller.immediate_wake()
+poller.block()
+server.close()
+
+
+if __name__ == '__main__':
+main()
--
2.38.1

___

Re: [ovs-dev] [PATCH ovn] pinctrl: Reply with correct destination for ICMPv6 RA packets

2023-09-19 Thread Mark Michelson

Thanks Ales,

Acked-by: Mark Michelson 

On 9/15/23 08:15, Ales Musil wrote:

When client sends RS we reply with RA to the source address,
however the client is allowed to send RS with unspecified
source address ("::"). In that case we would send the RA to
that empty address which is not correct.

According to RFC 2461 [0] the server is allowed to reply to the
source address directly only if the address isn't unspecified:

A router MAY choose to unicast the
response directly to the soliciting host's address (if the
solicitation's source address is not the unspecified address)

Make sure we change the source for all noes address when it
is unspecified.

[0] https://www.ietf.org/rfc/rfc2461.txt
Reported-at: https://issues.redhat.com/browse/FDP-43
Signed-off-by: Ales Musil 
---
  controller/pinctrl.c |   6 ++
  tests/ovn.at | 230 +--
  2 files changed, 116 insertions(+), 120 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index ff5a3444c..3c1cecfde 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -6226,6 +6226,12 @@ pinctrl_handle_put_nd_ra_opts(
  
  /* Set the IPv6 payload length and calculate the ICMPv6 checksum. */

  struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(&pkt_out);
+
+/* Set the source to "ff02::1" if the original source is "::". */
+if (!memcmp(&nh->ip6_src, &in6addr_any, sizeof in6addr_any)) {
+memcpy(&nh->ip6_src, &in6addr_all_hosts, sizeof in6addr_all_hosts);
+}
+
  nh->ip6_plen = htons(userdata->size);
  struct ovs_ra_msg *ra = dp_packet_l4(&pkt_out);
  ra->icmph.icmp6_cksum = 0;
diff --git a/tests/ovn.at b/tests/ovn.at
index e127530f6..34c3eb06f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -12643,76 +12643,122 @@ ovs-vsctl -- add-port br-int hv1-vif3 -- \
  wait_for_ports_up
  check ovn-nbctl --wait=hv sync
  
+n_resume=1

+
  # Make sure that ovn-controller has installed the corresponding OF Flow.
  OVS_WAIT_UNTIL([test 1 = `as hv1 ovs-ofctl dump-flows br-int | grep -c 
"ipv6_dst=ff02::2,nw_ttl=255,icmp_type=133,icmp_code=0"`])
  
  # This shell function sends a Router Solicitation packet.

-# test_ipv6_ra INPORT SRC_MAC SRC_LLA ADDR_MODE MTU RA_PREFIX_OPT RDNSS DNSSL 
ROUTE_INFO
+# test_ipv6_ra INPORT SRC_MAC SRC_LLA RA_OPT MTU PREFIX RDNSS DNSSL ROUTES
  test_ipv6_ra() {
-local inport=$1 src_mac=$2 src_lla=$3 addr_mode=$4 mtu=$5 prefix_opt=$6
-local rdnss=$7 dnssl=$8 route_info=$9
-local 
request=0002${src_mac}86dd60103aff${src_lla}ff02000285000efc0101${src_mac}
+local inport=$1 src_mac=$2 src_ip=$3 ra=$4 mtu=$5 prefix=$6
+local rdnss=$7 dnssl=$8 routes=$9
  
-local len=24

-local mtu_opt=""
-if test $mtu != 0; then
-len=`expr $len + 8`
-mtu_opt=0501${mtu}
+local request=$(fmt_pkt "Ether(dst='33:33:00:00:00:02', src='${src_mac}')/ 
\
+ IPv6(dst='ff02::2', src='${src_ip}')/ \
+ ICMPv6ND_RS()/ \
+ ICMPv6NDOptSrcLLAddr(lladdr='${src_mac}')")
+
+local reply_dst=$src_ip
+if test "${reply_dst}" = "::"; then
+reply_dst="ff02::1"
  fi
  
-if test ${#rdnss} != 0; then

-len=`expr $len + ${#rdnss} / 2`
+local rep_scapy="Ether(dst='${src_mac}', src='fa:16:3e:00:00:01')/ \
+ IPv6(dst='${reply_dst}', src='fe80::f816:3eff:fe00:1')/ \
+ ${ra}/ \
+ ICMPv6NDOptSrcLLAddr(lladdr='fa:16:3e:00:00:01')"
+
+if test "${mtu}" != "0"; then
+rep_scapy="${rep_scapy}/ICMPv6NDOptMTU(mtu=${mtu})"
+fi
+
+if test -n "${rdnss}"; then
+rep_scapy="${rep_scapy}/ICMPv6NDOptRDNSS(dns=[['${rdnss}']])"
  fi
  
-if test ${#dnssl} != 0; then

-len=`expr $len + ${#dnssl} / 2`
+if test -n "${dnssl}"; then
+rep_scapy="${rep_scapy}/ICMPv6NDOptDNSSL(searchlist=[['${dnssl}']])"
  fi
  
-if test ${#route_info} != 0; then

-len=`expr $len + ${#route_info} / 2`
+if test -n "${routes}"; then
+rep_scapy="${rep_scapy}/${routes}"
  fi
  
-if test ${#prefix_opt} != 0; then

-prefix_opt=${prefix_opt}fdad12345678
-len=`expr $len + ${#prefix_opt} / 2`
+if test -n "${prefix}"; then
+local a_flag=$(echo $ra | grep -vc "M=1")
+rep_scapy="${rep_scapy}/ICMPv6NDOptPrefixInfo(prefix='${prefix}', 
A=${a_flag})"
  fi
  
-len=$(printf "%x" $len)

-local lrp_mac=fa163e01
-local lrp_lla=fe80f8163efffe01
-local 
reply=${src_mac}${lrp_mac}86dd60${len}3aff${lrp_lla}${src_lla}8600ff${addr_mode}0101${lrp_mac}${mtu_opt}${rdnss}${dnssl}${route_info}${prefix_opt}
+local reply=$(fmt_pkt "${rep_scapy}")
  echo $reply >> $inport.expected
  
  as hv1 ovs-appctl netdev-dummy/receive hv1-vif${inport} $request

+
+OVS_WAI

Re: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug

2023-09-19 Thread Mike Pattrick
On Tue, Sep 19, 2023 at 7:51 AM Dexia Li  wrote:
>
> Hi, David
>
> Sorry, I am missing something too...
>
> Firstly, RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags implies 
> RTE_MBUF_F_TX_TCP_CKSUM in last letter.
> Some drivers such as ice can use RTE_MBUF_F_TX_TCP_SEG with 
> RTE_MBUF_F_TX_TCP_CKSUM but
> Some drivers such as Iavf will result in driver hang when using this two 
> flags meanwhile. Drivers like ice
> Can segment tcp packets when only RTE_MBUF_F_TX_TCP_SEG flag is set.
>
> Secondly, testpmd csum engine support example, we can see as follows 
> RTE_MBUF_F_TX_TCP_CKSUM is
> unset When RTE_MBUF_F_TX_TCP_SEG flag is set. Should ovs follow this example? 
> I suggest so.
> if (tso_segsz)
> ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
> else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
> ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
> }
>
> Thirdly, I use ovs 2.17 and dpdk 21.11 for test. In dpdk 21.11, 
> iavf_build_data_desc_cmd_offset_fields func may
> set RTE_MBUF_F_TX_TCP_SEG and RTE_MBUF_F_TX_TCP_CKSUM. Code as follows
>
> if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
> offset |= (m->l4_len >> 2) <<
>   IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> }
>
> /* Enable L4 checksum offloads */
> switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
> case RTE_MBUF_F_TX_TCP_CKSUM:
> command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
> offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> break;
>
> and dpdk 22.11 only set RTE_MBUF_F_TX_TCP_SEG truly.

This conflicts with the Mbuf lib programming guide, which includes an
example with both flags set:
https://doc.dpdk.org/guides/prog_guide/mbuf_lib.html

Other locations in the DPDK documentation suggest that both flags can
or should be set, that they're redundant, or one implies the other. I
can't find anywhere that suggests they are exclusive.


Thanks,
Mike


>
> I think, upper application like ovs should provide compatibility for various 
> drivers.
> Only setting RTE_MBUF_F_TX_TCP_SEG can support compatibility for various 
> drivers tso fuction.
>
>
>
> -邮件原件-
> 发件人: Dexia Li
> 发送时间: 2023年9月18日 17:47
> 收件人: David Marchand 
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Mike Pattrick 
> ; Eelco Chaudron 
> 主题: 答复: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug
>
> Hi,david
>
> In dpdk 22.11 rte_mbuf_core.h file, there are notes as follows:
>
> /**
>  * TCP segmentation offload. To enable this offload feature for a
>  * packet to be transmitted on hardware supporting TSO:
>  *  - set the RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>  *RTE_MBUF_F_TX_TCP_CKSUM)
>  *  - set the flag RTE_MBUF_F_TX_IPV4 or RTE_MBUF_F_TX_IPV6
>  *  - if it's IPv4, set the RTE_MBUF_F_TX_IP_CKSUM flag
>  *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz  */
>
> And also, in testpmd example csum engine file csumonly.c, we can see  func 
> process_inner_cksums partial code as follows
>
> if (tso_segsz)
> ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
> else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
> ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
> } else {
> if (info->is_tunnel)
> l4_off = info->outer_l2_len + 
> info->outer_l3_len +
>  info->l2_len + info->l3_len;
> else
> l4_off = info->l2_len + info->l3_len;
> tcp_hdr->cksum = 0;
> tcp_hdr->cksum =
> get_udptcp_checksum(m, l3_hdr, l4_off,
> info->ethertype);
> }
>
> We can see when RTE_MBUF_F_TX_TCP_SEG is set, and 
> RTE_ETH_TX_OFFLOAD_TCP_CKSUM is not set.
>
> Best regards
>
> dexia
>
>
> -邮件原件-
> 发件人: David Marchand 
> 发送时间: 2023年9月14日 19:23
> 收件人: Dexia Li 
> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Mike Pattrick 
> ; Eelco Chaudron 
> 主题: Re: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug
>
> Hello,
>
> On Wed, Sep 13, 2023 at 8:55 AM Dexia Li via dev  
> wrote:
> >
> > when userspace tso enabled, mbuf RTE_MBUF_F_TX_TCP_SEG and
> > RTE_MBUF_F_TX_TCP_CKSUM
> >
> > flag bits all positioned will result in driver hang using intel E810
> > vf
> >
> > driver iavf. As refered to dpdk csum example, RTE_MBUF_F_TX_TCP_SEG
> >
> > should only be positioned when tso is open.
> >
> > Signed-off-by: Dexia Li 
>
> Sorry, I am missing something...
>
> The mbuf API mentions that it is implied that RTE_MBUF_F_TX_TCP_CKSUM is set 
> along RTE_MBUF_F_TX_TCP_SEG.
> Meaning that an application may requ

Re: [ovs-dev] [PATCH ovn 00/15] Fixed another set of flaky Unit Tests

2023-09-19 Thread Mark Michelson

Thanks for the patch set Xavier.

With the exception of patch 13:

Acked-by: Mark Michelson 

I'm acking the patch set, but the nature of these changes worries me. I 
could easily see another patch series like this needed in the future 
because it's too easy to write flaky tests. Here are some high-level 
ideas I have based on changes I see in these patches.


1) From patches 12 and 15: Should we change ovn_start() so that by 
default it does not start a backup northd process? I suspect the number 
of tests that rely on a backup northd process are very small.


2) From patch 5 and possibly patch 14: Should we add a new --wait option 
to ovn-nbctl (e.g. "--wait=ovs") that waits for OVS to provide a 
confirmation that flows from this ovn-nbctl call have been installed?


3) From patch 4: Would it be possible to provide a "blocking" version of 
`ovn-appctl -t ovn-controller exit` ? In other words, could we make a 
version that will not return control to the shell until the process has 
exited? Barring that, we could write a macro for ovn-macros.at that will 
call `ovn-appctl -t ovn-controller exit` and then block until the 
process has exited. We could then always call that macro when we want to 
shut ovn-controller down.


On 9/18/23 12:46, Xavier Simonart wrote:

Xavier Simonart (15):
   tests: fixed multiple ovn-ic tests
   tests: fixed "Logical router policy packet marking"
   tests: fixed "options:requested-chassis for logical port"
   tests: tests fixed "Encaps tunnel cleanup ..." and "ovn-controller
 exit"
   tests: wait for all flows to be installed before sending packets
   tests: fixed multiple tests not  properly waiting for packets to
 be received
   tests: fixed multiple tests missing ovn-nbctl "wait"
   tests: fixed "L2 Drop and Allow ACL w/ Stateful ACL"
   tests: skip test "MAC binding aging" if scapy not available.
   tests: move trim_zeros() to ovn-macros
   tests: fixed "send gratuitous ARP for NAT rules on HA distributed
 router"
   tests: fixed "SCTP Load balancer health checks"a
   tests: fixed "LSP incremental processing"
   tests: fixed "ipsec -- basic configuration"
   tests: do not start northd-backup for northd tests querying northd

  tests/ovn-ic.at |  14 +-
  tests/ovn-ipsec.at  |   4 +-
  tests/ovn-macros.at |   4 +
  tests/ovn-northd.at |  27 ++-
  tests/ovn.at| 484 
  5 files changed, 207 insertions(+), 326 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 13/15] tests: fixed "LSP incremental processing"

2023-09-19 Thread Mark Michelson

Of the 15 patches in this set, this is the only one that I disagree with.

The problem with this change is that the test writer likely has a 
specific amount of recomputes they expect to see based on the changes 
that are applied. With the "-le" change, it means that the test could 
pass in a situation that should probably actually fail.


Here are a couple of modification ideas:

1) Instead of providing a single number for the expected recompute 
count, provide a range instead. The test currently does 
`check_recompute_counter 5 5 5` , but it could be changed to something 
like `check_recompute_counter 4 5 4 5 4 5`, giving a minimum and maximum 
number of recomputes.


2) Provide an optional "tolerance" parameter. The test currently does 
`check_recompute_counter 5 5 5` but it could be something like 
`check_recompute_counter 5 5 5 -1` to indicate that the number could 
actually be 5 - 1 and we'd still consider that a pass.


What do you think?

On 9/18/23 12:47, Xavier Simonart wrote:

We might get less recomputes than expected: e.g. Port_Binding->chassis and
Port_Binding->up might be received by northd within the same idl transaction.

Signed-off-by: Xavier Simonart 
---
  tests/ovn-northd.at | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index f51e29b11..abb425a8c 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -10010,13 +10010,13 @@ ovn_attach n1 br-phys 192.168.0.11
  
  check_recompute_counter() {

  northd_recomp=$(as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats 
northd recompute)
-AT_CHECK([test x$northd_recomp = x$1])
+AT_CHECK([test $northd_recomp -le $1])
  
  lflow_recomp=$(as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats lflow recompute)

-AT_CHECK([test x$lflow_recomp = x$2])
+AT_CHECK([test $lflow_recomp -le $2])
  
  sync_sb_pb_recomp=$(as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats sync_to_sb_pb recompute)

-AT_CHECK([test x$sync_sb_pb_recomp = x$3])
+AT_CHECK([test $sync_sb_pb_recomp -le $3])
  }
  
  check ovn-nbctl --wait=hv ls-add ls0


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 4/4] netdev-dpdk: Add support for ingress packet-per-second policing.

2023-09-19 Thread Eelco Chaudron

On 26 Aug 2023, at 8:01, mit...@outlook.com wrote:


From: Lin Huang 

OvS has supported packet-per-second policer which can be set at 
ingress

and egress side in kernel datapath. But the userspace datapath dosen't
support for ingress and egress packet-per-second policing now.

So, this patch add support for userspace ingress pps policing by using
native ovs token bucket library. Token bucket is accumulated by 'rate'
tokens per millisecond and store maxiumim tokens at 'burst' bucket 
size.

One token in the bucket means one packet (1 kpkts * millisecond) which
will drop or pass by policer.

This patch reuse 'ingress_policing_kpkts_rate' and
'ingress_policing_kpkts_burst' options at interface table. Now 
userspace
ingress policer supports setting packet-per-second limits in addition 
to

the previously configurable byte rate settings.

Examples:
$ ovs-vsctl set interface dpdk0 ingress_policing_rate=12300
$ ovs-vsctl set interface dpdk0 ingress_policing_burst=12300
$ ovs-vsctl set interface dpdk0 ingress_policing_kpkts_rate=123
$ ovs-vsctl set interface dpdk0 ingress_policing_kpkts_burst=123

Add some unit tests for ingress packet-per-second policing.

Signed-off-by: Lin Huang 
---
 Documentation/topics/dpdk/qos.rst |  21 ++
 NEWS  |   2 +
 lib/netdev-dpdk.c |  91 +++-
 tests/system-dpdk.at  | 337 
++

 4 files changed, 443 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/qos.rst 
b/Documentation/topics/dpdk/qos.rst

index 37f482cc5..3c5d14e4c 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -120,6 +120,9 @@ Refer to ``vswitch.xml`` for more details on 
egress policer.

 Rate Limiting (Ingress Policing)
 

+Bytes Per Second Policer
+
+
 Assuming you have a :doc:`vhost-user port ` receiving 
traffic
 consisting of packets of size 64 bytes, the following command would 
limit the

 reception rate of the port to ~1,000,000 packets per second::
@@ -135,6 +138,24 @@ To clear the ingress policer configuration from 
the port::


 $ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0

+Packets Per Second Policer
+~~
+
+Assuming you have a :doc:`vhost-user port ` receiving 
traffic,
+the following command would limit the reception rate of the port to 
~1,000,000

+packets per second::
+
+$ ovs-vsctl set interface dpdk0 ingress_policing_kpkts_rate=1000 
\

+ingress_policing_kpkts_burst=1000`
+
+To examine the ingress policer configuration of the port::
+
+$ ovs-vsctl list interface vhost-user0
+
+To clear the ingress policer configuration from the port::
+
+$ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0


Cut/paste error should be ingress_policing_kpkts_rate=0


+
 Refer to ``vswitch.xml`` for more details on ingress policer.

 Flow Control
diff --git a/NEWS b/NEWS
index 430c3daaf..39195768a 100644
--- a/NEWS
+++ b/NEWS
@@ -64,6 +64,8 @@ v3.2.0 - 17 Aug 2023
max sleep configuration of PMD thread cores.
  * Removed experimental tag from PMD load based sleeping.
  * Added new Qos type 'pkts-policer' to support kilo 
packet-per-second policing.
+ * Added support for ingress kilo packet-per-second policing 
configured by

+   ingress_policing_kpkts_rate/burst options.
- Linux TC offload:
  * Add support for offloading VXLAN tunnels with the GBP 
extensions.

- Python
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c6a26dc7e..06bac720a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -406,10 +406,17 @@ struct dpdk_tx_queue {
 );
 };

+enum policer_type {
+POLICER_BPS   = 1 << 0,   /* Rate value in bytes/sec. */
+POLICER_PKTPS = 1 << 1,   /* Rate value in packet/sec. */
+};
+
 struct ingress_policer {
 struct rte_meter_srtcm_params app_srtcm_params;
 struct rte_meter_srtcm in_policer;
 struct rte_meter_srtcm_profile in_prof;
+struct token_bucket tb;
+enum policer_type type;
 rte_spinlock_t policer_lock;
 };

@@ -516,6 +523,9 @@ struct netdev_dpdk {
 uint32_t policer_rate;
 uint32_t policer_burst;

+uint32_t policer_kpkts_rate;
+uint32_t policer_kpkts_burst;
+
 /* Array of vhost rxq states, see vring_state_changed. */
 bool *vhost_rxq_enabled;

@@ -615,6 +625,14 @@ is_dpdk_class(const struct netdev_class *class)
|| class->destruct == netdev_dpdk_vhost_destruct;
 }

+static int
+kpkts_policer_run_single_packet(struct token_bucket *tb, struct 
rte_mbuf **pkts,


For function (forward) declatation we leave out the name for structures. 
So this would become (same below):


  kpkts_policer_run_single_packet(struct token_bucket *, struct 
rte_mbuf **,



+int pkt_cnt, bool should_steal);
+
+static int
+kpkts_policer_profile_config(struct token_bucket *tb,
+   

Re: [ovs-dev] [PATCH v10 3/4] netdev-dpdk: Add support for egress packet-per-second policing.

2023-09-19 Thread Eelco Chaudron

On 26 Aug 2023, at 8:01, mit...@outlook.com wrote:


From: Lin Huang 

OvS has supported packet-per-second policer which can be set at 
ingress

and egress side in kernel datapath. But the userspace datapath doesn't
support for ingress and egress packet-per-second policing now.

So, this patch add support for userspace egress pps policing by using
native ovs token bucket library. Token bucket is accumulated by 'rate'
tokens per millisecond and store maximum tokens at 'burst' bucket 
size.

One token in the bucket means one packet (1 kpkts * millisecond) which
will drop or pass by policer.

This patch add a new egress Qos type called 'kpkts-policer'.
the policer police the kilo-packet per second at which the token 
bucket

be updated by 'kpkts_rate'. and the policer's burst size is defined by
'kpkts_burst'.

Examples:
$ovs-vsctl set port vhost-user0 qos=@newqos --
   --id=@newqos create qos type=kpkts-policer \
   other-config:kpkts_rate=123 other-config:kpkts_burst=123


I think the name of the policer is odd, it should reflect the actual 
policer type.

Maybe it should be called pps-policer?

Also, the parameters should be more inline with what the are. No need to 
add kpkts in front of it. Other egress policers use things like cir, 
max-rate, etc., without specifying the unit.


So you example would become:

 $ovs-vsctl set port vhost-user0 qos=@newqos --
--id=@newqos create qos type=pps-policer \
other-config:rate=123 other-config:burst=123



Add some unit tests for egress packet-per-second policing.

Signed-off-by: Lin Huang 
---
 Documentation/topics/dpdk/qos.rst |  21 +++
 NEWS  |   1 +
 lib/netdev-dpdk.c | 159 +++
 tests/system-dpdk.at  | 255 
++

 vswitchd/vswitch.xml  |  32 
 5 files changed, 468 insertions(+)

diff --git a/Documentation/topics/dpdk/qos.rst 
b/Documentation/topics/dpdk/qos.rst

index a98ec672f..37f482cc5 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -36,6 +36,9 @@ QoS (Egress Policing)
 Single Queue Policer
 

+Bytes Per Second Policer
+
+
 Assuming you have a :doc:`vhost-user port ` transmitting 
traffic
 consisting of packets of size 64 bytes, the following command would 
limit the
 egress transmission rate of the port to ~1,000,000 packets per 
second::
@@ -52,6 +55,24 @@ To clear the QoS configuration from the port and 
ovsdb, run::


 $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos

+Packets Per Second Policer
+++
+
+Assuming you have a :doc:`vhost-user port ` transmitting 
traffic,
+the following command would limit the egress transmission rate of the 
port to

+~1,000,000 packets per second::
+
+   ovs-vsctl set port vhost-user0 qos=@newqos -- \
+  --id=@newqos create qos type=kpkts-policer \
+  other-config:kpkts_rate=1000 other-config:kpkts_burst=1000


I will only add this once, but if we move to pps-policer rather than 
kpkts-policer we have to change the naming in the documentation and 
troughout the code to make it consitant.


This sounds obvious, but there are some confusing pieces of code in OVS 
due to last minute naming change of the commands :)



+To examine the QoS configuration of the port, run::
+
+$ ovs-appctl -t ovs-vswitchd qos/show vhost-user0
+
+To clear the QoS configuration from the port and ovsdb, run::
+
+$ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos

 Multi Queue Policer
 ~~~
diff --git a/NEWS b/NEWS
index b582bdbbc..430c3daaf 100644
--- a/NEWS
+++ b/NEWS
@@ -63,6 +63,7 @@ v3.2.0 - 17 Aug 2023
  * 'ovs-appctl dpif-netdev/pmd-sleep-show' command was added to 
get the

max sleep configuration of PMD thread cores.
  * Removed experimental tag from PMD load based sleeping.
+ * Added new Qos type 'pkts-policer' to support kilo 
packet-per-second policing.

- Linux TC offload:
  * Add support for offloading VXLAN tunnels with the GBP 
extensions.

- Python
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bc8204f7e..c6a26dc7e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -19,6 +19,7 @@

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -59,6 +60,7 @@
 #include "openvswitch/ofp-parse.h"
 #include "openvswitch/ofp-print.h"
 #include "openvswitch/shash.h"
+#include "openvswitch/token-bucket.h"
 #include "openvswitch/vlog.h"
 #include "ovs-numa.h"
 #include "ovs-rcu.h"
@@ -91,6 +93,8 @@ static bool per_port_memory = false; /* Status of 
per port memory support */

 #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
 #define OVS_VPORT_DPDK "ovs_dpdk"

+#define MAX_KPKTS_PARAMETER 4294967U  /* UINT32_MAX / 1000 */


If UINT32_MAX / 1000 is what you need, why no just define it that way?

  #define MAX_KPKTS_PARAMETER UINT32_MAX / 1000


+
 /*
  * nee

Re: [ovs-dev] [PATCH v10 2/4] netdev-dpdk: Make srtcm_policer to free pkts by bulk.

2023-09-19 Thread Eelco Chaudron

On 26 Aug 2023, at 8:01, mit...@outlook.com wrote:


From: Lin Huang 

Currently srtcm_policer free packet one by one, if packets are exceed 
rate limit.
That is a inefficient way to free memory which we have to call 
rte_pktmbuf_free()

pkt_cnt times.

To improve this, we can use rte_pktmbuf_free_bulk() function to free 
arrays of

mbufs instead of freeing packets one by one.

So, This patch change srtcm_policer to free pkts by bulk using 
rte_pktmbuf_free_bulk().

It gives us a slightly performance improves.


Hi Lin,

This patch looks good to me, however one small nit.

//Eelco


Signed-off-by: Lin Huang 
---
 lib/netdev-dpdk.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8f1361e21..bc8204f7e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2542,13 +2542,13 @@ srtcm_policer_run_single_packet(struct 
rte_meter_srtcm *meter,

 struct rte_mbuf **pkts, int pkt_cnt,
 bool should_steal)
 {
-int i = 0;
-int cnt = 0;
-struct rte_mbuf *pkt = NULL;
+struct rte_mbuf *should_steal_batch[NETDEV_MAX_BURST];
 uint64_t current_time = rte_rdtsc();
+int i = 0, n = 0;
+int cnt = 0;

 for (i = 0; i < pkt_cnt; i++) {
-pkt = pkts[i];
+struct rte_mbuf *pkt = pkts[i];
 /* Handle current packet */
 if (netdev_dpdk_srtcm_policer_pkt_handle(meter, profile,
  pkt, current_time)) 
{
@@ -2556,13 +2556,15 @@ srtcm_policer_run_single_packet(struct 
rte_meter_srtcm *meter,

 pkts[cnt] = pkt;
 }
 cnt++;
-} else {
-if (should_steal) {
-rte_pktmbuf_free(pkt);
-}
+} else if (should_steal) {
+should_steal_batch[n++] = pkt;


Here, I would keep the nested else if as is, as the "else if" is mostly 
used for related/continued "if else if" statements. Here 
netdev_dpdk_srtcm_policer_pkt_handle() and should_steal are not really a 
continuation.



 }
 }

+if (n) {
+rte_pktmbuf_free_bulk(should_steal_batch, n);
+}
+
 return cnt;
 }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/4] token-bucket: Make token-bucket timestamp updated by caller.

2023-09-19 Thread Eelco Chaudron

On 26 Aug 2023, at 8:01, mit...@outlook.com wrote:


From: Lin Huang 

Now, token-bucket 'last_fill' is updated by token_bucket_withdraw() 
itself.

Add a new function parameter 'now' to update timestamp by caller.

Signed-off-by: Lin Huang 


Thanks Lin for following up, and sorry for the late review.

Two small comments inline.

Cheers,

Eelco


---
 include/openvswitch/token-bucket.h |  3 ++-
 lib/token-bucket.c |  4 ++--
 lib/vlog.c | 17 ++---
 ofproto/pinsched.c |  2 +-
 4 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/openvswitch/token-bucket.h 
b/include/openvswitch/token-bucket.h

index 580747f61..d1191e956 100644
--- a/include/openvswitch/token-bucket.h
+++ b/include/openvswitch/token-bucket.h
@@ -40,7 +40,8 @@ void token_bucket_init(struct token_bucket *,
unsigned int rate, unsigned int burst);
 void token_bucket_set(struct token_bucket *,
unsigned int rate, unsigned int burst);
-bool token_bucket_withdraw(struct token_bucket *, unsigned int n);
+bool token_bucket_withdraw(struct token_bucket *tb, unsigned int n,


No need to add *tb, just leave it as *.


+   long long int now);
 void token_bucket_wait_at(struct token_bucket *, unsigned int n,
   const char *where);
 #define token_bucket_wait(bucket, n)\
diff --git a/lib/token-bucket.c b/lib/token-bucket.c
index 0badeb46b..60eb26e53 100644
--- a/lib/token-bucket.c
+++ b/lib/token-bucket.c
@@ -59,10 +59,10 @@ token_bucket_set(struct token_bucket *tb,
  * if 'tb' contained fewer than 'n' tokens (and thus 'n' tokens could 
not be

  * removed) . */
 bool
-token_bucket_withdraw(struct token_bucket *tb, unsigned int n)
+token_bucket_withdraw(struct token_bucket *tb, unsigned int n,
+  long long int now)
 {
 if (tb->tokens < n) {
-long long int now = time_msec();
 if (now > tb->last_fill) {
 unsigned long long int elapsed_ull
 = (unsigned long long int) now - tb->last_fill;
diff --git a/lib/vlog.c b/lib/vlog.c
index b2653142f..7a46f6eb7 100644
--- a/lib/vlog.c
+++ b/lib/vlog.c
@@ -1312,6 +1312,9 @@ bool
 vlog_should_drop(const struct vlog_module *module, enum vlog_level 
level,

  struct vlog_rate_limit *rl)
 {
+long long int now;
+time_t now_sec;
+
 if (!module->honor_rate_limits) {
 return false;
 }
@@ -1321,12 +1324,13 @@ vlog_should_drop(const struct vlog_module 
*module, enum vlog_level level,

 }

 ovs_mutex_lock(&rl->mutex);
-if (!token_bucket_withdraw(&rl->token_bucket, VLOG_MSG_TOKENS)) {
-time_t now = time_now();
+now = time_msec();
+now_sec = now / 1000;


Can we move this above the mutex lock, especially the time_now(), which 
might result in a syscall?


+if (!token_bucket_withdraw(&rl->token_bucket, VLOG_MSG_TOKENS, 
now)) {

 if (!rl->n_dropped) {
-rl->first_dropped = now;
+rl->first_dropped = now_sec;
 }
-rl->last_dropped = now;
+rl->last_dropped = now_sec;
 rl->n_dropped++;
 ovs_mutex_unlock(&rl->mutex);
 return true;
@@ -1335,10 +1339,9 @@ vlog_should_drop(const struct vlog_module 
*module, enum vlog_level level,

 if (!rl->n_dropped) {
 ovs_mutex_unlock(&rl->mutex);
 } else {
-time_t now = time_now();
 unsigned int n_dropped = rl->n_dropped;
-unsigned int first_dropped_elapsed = now - rl->first_dropped;
-unsigned int last_dropped_elapsed = now - rl->last_dropped;
+unsigned int first_dropped_elapsed = now_sec - 
rl->first_dropped;
+unsigned int last_dropped_elapsed = now_sec - 
rl->last_dropped;

 rl->n_dropped = 0;
 ovs_mutex_unlock(&rl->mutex);

diff --git a/ofproto/pinsched.c b/ofproto/pinsched.c
index 59561f076..a39e4d2ee 100644
--- a/ofproto/pinsched.c
+++ b/ofproto/pinsched.c
@@ -184,7 +184,7 @@ get_tx_packet(struct pinsched *ps)
 static bool
 get_token(struct pinsched *ps)
 {
-return token_bucket_withdraw(&ps->token_bucket, 1000);
+return token_bucket_withdraw(&ps->token_bucket, 1000, 
time_msec());

 }

 void
--
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-21.12 1/2] Split out code to handle port binding db updates

2023-09-19 Thread Dumitru Ceara
On 9/12/23 11:53, Ales Musil wrote:
> From: Ihar Hrachyshka 
> 
> This function will later be used to handle port binding updates for
> postponed (throttled) bindings.
> 
> Conflicts:
>   controller/binding.c
> 
> Signed-off-by: Ihar Hrachyshka 
> Acked-by: Mark Michelson 
> Signed-off-by: Numan Siddique 
> (cherry picked from commit 3103487e087b27b1b3577afba016403fd1ac3093)
> (cherry picked from commit 79cba201419cfbfa00876404dc9021b6d135f6da)
> Signed-off-by: Mark Michelson 
> (cherry picked from commit 3af319c8ef5e732f4d1159ca23c276e0846b9003)
> ---

Thanks, Ales!  Applied to 21.12.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Remove hosting-chassis only if it's specified

2023-09-19 Thread Dumitru Ceara
On 9/18/23 14:48, Ales Musil wrote:
> To avoid any warning spam in the northd.log remove
> the "hosting-chassis" status only if it was previously
> specified.
> 
> Fixes: 19164b030404 ("Expose distributed gateway port information in NB DB")
> Reported-at: https://issues.redhat.com/browse/FDP-54
> Signed-off-by: Ales Musil 
> ---

Thanks, Ales!

I made a small cosmetic change and applied the patch to main and 23.09.

diff --git a/northd/northd.c b/northd/northd.c
index a3da30189b..2a1f23fb94 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -18067,9 +18067,7 @@ handle_cr_port_binding_changes(const struct 
sbrec_port_binding *sb,
 nbrec_logical_router_port_update_status_setkey(nbrec_lrp,
"hosting-chassis",
sb->chassis->name);
-}
-
-if (!sb->chassis && smap_get(&nbrec_lrp->status, "hosting-chassis")) {
+} else if (smap_get(&nbrec_lrp->status, "hosting-chassis")) {
 nbrec_logical_router_port_update_status_delkey(nbrec_lrp,
"hosting-chassis");
 }
--

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-21.12 2/2] controller: throttle port claim attempts

2023-09-19 Thread Dumitru Ceara
On 9/12/23 11:53, Ales Musil wrote:
> From: Ihar Hrachyshka 
> 
> When multiple chassis are fighting for the same port (requested-chassis
> is not set, e.g. for gateway ports), they may produce an unreasonable
> number of chassis field updates in a very short time frame (hundreds of
> updates in several seconds). This puts unnecessary load on OVN as well
> as any db notification consumers trying to keep up with the barrage.
> 
> This patch throttles port claim attempts so that they don't happen more
> frequently than once per 0.5 seconds.
> 
> Conflicts:
>   controller/binding.c
>   controller/binding.h
>   controller/ovn-controller.c
> 
> Reported: https://bugzilla.redhat.com/show_bug.cgi?id=1974898
> Signed-off-by: Ihar Hrachyshka 
> Acked-by: Mark Michelson 
> Signed-off-by: Numan Siddique 
> (cherry picked from commit 4dc4bc7fdb848bcc626becbd2c80ffef8a39ff9a)
> (cherry picked from commit 887a8df4f4aa08a4a87b42f7aa684ed7e9aff9a1)
> Signed-off-by: Mark Michelson 
> (cherry picked from commit 2c98163e024f0543d84df44f9c0840ce0347e2bc)
> ---

Thanks, Ales!  Applied to 21.12.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC OVN] DHCP Proxy Agent support for overlay subnets

2023-09-19 Thread Numan Siddique
On Fri, Sep 8, 2023 at 12:04 AM Naveen Yerramneni
 wrote:
>
> Hi Team,
>
> Could someone please look into this ?
>
> Thanks,
> Naveen
>
> > On 23-Aug-2023, at 8:06 AM, Naveen Yerramneni 
> >  wrote:
> >
> > This patch contains changes to enable DHCP Proxy Agent support for overlay 
> > subnets.
> >
> > NOTE:
> > -
> >  - This patch is not complete, sending this to get the initial feedback 
> > about the approach taken.
> >
> > USE CASE:
> > --
> >  - Enable IP address assignment for overlay subnets from the centralized 
> > DHCP server present in the underlay network.
> >
> >
> > PREREQUISITES
> > --
> >  - Logical Router Port IP should be assigned (statically) from the same 
> > overlay subnet which is managed by DHCP server.
> >  - LRP IP is used for GIADRR field when proxying the DHCP packets and also 
> > same IP needs to be configured as default gateway for the overlay subnet.
> >  - Overlay subnets managed by external DHCP server are expected to be 
> > directly reachable from the underlay network.
> >
> > EXPECTED PACKET FLOW:
> > --
> > Following is the expected packet flow inorder to support DHCP Proxy 
> > functionality in OVN.
> >  1. VM originates DHCP discovery (broadcast).
> >  2. DHCP Proxy (running on the OVN) receives the broadcast and forwards the 
> > packet to the DHCP server by converting it to unicast. While forwarding the 
> > packet, it updates the GIADDR in DHCP header to its
> > interface IP on which DHCP packet is received.
> >  3. DHCP server uses GIADDR field to decide the IP address pool from which 
> > IP has to be assigned and DHCP offer is sent to the same IP (GIADDR).
> >  4. DHCP proxy resets the GIADDR field when forwarding the offer to the 
> > client and it also replaces DHCP server ID option (54) with its IP address.
> >  5. VM/Host sends DHCP request (broadcast) packet.
> >  6. DHCP Proxy (running on the OVN) receives the broadcast and forwards the 
> > packet to the DHCP server by converting it to unicast. While forwarding the 
> > packet, it updates the GIADDR in DHCP header to its
> > interface IP on which DHCP packet is received and also replaces the 
> > DHCP server ID option (54) with DHCP server IP address.
> >  7. DHCP Server sends the ACK packet.
> >  8. DHCP Proxy resets the GIADDR field when forwarding the ACK to the 
> > client and it also replaces DHCP server ID option (54) with its IP address.
> >  9. By setting DHCP server ID option (54) to its IP address, DHCP Proxy 
> > completely hides the DHCP server from the end DHCP clients and all the 
> > future renew/release packets come to DHCP proxy.
> >
> > OVN DHCP PROXY PACKET FLOW:
> > 
> > To add DHCP Proxy support on OVN, we need to replicate all the behavior 
> > described above using distributed logical switch and logical router.
> > At a highlevel, packet flow is distributed among Logical Switch and Logical 
> > Router on source node (where VM is deployed) and redirect chassis(RC) node.
> >  1. Request packet gets processed on the source node. Required info is 
> > extracted from the DHCP payload and stored in the in-memory hash map (key: 
> > MAC + DHCP_xid) of the OVN controller.
> >  2. Response packet is first processed on RC node (which first receives the 
> > packet from underlay network). RC node forwards the packet to the right 
> > node by filling in the dest MAC and IP.
> >  3. Response packet is processed on the source node which originated the 
> > DHCP request. Hash map lookup is done to find the associative DHCP session 
> > and stateful checks are performed to validate the response.
> >
> > OVN Packet flow with DHCP Proxy is explained below.
> >  1. VM sends the DHCP discover packet (broadcast).
> >  2. Logical switch converts the packet to L2 unicast by setting the 
> > destination MAC to LRP's MAC
> >  3. Logical Router receives the packet and redirects it to the OVN 
> > controller.
> >  4. OVN controller updates the required information in the DHCP payload 
> > (GIADDR, SERVER ID) after doing the required checks. It stores required 
> > information from the packet to do some stateful checks on
> > response packets and reinjects the packet back to the datapath provided 
> > if all sanity checks are passed.
> >  5. Logical Router converts the packet to L3 unicast and forwards it to the 
> > server. This packet gets routed like any other packet (via RC node).
> >  6. Server replies with DHCP offer.
> >  7. RC node processes the DHCP offer and forwards it to the OVN controller.
> >  8. OVN controller updates the destination MAC (available in DHCP header) 
> > and destination IP (available in DHCP header) and reinjects the packet to 
> > datapath.
> >  9. Logical router outputs  the packet to the logical switch.
> >  10.Logical switch processes the packet and redirects it to the OVN 
> > controller.
> >  11.OVN controller does the required checks on the packet, updates DHCP 
> > payload (GIADDR, Server 

Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Robin Jarry
Ilya Maximets, Sep 19, 2023 at 13:47:
> With flexibility of appctl comes absolutely no guarantee for API
> stability.  But as soon as we have structured output, someone will
> expect it.  If we can agree that users cannot rely on the structure
> of that structured output, then it's fine.  Otherwise, OVSDB with
> its defined schema is a much better choice, IMO.  Constructing a
> single 'select' transaction for OVSDB isn't really much more
> difficult than constructing an appctl JSON-PRC request.

I would argue that the ovsdb schema could also be modified so I guess
this comes up to deciding whether API can be broken or not.

However, going through ovsdb simply as an API proxy to query live stats
to ovs-vswitchd seems complex and not resource efficient. Especially if
the appctl socket is already available and allows to reach vswitchd
directly.

I think that for statistics, it would make more sense to go with the
lightweight option.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH] netdev-dpdk: fix tso bug

2023-09-19 Thread Dexia Li via dev
Hi, David

Sorry, I am missing something too...

Firstly, RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags implies 
RTE_MBUF_F_TX_TCP_CKSUM in last letter. 
Some drivers such as ice can use RTE_MBUF_F_TX_TCP_SEG with 
RTE_MBUF_F_TX_TCP_CKSUM but
Some drivers such as Iavf will result in driver hang when using this two flags 
meanwhile. Drivers like ice 
Can segment tcp packets when only RTE_MBUF_F_TX_TCP_SEG flag is set.

Secondly, testpmd csum engine support example, we can see as follows 
RTE_MBUF_F_TX_TCP_CKSUM is 
unset When RTE_MBUF_F_TX_TCP_SEG flag is set. Should ovs follow this example? I 
suggest so.
if (tso_segsz)
ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
}

Thirdly, I use ovs 2.17 and dpdk 21.11 for test. In dpdk 21.11, 
iavf_build_data_desc_cmd_offset_fields func may 
set RTE_MBUF_F_TX_TCP_SEG and RTE_MBUF_F_TX_TCP_CKSUM. Code as follows

if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
offset |= (m->l4_len >> 2) <<
  IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
}

/* Enable L4 checksum offloads */
switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
case RTE_MBUF_F_TX_TCP_CKSUM:
command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
break;

and dpdk 22.11 only set RTE_MBUF_F_TX_TCP_SEG truly.

I think, upper application like ovs should provide compatibility for various 
drivers. 
Only setting RTE_MBUF_F_TX_TCP_SEG can support compatibility for various 
drivers tso fuction. 



-邮件原件-
发件人: Dexia Li 
发送时间: 2023年9月18日 17:47
收件人: David Marchand 
抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Mike Pattrick 
; Eelco Chaudron 
主题: 答复: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug

Hi,david

In dpdk 22.11 rte_mbuf_core.h file, there are notes as follows:

/**
 * TCP segmentation offload. To enable this offload feature for a
 * packet to be transmitted on hardware supporting TSO:
 *  - set the RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
 *RTE_MBUF_F_TX_TCP_CKSUM)
 *  - set the flag RTE_MBUF_F_TX_IPV4 or RTE_MBUF_F_TX_IPV6
 *  - if it's IPv4, set the RTE_MBUF_F_TX_IP_CKSUM flag
 *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz  */

And also, in testpmd example csum engine file csumonly.c, we can see  func 
process_inner_cksums partial code as follows

if (tso_segsz)
ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
} else {
if (info->is_tunnel)
l4_off = info->outer_l2_len + 
info->outer_l3_len +
 info->l2_len + info->l3_len;
else
l4_off = info->l2_len + info->l3_len;
tcp_hdr->cksum = 0;
tcp_hdr->cksum =
get_udptcp_checksum(m, l3_hdr, l4_off,
info->ethertype);
}

We can see when RTE_MBUF_F_TX_TCP_SEG is set, and RTE_ETH_TX_OFFLOAD_TCP_CKSUM 
is not set.

Best regards 

dexia


-邮件原件-
发件人: David Marchand 
发送时间: 2023年9月14日 19:23
收件人: Dexia Li 
抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Mike Pattrick 
; Eelco Chaudron 
主题: Re: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug

Hello,

On Wed, Sep 13, 2023 at 8:55 AM Dexia Li via dev  
wrote:
>
> when userspace tso enabled, mbuf RTE_MBUF_F_TX_TCP_SEG and 
> RTE_MBUF_F_TX_TCP_CKSUM
>
> flag bits all positioned will result in driver hang using intel E810 
> vf
>
> driver iavf. As refered to dpdk csum example, RTE_MBUF_F_TX_TCP_SEG
>
> should only be positioned when tso is open.
>
> Signed-off-by: Dexia Li 

Sorry, I am missing something...

The mbuf API mentions that it is implied that RTE_MBUF_F_TX_TCP_CKSUM is set 
along RTE_MBUF_F_TX_TCP_SEG.
Meaning that an application may request both RTE_MBUF_F_TX_TCP_SEG or 
RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_TCP_CKSUM.
And iirc the combination of both RTE_MBUF_F_TX_TCP_CKSUM and 
RTE_MBUF_F_TX_TCP_SEG is accepted in some dpdk drivers.


Looking at the 22.11 iavf driver, the scalar tx handler looks at tso requests 
first, prepares the descriptor and stops at this.
It won't look at tcp checksum flags.
https://git.dpdk.org/dpdk-stable/tree/drivers/net/iavf/iavf_rxtx.c?h=v22.11.3#n2623
https://git.dpdk.org/dpdk-stable/tree/drivers/net/iavf/iavf_rxtx.c?h=v22.11.3#n2637

As far as the vector tx handlers are concerned, I think they are not usable 
with TSO requests, an

Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Ilya Maximets
On 9/19/23 13:27, Eelco Chaudron wrote:
> 
> 
> On 19 Sep 2023, at 10:42, Robin Jarry wrote:
> 
>> Adrian Moreno, Sep 19, 2023 at 09:18:
 Both OVSDB and appctl are literally JSON-RPC protocols.  There is no
 need to re-invent anything.
>>>
>>> Right. Isn't appctl simpler in this case? IIUC, it would still satisfy
>>> the requirements: client decides update interval, flexible schema, etc
>>> with the benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb
>>> communication, no need to store data in both places) and probably
>>> involving less internal code change.
>>>
>>> Just to clarify: I'm referring to allowing JSON output of the (already
>>> JSON-RPC) appctl protocol.
>>
>> I agree. Using ovsdb for ephemeral stats seems weird to me. appctl and

OVSDB already contains a lot of ephemeral stats.

>> a more fluid schema/data structure would be suitable.
>>
>> What kind of API did you have in mind to structure the JSON output?
> 
> I guess we should use the appctl json API. I’m including Jakob who is going 
> to come up with a POC to see if we can add json support for the existing 
> appctl command. Probably starting with the “dpif-netdev/pmd-perf-show” 
> output, but I let him comment.

With flexibility of appctl comes absolutely no guarantee for API
stability.  But as soon as we have structured output, someone will
expect it.  If we can agree that users cannot rely on the structure
of that structured output, then it's fine.  Otherwise, OVSDB with
its defined schema is a much better choice, IMO.  Constructing a
single 'select' transaction for OVSDB isn't really much more
difficult than constructing an appctl JSON-PRC request.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] checkpatch: Add checks for the subject line.

2023-09-19 Thread Eelco Chaudron


On 19 Sep 2023, at 13:25, Simon Horman wrote:

> On Thu, Sep 14, 2023 at 03:52:50PM +0200, Eelco Chaudron wrote:
>>
>>
>> On 14 Sep 2023, at 15:44, Eelco Chaudron wrote:
>>
>>> This patch adds WARNINGs for the subject line length and the format,
>>> i.e., the sentence should start with a capital and end with a dot.
>>>
>>> Acked-by: Simon Horman 
>>> Signed-off-by: Eelco Chaudron 
>>
>> Simon, let me know if you are ok with the change, and I’ll commit it.
>
> Sorry for the delay. Yes, I am fine with this.

Thanks, applied to master.

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Eelco Chaudron


On 19 Sep 2023, at 10:42, Robin Jarry wrote:

> Adrian Moreno, Sep 19, 2023 at 09:18:
>>> Both OVSDB and appctl are literally JSON-RPC protocols.  There is no
>>> need to re-invent anything.
>>
>> Right. Isn't appctl simpler in this case? IIUC, it would still satisfy
>> the requirements: client decides update interval, flexible schema, etc
>> with the benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb
>> communication, no need to store data in both places) and probably
>> involving less internal code change.
>>
>> Just to clarify: I'm referring to allowing JSON output of the (already
>> JSON-RPC) appctl protocol.
>
> I agree. Using ovsdb for ephemeral stats seems weird to me. appctl and
> a more fluid schema/data structure would be suitable.
>
> What kind of API did you have in mind to structure the JSON output?

I guess we should use the appctl json API. I’m including Jakob who is going to 
come up with a POC to see if we can add json support for the existing appctl 
command. Probably starting with the “dpif-netdev/pmd-perf-show” output, but I 
let him comment.

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] checkpatch: Add checks for the subject line.

2023-09-19 Thread Simon Horman
On Thu, Sep 14, 2023 at 03:52:50PM +0200, Eelco Chaudron wrote:
> 
> 
> On 14 Sep 2023, at 15:44, Eelco Chaudron wrote:
> 
> > This patch adds WARNINGs for the subject line length and the format,
> > i.e., the sentence should start with a capital and end with a dot.
> > 
> > Acked-by: Simon Horman 
> > Signed-off-by: Eelco Chaudron 
> 
> Simon, let me know if you are ok with the change, and I’ll commit it.

Sorry for the delay. Yes, I am fine with this.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: fix tso bug

2023-09-19 Thread David Marchand
Hello,

Please don't top post.

On Mon, Sep 18, 2023 at 11:47 AM Dexia Li  wrote:
>
> Hi,david
>
> In dpdk 22.11 rte_mbuf_core.h file, there are notes as follows:
>
> /**
>  * TCP segmentation offload. To enable this offload feature for a
>  * packet to be transmitted on hardware supporting TSO:
>  *  - set the RTE_MBUF_F_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>  *RTE_MBUF_F_TX_TCP_CKSUM)
>  *  - set the flag RTE_MBUF_F_TX_IPV4 or RTE_MBUF_F_TX_IPV6
>  *  - if it's IPv4, set the RTE_MBUF_F_TX_IP_CKSUM flag
>  *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
>  */
>
> And also, in testpmd example csum engine file csumonly.c, we can see
>  func process_inner_cksums partial code as follows
>
> if (tso_segsz)
> ol_flags |= RTE_MBUF_F_TX_TCP_SEG;
> else if (tx_offloads & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) {
> ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
> } else {
> if (info->is_tunnel)
> l4_off = info->outer_l2_len + 
> info->outer_l3_len +
>  info->l2_len + info->l3_len;
> else
> l4_off = info->l2_len + info->l3_len;
> tcp_hdr->cksum = 0;
> tcp_hdr->cksum =
> get_udptcp_checksum(m, l3_hdr, l4_off,
> info->ethertype);
> }
>
> We can see when RTE_MBUF_F_TX_TCP_SEG is set, and 
> RTE_ETH_TX_OFFLOAD_TCP_CKSUM is not set.

I don't see the point of copying this code.

Let me repeat what I said in my previous mail: asking for TSO means
implicitly that the hardware will do TCP checksum and the DPDK API
leaves some leeway to applications about this fact.
And specifically for the iavf driver, I see no bug in this area of the code.

To be frank, I found some bug in the net/ice and net/iavf drivers, but
it is not related to setting both ol_flags, and nothing is to be
changed on OVS side to fix this different issue I identified.

Please provide the details of which driver and what the issue is, in
your case, when both flags are set.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Robin Jarry
Adrian Moreno, Sep 19, 2023 at 09:18:
> > Both OVSDB and appctl are literally JSON-RPC protocols.  There is no
> > need to re-invent anything.
>
> Right. Isn't appctl simpler in this case? IIUC, it would still satisfy
> the requirements: client decides update interval, flexible schema, etc
> with the benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb
> communication, no need to store data in both places) and probably
> involving less internal code change.
>
> Just to clarify: I'm referring to allowing JSON output of the (already
> JSON-RPC) appctl protocol.

I agree. Using ovsdb for ephemeral stats seems weird to me. appctl and
a more fluid schema/data structure would be suitable.

What kind of API did you have in mind to structure the JSON output?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Adrian Moreno



On 9/18/23 20:24, Ilya Maximets wrote:

On 9/12/23 15:47, Eelco Chaudron wrote:



On 12 Sep 2023, at 15:19, Robin Jarry wrote:


Eelco Chaudron, Sep 12, 2023 at 09:17:

I feel like if we do need another way of getting (real time)
statistics out of OVS, we should use the same communication channel as
the other ovs-xxx utilities are using. But rather than returning
text-based responses, we might be able to make it JSON (which is
already used by the dbase). I know that Adrian is already
investigating machine-readable output for some existing utilities,
maybe it can be extended for the (pmd) statistics use case.

Using something like the DPDK telemetry socket, might not work for
other use cases where DPDK is not in play.


Maybe the telemetry socket code could be reused even when DPDK is not in
play.


Many distributions like Debian and Ubuntu do not even build their main
OVS packages with DPDK.  They have a separate DPDK-enabled package.  So,
this telemetry will not be available.  Also, in order to use telemetry,
you need to initialize DPDK, which is a heavy and completely unnecessary
operation if DPDK is not going to be used.


It already has all the APIs to return structured data and
serialize it to JSON. It would be nice not to have to reinvent the
wheel.


Both OVSDB and appctl are literally JSON-RPC protocols.  There is no need
to re-invent anything.



Right. Isn't appctl simpler in this case? IIUC, it would still satisfy the 
requirements: client decides update interval, flexible schema, etc with the 
benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb  communication, no 
need to store data in both places) and probably involving less internal code change.


Just to clarify: I'm referring to allowing JSON output of the (already JSON-RPC) 
appctl protocol.



But this is a new type of connecting into OVS, and I feel like we should keep 
the existing infrastructure, and not add another connection type. This would 
make it easy for existing tools to also benefit from the new format over the 
existing connection methods.

Any input from others in the community?


I agree, addition of a new connection type doesn't look good to me either.


I had considered using ovsdb but it seemed to me
less suitable for a few reasons:

* I had understood that ovsdb is a configuration database, not a state
   reporting database.


OVSDB already reports port statistics and some system stats, so
it's fine to expose things like that.  They are usually ephemeral
columns that do not end up written on disk.


* To have reliable and up to date numbers, ovs would need to push them
   at high rate to the database so that clients to get outdated cpu
   usage. The DPDK telemetry socket is real-time, the current numbers are
   returned on every request.


There is a mechanism to wait for ovs-vswitchd reply on request.
So, ovs-vswitch may update stats in the database the moment they
were requested by the client.  Should not be an issue.  This is
working today for port status and some other things.


* I would need to define a custom schema / table to store structured
   information in the db. The DPDK telemetry socket already has a schema
   defined for this.


It's true that we'll need some new columns and maybe a table, but
that should not be hard to do.  And it's even better because we'll
be able to define columns that make sense for OVS.


* Accessing ovsdb requires a library making it more complex to use for
   telemetry scrapers. The DPDK telemetry socket can be accessed with
   a standalone python script with no external dependencies[1].


Accessing OVSDB doesn't require a library, it's just a JSON-RPC [1].
We do provide our own implementation of the protocol, but there
is no need to use it, especially for basic "list-all" type of requests.
Most languages like python have built-in JSON libraries.  Some have
JSON-RPC libraries.

[1] https://www.rfc-editor.org/rfc/rfc7047

Best regards, Ilya Maximets.




--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev