[ovs-dev] [PATCH ovn] Revert "ovn: add geneve PMTUD support"

2023-12-15 Thread numans
From: Numan Siddique 

This reverts commit 450e41e783bfa69e4f9d6c80f6bcb01147d5cfe1.

If a packet has to be tunnelled to another node and if the physical
interface used for tunnelling has lower MTU than the packet or
if there is a route exception with a lower MTU, then the geneve
kernel module generates an ICMP need frag packet.  This packet
was getting dropped since the metadata had to be swapped.
The commit [1] did exactly that and fixed the issue.
But it has 2 issues -
  1. It introduced a regression for the scenario when an ICMP need frag
 packet generated outside of OVN has to be tunnelled and delivered
 to the destination VM/pod.  These ICMP need frag packets are now
 dropped.
  2. If the logical switches has ACLs or load balancers configured then
 these icmp need frag packets are dropped as they are not sent to
 the correct zone.

Its better to revert until we find a proper solution for the original
issue.

[1] - 450e41e783bf("ovn: add geneve PMTUD support")

Signed-off-by: Numan Siddique 
---
 .../workflows/ovn-fake-multinode-tests.yml|  6 +-
 NEWS  |  1 -
 controller/physical.c | 45 
 northd/northd.c   | 24 ---
 northd/ovn-northd.8.xml   |  8 ---
 tests/multinode.at| 69 ---
 tests/ovn-northd.at   |  6 --
 7 files changed, 3 insertions(+), 156 deletions(-)

diff --git a/.github/workflows/ovn-fake-multinode-tests.yml 
b/.github/workflows/ovn-fake-multinode-tests.yml
index b3ba82a30b..25610df534 100644
--- a/.github/workflows/ovn-fake-multinode-tests.yml
+++ b/.github/workflows/ovn-fake-multinode-tests.yml
@@ -76,8 +76,8 @@ jobs:
   fail-fast: false
   matrix:
 cfg:
-- { branch: "main", testsuiteflags: ""}
-- { branch: "branch-22.03", testsuiteflags: "-k 'basic test'" }
+- { branch: "main" }
+- { branch: "branch-22.03" }
 name: multinode tests ${{ join(matrix.cfg.*, ' ') }}
 env:
   RUNC_CMD: podman
@@ -176,7 +176,7 @@ jobs:
 
 - name: Run fake-multinode system tests
   run: |
-if ! sudo make check-multinode TESTSUITEFLAGS="${{ 
matrix.cfg.testsuiteflags }}"; then
+if ! sudo make check-multinode; then
   sudo podman exec -it ovn-central ovn-nbctl show || :
   sudo podman exec -it ovn-central ovn-sbctl show || :
   sudo podman exec -it ovn-chassis-1 ovs-vsctl show || :
diff --git a/NEWS b/NEWS
index acb3b854fb..e10fb79dd8 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,6 @@ Post v23.09.0
 connection method and doesn't require additional probing.
 external_ids:ovn-openflow-probe-interval configuration option for
 ovn-controller no longer matters and is ignored.
-  - Enable PMTU discovery on geneve tunnels for E/W traffic.
 
 OVN v23.09.0 - 15 Sep 2023
 --
diff --git a/controller/physical.c b/controller/physical.c
index ce3b88d165..ba88e1d8b4 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -2446,51 +2446,6 @@ physical_run(struct physical_ctx *p_ctx,
 &ofpacts, hc_uuid);
 }
 
-/* Add specif flows for E/W ICMPv{4,6} packets if tunnelled packets do not
- * fit path MTU.
- */
-HMAP_FOR_EACH (tun, hmap_node, p_ctx->chassis_tunnels) {
-ofpbuf_clear(&ofpacts);
-if (tun->type == GENEVE) {
-put_move(MFF_TUN_ID, 0, MFF_LOG_DATAPATH, 0, 24, &ofpacts);
-put_move(p_ctx->mff_ovn_geneve, 16, MFF_LOG_OUTPORT, 0, 16,
- &ofpacts);
-put_move(p_ctx->mff_ovn_geneve, 0, MFF_LOG_INPORT, 0, 15,
- &ofpacts);
-} else if (tun->type == STT) {
-put_move(MFF_TUN_ID, 0, MFF_LOG_DATAPATH, 0, 24, &ofpacts);
-put_move(MFF_TUN_ID, 40, MFF_LOG_OUTPORT,  0, 16, &ofpacts);
-put_move(MFF_TUN_ID, 24, MFF_LOG_INPORT, 0, 15, &ofpacts);
-} else if (tun->type == VXLAN) {
-put_move(MFF_TUN_ID, 0, MFF_LOG_DATAPATH, 0, 12, &ofpacts);
-put_move(MFF_TUN_ID, 12, MFF_LOG_INPORT, 0, 12, &ofpacts);
-} else {
-OVS_NOT_REACHED();
-}
-put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, &ofpacts);
-
-/* IPv4 */
-struct match match = MATCH_CATCHALL_INITIALIZER;
-match_set_in_port(&match, tun->ofport);
-match_set_dl_type(&match, htons(ETH_TYPE_IP));
-match_set_nw_proto(&match, IPPROTO_ICMP);
-match_set_icmp_type(&match, 3);
-match_set_icmp_code(&match, 4);
-
-ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 120, 0, &match,
-&ofpacts, hc_uuid);
-/* IPv6 */
-match_init_catchall(&match);
-match_set_in_port(&match, tun->ofport);
-match_set_dl_type(&match, htons(ETH_TYPE_IPV6));
-match_set_nw_proto(&match, IPPROTO_ICMPV6);
-match_se

Re: [ovs-dev] [PATCH] dpdk: Update to use v23.11.

2023-12-15 Thread Ilya Maximets
On 12/13/23 14:06, David Marchand wrote:
> This commit adds support for DPDK v23.11.
> It updates the CI script and documentation and includes the following
> changes coming from the dpdk-latest branch:
> 
> - sparse: Add some compiler intrinsics for DPDK build.
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=371129&state=*
> 
> - ci: Cache DPDK installed libraries only.
> - ci: Reduce optional libraries in DPDK.
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=383367&state=*
> 
> - system-dpdk: Ignore net/ice error log about QinQ offloading.
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=385259&state=*
> 
> Signed-off-by: David Marchand 
> ---

Thanks, David!  I didn't test this much, but it looks
correct to me:

Acked-by: Ilya Maximets 


BTW, Just to not forget, there is one deferred change that
is waiting for 23.11 to be revisited:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20230630024630.6464-3-ivan.ma...@arknetworks.am/

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 0/1] Per pmd load based sleeping

2023-12-15 Thread Ilya Maximets
On 12/14/23 12:15, Kevin Traynor wrote:
> These patches allow specific sleep settings for PMD threads. It is
> backwards compatabile with previously only allowing a global value.
> 
> v6:
> - Reworked for Ilya v5 review
> - Added NEWS
> - Reworked list parsing to handle invalid key:value value correctly
> - Added UTs for invalid values
> - Other minor coding std changes
> 
> v5:
> - Reworked from previous series
> 
> GHA: https://github.com/kevintraynor/ovs/actions/runs/7207817272
> cirrus: https://cirrus-ci.com/build/6075630895235072
> 
> Kevin Traynor (1):
>   dpif-netdev: Add per pmd sleep config.
> 
>  Documentation/topics/dpdk/pmd.rst |  34 ++-
>  NEWS  |   4 +
>  lib/dpif-netdev-private-thread.h  |   3 +
>  lib/dpif-netdev.c | 270 ---
>  tests/pmd.at  | 350 --
>  vswitchd/vswitch.xml  |  31 ++-
>  6 files changed, 642 insertions(+), 50 deletions(-)
> 


Thanks, Kevin!  This version looks good to me and I also can't
reproduce the performance hit from v5, it probably been something
in my setup.

Applied.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] netdev-dummy: Add support and test for tso.

2023-12-15 Thread Ilya Maximets
On 12/11/23 16:39, Mike Pattrick wrote:
> Test that netdev-dummy is able to send and recieve segment offloaded
> packets.
> 
> Signed-off-by: Mike Pattrick 
> 
> ---
> v2: Fix clang build error: mutex needed to access netdev_dummy members
> 
> Signed-off-by: Mike Pattrick 
> ---
>  lib/netdev-dummy.c   | 32 +++-
>  tests/dpif-netdev.at | 50 
>  2 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index 8c6e6d448..9d9a28892 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -44,6 +44,7 @@
>  #include "unaligned.h"
>  #include "timeval.h"
>  #include "unixctl.h"
> +#include "userspace-tso.h"
>  #include "reconnect.h"
>  
>  VLOG_DEFINE_THIS_MODULE(netdev_dummy);
> @@ -152,6 +153,8 @@ struct netdev_dummy {
>  bool ol_ip_csum OVS_GUARDED;
>  /* Flag RX packet with good csum. */
>  bool ol_ip_csum_set_good OVS_GUARDED;
> +/* Set the segment size for netdev TSO support. */
> +int ol_tso_segsz OVS_GUARDED;
>  };
>  
>  /* Max 'recv_queue_len' in struct netdev_dummy. */
> @@ -806,6 +809,10 @@ netdev_dummy_get_config(const struct netdev *dev, struct 
> smap *args)
>  smap_add_format(args, "ol_ip_csum_set_good", "%s", "true");
>  }
>  
> +if (netdev->ol_tso_segsz && userspace_tso_enabled()) {
> +smap_add_format(args, "ol_tso_segsz", "%d", netdev->ol_tso_segsz);
> +}
> +
>  /* 'dummy-pmd' specific config. */
>  if (!netdev_is_pmd(dev)) {
>  goto exit;
> @@ -937,6 +944,14 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
> struct smap *args,
>  netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM;
>  }
>  
> +if (userspace_tso_enabled()) {
> +netdev->ol_tso_segsz = smap_get_int(args, "ol_tso_segsz", 0);
> +if (netdev->ol_tso_segsz) {
> +netdev_->ol_flags |= (NETDEV_TX_OFFLOAD_TCP_TSO
> +  | NETDEV_TX_OFFLOAD_TCP_CKSUM);
> +}
> +}
> +
>  netdev_change_seq_changed(netdev_);
>  
>  /* 'dummy-pmd' specific config. */
> @@ -1119,6 +1134,15 @@ netdev_dummy_rxq_recv(struct netdev_rxq *rxq_, struct 
> dp_packet_batch *batch,
>  /* The netdev hardware sets the flag when the packet has good csum. 
> */
>  dp_packet_ol_set_ip_csum_good(packet);
>  }
> +
> +if (userspace_tso_enabled() && netdev->ol_tso_segsz) {
> +dp_packet_set_tso_segsz(packet, netdev->ol_tso_segsz);
> +dp_packet_hwol_set_tcp_seg(packet);
> +dp_packet_hwol_set_tx_ip_csum(packet);
> +dp_packet_hwol_set_tx_ipv4(packet);
> +dp_packet_hwol_set_csum_tcp(packet);
> +}
> +
>  ovs_mutex_unlock(&netdev->mutex);
>  
>  dp_packet_batch_init_packet(batch, packet);
> @@ -1174,6 +1198,12 @@ netdev_dummy_send(struct netdev *netdev, int qid,
>  DP_PACKET_BATCH_FOR_EACH(i, packet, batch) {
>  const void *buffer = dp_packet_data(packet);
>  size_t size = dp_packet_size(packet);
> +bool is_tso;
> +
> +ovs_mutex_lock(&dev->mutex);
> +is_tso = userspace_tso_enabled() && dev->ol_tso_segsz &&
> + dp_packet_hwol_is_tso(packet);
> +ovs_mutex_unlock(&dev->mutex);
>  
>  if (!dp_packet_is_eth(packet)) {
>  error = EPFNOSUPPORT;
> @@ -1194,7 +1224,7 @@ netdev_dummy_send(struct netdev *netdev, int qid,
>  if (eth->eth_type == htons(ETH_TYPE_VLAN)) {
>  max_size += VLAN_HEADER_LEN;
>  }
> -if (size > max_size) {
> +if (size > max_size && !is_tso) {
>  error = EMSGSIZE;
>  break;
>  }
> diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
> index d0359b5ea..2d9b8eb5f 100644
> --- a/tests/dpif-netdev.at
> +++ b/tests/dpif-netdev.at
> @@ -810,6 +810,56 @@ AT_CHECK_UNQUOTED([tail -n 1 p2.pcap.txt], [0], 
> [${good_expected}
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([userspace offload - tso])

Hi, Mike.  I didn't review the code, but see a few comments
for the test below.

> +OVS_VSWITCHD_START(
> +  [set Open_vSwitch . other_config:userspace-tso-enable=true -- \
> +   add-br br1 -- set bridge br1 datapath-type=dummy -- \
> +   add-port br1 p1 -- \
> +   set Interface p1 type=dummy -- \
> +   add-port br1 p2 -- \
> +   set Interface p2 type=dummy --])
> +
> +dnl Modify the ip_dst addr to force changing the IP csum.

This comment doesn't match the flow.

> +AT_CHECK([ovs-ofctl add-flow br1 in_port=p1,actions=output:p2])
> +
> +flow_s="\
> +  eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x0800,\
> +  nw_src=192.168.123.2,nw_dst=192.168.123.1,nw_proto=6,nw_ttl=64,nw_frag=no,\
> +  tp_src=54392,tp_dst=5201,tcp_flags=ack"
> +
> +payload=$(dd if=/dev/zero bs=2000 count=1 | hexdump -ve '1/1 "%02x"')

This seems unnecessary, you may use printf instead, since it's just
an all-zero payload, but also...

> 

Re: [ovs-dev] [PATCH v2 1/2] ofproto-dpif-upcall: Resolve checksums before controller upcall.

2023-12-15 Thread Ilya Maximets
On 12/11/23 16:39, Mike Pattrick wrote:
> Currently dp_netdev_upcall() resolves checksums on all packets, but this
> isn't strictly needed. The checksums will be resolved before
> transmission. However, we do have to resolve the checksums before
> sending a packet to the controller as offload flags aren't retained.

Hmm.  I'm not sure this is the only case.  For example, sFlow sampling
embeds packet header bytes into the sFlow header.  It should probably
contain correct checksums at this point.  There are maybe other cases
similar to this one as well.

Best regards, Ilya Maximets.

> 
> Signed-off-by: Mike Pattrick 
> ---
>  lib/dpif-netdev.c |  2 --
>  ofproto/ofproto-dpif-upcall.c | 13 ++---
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9a59a1b03..8f7aaffbb 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7980,8 +7980,6 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
> struct dp_packet *packet_,
>  ds_destroy(&ds);
>  }
>  
> -dp_packet_ol_send_prepare(packet_, 0);
> -
>  return dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata,
>   actions, wc, put_actions, dp->upcall_aux);
>  }
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index cc10f57b5..e974d844a 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1574,15 +1574,20 @@ process_upcall(struct udpif *udpif, struct upcall 
> *upcall,
>  const struct frozen_state *state = &recirc_node->state;
>  
>  struct ofproto_async_msg *am = xmalloc(sizeof *am);
> +
> +/* Resolve offloaded checksums, if any. */
> +struct dp_packet *packet_clone = dp_packet_clone(packet);
> +dp_packet_ol_send_prepare(packet_clone, 0);
> +
>  *am = (struct ofproto_async_msg) {
>  .controller_id = cookie->controller.controller_id,
>  .oam = OAM_PACKET_IN,
>  .pin = {
>  .up = {
>  .base = {
> -.packet = xmemdup(dp_packet_data(packet),
> -  dp_packet_size(packet)),
> -.packet_len = dp_packet_size(packet),
> +.packet = xmemdup(dp_packet_data(packet_clone),
> +  dp_packet_size(packet_clone)),
> +.packet_len = dp_packet_size(packet_clone),
>  .reason = cookie->controller.reason,
>  .table_id = state->table_id,
>  .cookie = get_32aligned_be64(
> @@ -1598,6 +1603,8 @@ process_upcall(struct udpif *udpif, struct upcall 
> *upcall,
>  },
>  };
>  
> +dp_packet_delete(packet_clone);
> +
>  if (cookie->controller.continuation) {
>  am->pin.up.stack = (state->stack_size
>? xmemdup(state->stack, state->stack_size)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] reconnect: Set defaults from environment variables.

2023-12-15 Thread Ilya Maximets
On 12/4/23 12:23, Felix Huettner via dev wrote:
> This exposes the old constants regarding min backoff, max backoff and
> probe interval using environment variables. In case previously users
> wanted to tune the probe interval for all connections this required
> setting this setting in multiple locations. E.g. to configure the probe
> interval from a relay to the ovsdb server you need to call an appctl
> command after the relay has started up.
> The new environment variables make it easy to set them for all new
> connections. The existing configuration options for these settings stay
> in place and take precedence over the environment variables.
> In case the environment variables are not specified/invalid formatted we
> default to the previous defaults.
> 
> Signed-off-by: Felix Huettner 
> ---
> v3->v4: fix conflict in NEWS file
> v2->v3: add minimal values and defaults as defines
> v1->v2: fixed wrong function definitions
> 
>  NEWS|  7 +
>  lib/jsonrpc.c   |  4 +--
>  lib/reconnect.c | 70 +
>  lib/reconnect.h |  7 ++---
>  ovsdb/jsonrpc-server.c  |  4 +--
>  ovsdb/ovsdb-server.c|  2 +-
>  ovsdb/relay.h   |  2 --
>  python/ovs/reconnect.py | 45 --
>  8 files changed, 121 insertions(+), 20 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 490e275da..6b580edf3 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -19,6 +19,13 @@ Post-v3.2.0
>   * Added support for Generic Segmentation Offloading for the cases where
> TSO is enabled but not supported by an egress interface (except for
> tunnel interfaces).
> +   - The environment variable OVS_RECONNECT_MIN_BACKOFF,
> + OVS_RECONNECT_MAX_BACKOFF and OVS_RECONNECT_PROBE_INTERVAL, if set, 
> allow
> + users to tune the default reconnect configuration. E.g. adapting
> + OVS_RECONNECT_PROBE_INTERVAL modifies the default probe interval, for 
> all
> + servers and clients. These variables are valid for all ovs code as well 
> as
> + the python client. Values set for individual connections (e.g. using the
> + `inactivity_probe` column in the `Manager` tables) will take precedence.

Hi, Felix.  Thanks for the patch!  Though I'm not sure if introduction
of yet another way of setting the same thing is a good approach for
this problem.  If anything, we should probably try to reduce the number
of configuration methods.  I tried to come up with a way to solve the
configuration for ovsdb-server for a long time now, and the config file
approach that I've been working for a past few months seems promising:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20231214010431.1664005-21-i.maxim...@ovn.org/
At least it should reduce the configuration method variety while the
config file is in use.  And it should solve the problem of configuring
relays that you mentioned in the commit message.

Having a configurable global default may also be confusing for the end
users.  Primarily because it doesn't actually apply to everything.
For example, active-backup replication has its own default probe interval
that will not be affected, and RAFT is calculating probe interval based
on election timer.  So, it may be not clear why the variable is set,
nothing is explicitly configured by the user, but some connections are
not using the value.
Another potential source of confusion is that OpenFlow connections use
the same terminology (probe_interval), but a different implementation
that is not going to be affected by this change as well.

Is there a scenario that we still need to cover that will not be covered
by the configuration file?  If there are some, we could try to come
up with solutions for these.  I know of a few, but these are just bugs
that should be fixed and this patch will not help with them anyway.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] ovsdb: Abort schema conversion before election timer expires.

2023-12-15 Thread Ilya Maximets
On 12/13/23 23:19, Frode Nordahl wrote:
> Since the topic of clustered database schema conversion issues was
> first brought up [0], there has been several performance
> improvements which has greatly improved the situation.
> 
> However, we cannot get away from the fact that the process remains
> unpredictable, when faced with a cluster that has grown to a scale
> no longer supported by its configuration.
> 
> The OVSDB Server is mostly single threaded, and for clustered
> databases the schema conversion is done server side in the main
> thread.  While conversion is ongoing the server will not service
> critical cluster communication, such as Raft elections.
> 
> In an ideal world operators would regularly re-assess their
> deployment configurations with real data to ensure parameters such
> as the OVSDB Raft election-timer is appropriately configured for
> all cluster operations, including schema conversions.
> 
> The reality is that this does not happen, and operators are instead
> discovering the hard facts of managing a OVSDB cluster at critical
> points in time such as during database upgrades.
> 
> To alleviate this situation we introduce time keeping in the
> ovsdb_convert functions for clustered databases.
> 
> A timeout is deduced from the configured election timer using a
> divisor of 4 to compute buffer time.  Buffer time is the time
> required for writing the result of the conversion to the raft log
> and initiating command to replace data on peers before an election
> is held for the next term.
> 
> For conversions initiated by a client (such as ovsdb-client),
> the process will be aborted and an error will be returned if the
> schema conversion process overruns the timeout.  Conversions
> initiated from storage will not abort, a warning indicating the election
> timer being set too low will be logged instead.
> 
> This will allow a human or automatic operator to take necessary
> action, such as increasing the election timer, before attempting
> the schema conversion again.
> 
> 0: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
> Signed-off-by: Frode Nordahl 
> ---

Hi, Frode.  Thanks for the patch!

It reminds me of the previous unfinished work from Anton:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20210820091630.7334-1-anton.iva...@cambridgegreys.com/

Looks like the same general idea.  There are a few issues with this
kind of functionality though:

- The task is not finished, i.e. we have to give up and fail, which
  is not nice for the user.

- Leaky abstraction.  This patch requires ovsdb/file module to know
  about RAFT, which is largely irrelevant to that module.  Can be fixed
  by handing off the configuration from top to bottom (see the Anton's
  implementation), but in this case will require all the intermediate
  layers to pass the information along, piercing the whole hierarchy.

- The issue appears to not be fully addressed.  In my previous testing,
  IIRC, sending back all the initial monitor replies after conversion
  was a heavier operation than a conversion itself.  So, even if
  conversion doesn't time out, jsonrpc server may cause the timeout.
  So, we kind of need both parts of the puzzle.

- Also, only timing the conversion is a little arbitrary.  The timer/4
  is chosen, but there is no guarantee that 3/4 of the timer are enough
  for everything else to run.  Anton's solution was trying to keep track
  of time for every operation and subtract from the budget, but that
  becomes a trade off between how deep in the call stack we want to
  be able to yield (foreshadowing :) ) vs at how many levels we need to
  implement time budget tracking.


So, I got another idea to discuss (It might have been mentioned somewhere
in the past years, but I do not remember).

Can we make use of some cooperative multitasking here?
I'm not talking about full-fledged co-routines, but something simpler,
though similar in nature.

Let's say we create a library with 2 functions:

1. cooperative_multitasking_register(callback, arg, time_threshold);
2. cooperative_yield();

Different modules could register a function in this module specifying
a how much time should pass between calls.  Other modules that are
expected to have heavy processing will call the yield() function at
points where they feel comfortable to do so.  The yield() function
may look through all the registered callbacks and call ones that wasn't
called for the duration of their time threshold.  Maybe we'll need
a third function for a callback to call to update the last used time.
Might be useful if the callback is called directly at some point and
not from the yeild().

Some strict rules will be needed for such a library though.  i.e.
the module that registers itself with a callback must not use the
yield functionality inside nor it should be possible to do so via
calls to other modules.  Not sure if recursion detection is enough,
but surely needed.  And basically, the module that regi

Re: [ovs-dev] [PATCH v7 2/3] dpif: Probe support for OVS_ACTION_ATTR_DROP.

2023-12-15 Thread Eric Garver
On Fri, Dec 15, 2023 at 06:02:21PM +0100, Adrian Moreno wrote:
> 
> 
> On 9/28/23 13:50, Eelco Chaudron wrote:
> > 
> > 
> > On 25 Sep 2023, at 20:04, Eric Garver wrote:
> > 
> > > Kernel support has been added for this action. As such, we need to probe
> > > the datapath for support.
> > > 
> > > Signed-off-by: Eric Garver 
> > 
> > This patch looks fine to me, and I’ll ack it. However, we should not apply 
> > this series until we have a final version of Ilya’s RFC patch applied.
> > 
> > This is the patch I’m referring to:
> > 
> > https://patchwork.ozlabs.org/project/openvswitch/patch/20230908234009.892803-1-i.maxim...@ovn.org/
> > 
> 
> 
> Hi Eric,
> 
> We've further investigated Ilya's RFC and seem some blocking issues. So I'm
> afraid we have to ask you to respin your V6 of this patch addressing Ilya's
> thread-safety problems as well as another concern: making sure re-probing
> does not install a dp flow that affects ongoing traffic.
> 
> More details here:
> https://patchwork.ozlabs.org/project/openvswitch/patch/20230908234009.892803-1-i.maxim...@ovn.org/
> 
> Sorry for delaying this. If you want help, I volunteer.

ACK. I'll investigate in the new year.

Eric.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 2/2] userspace: Enable tunnel tests with tso.

2023-12-15 Thread Mike Pattrick
This patch enables most of the tunnel tests in the testsuite, and adds a
large TCP transfer to a vxlan and geneve test to verify TSO
functionality. Some additional changes were required to accommodate these
changes with netdev-linux interfaces. The test for vlan over vxlan is
purposely not enabled as the traffic produced by this test gives
incorrect values in the vnet header.

Signed-off-by: Mike Pattrick 
---

v10:
 - Software TCP checksums now support encapsulated TSO case
 - Redundant inner offset code was removed
---
 lib/dp-packet.h  | 49 +++-
 lib/dpif-netdev-extract-avx512.c |  8 ++---
 lib/flow.c   | 12 ++-
 lib/netdev-linux.c   | 45 +++--
 lib/netdev-native-tnl.c  | 27 ---
 lib/packets.c| 56 +---
 tests/system-traffic.at  | 39 ++
 7 files changed, 151 insertions(+), 85 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 3b16b2a54..cf341f09c 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -433,6 +433,8 @@ dp_packet_reset_offsets(struct dp_packet *b)
 b->l2_5_ofs = UINT16_MAX;
 b->l3_ofs = UINT16_MAX;
 b->l4_ofs = UINT16_MAX;
+b->inner_l3_ofs = UINT16_MAX;
+b->inner_l4_ofs = UINT16_MAX;
 }
 
 static inline uint16_t
@@ -530,6 +532,16 @@ dp_packet_inner_l4(const struct dp_packet *b)
: NULL;
 }
 
+static inline size_t
+dp_packet_inner_l4_size(const struct dp_packet *b)
+{
+return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
+? (const char *) dp_packet_tail(b)
+- (const char *) dp_packet_inner_l4(b)
+- dp_packet_l2_pad_size(b)
+: 0;
+}
+
 static inline const void *
 dp_packet_get_tcp_payload(const struct dp_packet *b)
 {
@@ -865,14 +877,6 @@ dp_packet_set_data(struct dp_packet *b, void *data)
 }
 }
 
-static inline void
-dp_packet_reset_packet(struct dp_packet *b, int off)
-{
-dp_packet_set_size(b, dp_packet_size(b) - off);
-dp_packet_set_data(b, ((unsigned char *) dp_packet_data(b) + off));
-dp_packet_reset_offsets(b);
-}
-
 enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
@@ -1411,21 +1415,36 @@ dp_packet_ol_reset_l4_csum_good(struct dp_packet *p)
 }
 }
 
-/* Marks packet 'p' with good integrity if the 'start' and 'offset'
- * matches with the 'csum_start' and 'csum_offset' in packet 'p'.
- * The 'start' is the offset from the begin of the packet headers.
- * The 'offset' is the offset from start to place the checksum.
+/* Marks packet 'p' with good integrity if checksum offload locations
+ * were provided. In the case of encapsulated packets, these values may
+ * be deeper into the packet than OVS might expect. But the packet
+ * should still be considered to have good integrity.
+ * The 'csum_start' is the offset from the begin of the packet headers.
+ * The 'csum_offset' is the offset from start to place the checksum.
  * The csum_start and csum_offset fields are set from the virtio_net_hdr
  * struct that may be provided by a netdev on packet ingress. */
 static inline void
-dp_packet_ol_l4_csum_check_partial(struct dp_packet *p, uint16_t start,
- uint16_t offset)
+dp_packet_ol_l4_csum_check_partial(struct dp_packet *p)
 {
-if (p->csum_start == start && p->csum_offset == offset) {
+if (p->csum_start && p->csum_offset) {
 dp_packet_ol_set_l4_csum_partial(p);
 }
 }
 
+static inline void
+dp_packet_reset_packet(struct dp_packet *b, int off)
+{
+dp_packet_set_size(b, dp_packet_size(b) - off);
+dp_packet_set_data(b, ((unsigned char *) dp_packet_data(b) + off));
+dp_packet_reset_offsets(b);
+
+if (b->csum_start >= off && b->csum_offset) {
+/* Adjust values for decapsulation. */
+b->csum_start -= off;
+dp_packet_ol_set_l4_csum_partial(b);
+}
+}
+
 static inline uint32_t ALWAYS_INLINE
 dp_packet_calc_hash_ipv4(const uint8_t *pkt, const uint16_t l3_ofs,
  uint32_t hash)
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index 1bc7e8d0e..57ca4c71b 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -776,9 +776,7 @@ mfex_ipv6_set_hwol(struct dp_packet *pkt)
 static void
 mfex_tcp_set_hwol(struct dp_packet *pkt)
 {
-dp_packet_ol_l4_csum_check_partial(pkt, pkt->l4_ofs,
- offsetof(struct tcp_header,
-  tcp_csum));
+dp_packet_ol_l4_csum_check_partial(pkt);
 if (dp_packet_l4_checksum_good(pkt)
 || dp_packet_ol_l4_csum_partial(pkt)) {
 dp_packet_hwol_set_csum_tcp(pkt);
@@ -788,9 +786,7 @@ mfex_tcp_set_hwol(struct dp_packet *pkt)
 static void
 mfex_udp_set_hwol(struct dp_packet *pkt)
 {
-dp_packet_ol_l4_csum_check_partial(pkt, pkt->l4_ofs,
- offsetof(struct udp_head

[ovs-dev] [PATCH v10 1/2] userspace: Support vxlan and geneve tso.

2023-12-15 Thread Mike Pattrick
From: Dexia Li 

For userspace datapath, this patch provides vxlan and geneve tunnel tso.
Only support userspace vxlan or geneve tunnel, meanwhile support
tunnel outter and inner csum offload. If netdev do not support offload
features, there is a software fallback.If netdev do not support vxlan
and geneve tso,packets will drop. Front-end devices can close offload
features by ethtool also.

Signed-off-by: Dexia Li 
Co-authored-by: Mike Pattrick 
Signed-off-by: Mike Pattrick 
---
v9: Rebased patch
---
 lib/dp-packet.c |  41 +++-
 lib/dp-packet.h | 216 
 lib/dpif-netdev.c   |   4 +-
 lib/flow.c  |   2 +-
 lib/netdev-dpdk.c   |  87 ++--
 lib/netdev-dummy.c  |   2 +-
 lib/netdev-native-tnl.c | 106 ++--
 lib/netdev-provider.h   |   4 +
 lib/netdev.c|  33 --
 lib/packets.c   |  12 +--
 lib/packets.h   |   6 +-
 tests/dpif-netdev.at|   4 +-
 12 files changed, 458 insertions(+), 59 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 920402369..cb20608d7 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -546,16 +546,47 @@ dp_packet_compare_offsets(struct dp_packet *b1, struct 
dp_packet *b2,
 return true;
 }
 
+void
+dp_packet_tnl_outer_ol_send_prepare(struct dp_packet *p,
+uint64_t flags)
+{
+if (dp_packet_hwol_is_outer_ipv4_cksum(p)) {
+if (!(flags & NETDEV_TX_OFFLOAD_OUTER_IP_CKSUM)) {
+dp_packet_ip_set_header_csum(p, false);
+dp_packet_ol_set_ip_csum_good(p);
+dp_packet_hwol_reset_outer_ipv4_csum(p);
+}
+}
+
+if (!dp_packet_hwol_is_outer_UDP_cksum(p)) {
+return;
+}
+
+if (!(flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) {
+packet_udp_complete_csum(p, false);
+dp_packet_ol_set_l4_csum_good(p);
+dp_packet_hwol_reset_outer_udp_csum(p);
+}
+}
+
 /* Checks if the packet 'p' is compatible with netdev_ol_flags 'flags'
  * and if not, updates the packet with the software fall back. */
 void
 dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags)
 {
+bool tnl_inner = false;
+
+if (dp_packet_hwol_is_tunnel_geneve(p) ||
+dp_packet_hwol_is_tunnel_vxlan(p)) {
+dp_packet_tnl_outer_ol_send_prepare(p, flags);
+tnl_inner = true;
+}
+
 if (dp_packet_hwol_tx_ip_csum(p)) {
 if (dp_packet_ip_checksum_good(p)) {
 dp_packet_hwol_reset_tx_ip_csum(p);
 } else if (!(flags & NETDEV_TX_OFFLOAD_IPV4_CKSUM)) {
-dp_packet_ip_set_header_csum(p);
+dp_packet_ip_set_header_csum(p, tnl_inner);
 dp_packet_ol_set_ip_csum_good(p);
 dp_packet_hwol_reset_tx_ip_csum(p);
 }
@@ -565,24 +596,24 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t 
flags)
 return;
 }
 
-if (dp_packet_l4_checksum_good(p)) {
+if (dp_packet_l4_checksum_good(p) && (!tnl_inner)) {
 dp_packet_hwol_reset_tx_l4_csum(p);
 return;
 }
 
 if (dp_packet_hwol_l4_is_tcp(p)
 && !(flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) {
-packet_tcp_complete_csum(p);
+packet_tcp_complete_csum(p, tnl_inner);
 dp_packet_ol_set_l4_csum_good(p);
 dp_packet_hwol_reset_tx_l4_csum(p);
 } else if (dp_packet_hwol_l4_is_udp(p)
&& !(flags & NETDEV_TX_OFFLOAD_UDP_CKSUM)) {
-packet_udp_complete_csum(p);
+packet_udp_complete_csum(p, tnl_inner);
 dp_packet_ol_set_l4_csum_good(p);
 dp_packet_hwol_reset_tx_l4_csum(p);
 } else if (!(flags & NETDEV_TX_OFFLOAD_SCTP_CKSUM)
&& dp_packet_hwol_l4_is_sctp(p)) {
-packet_sctp_complete_csum(p);
+packet_sctp_complete_csum(p, tnl_inner);
 dp_packet_ol_set_l4_csum_good(p);
 dp_packet_hwol_reset_tx_l4_csum(p);
 }
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 11aa00723..3b16b2a54 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -86,22 +86,47 @@ enum dp_packet_offload_mask {
 DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, RTE_MBUF_F_TX_SCTP_CKSUM, 0x800),
 /* Offload IP checksum. */
 DEF_OL_FLAG(DP_PACKET_OL_TX_IP_CKSUM, RTE_MBUF_F_TX_IP_CKSUM, 0x1000),
+/* Offload packet is tunnel GENEVE. */
+DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_GENEVE,
+RTE_MBUF_F_TX_TUNNEL_GENEVE, 0x2000),
+/* Offload packet is tunnel VXLAN. */
+DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_VXLAN,
+RTE_MBUF_F_TX_TUNNEL_VXLAN, 0x4000),
+/* Offload tunnel packet, out is ipv4 */
+DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_IPV4,
+RTE_MBUF_F_TX_OUTER_IPV4, 0x8000),
+/* Offload TUNNEL out ipv4 checksum */
+DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_IP_CKSUM,
+RTE_MBUF_F_TX_OUTER_IP_CKSUM, 0x1),
+/* Offload TUNNEL out udp checksum */
+DEF_OL_FLAG(DP_PACKET_OL_TX_OUTER_UDP_CKSUM,
+RTE_MBUF_F_TX_O

Re: [ovs-dev] [PATCH 1/2] ovsdb: Fix use after free on schema conversion error.

2023-12-15 Thread Ilya Maximets
On 12/13/23 23:19, Frode Nordahl wrote:
> In the event a schema conversion aborts, the cleanup code in
> ovsdb_convert() prior to this patch will remove the in-memory
> copy of the new database prior to aborting any on-going
> transactions in that database, consequently leading to a use after
> free and potential crash.
> 
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
> databases.")
> Signed-off-by: Frode Nordahl 
> ---
>  ovsdb/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ovsdb/file.c b/ovsdb/file.c
> index 8bd1d4af3..778b4004b 100644
> --- a/ovsdb/file.c
> +++ b/ovsdb/file.c
> @@ -388,10 +388,10 @@ ovsdb_convert(const struct ovsdb *src, const struct 
> ovsdb_schema *new_schema,
>  return NULL;
>  
>  error:
> -ovsdb_destroy(dst);
>  if (txn) {
>  ovsdb_txn_abort(txn);
>  }
> +ovsdb_destroy(dst);
>  *dstp = NULL;
>  return error;
>  }


Thanks, Frode!  Good catch!

Can we have a test case for this issue though?
I don't think we can test this directly, as the chances for the
crash are not 100%, but we can write a test and let asan fail it.
We do run tests under asan in CI, so that should be fine.

The reproducer may look like this:
1. Create a schema with 2 tables with string columns.
2. Create database and add non-numerical string data to both tables.
3. Try to convert the column in the first table to integer.
4. The same for the other table.

Steps 3 and 4 should expect failure, but should ignore the
error message, or at least ignore the table/column names in
the output, because we don't know which table will be first
during conversion.  For the same reason we need both steps,
because we need transaction to be populated with the data
from one table before the conversion fails on the second one.
ASAN or valgrind should catch the UAF condition. 

What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] ofproto-dpif-upcall: Resolve checksums before controller upcall.

2023-12-15 Thread Mike Pattrick
On Fri, Dec 15, 2023 at 8:01 AM Eelco Chaudron  wrote:
>
>
>
> On 11 Dec 2023, at 16:39, Mike Pattrick wrote:
>
> > Currently dp_netdev_upcall() resolves checksums on all packets, but this
> > isn't strictly needed. The checksums will be resolved before
> > transmission. However, we do have to resolve the checksums before
> > sending a packet to the controller as offload flags aren't retained.
> >
> > Signed-off-by: Mike Pattrick 
> > ---
> >  lib/dpif-netdev.c |  2 --
> >  ofproto/ofproto-dpif-upcall.c | 13 ++---
> >  2 files changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 9a59a1b03..8f7aaffbb 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -7980,8 +7980,6 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
> > struct dp_packet *packet_,
> >  ds_destroy(&ds);
> >  }
> >
> > -dp_packet_ol_send_prepare(packet_, 0);
> > -
> >  return dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata,
> >   actions, wc, put_actions, dp->upcall_aux);
> >  }
> > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> > index cc10f57b5..e974d844a 100644
> > --- a/ofproto/ofproto-dpif-upcall.c
> > +++ b/ofproto/ofproto-dpif-upcall.c
> > @@ -1574,15 +1574,20 @@ process_upcall(struct udpif *udpif, struct upcall 
> > *upcall,
> >  const struct frozen_state *state = &recirc_node->state;
> >
> >  struct ofproto_async_msg *am = xmalloc(sizeof *am);
> > +
> > +/* Resolve offloaded checksums, if any. */
> > +struct dp_packet *packet_clone = dp_packet_clone(packet);
> > +dp_packet_ol_send_prepare(packet_clone, 0);
>
> Doing a clone (malloc/copy), and doing another malloc and copy below sound 
> like a waste of resources.
>
> Can we maybe make this more smart in the dp_netdev_upcall()?
> Something like the below in dp_netdev_upcall():

The reason for moving this out of dp_netdev_upcall is resolving the
checksum will modify certain flags that interfere with the software
gso implementation. I could reset the flags for all packets in
dp-packet-gso, but this would not be needed in most packets and would
mean that dp_netdev_upcall packets don't take advantage of checksum
offload at all.

Another option would be to add a new function that sets the checksum
in a different buffer than the dp-packet. This would only be used
here, but shouldn't be too complex to implement.

Probably the first one would be far more straightforward, so unless
you have a preference I'll do that one.

-M

>
>   if (type == CONTROLLER_UPCALL)
> dp_packet_ol_send_prepare(packet_, 0);
>
>
> >  *am = (struct ofproto_async_msg) {
> >  .controller_id = cookie->controller.controller_id,
> >  .oam = OAM_PACKET_IN,
> >  .pin = {
> >  .up = {
> >  .base = {
> > -.packet = xmemdup(dp_packet_data(packet),
> > -  dp_packet_size(packet)),
> > -.packet_len = dp_packet_size(packet),
> > +.packet = xmemdup(dp_packet_data(packet_clone),
> > +  
> > dp_packet_size(packet_clone)),
> > +.packet_len = dp_packet_size(packet_clone),
> >  .reason = cookie->controller.reason,
> >  .table_id = state->table_id,
> >  .cookie = get_32aligned_be64(
> > @@ -1598,6 +1603,8 @@ process_upcall(struct udpif *udpif, struct upcall 
> > *upcall,
> >  },
> >  };
> >
> > +dp_packet_delete(packet_clone);
> > +
> >  if (cookie->controller.continuation) {
> >  am->pin.up.stack = (state->stack_size
> >? xmemdup(state->stack, state->stack_size)
> > --
> > 2.39.3
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [branch-21.12 v2 ovn] controller: make garp_max_timeout configurable

2023-12-15 Thread Lorenzo Bianconi
> On 12/14/23 18:42, Lorenzo Bianconi wrote:
> > When using VLAN backed networks and OVN routers leveraging the
> > 'ovn-chassis-mac-mappings' option for east-west traffic, the eth.src field 
> > is
> > replaced by the chassis mac address in order to not expose the router mac
> > address from different nodes and confuse the TOR switch. However doing so
> > the TOR switch is not able to learn the port/mac bindings for routed E/W
> > traffic and it is force to always flood it. Fix this issue adding the
> > capability to configure a given timeout for garp sent by ovn-controller
> > and not disable it after the exponential backoff in order to keep
> > refreshing the entries in TOR swtich fdb table.
> > More into about the issue can be found here [0].
> > 
> > [0] 
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2020-September/050678.html
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2087779
> > 
> > Signed-off-by: Lorenzo Bianconi 
> > Acked-by: Ales Musil 
> > Signed-off-by: Mark Michelson 
> > ---
> > Changes since v1:
> > - add missing selftest
> > ---
> 
> Hi Lorenzo,
> 
> Thanks for the backport.
> 
> [...]
> 
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 003bf45d5..987e043df 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -30762,3 +30762,108 @@ check test "$current_id" = "$prev_id"
> >  OVN_CLEANUP([hv1],[hv2])
> >  AT_CLEANUP
> >  ])
> > +
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([send gratuitous arp for l3gateway only on selected chassis])
> 
> This test is identical to what 77ec310dda51 ("pinctrl.c: Send GARP only
> on chassis atached to l3gw") added to branch 22.03.  However, we don't
> have the changes that 77ec310dda51 added to pinctrl.c.
> 
> Does that mean this test is flaky and might fail occasionally?  Or is it
> just that the test doesn't test the change it was supposed to test?
> 
> In any case, why not add a new test, just for garp_max_timeout, for main
> branch?  We can backport that one alone to all branches, including 21.12.
> 
> Thanks,
> Dumitru
> 
> 

Hi Dumitru,

Regarding the self-test for 21.12 backport I posted a standalone test for
garp-max-timeout we can easily backport on 21.12.

https://patchwork.ozlabs.org/project/ovn/patch/3957de4e1a47075fd8251156c01ec06cad9ba60a.1702659678.git.lorenzo.bianc...@redhat.com/

Regards,
Lorenzo
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 2/3] dpif: Probe support for OVS_ACTION_ATTR_DROP.

2023-12-15 Thread Adrian Moreno



On 9/28/23 13:50, Eelco Chaudron wrote:



On 25 Sep 2023, at 20:04, Eric Garver wrote:


Kernel support has been added for this action. As such, we need to probe
the datapath for support.

Signed-off-by: Eric Garver 


This patch looks fine to me, and I’ll ack it. However, we should not apply this 
series until we have a final version of Ilya’s RFC patch applied.

This is the patch I’m referring to:

https://patchwork.ozlabs.org/project/openvswitch/patch/20230908234009.892803-1-i.maxim...@ovn.org/




Hi Eric,

We've further investigated Ilya's RFC and seem some blocking issues. So I'm 
afraid we have to ask you to respin your V6 of this patch addressing Ilya's 
thread-safety problems as well as another concern: making sure re-probing does 
not install a dp flow that affects ongoing traffic.


More details here:
https://patchwork.ozlabs.org/project/openvswitch/patch/20230908234009.892803-1-i.maxim...@ovn.org/

Sorry for delaying this. If you want help, I volunteer.

--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] test: add dedicated test for garp-max-timeout

2023-12-15 Thread Lorenzo Bianconi
Introduce a dedicated test for garp-max-timeout knob

Signed-off-by: Lorenzo Bianconi 
---
 tests/ovn.at | 82 
 1 file changed, 82 insertions(+)

diff --git a/tests/ovn.at b/tests/ovn.at
index 918f97a9e..25b101a7a 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -37464,3 +37464,85 @@ wait_for_ports_up
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([gratuitous arp max timeout])
+AT_KEYWORDS([garp-timeout])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+ovn_start
+
+ovn-nbctl ls-add ls0
+ovn-nbctl lr-add lr0
+ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24
+ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \
+type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"'
+
+ovn-nbctl lsp-add ls0 ln_port
+ovn-nbctl lsp-set-addresses ln_port unknown
+ovn-nbctl lsp-set-type ln_port localnet
+ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1
+
+# Prepare packets
+touch empty_expected
+echo 
"f00108060001080006040001f001c0a80001c0a80001"
 > arp_expected
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl \
+-- add-br br-phys \
+-- add-br br-eth0
+
+ovn_attach n1 br-phys 192.168.0.10
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . 
external-ids:ovn-bridge-mappings=physnet1:br-eth0])
+AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif 
options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap])
+
+ovn-sbctl dump-flows > sbflows
+AT_CAPTURE_FILE([sbflows])
+
+# Wait until the patch ports are created in hv1 to connect br-int to br-eth0
+AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
+OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"])
+# Temporarily remove lr0 chassis
+# Wait for hv confirmation to make sure chassis is removed before we reset pcap
+# Otherwise a garp might be sent after pcap have been reset but before chassis 
is removed
+AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
+
+as hv1 reset_pcap_file snoopvif hv1/snoopvif
+hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
+AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
+OVS_WAIT_UNTIL([
+ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0)
+test "$ls0_lr0" = $hv1_uuid
+])
+
+OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected])
+# Temporarily remove lr0 chassis
+AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
+
+as hv1 reset_pcap_file snoopvif hv1/snoopvif
+# set garp max timeout to 2s
+AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . 
external-ids:garp-max-timeout-sec=2])
+AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
+
+OVS_WAIT_UNTIL([
+n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l)
+test "$n_arp" = 10
+])
+
+AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
+as hv1 reset_pcap_file snoopvif hv1/snoopvif
+# set garp max timeout to 2s
+AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . 
external-ids:garp-max-timeout-sec=1])
+AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
+
+OVS_WAIT_UNTIL([
+n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l)
+test "$n_arp" = 20
+])
+
+OVN_CLEANUP([hv1])
+
+AT_CLEANUP
+])
-- 
2.43.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] bridge: Re-create bridges when Flow API is enabled.

2023-12-15 Thread Adrian Moreno



On 12/15/23 11:03, Eelco Chaudron wrote:



On 14 Dec 2023, at 19:15, Adrian Moreno wrote:


On 12/13/23 11:47, Adrian Moreno wrote:



On 9/15/23 17:25, Eelco Chaudron wrote:



On 9 Sep 2023, at 1:40, Ilya Maximets wrote:


Currently OVS requires restart of ovs-vswitchd after enabling hardware
offload.  This is necessary to make sure all the correct features are
probed and all the internal configurations are in the right state.

Making that process a bit less invasive by re-creating bridges instead
of a full process restart.  Documentation is not updated, because
disabling hardware offload still can take effect only after restart.



Hi Ilya,

I like the idea of re-creating the bridges, however not sure if it generated 
any side effects other than the ones you mention in the FIXME.



I have taken a look at the patch, tested it and added some rework of the 
iface_hints to avoid the datapath ports from being deleted when the bridge is 
re-created.

The recreation of the bridge seems to work but my concern is that, unlike a restart 
which uses ovs-save to save & restore the flows, groups and tunnel metadata, 
doing this will leave the bridge with no OpenFlow configuration.

During the time the controller takes to reconfigure the bridge, reval threads 
will wipe the datapath flows. If there is no controller ... well we'll need to 
wait for the user to program the flows back. This is quite a big behavior 
change so I'd say if we go for this option we better documment it pretty well.



I'm thinking that, considering this hassle (loosing all flows, meters, tunnel 
tlv info, etc), isn't it better to simply restart vswitchd?

A more more complex solution (probably big enough for this release which is 
coming fast) could be to recreate ovs-save internally and:
- Collect all the flow table, groups, tlv data and meters
- Delete the bridge and recreate it
- Reconfigure everything back

Flow stats will probably be lost in the process, which is a weird outcome of 
enabling hwol but it's better than loosing all the flows.


I guess the goal for this patch was to NOT touch any actual resources, only 
re-probe the capabilities. See comment below:

+    /* Destroy all the remaining bridges if Flow API status changed
+ * as we need to re-probe supported features.  Do not delete
+ * bridge resources to avoid datapath disruption. */

Ilya can you comment on this?


Summarizing the conversation we had with Ilya and adding a bit of context for 
the sake of completeness:


When reviewing Eric's series to add OVS_ACTION_ATTR_DROP support for the kernel 
datapath. The problem comes when hw-offload is enabled since the tc datapath 
does not support this, we need a way to re-evaluate the feature and stop adding 
that action.


An approach was proposed (suggested by Eelco) in version 6 [1] of that series: 
manually reprobing that feature bit. Ilya raised concerns about thread-safety 
and maintainability and proposed exploring this alternative idea.


Given the consequences of this approach (loosing all the OFP flows), it doesn't 
seem as a good way to go, and we don't have any other good idea to do this easily.


Therefore, the way to move forward would be to add atomicity to Eric's V6 
version. Note that maintainability is still a concern and re-probing independent 
features should not be the general case.


Also, we need to make sure reprobing does not add dp flows that affect ongoing 
traffic.



[1] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20230901194234.3409915-3-e...@garver.life/



--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 09/11] ci: Fix dpdk build cache key generation.

2023-12-15 Thread David Marchand
On Tue, Dec 5, 2023 at 4:00 PM Eelco Chaudron  wrote:
>
> When new drivers are introduced, the cache key is not accurately computed.
>
> Before the commit 1a1b3106d90e ("ci: Separate DPDK from OVS build."), the
> DPDK build process was integrated in .ci/linux-{setup,build}.sh scripts,
> where specific lines were employed to generate the key. Since it is now
> separated in .ci/dpdk-{setup,build}.sh, this patch computes the key based
> on the content of those dedicated scripts.
>
> Fixes: 4e90baca89f0 ("system-dpdk: Run traffic tests.")
> Signed-off-by: Eelco Chaudron 

Thanks for the fix.

Reviewed-by: David Marchand 

-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/22] ovsdb-server: Configuration via config-file.

2023-12-15 Thread Dumitru Ceara
On 12/14/23 13:40, Ilya Maximets wrote:
> On 12/14/23 08:12, Frode Nordahl wrote:
>> On Thu, Dec 14, 2023 at 2:04 AM Ilya Maximets  wrote:
>>>
>>> The original problem was summarized on the OVS+OVN Conf'22 last year:
>>> https://www.openvswitch.org/support/ovscon2022/#t19
>>> Slides: 
>>> https://www.openvswitch.org/support/ovscon2022/slides/ovsdb-a-database.pdf
>>>
>>> In short, there are way to many ways to configure remotes and databases
>>> but not a single one of them can cover all the situations.  And there
>>> are cases like max-backoff on active-backup connection that are not
>>> configurable at all.
>>>
>>> The proposed solution for this problem was to unify configurations
>>> within a new Local_Config database that will be local to a server
>>> and will have support for all we need in the schema.
>>>
>>> However, there are few issues with this solution:
>>>
>>> 1. It doesn't reduce the number of ways things can be configured,
>>>but increases, because we'll need to add things that are covered
>>>by other methods to the database, while those other methods
>>>are still in use.  And it is actually required to use the
>>>command line or appctl in order to use the values stored in the
>>>database, i.e. --remote=db:db:table:row.
>>>
>>> 2. Not particularly user-friendly.  Database files are user-readable
>>>-ish, but not writable, i.e. require special tools in order to
>>>make changes and in most cases require a running ovsdb-server
>>>in order to make changes.
>>>
>>> 3. We need a way to restrict write access for external users,
>>>but allow local administrators to change the configuration.
>>>Hence, potentially complex access management and potential
>>>security issues.
>>>
>>> Taking all of that into account and spending another year on a
>>> solution :) , I believe, introduction of a configuration file can
>>> make things easier.  This patch set adds support for a new command
>>> line argument --config-file that is mutually exclusive with many
>>> other command line arguments and appctl commands that can configure
>>> the same thing.
>>
>> Thank you for your work on this! The existing configuration model of
>> the OVSDB server is indeed on the list of black magic arts for anyone
>> using it for the first time.

:)

>>
>> I like the way you think wrt. making the new way mutually exclusive
>> with the old ways to reduce the end user surface area on these knobs.
>>
>>> A few key points:
>>>
>>> 1. Configuration file is a plain text JSON file, so it can be edited
>>>with just a text editor without need for any extra tools.
>>>(JSON because we have a decent JOSN parser and there is no need
>>>for extra dependencies.  And no other format is actually superior,
>>>all of them have their own issues.)
>>>
>>> 2. Configuration file is a replacement for many command line options
>>>and appctls, because it is mutually exclusive with them.
>>>If --config-file is set, configuration that can be done via
>>>configuration file can only be done via configuration file.
>>>(It still allows to point to the database for remotes, but there
>>>is no actual need for this, except in very rare cases, and it is
>>>visible from the file that it points to the database.)
>>>
>>> 3. If configuration change is needed in runtime: change the file
>>>and call ovsdb-server/reload.  Ihar also suggested to use SIGHUP
>>>for this, can be added in the future.
>>
>> +1
>>
>>> SSL configuration is not moved to a config-file in this patch set,
>>> but can be done as a follow up.
>>>
>>>
>>> Structure of this patch set:
>>>
>>>  - Patches 1-5 are general fixes and enhancements that are not
>>>really related to the change, but needed/useful
>>>one way or another.
>>>
>>>  - There are only 4 actually large patches that are not tests.
>>>And they are mainly refactoring of the existing code without
>>>externally visible changes.  This is because we have to
>>>isolate all the databases from each other so they do not
>>>share configuration.  But at the same time we have to preserve
>>>all the current interfaces for backward compatibility and
>>>that takes a lot of code.
>>>
>>>  - Most other patches in the set a small simple changes.
>>>
>>> More description and examples are in individual patches and the
>>> documentation.
>>>
>>> TL;DR;  A small example of a config file from the patch #20:
>>>
>>>  {
>>> "remotes": {
>>> "punix:db.sock": {},
>>> "pssl:6641": {
>>> "inactivity-probe": 16000,
>>> "read-only": false,
>>> "role": "ovn-controller"
>>> }
>>> },
>>> "databases": {
>>> "conf.db": {},
>>> "sb.db": {
>>> "service-model": "clustered"
>>
>> I'm a bit worried about this item in the configuration file:
> 
> Hi.  Thanks for your input!
> 
> I thoug

Re: [ovs-dev] [PATCH v3 10/11] ci: Allow make check-dpdk to run the MFEX tests.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:41PM +0100, Eelco Chaudron wrote:
> The testcase change will make sure the 'MFEX Configuration' test
> will run without the need for scapy and it's auto generated tests.
> In addition we exclude the traffic related MFEX tests from running
> on GitHub actions due to limited resources.
> 
> Signed-off-by: Eelco Chaudron 

...

> diff --git a/python/test_requirements.txt b/python/test_requirements.txt
> index c85ce41ad..5043c71e2 100644
> --- a/python/test_requirements.txt
> +++ b/python/test_requirements.txt
> @@ -2,4 +2,5 @@ netaddr
>  pyftpdlib
>  pyparsing
>  pytest
> +scapy
>  tftpy

Hi Eelco,

I'm having a bit of trouble reconciling the change above
with the patch description.

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 09/11] ci: Fix dpdk build cache key generation.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:27PM +0100, Eelco Chaudron wrote:
> When new drivers are introduced, the cache key is not accurately computed.
> 
> Before the commit 1a1b3106d90e ("ci: Separate DPDK from OVS build."), the
> DPDK build process was integrated in .ci/linux-{setup,build}.sh scripts,
> where specific lines were employed to generate the key. Since it is now
> separated in .ci/dpdk-{setup,build}.sh, this patch computes the key based
> on the content of those dedicated scripts.
> 
> Fixes: 4e90baca89f0 ("system-dpdk: Run traffic tests.")
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 09/11] ci: Fix dpdk build cache key generation.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:27PM +0100, Eelco Chaudron wrote:
> When new drivers are introduced, the cache key is not accurately computed.
> 
> Before the commit 1a1b3106d90e ("ci: Separate DPDK from OVS build."), the
> DPDK build process was integrated in .ci/linux-{setup,build}.sh scripts,
> where specific lines were employed to generate the key. Since it is now
> separated in .ci/dpdk-{setup,build}.sh, this patch computes the key based
> on the content of those dedicated scripts.
> 
> Fixes: 4e90baca89f0 ("system-dpdk: Run traffic tests.")
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 08/11] ci: Add make check-system-tso to GitHub actions ci.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:19PM +0100, Eelco Chaudron wrote:
> This patch adds 'make check-system-tso' to the GitHub actions ci.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 07/11] ci: Add make check-system-userspace to GitHub actions ci.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:09PM +0100, Eelco Chaudron wrote:
> This patch adds 'make check-system-userspace' to the GitHub actions ci.
> The tests are not split into two seperate test runs as they complete in
> around 10 minutes.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 06/11] ci: Add make check-offloads to GitHub actions ci.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:01PM +0100, Eelco Chaudron wrote:
> This patch also adds the 'CHECK_GITHUB_ACTION' macro to skip
> tests that won't execute successfully through GitHub actions.
> We could not use the -k !keyword option, as it can not be
> combined with a range of tests.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 11/11] ci: Add make check-afxdp to GitHub actions ci.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 04:00:49PM +0100, Eelco Chaudron wrote:
> This patch adds 'make check-afxdp' to the GitHub actions ci.
> The tests are not split into two seperate test runs as they
> complete in around 10 minutes.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 05/11] ci: Add make check-kernel to GitHub actions ci.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 03:59:54PM +0100, Eelco Chaudron wrote:
> This patch adds 'make check-kernel' to the GitHub actions ci.
> However, to do this, some additional changes were needed. First,
> some of the missing test and package dependencies had to be added.
> Finally, we added an option to the GitHub run matrix that allows
> the tests to be split up, to avoid lengthy single test runs.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 04/11] ci: Exclude tests that show random failures through GitHub actions.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 03:59:46PM +0100, Eelco Chaudron wrote:
> I ran 80 series of full tests, and the following tests showed failures:
> 
>  802.1ad - vlan_limit
>+2023-11-20T10:32:11.245Z|1|dpif_netdev(revalidator5)|ERR|internal
>  error parsing flow key recirc_id(0),dp_hash(0),skb_priority(0),
>  in_port(2),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),
>  packet_type(ns=0,id=0),eth(src=42:7e:4b:46:68:1b,dst=33:33:ff:46:68:1b),
>  eth_type(0x88a8),vlan(vid=4094,pcp=0),encap(eth_type(0x8100),
>  vlan(vid=100,pcp=0),encap(eth_type(0x86dd),ipv6(
>  src=::,dst=ff02::1:ff46:681b,label=0,proto=58,tclass=0,hlimit=255,
>  frag=no),icmpv6(type=135,code=0),nd(target=fe80::407e:4bff:fe46:681b,
>  sll=00:00:00:00:00:00,tll=00:00:00:00:00:00)))
>+2023-11-20T10:32:11.245Z|2|dpif(revalidator5)|WARN|netdev@ovs-netdev:
>  failed to put[modify] (Invalid argument)
>  ufid:ef1ca90c-dbd0-4ca7-9869-411bdffd1ece recirc_id(0),dp_hash(0/0),
>  skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),
>  ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),
>  eth(src=42:7e:4b:46:68:1b,dst=33:33:ff:46:68:1b),eth_type(0x88a8),
>  vlan(vid=4094,pcp=0/0x0),encap(eth_type(0x8100),
>  vlan(vid=100/0x0,pcp=0/0x0),encap(eth_type(0x86dd),
>  ipv6(src=::/::,dst=ff02::1:ff46:681b/::,label=0/0,proto=58/0,
>  tclass=0/0,hlimit=255/0,frag=no),icmpv6(type=135/0,code=0/0),
>  nd(target=fe80::407e:4bff:fe46:681b/::,
>  sll=00:00:00:00:00:00/00:00:00:00:00:00,
>  tll=00:00:00:00:00:00/00:00:00:00:00:00))), actions:drop
> 
>   conntrack - zones from other field, more tests
> +2023-11-20T10:45:43.015Z|1|dpif(handler5)|WARN|system@ovs-system:
>   execute ct(commit),3 failed (Invalid argument) on packet tcp,
>   vlan_tci=0x,dl_src=42:7e:4b:46:68:1b,dl_dst=ba:72:4c:a5:31:6b,
>   nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,
>   nw_frag=no,tp_src=53738,tp_dst=80,tcp_flags=psh|ack tcp_csum:e4a
> 
>   conntrack - limit by zone
> ./system-traffic.at:5154: ovs-appctl dpctl/ct-get-limits zone=0,1,2,3,4,5
> --- - 2023-11-20 10:51:09.965375141 +
> +++ /home/runner/work/ovs/ovs/tests/system-kmod-testsuite.dir/at-groups/
>   114/stdout  2023-11-20 10:51:09.956723756 +
> @@ -1,5 +1,5 @@
>  default limit=10
> -zone=0,limit=5,count=5
> +zone=0,limit=5,count=6
> 
>   conntrack - Multiple ICMP traverse
> ./system-traffic.at:7571: ovs-appctl dpctl/dump-conntrack | ...
>   -e 's/state=[0-9_A-Z]*/state=/g' | sort | uniq
> --- - 2023-11-20 15:36:02.591051192 +
> +++ /home/runner/work/ovs/ovs/tests/system-kmod-testsuite.dir/...
> @@ -1,2 +1,9 @@
> +tcp,orig=(src=10.1.1.7,dst=13.107.43.16,sport=,
>   dport=),reply=(src=13.107.43.16,dst=10.1.1.7,sport=,
>   dport=),protoinfo=(state=)
> +tcp,orig=(src=10.1.1.7,dst=168.63.129.16,sport=,
>   dport=),reply=(src=168.63.129.16,dst=10.1.1.7,sport=,
>   dport=),protoinfo=(state=)
> ...
> +tcp,orig=(src=20.22.98.201,dst=10.1.1.7,sport=,dport=),
>   reply=(src=10.1.1.7,dst=20.22.98.201,sport=,dport=),
>   protoinfo=(state=)
> 
>   conntrack - ct flush
> +++ /home/runner/work/ovs/ovs/tests/system-kmod-testsuite.dir/...
> @@ -1,3 +1,5 @@
> +tcp,orig=(src=10.1.1.154,dst=13.107.42.16,sport=45300,dport=443),
>   reply=(src=13.107.42.16,dst=10.1.1.154,sport=443,dport=45300),
>   protoinfo=(state=ESTABLISHED)
> +tcp,orig=(src=10.1.1.154,dst=20.72.125.48,sport=45572,dport=443),
>   reply=(src=20.72.125.48,dst=10.1.1.154,sport=443,dport=45572),
>   protoinfo=(state=ESTABLISHED)
> 
> As I do not see those failures when running these stand alone on the
> same Ubuntu distribution, I've disabled first three tests for now.

Hi Eelco,

I guess the change described above, and below address different problems,
and thus could be separate patches. But I don't feel particularly strongly
about this.

Reviewed-by: Simon Horman 

> For the other tests we narrowed the result to not include any of the local
> IP addresses in the test results.
> 
> This patch also adds the 'CHECK_GITHUB_ACTION' macro to skip
> tests that won't execute successfully through GitHub actions.
> We could not use the -k !keyword option, as it can not be
> combined with a range of tests.
> 
> Signed-off-by: Eelco Chaudron 

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 03/11] ci: Update the GitHub Ubuntu runner image to Ubuntu 22.04.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 03:59:38PM +0100, Eelco Chaudron wrote:
> Updating this image is a requirement for the kernel system-traffic
> tests to pass on Ubuntu. In addition, 20.04 might be replaced,
> as soon as 24.04 comes out. Or we need to do this when it becomes
> EOL in April 2025.
> 
> Signed-off-by: Eelco Chaudron 

Yes, we should do this :)

Acked-by: Simon Horman 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 01/11] ci: Add JOBS variable to replace all the '-j4' instances.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 03:59:23PM +0100, Eelco Chaudron wrote:
> Add a JOBS variable, which defaults to '-j4' but can be overwritten
> with the same environment variable. This can be useful if you use
> this linux-build.sh script outside of GitHub actions on a machine
> with many cores.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 02/11] ci: Add make check-ovsdb-cluster tests to GitHub action ci.

2023-12-15 Thread Simon Horman
On Tue, Dec 05, 2023 at 03:59:31PM +0100, Eelco Chaudron wrote:
> This patch adds 'make check-ovsdb-cluster' tests to GitHub action ci.
> In addition, this patch also makes sure this test and 'make check' do
> not run as root.
> 
> Signed-off-by: Eelco Chaudron 
> ---
>  .ci/linux-build.sh   |5 -
>  .github/workflows/build-and-test.yml |3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
> index 67c01a644..bb540703e 100755
> --- a/.ci/linux-build.sh
> +++ b/.ci/linux-build.sh
> @@ -129,11 +129,14 @@ else
>  build_ovs
>  for testsuite in $TESTSUITE; do
>  run_as_root=
> +if [ "$testsuite" != "check" ] && \
> +   [ "$testsuite" != "check-ovsdb-cluster" ] ; then
> +run_as_root="sudo -E PATH=$PATH"
> +fi
>  if [ "${testsuite##*dpdk}" != "$testsuite" ]; then
>  sudo sh -c 'echo 1024 > /proc/sys/vm/nr_hugepages' || true
>  [ "$(cat /proc/sys/vm/nr_hugepages)" = '1024' ]
>  export DPDK_EAL_OPTIONS="--lcores 0@1,1@1,2@1"
> -run_as_root="sudo -E PATH=$PATH"
>  fi
>  $run_as_root make $testsuite TESTSUITEFLAGS=${JOBS} RECHECK=yes
>  done

Hi Eelco,

perhaps it is because it is Friday afternoon (although I did just have
a coffee), but I am a but confused by the change above.

My reading is that, before this change:

   run_as_root is set if the $testsuite includes the string "dpdk"

While after the change:

   run_as_root is set if $testsuite is neither "check"
   nor check-ovsdb-cluster".

In both cases:

   * Anything with "dpdk" results in run_as_root set; and
   * "check" and "check-ovsdb-cluster" do not result in run_as_root being set

Further, it seems to me that the other values of TESTSUITE are:
* "": which means the loop won't iterate
* "test": which is special cased and also means the loop won't iterated

So I am a little puzzled regarding the motivation for this change.


> diff --git a/.github/workflows/build-and-test.yml 
> b/.github/workflows/build-and-test.yml
> index 09654205e..5d441157c 100644
> --- a/.github/workflows/build-and-test.yml
> +++ b/.github/workflows/build-and-test.yml
> @@ -164,6 +164,9 @@ jobs:
>  m32:  m32
>  opts: --disable-ssl
>  
> +  - compiler: gcc
> +testsuite:check-ovsdb-cluster
> +
>  steps:
>  - name: checkout
>uses: actions/checkout@v3
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [branch-21.12 v2 ovn] controller: make garp_max_timeout configurable

2023-12-15 Thread Dumitru Ceara
On 12/14/23 18:42, Lorenzo Bianconi wrote:
> When using VLAN backed networks and OVN routers leveraging the
> 'ovn-chassis-mac-mappings' option for east-west traffic, the eth.src field is
> replaced by the chassis mac address in order to not expose the router mac
> address from different nodes and confuse the TOR switch. However doing so
> the TOR switch is not able to learn the port/mac bindings for routed E/W
> traffic and it is force to always flood it. Fix this issue adding the
> capability to configure a given timeout for garp sent by ovn-controller
> and not disable it after the exponential backoff in order to keep
> refreshing the entries in TOR swtich fdb table.
> More into about the issue can be found here [0].
> 
> [0] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-September/050678.html
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2087779
> 
> Signed-off-by: Lorenzo Bianconi 
> Acked-by: Ales Musil 
> Signed-off-by: Mark Michelson 
> ---
> Changes since v1:
> - add missing selftest
> ---

Hi Lorenzo,

Thanks for the backport.

[...]

> diff --git a/tests/ovn.at b/tests/ovn.at
> index 003bf45d5..987e043df 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -30762,3 +30762,108 @@ check test "$current_id" = "$prev_id"
>  OVN_CLEANUP([hv1],[hv2])
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([send gratuitous arp for l3gateway only on selected chassis])

This test is identical to what 77ec310dda51 ("pinctrl.c: Send GARP only
on chassis atached to l3gw") added to branch 22.03.  However, we don't
have the changes that 77ec310dda51 added to pinctrl.c.

Does that mean this test is flaky and might fail occasionally?  Or is it
just that the test doesn't test the change it was supposed to test?

In any case, why not add a new test, just for garp_max_timeout, for main
branch?  We can backport that one alone to all branches, including 21.12.

Thanks,
Dumitru


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] ofproto-dpif-upcall: Resolve checksums before controller upcall.

2023-12-15 Thread Eelco Chaudron



On 11 Dec 2023, at 16:39, Mike Pattrick wrote:

> Currently dp_netdev_upcall() resolves checksums on all packets, but this
> isn't strictly needed. The checksums will be resolved before
> transmission. However, we do have to resolve the checksums before
> sending a packet to the controller as offload flags aren't retained.
>
> Signed-off-by: Mike Pattrick 
> ---
>  lib/dpif-netdev.c |  2 --
>  ofproto/ofproto-dpif-upcall.c | 13 ++---
>  2 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9a59a1b03..8f7aaffbb 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7980,8 +7980,6 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
> struct dp_packet *packet_,
>  ds_destroy(&ds);
>  }
>
> -dp_packet_ol_send_prepare(packet_, 0);
> -
>  return dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata,
>   actions, wc, put_actions, dp->upcall_aux);
>  }
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index cc10f57b5..e974d844a 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1574,15 +1574,20 @@ process_upcall(struct udpif *udpif, struct upcall 
> *upcall,
>  const struct frozen_state *state = &recirc_node->state;
>
>  struct ofproto_async_msg *am = xmalloc(sizeof *am);
> +
> +/* Resolve offloaded checksums, if any. */
> +struct dp_packet *packet_clone = dp_packet_clone(packet);
> +dp_packet_ol_send_prepare(packet_clone, 0);

Doing a clone (malloc/copy), and doing another malloc and copy below sound like 
a waste of resources.

Can we maybe make this more smart in the dp_netdev_upcall()?
Something like the below in dp_netdev_upcall():

  if (type == CONTROLLER_UPCALL)
dp_packet_ol_send_prepare(packet_, 0);


>  *am = (struct ofproto_async_msg) {
>  .controller_id = cookie->controller.controller_id,
>  .oam = OAM_PACKET_IN,
>  .pin = {
>  .up = {
>  .base = {
> -.packet = xmemdup(dp_packet_data(packet),
> -  dp_packet_size(packet)),
> -.packet_len = dp_packet_size(packet),
> +.packet = xmemdup(dp_packet_data(packet_clone),
> +  dp_packet_size(packet_clone)),
> +.packet_len = dp_packet_size(packet_clone),
>  .reason = cookie->controller.reason,
>  .table_id = state->table_id,
>  .cookie = get_32aligned_be64(
> @@ -1598,6 +1603,8 @@ process_upcall(struct udpif *udpif, struct upcall 
> *upcall,
>  },
>  };
>
> +dp_packet_delete(packet_clone);
> +
>  if (cookie->controller.continuation) {
>  am->pin.up.stack = (state->stack_size
>? xmemdup(state->stack, state->stack_size)
> -- 
> 2.39.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/3] Extend the CT flush with mark and labels fields

2023-12-15 Thread Ilya Maximets
On 12/6/23 08:53, Ales Musil wrote:
> The CT flush is not capable of partial flush based
> on the touples and zone combinations. Extend it
> so it also accepts mark and labels. As part of this
> series unify parsing of arguments in dpctl and ovs-ofctl
> for the ct flush command.
> 
> Ales Musil (3):
>   ofp-prop: Add helper for parsing and storing of ovs_u128.
>   dpctl, ovs-ofctl: Unify parsing of ct-flush arguments.
>   openflow: Allow CT flush to match on mark and labels.
> 
>  NEWS   |   3 +
>  include/openflow/nicira-ext.h  |   4 +
>  include/openvswitch/ofp-ct.h   |  14 ++-
>  include/openvswitch/ofp-prop.h |   5 +
>  lib/ct-dpif.c  |  12 ++-
>  lib/dpctl.c|  46 ++---
>  lib/ofp-ct.c   | 182 -
>  lib/ofp-prop.c |  42 
>  tests/ofp-print.at |  84 +++
>  tests/ovs-ofctl.at |  36 +++
>  tests/system-traffic.at| 112 +---
>  utilities/ovs-ofctl.8.in   |  13 ++-
>  utilities/ovs-ofctl.c  |  49 +++--
>  13 files changed, 475 insertions(+), 127 deletions(-)
> 


Hi, Ales.  I updated the NEWS entry and fixed a couple of cosmetic issues.
With that, applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] bridge: Re-create bridges when Flow API is enabled.

2023-12-15 Thread Eelco Chaudron


On 14 Dec 2023, at 19:15, Adrian Moreno wrote:

> On 12/13/23 11:47, Adrian Moreno wrote:
>>
>>
>> On 9/15/23 17:25, Eelco Chaudron wrote:
>>>
>>>
>>> On 9 Sep 2023, at 1:40, Ilya Maximets wrote:
>>>
 Currently OVS requires restart of ovs-vswitchd after enabling hardware
 offload.  This is necessary to make sure all the correct features are
 probed and all the internal configurations are in the right state.

 Making that process a bit less invasive by re-creating bridges instead
 of a full process restart.  Documentation is not updated, because
 disabling hardware offload still can take effect only after restart.

>>>
>>> Hi Ilya,
>>>
>>> I like the idea of re-creating the bridges, however not sure if it 
>>> generated any side effects other than the ones you mention in the FIXME.
>>>
>>
>> I have taken a look at the patch, tested it and added some rework of the 
>> iface_hints to avoid the datapath ports from being deleted when the bridge 
>> is re-created.
>>
>> The recreation of the bridge seems to work but my concern is that, unlike a 
>> restart which uses ovs-save to save & restore the flows, groups and tunnel 
>> metadata, doing this will leave the bridge with no OpenFlow configuration.
>>
>> During the time the controller takes to reconfigure the bridge, reval 
>> threads will wipe the datapath flows. If there is no controller ... well 
>> we'll need to wait for the user to program the flows back. This is quite a 
>> big behavior change so I'd say if we go for this option we better documment 
>> it pretty well.
>>
>
> I'm thinking that, considering this hassle (loosing all flows, meters, tunnel 
> tlv info, etc), isn't it better to simply restart vswitchd?
>
> A more more complex solution (probably big enough for this release which is 
> coming fast) could be to recreate ovs-save internally and:
> - Collect all the flow table, groups, tlv data and meters
> - Delete the bridge and recreate it
> - Reconfigure everything back
>
> Flow stats will probably be lost in the process, which is a weird outcome of 
> enabling hwol but it's better than loosing all the flows.

I guess the goal for this patch was to NOT touch any actual resources, only 
re-probe the capabilities. See comment below:

+    /* Destroy all the remaining bridges if Flow API status changed
+ * as we need to re-probe supported features.  Do not delete
+ * bridge resources to avoid datapath disruption. */

Ilya can you comment on this?

> +Mike
> Any other ideas?
>
>
>> Apart from that I see an error in the log says:
>>
>> 2023-12-13T10:43:44.791Z|00064|ofproto_dpif_rid|ERR|recirc_id 1 left 
>> allocated when ofproto (ktest) is destructed
>>
>> But I think (will double check) the recirc node gets cleaned afterwards.
>>
>> Cheers,
>> Adrián
>>
>>
>>> //Eelco
>>>
>>>
 [
  This is not a full implementation, see the FIXME comment in the code.
  Posted in support for review of the OVS_ACTION_ATTR_DROP patch set.
  I can continue working on the proper implementation once I'm back
  from PTO.
 ]

 Signed-off-by: Ilya Maximets 
 ---
  tests/dpif-netdev.at |  6 +++---
  vswitchd/bridge.c    | 37 +
  2 files changed, 40 insertions(+), 3 deletions(-)

 diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
 index 85119fb81..a3e07895a 100644
 --- a/tests/dpif-netdev.at
 +++ b/tests/dpif-netdev.at
 @@ -410,10 +410,10 @@ m4_define([DPIF_NETDEV_FLOW_HW_OFFLOAD],
    set bridge br0 datapath-type=dummy \
   other-config:datapath-id=1234 fail-mode=secure], [], 
 [],
    [m4_if([$1], [dummy-pmd], [--dummy-numa="0,0,0,0,1,1,1,1"], [])])
 -   AT_CHECK([ovs-appctl vlog/set dpif:file:dbg dpif_netdev:file:dbg 
 netdev_dummy:file:dbg])

     AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw-offload=true])
     OVS_WAIT_UNTIL([grep "netdev: Flow API Enabled" ovs-vswitchd.log])
 +   AT_CHECK([ovs-appctl vlog/set dpif:file:dbg dpif_netdev:file:dbg 
 netdev_dummy:file:dbg])

     AT_CHECK([ovs-ofctl del-flows br0])
     AT_CHECK([ovs-ofctl add-flow br0 in_port=1,actions=IN_PORT])
 @@ -473,10 +473,10 @@ m4_define([DPIF_NETDEV_FLOW_HW_OFFLOAD_OFFSETS],
    set bridge br0 datapath-type=dummy \
   other-config:datapath-id=1234 fail-mode=secure], [], 
 [],
    [m4_if([$1], [dummy-pmd], [--dummy-numa="0,0,0,0,1,1,1,1"], [])])
 -   AT_CHECK([ovs-appctl vlog/set dpif:file:dbg dpif_netdev:file:dbg 
 netdev_dummy:file:dbg])

     AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw-offload=true])
     OVS_WAIT_UNTIL([grep "netdev: Flow API Enabled" ovs-vswitchd.log])
 +   AT_CHECK([ovs-appctl vlog/set dpif:file:dbg dpif_netdev:file:dbg 
 netdev_dummy:file:dbg])

     AT_CHECK([ovs-ofctl del-flows br0])

 @@ -550,10 +550,10 @@ 
 m4_

Re: [ovs-dev] [PATCH v3] ci: Add clang-analyze to GitHub actions.

2023-12-15 Thread Ilya Maximets
On 12/7/23 10:32, Eelco Chaudron wrote:
> This patch detects new static analyze issues, and report them.
> It does this by reporting on the delta for this branch, compared
> to the base branch.
> 
> For example the error might look like this:
> 
>   error level: +0 -0 no changes
>   warning level: +2 +0
> New issue "deadcode.DeadStores Value stored to 'remote' is never read" (1 
> occurrence)
>   file:///home/runner/work/ovs/ovs/vswitchd/ovs-vswitchd.c:86
> New issue "unix.Malloc Potential leak of memory pointed to by 'remote'" 
> (1 occurrence)
>   file:///home/runner/work/ovs/ovs/vswitchd/ovs-vswitchd.c:95
>   note level: +0 -0 no changes
>   all levels: +2 +0
> 
> Signed-off-by: Eelco Chaudron 
> ---
> changes in v2:
>   - When it's a new branch, it compares it to the HEAD of the default branch.
> 
> changes in v3:
>   - Include the clang version as part of the cache
>   - Change the way it looks for the 'default' branch so it will work
> for patch branches.
>   - Also compare to the base branch for forced commits.
> 
>  .ci/linux-build.sh   |   29 +
>  .github/workflows/build-and-test.yml |  106 
> ++
>  2 files changed, 135 insertions(+)
> 
> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
> index aa2ecc505..fedf1398a 100755
> --- a/.ci/linux-build.sh
> +++ b/.ci/linux-build.sh
> @@ -49,6 +49,30 @@ function build_ovs()
>  make -j4
>  }
>  
> +function clang_analyze()
> +{
> +[ -d "./base-clang-analyzer-results" ] && cache_build=false \
> +   || cache_build=true
> +if [ "$cache_build" = true ]; then
> +# If this is a cache build, proceed to the base branch's directory.
> +cd base_ovs_main
> +fi;
> +
> +configure_ovs $OPTS
> +make clean
> +scan-build -o ./clang-analyzer-results -sarif --use-cc=clang make -j4
> +
> +if [ "$cache_build" = true ]; then
> +# Move results, so it will be picked up by the cache.
> +mv ./clang-analyzer-results ../base-clang-analyzer-results
> +cd ..
> +else
> +# Only do the compare on the none cache builds.
> +sarif --check note diff ./base-clang-analyzer-results \
> +./clang-analyzer-results
> +fi;
> +}
> +
>  if [ "$DEB_PACKAGE" ]; then
>  ./boot.sh && ./configure --with-dpdk=$DPDK && make debian
>  mk-build-deps --install --root-cmd sudo --remove debian/control
> @@ -116,6 +140,11 @@ fi
>  
>  OPTS="${EXTRA_OPTS} ${OPTS} $*"
>  
> +if [ "$CLANG_ANALYZE" ]; then
> +clang_analyze
> +exit 0
> +fi
> +
>  if [ "$TESTSUITE" = 'test' ]; then
>  # 'distcheck' will reconfigure with required options.
>  # Now we only need to prepare the Makefile without sparse-wrapped CC.
> diff --git a/.github/workflows/build-and-test.yml 
> b/.github/workflows/build-and-test.yml
> index 09654205e..f8a000490 100644
> --- a/.github/workflows/build-and-test.yml
> +++ b/.github/workflows/build-and-test.yml
> @@ -223,6 +223,112 @@ jobs:
>  name: logs-linux-${{ join(matrix.*, '-') }}
>  path: logs.tgz
>  
> +  build-clang-analyze:
> +needs: build-dpdk
> +env:
> +  dependencies: |
> +automake bc clang-tools libbpf-dev libnuma-dev libpcap-dev \
> +libunbound-dev libunwind-dev libssl-dev libtool llvm-dev \
> +python3-unbound
> +  CC:   clang
> +  DPDK: dpdk
> +  CLANG_ANALYZE: true
> +name: clang-analyze
> +runs-on: ubuntu-22.04
> +timeout-minutes: 30
> +
> +steps:
> +- name: checkout
> +  uses: actions/checkout@v3
> +  with:
> +fetch-depth: 0
> +
> +- name: get base branch sha
> +  id: base_branch
> +  env:
> +BASE_SHA: ${{ github.event.pull_request.base.sha }}
> +EVENT_BEFORE: ${{ github.event.before }}
> +DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
> +FORCED_PUSH: ${{ github.event.forced }}
> +  run: |
> +if [ "$GITHUB_EVENT_NAME" = "pull_request" ]; then
> +  echo "sha=$BASE_SHA" >> $GITHUB_OUTPUT
> +else
> +  if [ "$FORCED_PUSH" = true ] || [ "$EVENT_BEFORE" -eq 0 ]; then
> +set sha=$(git describe --all --no-abbrev --always\
> +   --match master --match main   \
> +   --match branch-[0-9].[0-9]* | \
> +  sed 's/.*\///')
> +[ -z "$sha" ] && echo "sha=$DEFAULT_BRANCH" >> $GITHUB_OUTPUT \
> +  || echo "sha=$sha" >> $GITHUB_OUTPUT

Hi, Eelco.  I was trying to test this patch and it breaks my workflow.

The issue is that commands above is looking for base branches in my
fork.  It finds the 'master' branch, but this branch in my fork is
3-4 years old and fails to build.  Even if it worked, comparing against
a version that old doesn't make sense.   Also, it might happen that
the feature branch is based on an ol