Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.

2024-07-08 Thread Roi Dayan via dev


On 08/07/2024 5:17, LIU Yulong wrote:
> Hi,
> 
> 
> Maybe this fix should be the root cause of:
> ovs-vswitchd core at revalidator_sweep__
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> 
> 
> Did you see such core at revalidator_sweep__ ?
> 
> 
> Regards,
> 
> 
> LIU Yulong
> 
> 

Hi,

It doesn't look like the same issue. from the dump there is  looks
like the state of the ukey did get update to be UKEY_DELETE.



>  
>  
> -- Original --
> From:  "Eelco Chaudron" Date:  Wed, Jul 3, 2024 06:46 PM
> To:  "Roi Dayan" Cc:  "dev" Dickman" Subject:  Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale 
> ukeys leaks.
> 
>  
> 
> 
> 
> On 3 Jul 2024, at 12:22, Roi Dayan wrote:
> 
> > On 03/07/2024 13:01, Eelco Chaudron wrote:
> >>
> >>
> >> On 3 Jul 2024, at 9:31, Roi Dayan wrote:
> >>
> >>> On 18/06/2024 11:53, Eelco Chaudron wrote:
> 
> 
>  On 18 Jun 2024, at 8:05, Roi Dayan wrote:
> 
> > On 03/06/2024 16:29, Eelco Chaudron wrote:
> >>
> >>
> >> On 3 Jun 2024, at 10:07, Roi Dayan wrote:
> >>
> >>> On 03/06/2024 10:18, Roi Dayan wrote:
> 
> 
>  On 30/05/2024 18:48, Eelco Chaudron wrote:
> >
> >
> > On 23 May 2024, at 12:46, Roi Dayan via 
> dev wrote:
> >
> >> It is observed in some environments 
> that there are much more ukeys than
> >> actual DP flows. For example:
> >>
> >> $ ovs-appctl upcall/show
> >> system@ovs-system:
> >> flows : (current 7) (avg 6) (max 
> 117) (limit 2125)
> >> offloaded flows : 525
> >> dump duration : 1063ms
> >> ufid enabled : true
> >>
> >> 23: (keys 3612)
> >> 24: (keys 3625)
> >> 25: (keys 3485)
> >>
> >> The revalidator threads are busy 
> revalidating the stale ukeys leading to
> >> high CPU and long dump duration.
> >>
> >> There are some possible situations 
> that may result in stale ukeys that
> >> have no corresponding DP flows.
> >>
> >> In revalidator, push_dp_ops() 
> doesn't check error if the op type is not
> >> DEL. It is possible that a 
> PUT(MODIFY) fails, especially for tc offload
> >> case, where the old flow is deleted 
> first and then the new one is
> >> created. If the creation fails, the 
> ukey will be stale (no corresponding
> >> DP flow). This patch adds a warning 
> in such case.
> >
> > Not sure I understand, this behavior 
> should be captured by the UKEY_INCONSISTENT state.
> 
>  Hi Eelco,
> 
>  thanks for reviewing.
>  We started the patch on older branch as we 
> didn't rebase for some time
>  and got a little later on sending it.
>  I see the addition now for UKEY_INCOSISTENT 
> in the following patch:
> 
>  cf11766cbcf1 ofproto-dpif-upcall: Fix 
> push_dp_ops to handle all errors.
> 
>  Looks like it handles the same situation we 
> tried to handle in this patch.
> 
> >
> >> Another possible scenario is in 
> handle_upcalls() if a PUT operation did
> >> not succeed and op->error 
> attribute was not set correctly it can lead to
> >> stale ukey in operational state.
> >
> >
> > Guess we need to figure out these cases, 
> as these are the root cause of your problem.
> 
>  right. maybe. but this could help keep the 
> system alive/clean while logging the
>  bad situation instead of having increase in 
> those stale/inconsistent ukeys.
>  I understand if it's not nice to handle it 
> like this.
> 
> >>>
> >>> Hi Eelco,
> >>>
> >>> I remember now one of the reproduction scenarios 
> we did is do some traffic
> >>> to get rules added using TC and then delete 
> those from tc command line
> >>> and it got to stale ukeys.
> >>> The revalidator dump thread not seeing the rules 
> so not updating anything
> >>> and sweep over the ukeys skipping them as well.
> >>> This is why we checked against the timing stats 
> of the ukey.
> >>>
> >>> I remember I tested on the upstream code and not 
> our development branch
> >>> and it reproduces. I didn't notice if the commit 
> adding UKEY_INCONSISTENT
> >>> existed but it handles errors from adding flows 
> so I dont think it matters.
> >>>
> >>> I'll try to check and verify again but I think 
> it's still here.
> >>> So while cases getting dop.error handled with 
> UKEY_INCONSISTENT,
> >>> this case I think is not.
> >>
> >> I think you are right this case is not covered by 
> the UKEY_INCONSISTENT test below. See my suggestion on fixing this in 
> revalidate_ukey().
> >>
> >> Maybe you could also add a test case for this in the 
> offload test suite.
> >>
> >>

[ovs-dev] [ovs-dev, v8] [PATCH]ipf: Fix ovs ipf crash.

2024-07-08 Thread 赖香武
hi, llya and Michael

Can you take the time to take a look at my patch? Thank you.

At 2024-05-28 23:16:46, "Ilya Maximets"  wrote:
>On 5/23/24 09:40, laixiangwu wrote:
>> Description:
>> 
>> when 1) The fragment timeout is between 15 seconds and 25 seconds; 2) 
>> The ipf_list currently has received more than 32 fragments, and there 
>> are other fragments of same big packet that have not been received.
>> 
>> When the above two scenario conditions are met, due to exceeding the 
>> capacity of the packet batch(here is 32), ipf_dp_packet_batch_add 
>> returns false, and ipf_list will not be cleared. However, the 32 
>> fragments packets added to the packet batch will be processed normally. 
>> When receiving the subsequent fragments of the ipf_list, because the 
>> first 32 fragments have been processed, when processing subsequent 
>> fragment packets, relevant information about the processed fragment 
>> packets will be read,therefore will occur carsh.
>> One solution is do not forward timeout fragment packets from the above 
>> scenarios, that is, do not add them to the packet batch, and handle 
>> other scenarios according to the original logic.
>> Signed-off-by: laixiangwu <15310488...@163.com>
>> ---
>>  lib/ipf.c | 10 --
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>
>Hi, laixiangwu.  This version of the patch looks the same as the
>previous one here:
>  
> https://patchwork.ozlabs.org/project/openvswitch/patch/20240522021957.2292-1-15310488...@163.com/
>
>And I see Mike asked a few questions for the approach there.
>Could you, please, answer those?
>
>For now, I'll mark this patch with 'Changes Requested'.
>
>If you plan to send a new version based on Mike's comments, please, add
>'v6' to the subject prefix, i.e. [PATCH v6], since it's technically a
>6th version of it.
>
>Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] controller: Add lsp option disable_garp_rarp.

2024-07-08 Thread Shibir Basak
If the lsp option 'disable_garp_rarp' is set to true,
GARP and RARP announcements are not sent by ovn when a
VIF port is created on a bridged logical switch.

Usecase

This option is useful during VM migration to let the
hypervisor/VM send the RARP/GARP once VM is ready to
process the packets post migration. This helps to reduce
downtime during VM migration.

Signed-off-by: Shibir Basak 
Acked-by: Naveen Yerramneni 
---
 controller/pinctrl.c |  4 +--
 ovn-nb.xml   |  7 +
 tests/ovn.at | 68 
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 9a2f3f5b3..800c85d21 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -6618,7 +6618,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn 
*ovnsb_idl_txn,
 SSET_FOR_EACH (iface_id, &localnet_vifs) {
 const struct sbrec_port_binding *pb = lport_lookup_by_name(
 sbrec_port_binding_by_name, iface_id);
-if (pb) {
+if (pb && !smap_get_bool(&pb->options, "disable_garp_rarp", false)) {
 send_garp_rarp_update(ovnsb_idl_txn,
   sbrec_mac_binding_by_lport_ip,
   local_datapaths, pb, &nat_addresses,
@@ -6631,7 +6631,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn 
*ovnsb_idl_txn,
 SSET_FOR_EACH (gw_port, &local_l3gw_ports) {
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(sbrec_port_binding_by_name, gw_port);
-if (pb) {
+if (pb && !smap_get_bool(&pb->options, "disable_garp_rarp", false)) {
 send_garp_rarp_update(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip,
   local_datapaths, pb, &nat_addresses,
   garp_max_timeout, garp_continuous);
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 9552534f6..3a0f0e34d 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1290,6 +1290,13 @@
   The default value is false.
 
 
+
+  If set to true, GARP and RARP announcements are not
+  sent when a VIF port is created on a bridged logical switch.
+  The default value is false.
+
+
 
   If set to mc_unknown, packets going to this VIF get cloned to all
diff --git a/tests/ovn.at b/tests/ovn.at
index 87a64499f..a71a83394 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -25013,6 +25013,74 @@ OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([Disabling RARP/GARP announcements])
+ovn_start
+
+# In this test case we create 1 switch and bring up 4 VIFs on it.
+# Two VIFs will be assigned MAC addresses only (i.e. without ips)
+# and two VIFs will be assigned IP addresses along with MAC addresses.
+# VIFs with IPs are supposed to send GARPs and VIFs with only MAC
+# addresses are supposed to send RARP. However, we test the lsp
+# option disable_garp_rarp, which when set to true for an lsp does
+# not send the GARP/RARP announcements.
+
+ovn-nbctl ls-add ls1
+ovn-nbctl lsp-add ls1 ln1 "" 101
+ovn-nbctl lsp-set-addresses ln1 unknown
+ovn-nbctl lsp-set-type ln1 localnet
+ovn-nbctl lsp-set-options ln1 network_name=phys
+
+ovn-nbctl lsp-add ls1 lp11
+ovn-nbctl lsp-set-addresses lp11 "f0:00:00:00:00:11"
+
+ovn-nbctl lsp-add ls1 lp12
+ovn-nbctl lsp-set-addresses lp12 "f0:00:00:00:00:12"
+ovn-nbctl set logical_switch_port lp12 options:disable_garp_rarp=true
+
+ovn-nbctl lsp-add ls1 lp13
+ovn-nbctl lsp-set-addresses lp13 "f0:00:00:00:00:13 192.168.1.3"
+
+ovn-nbctl lsp-add ls1 lp14
+ovn-nbctl lsp-set-addresses lp14 "f0:00:00:00:00:14 192.168.1.4"
+ovn-nbctl set logical_switch_port lp14 options:disable_garp_rarp=true
+
+net_add n1
+
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+AT_CHECK([ovs-vsctl add-port br-phys snoopvif -- set Interface snoopvif 
options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap])
+
+ovs-vsctl add-port br-int vif11 -- \
+set Interface vif11 external-ids:iface-id=lp11
+
+ovs-vsctl add-port br-int vif12 -- \
+set Interface vif12 external-ids:iface-id=lp12
+
+ovs-vsctl add-port br-int vif13 -- \
+set Interface vif13 external-ids:iface-id=lp13
+
+ovs-vsctl add-port br-int vif14 -- \
+set Interface vif14 external-ids:iface-id=lp14
+
+wait_for_ports_up
+ovn-nbctl --wait=sb sync
+
+# RARP packet for lp11
+echo 
"f011816580350001080006040003f011f011"
 > expected
+# GARP packet for lp13
+echo 
"f013816508060001080006040001f013c0a80103c0a80103"
 >> expected
+OVN_CHECK_PACKETS_UNIQ([hv1/snoopvif-tx.pcap], [expected])
+
+OVN_CLEANUP([hv1])
+
+AT_CLEANUP
+])
+
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([Stateless Floating IP])
 ovn_start
-- 
2.22.3

___
dev mailing list
d...@op

Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-07-08 Thread Florian Westphal
Xin Long  wrote:
> > Xin Long  wrote:
> > > I can avoid this warning by not allocating ext for commit ct in ovs:
> > >
> > > @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct,
> > > struct sw_flow_key *key,
> > > struct nf_conn_labels *cl;
> > > int err;
> > >
> > > -   cl = ovs_ct_get_conn_labels(ct);
> > > +   cl = nf_ct_labels_find(ct);
> > > if (!cl)
> > > return -ENOSPC;
> ovs_ct_get_conn_labels() must be replaced with nf_ct_labels_find() in here
> anyway, thinking that the confirmed ct without labels was created in other
> places (not by OVS conntrack), the warning may still be triggered when
> trying to set labels in OVS after.

Right.

> > Then ovs_ct_limit_init() and nf_connlabels_get() need to be called
> > once on the first conntrack operatation, regardless if labels are asked
> > for or not.
> >
> > Not nice but still better than current state.
> Right, not nice, it undermines the bits check against NF_CT_LABELS_MAX_SIZE.

Yes, but OTOH this bit check isn't used anymore, it comes from days when
label area was dynamically sized, today is hardcoded to 128 anyway.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/3] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-07-08 Thread David Marchand
On Mon, Jul 8, 2024 at 4:22 PM Mike Pattrick  wrote:
> On Mon, Jul 8, 2024 at 8:23 AM David Marchand  
> wrote:
> > A final note, I suspect all those checks negatively impact non tso
> > packets processing when OVS tso is enabled.
> >
> > Would it be feasible to mark that tso (or a tx offload) has been
> > requested at the "batch" level?
> > This is more an optimisation... maybe in the future?
>
> I've been thinking about changing the trunc member to a flags bit
> array lately in other contexts as well. It would be useful to mark if
> a batch has fragmented packets, if there is mixed memory between
> mbuf's and other types of packet source memory, there should be other
> uses as well. Having batch flags would remove a few instances of
> iterating over the whole batch.
>
> I think this is a seperate patch though.

Yes.
TSO is still marked experimental and we can wait for such optimisations.

Thanks Mike.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-07-08 Thread Xin Long
On Mon, Jul 8, 2024 at 6:38 PM Florian Westphal  wrote:
>
> Xin Long  wrote:
> > I can avoid this warning by not allocating ext for commit ct in ovs:
> >
> > @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct,
> > struct sw_flow_key *key,
> > struct nf_conn_labels *cl;
> > int err;
> >
> > -   cl = ovs_ct_get_conn_labels(ct);
> > +   cl = nf_ct_labels_find(ct);
> > if (!cl)
> > return -ENOSPC;
ovs_ct_get_conn_labels() must be replaced with nf_ct_labels_find() in here
anyway, thinking that the confirmed ct without labels was created in other
places (not by OVS conntrack), the warning may still be triggered when
trying to set labels in OVS after.

> >
> > However, the test case would fail, although the failure can be worked around
> > by setting ct_label in the 1st rule:
> >
> >   
> > table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1)
> >
> > So I'm worrying our change may break some existing OVS user cases.
>
> Then ovs_ct_limit_init() and nf_connlabels_get() need to be called
> once on the first conntrack operatation, regardless if labels are asked
> for or not.
>
> Not nice but still better than current state.
Right, not nice, it undermines the bits check against NF_CT_LABELS_MAX_SIZE.

>
> ovs_ct_execute() perhaps?
ovs_ct_execute() is in the hot path, and a better place would be
ovs_ct_copy_action() where the ct for an ovs flow is added.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] Maintainers: Fix the typo in the email id.

2024-07-08 Thread Numan Siddique
On Mon, Jul 8, 2024 at 5:49 PM Dumitru Ceara  wrote:
>
> On 7/8/24 23:22, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > Also added my ovn.org email id to the list.
> >
> > Signed-off-by: Numan Siddique 
> > ---
>
> Looks good to me, thanks!
>
> Acked-by: Dumitru Ceara 

Thanks.  Applied to main.

Numan

>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-07-08 Thread Florian Westphal
Xin Long  wrote:
> I can avoid this warning by not allocating ext for commit ct in ovs:
> 
> @@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct,
> struct sw_flow_key *key,
> struct nf_conn_labels *cl;
> int err;
> 
> -   cl = ovs_ct_get_conn_labels(ct);
> +   cl = nf_ct_labels_find(ct);
> if (!cl)
> return -ENOSPC;
> 
> However, the test case would fail, although the failure can be worked around
> by setting ct_label in the 1st rule:
> 
>   
> table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1)
> 
> So I'm worrying our change may break some existing OVS user cases.

Then ovs_ct_limit_init() and nf_connlabels_get() need to be called
once on the first conntrack operatation, regardless if labels are asked
for or not.

Not nice but still better than current state.

ovs_ct_execute() perhaps?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-07-08 Thread Xin Long
On Wed, Jun 19, 2024 at 6:10 PM Xin Long  wrote:
>
> On Wed, Jun 19, 2024 at 5:20 PM Florian Westphal  wrote:
> >
> > Xin Long  wrote:
> > > Got it, I will fix this in ovs.
> >
> > Thanks!
> >
> > I don't want to throw more work at you, but since you are
> > already working on ovs+conntrack:
> >
> > ovs_ct_init() is bad, as this runs unconditionally.
> >
> > modprobe openvswitch -> conntrack becomes active in all
> > existing and future namespaces.
> >
> > Conntrack is slow, we should avoid this behaviour.
> >
> > ovs_ct_limit_init() should be called only when the feature is
> > configured (the problematic call is nf_conncount_init, as that
> > turns on connection tracking, the kmalloc etc is fine).
> understand.
>
> >
> > Likewise, nf_connlabels_get() should only be called when
> > labels are added/configured, not at the start.
> >
> > I resolved this for tc in 70f06c115bcc but it seems i never
> > got around to fix it for ovs.
Hi, Florian,

I noticed the warning in nf_ct_ext_add() while I'm making this change:

   WARN_ON(nf_ct_is_confirmed(ct));

It can be triggered by these ovs flows from ovs selftests:

  
table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
  
table=1,ip,actions=ct(commit,nat(src=10.1.1.240),exec(set_field:0xac->ct_mark,set_field:0xac->ct_label),table=2)

The 1st flow creates the ct and commits/confirms it, then the 2nd flow is
setting ct_label on a confirmed ct. With this change, the connlabels ext
is not yet allocated at the time, and then the warning is triggered when
allocating it in nf_ct_ext_add().

tc act_ct action doesn't have this issue, as it returns an error if the
connlabels is not found in tcf_ct_act_set_labels(), instead of allocating it.

I can avoid this warning by not allocating ext for commit ct in ovs:

@@ -426,7 +426,7 @@ static int ovs_ct_set_labels(struct nf_conn *ct,
struct sw_flow_key *key,
struct nf_conn_labels *cl;
int err;

-   cl = ovs_ct_get_conn_labels(ct);
+   cl = nf_ct_labels_find(ct);
if (!cl)
return -ENOSPC;

However, the test case would fail, although the failure can be worked around
by setting ct_label in the 1st rule:

  
table=0,priority=30,in_port=1,ip,nw_dst=172.1.1.2,actions=ct(commit,nat(dst=10.1.1.2:80),exec(set_field:0x01->ct_label),table=1)

So I'm worrying our change may break some existing OVS user cases.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/8] northd: Assume all chassis support the "ct-no-masked-label" feature.

2024-07-08 Thread Dumitru Ceara
On 7/8/24 13:24, Dumitru Ceara wrote:
> This feature is supported in the last two LTS releases and the correct
> upgrade procedure mandates that we don't jump across LTS releases.  It's
> safe to remove the check in northd.
> 
> Also remove logical field definitions for ct_label.s that cannot
> appear anymore in the southbound database.
> 
> Signed-off-by: Dumitru Ceara 
> ---

ovsrobot caught the fact that I didn't update the "Load balancer CT
related backwards compatibility" test in this patch.  The test is
removed completely in the next patch of the series but I should've
made sure it passes in this patch.

https://github.com/ovsrobot/ovn/actions/runs/9838921044/job/2715653

I'll fold the incremental change below in v2 but I'll wait some
more for comments on the rest of the series.

Regards,
Dumitru

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 7230244325..c294d1a644 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -10007,42 +10007,42 @@ AT_CHECK([grep -e "lr_in_defrag" -e "lr_in_dnat" 
lflows1], [0], [dnl
   table=??(lr_in_defrag   ), priority=0, match=(1), action=(next;)
   table=??(lr_in_defrag   ), priority=100  , match=(ip && ip4.dst == 
192.168.0.1), action=(ct_dnat;)
   table=??(lr_in_dnat ), priority=0, match=(1), action=(next;)
-  table=??(lr_in_dnat ), priority=110  , match=(ct.new && !ct.rel && 
ip4 && ip4.dst == 192.168.0.1), action=(ct_lb(backends=192.168.1.10);)
-  table=??(lr_in_dnat ), priority=50   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted), action=(next;)
-  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted && ct_label.force_snat == 1), 
action=(flags.force_snat_for_lb = 1; next;)
-  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted && ct_label.skip_snat == 1), 
action=(flags.skip_snat_for_lb = 1; next;)
+  table=??(lr_in_dnat ), priority=110  , match=(ct.new && !ct.rel && 
ip4 && ip4.dst == 192.168.0.1), action=(ct_lb_mark(backends=192.168.1.10);)
+  table=??(lr_in_dnat ), priority=50   , match=(ct.est && !ct.rel && 
!ct.new && ct_mark.natted), action=(next;)
+  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_mark.natted && ct_mark.force_snat == 1), 
action=(flags.force_snat_for_lb = 1; next;)
+  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_mark.natted && ct_mark.skip_snat == 1), 
action=(flags.skip_snat_for_lb = 1; next;)
 ])
 
 check ovn-nbctl --wait=sb set logical_router lr 
options:lb_force_snat_ip="192.168.1.1"
 AT_CHECK([ovn-sbctl lflow-list | grep lr_in_dnat | ovn_strip_lflows], [0], [dnl
   table=??(lr_in_dnat ), priority=0, match=(1), action=(next;)
-  table=??(lr_in_dnat ), priority=110  , match=(ct.new && !ct.rel && 
ip4 && ip4.dst == 192.168.0.1), action=(flags.force_snat_for_lb = 1; 
ct_lb(backends=192.168.1.10; force_snat);)
-  table=??(lr_in_dnat ), priority=50   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted), action=(next;)
-  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted && ct_label.force_snat == 1), 
action=(flags.force_snat_for_lb = 1; next;)
-  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted && ct_label.skip_snat == 1), 
action=(flags.skip_snat_for_lb = 1; next;)
+  table=??(lr_in_dnat ), priority=110  , match=(ct.new && !ct.rel && 
ip4 && ip4.dst == 192.168.0.1), action=(flags.force_snat_for_lb = 1; 
ct_lb_mark(backends=192.168.1.10; force_snat);)
+  table=??(lr_in_dnat ), priority=50   , match=(ct.est && !ct.rel && 
!ct.new && ct_mark.natted), action=(next;)
+  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_mark.natted && ct_mark.force_snat == 1), 
action=(flags.force_snat_for_lb = 1; next;)
+  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_mark.natted && ct_mark.skip_snat == 1), 
action=(flags.skip_snat_for_lb = 1; next;)
 ])
 check ovn-nbctl remove logical_router lr options lb_force_snat_ip
 
 check ovn-nbctl --wait=sb set load_balancer lb-test options:skip_snat="true"
 AT_CHECK([ovn-sbctl lflow-list | grep lr_in_dnat | ovn_strip_lflows], [0], [dnl
   table=??(lr_in_dnat ), priority=0, match=(1), action=(next;)
-  table=??(lr_in_dnat ), priority=110  , match=(ct.new && !ct.rel && 
ip4 && ip4.dst == 192.168.0.1), action=(flags.skip_snat_for_lb = 1; 
ct_lb(backends=192.168.1.10; skip_snat);)
-  table=??(lr_in_dnat ), priority=50   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted), action=(next;)
-  table=??(lr_in_dnat ), priority=70   , match=(ct.est && !ct.rel && 
!ct.new && ct_label.natted && ct_label.force_snat == 1), 
action=(flags.force_snat_for_lb = 1; next;)
-  table=??(

Re: [ovs-dev] [PATCH] ovsdb: raft: Don't forward more than one command to the leader.

2024-07-08 Thread Ilya Maximets
On 6/28/24 11:43, Ilya Maximets wrote:
> On 6/28/24 09:20, Dumitru Ceara wrote:
>> On 6/27/24 00:02, Ilya Maximets wrote:
>>> Every transaction has RAFT log prerequisites.  Even if transactions
>>> are not related (because RAFT doesn't actually know what data it is
>>> handling).  When leader writes a new record to a RAFT storage, it is
>>> getting appended to the log right away and changes current 'eid',
>>> i.e., changes prerequisites.  The leader will not try to write new
>>> records until the current one is committed, because until then the
>>> pre-check will be failing.
>>>
>>> However, that is different for the follower.  Followers do not add
>>> records to the RAFT log until the leader sends an append request back.
>>> So, if there are multiple transactions pending on a follower, it will
>>> create a command for each of them and prerequisites will be set to the
>>> same values.  All these commands will be sent to the leader, but only
>>> one can succeed at a time, because accepting one command immediately
>>> changes prerequisites and all other commands become non-applicable.
>>> So, out of N commands, 1 will succeed and N - 1 will fail.  The cluster
>>> failure is a transient failure, so the follower will re-process all the
>>> failed transactions and send them again.  1 will succeed and N - 2 will
>>> fail.  And so on, until there are no more transactions.  In the end,
>>> instead of processing N transactions, the follower is performing
>>> N * (N - 1) / 2 transaction processing iterations.  That is consuming
>>> a huge amount of CPU resources completely unnecessarily.
>>>
>>> Since there is no real chance for multiple transactions from the same
>>> follower to succeed, it's better to not send them in the first place.
>>> This also eliminates prerequisite mismatch messages on a leader in
>>> this particular case.
>>>
>>
>> Makes sense!
>>
>>> In a test with 30 parallel shell threads executing 12K transactions
>>> total with separate ovsdb-client calls through the same follower there
>>> is about 60% performance improvement.  The test takes ~100 seconds to
>>> complete without this change and ~40 seconds with this change applied.
>>> The new time is very close to what it takes to execute the same test
>>> through the cluster leader.
>>>
>>
>> Maybe a public link to the test (if possible) would be a nice reference
>> to have in the future?
> 
> I'm not sure where to post it, so I'll just leave it here, and I can
> add a link to this post into the commit message.
> 
> The script is adding 12K ports across 120 logical switches in a style
> of ovn-kubernetes (sets up port security, some external IDs and adds
> to an address set):
> 
> ---
> $ cat ../transaction-benchmark-parallel.sh
> #set -x
> set -o errexit
> 
> # Creating 120 logical switches
> for i in $(seq 120); do
> ovn-nbctl ls-add ip-10-10-177-$i.us-west-2.compute.internal
> done
> IFS=$'\r\n' GLOBIGNORE='*' \
> command eval  'uuids=($(ovn-nbctl --bare --columns _uuid list 
> logical_switch | sed "/^$/d"))'
> 
> for i in $(seq 120); do
> echo "uuid #${i}: ${uuids[$(($i - 1))]}"
> done
> 
> # Creating one addres set
> as_name="a0123456789012345678"
> as_uuid=$(ovn-nbctl create address_set name=${as_name})
> 
> # Adding ports
> function add_400() {
> rm -rf txn_results$1.txt
> for i in $(seq $1 $(($1 + 399)) ); do
> echo $i
> 
> port_name="node-density-----_node-density-$i"
> namespace="node-density-----"
> ls_name="ip-10-10-177-$((${i} % 120 + 1)).us-west-2.compute.internal"
> uuid_name="row____"
> mac=$(printf '0a:58:0a:9b:%02x:%02x' $((${i} >> 8)) $((${i} & 255)))
> ip=$(printf "10.11.%d.%d\n" $((${i} / 255)) $((${i} % 255 + 1)))
> 
> time ovsdb-client transact unix:$(pwd)/sandbox/nb2.ovsdb 
> "[\"OVN_Northbound\",
> {
>   \"uuid-name\":\"${uuid_name}\",
>   \"row\":{
>   \"name\":\"${port_name}\",
>   \"options\":[\"map\",[[\"requested-chassis\",\"${ls_name}\"]]],
>   \"addresses\":[\"set\",[\"${mac} ${ip}\"]],
>   
> \"external_ids\":[\"map\",[[\"pod\",\"true\"],[\"namespace\",\"${namespace}\"]]],
>   \"port_security\":[\"set\",[\"${mac} ${ip}\"]]
> },
>   \"op\":\"insert\", \"table\":\"Logical_Switch_Port\"
> },
> {
>   \"where\":[[\"_uuid\",\"==\",[\"uuid\",\"${uuids[$(($i % 
> 120))]}\"]]],
>   
> \"mutations\":[[\"ports\",\"insert\",[\"set\",[[\"named-uuid\",\"${uuid_name}\"],
>   \"op\":\"mutate\", \"table\":\"Logical_Switch\"
> },
> {
>   \"where\":[[\"_uuid\",\"==\",[\"uuid\",\"${as_uuid}\"]]],
>   \"mutations\":[[\"addresses\",\"insert\",[\"set\",[\"${ip}\",
>   \"op\":\"mutate\", \"table\":\"Address_Set\"
> }]" >> txn_results$1.txt
> done
> echo "done

Re: [ovs-dev] [PATCH ovn v2] Maintainers: Fix the typo in the email id.

2024-07-08 Thread Dumitru Ceara
On 7/8/24 23:22, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> Also added my ovn.org email id to the list.
> 
> Signed-off-by: Numan Siddique 
> ---

Looks good to me, thanks!

Acked-by: Dumitru Ceara 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] selftests: openvswitch: retry instead of sleep

2024-07-08 Thread Adrián Moreno
On Mon, Jul 08, 2024 at 09:31:58PM GMT, Ilya Maximets wrote:
> On 7/8/24 15:44, Adrian Moreno wrote:
> > There are a couple of places where the test script "sleep"s to wait for
> > some external condition to be met.
> >
> > This is error prone, specially in slow systems (identified in CI by
> > "KSFT_MACHINE_SLOW=yes").
> >
> > To fix this, add a "ovs_wait" function that tries to execute a command
> > a few times until it succeeds. The timeout used is set to 5s for
> > "normal" systems and doubled if a slow CI machine is detected.
> >
> > This should make the following work:
> >
> > $ vng --build  \
> > --config tools/testing/selftests/net/config \
> > --config kernel/configs/debug.config
> >
> > $ vng --run . --user root -- "make -C tools/testing/selftests/ \
> > KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
> >
> > Signed-off-by: Adrian Moreno 
> > ---
> >  .../selftests/net/openvswitch/openvswitch.sh  | 49 ---
> >  .../selftests/net/openvswitch/ovs-dpctl.py|  1 +
> >  2 files changed, 42 insertions(+), 8 deletions(-)
> >
>
> Hi, Adrian.  See a small pile of nitpicks below.
>
> None of them are blocking from my perspective, except for a typo.
> Just listed them since there is a typo anyway.
>
> > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
> > b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > index bc71dbc18b21..83407b42073a 100755
> > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > @@ -11,6 +11,7 @@ ksft_skip=4
> >  PAUSE_ON_FAIL=no
> >  VERBOSE=0
> >  TRACING=0
> > +WAIT_TIMEOUT=5
> >
> >  tests="
> > arp_pingeth-arp: Basic arp ping between 
> > two NS
> > @@ -29,6 +30,32 @@ info() {
> > [ $VERBOSE = 0 ] || echo $*
> >  }
> >
> > +ovs_wait() {
> > +   info "waiting $WAIT_TIMEOUT s for: $@"
> > +
> > +   "$@"
> > +   if [[ $? -eq 0 ]]; then
>
> Maybe just 'if "$@"; then' ?
>

In my head this is a bit less clean but I don't mind.

> > +   info "wait succeeded inmediately"
>
> * immediately

Thanks. Will fix the typo.

>
> > +   return 0
> > +   fi
> > +
> > +   # A quick re-check helps speed up small races in fast systems.
> > +   # However, fractional sleeps might not necessarily work.
> > +   local start=0
> > +   sleep 0.1 || { sleep 1; start=1; }
> > +
> > +   for (( i=start; i
> for i in $(seq ${start} ${WAIT_TIMEOUT}); do
>
> Will need to initialize start to 1 and 2.
>
> It works, but seems like an unnecessary use of non-POSIX constructs.

The reason why I chose this form is that I find it more robust on a
script that changes IFS. If this function is called within a block that
has changed IFS, "i" will take the entire sequence as the value for the
first iteration.

>
> > +   "$@"
> > +   if [[ $? -eq 0 ]]; then
>
> if "$@"; then
>
> > +   info "wait succeeded after $i seconds"
> > +   return 0
> > +   fi
> > +   sleep 1
> > +   done
> > +   info "wait failed after $i seconds"
> > +   return 1
> > +}
> > +
> >  ovs_base=`pwd`
> >  sbxs=
> >  sbx_add () {
> > @@ -278,20 +305,21 @@ test_psample() {
> >
> > # Record psample data.
> > ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py 
> > psample-events
> > +   ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout
> >
> > # Send a single ping.
> > -   sleep 1
> > ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 
> > 1 || return 1
> > -   sleep 1
> >
> > # We should have received one userspace action upcall and 2 psample 
> > packets.
> > -   grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || 
> > return 1
> > +   ovs_wait grep -q "userspace action command" $ovs_dir/s0.out
> > +   [[ $? -eq 0 ]] || return 1
>
> Why checking separately and not one the same line with || return 1 ?

IMHO, passing complex commands to a function in bash can easily get very
problematic. That's why I try to remove all pipes, redirections or
logical operators like && and ||. At least for me it removes one extra
cycle that my brain has to spend looking at quotes and figuring out if
the operand will be interpreted inside the function or outside.

> Also double brackets seem unnecessary.

That's true.

>
> >
> > # client -> server samples should only contain the first 14 bytes of 
> > the packet.
> > -   grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
> > -$ovs_dir/stdout >/dev/null 2>&1 || return 1
> > -   grep -E "rate:4294967295,group:2,cookie:eeff0c" \
> > -$ovs_dir/stdout >/dev/null 2>&1 || return 1
> > +   ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee 
> > data:[0-9a-f]{28}$" $ovs_dir/stdout
> > +   [[ $? -eq 0 ]] || return 1
> > +
> > +   ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout
> > +   [[ $? -eq 0 ]] || ret

Re: [ovs-dev] [PATCH ovn v4 1/1] Add support for overlay provider networks.

2024-07-08 Thread Numan Siddique
On Mon, Jul 8, 2024 at 2:54 AM Han Zhou  wrote:
>
> On Thu, Jun 6, 2024 at 3:38 PM  wrote:
> >
> > From: Numan Siddique 
> >
> > It is expected that a provider network logical switch has a localnet logical
> > switch port in order to bridge the overlay traffic to the underlay traffic.
> > There can be some usecases where a overlay logical switch (without
> > a localnet port) can act as a provider network and presently NAT doesn't
> > work as expected.  This patch adds this support.  A new config option
> > "overlay_provider_network" is added to support this feature.
> > This feature gets enabled for a logical switch 'P' if:
> >   - The above option is set to true in the Logical_Switch.other_config
> > column.
> >   - The logical switch 'P' doesn't have any localnet ports.
> >   - The logical router port of a router 'R' connecting to 'P'
> > is a gateway router port.
> >   - And the logical router 'R' has only one gateway router port.
> >
> > If all the above conditions are met, ovn-northd creates a chassisredirect
> > port for the logical switch port (of type router) connecting to the
> > router 'R'.  For example, if the logical port is named as "P-R" and its
> > peer router port is "R-P", then chassisredirect port cr-P-R is created
> > along with cr-R-P.  Gateway chassis binding the cr-R-P also binds cr-P-R.
> > This ensures that the routing is centralized on this gateway chassis for
> > the traffic coming from switch "P" towards the router or vice versa.
> > This centralization is required in order to support NAT (both SNAT and
> > DNAT).  Distributed NAT (i.e if external_mac and router_port is set) is
> > not supported and instead the router port mac is used for such traffic.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-364
> > Signed-off-by: Numan Siddique 
> > ---
> >  NEWS  |   2 +
> >  controller/physical.c |   4 +
> >  northd/northd.c   | 239 ++
> >  northd/northd.h   |   1 +
> >  ovn-nb.xml|  27 ++
> >  tests/multinode-macros.at |   2 +-
> >  tests/multinode.at| 177 +
> >  tests/ovn-northd.at   | 520 +-
> >  tests/ovn.at  |   8 +-
> >  9 files changed, 928 insertions(+), 52 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 3bdc551728..51e6eeabc9 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -31,6 +31,8 @@ Post v24.03.0
> >  has been renamed to "options:ic-route-denylist" in order to comply with
> >  inclusive language guidelines. The previous name is still recognized to
> >  aid with backwards compatibility.
> > +  - Added Overlay provider network support to a logical switch if
> > +the config "overlay_provider_network" is set to true.
> >
> >  OVN v24.03.0 - 01 Mar 2024
> >  --
> > diff --git a/controller/physical.c b/controller/physical.c
> > index 25da789f0b..69a617f341 100644
> > --- a/controller/physical.c
> > +++ b/controller/physical.c
> > @@ -1607,6 +1607,10 @@ consider_port_binding(struct ovsdb_idl_index 
> > *sbrec_port_binding_by_name,
> >  ct_zones);
> >  put_zones_ofpacts(&zone_ids, ofpacts_p);
> >
> > +/* Clear the MFF_INPORT.  Its possible that the same packet may
> > + * go out from the same tunnel inport. */
> > +put_load(ofp_to_u16(OFPP_NONE), MFF_IN_PORT, 0, 16, ofpacts_p);
> > +
> >  /* Resubmit to table 41. */
> >  put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> >  }
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 9f81afccbb..2af9295a50 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -2098,6 +2098,53 @@ parse_lsp_addrs(struct ovn_port *op)
> >  }
> >  }
> >
> > +static struct ovn_port *
> > +create_cr_port(struct ovn_port *op, struct hmap *ports,
> > +   struct ovs_list *both_dbs, struct ovs_list *nb_only)
> > +{
> > +char *redirect_name = ovn_chassis_redirect_name(
> > +op->nbsp ? op->nbsp->name : op->nbrp->name);
> > +
> > +struct ovn_port *crp = ovn_port_find(ports, redirect_name);
> > +if (crp && crp->sb && crp->sb->datapath == op->od->sb) {
> > +ovn_port_set_nb(crp, NULL, op->nbrp);
> > +ovs_list_remove(&crp->list);
> > +ovs_list_push_back(both_dbs, &crp->list);
> > +} else {
> > +crp = ovn_port_create(ports, redirect_name,
> > +  op->nbsp, op->nbrp, NULL);
> > +ovs_list_push_back(nb_only, &crp->list);
> > +}
> > +
> > +crp->primary_port = op;
> > +op->cr_port = crp;
> > +crp->od = op->od;
> > +free(redirect_name);
> > +
> > +return crp;
> > +}
> > +
> > +/* Returns true if chassis resident port needs to be created for
> > + * op's peer logical switch.  False otherwise.
> > + *
> > + * Chassis resident port needs to be created if the op's logical router
> > + *   - Has only 

[ovs-dev] [PATCH ovn v2] Maintainers: Fix the typo in the email id.

2024-07-08 Thread numans
From: Numan Siddique 

Also added my ovn.org email id to the list.

Signed-off-by: Numan Siddique 
---
 MAINTAINERS.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
index f72c8b5eda..a57f906ec6 100644
--- a/MAINTAINERS.rst
+++ b/MAINTAINERS.rst
@@ -54,7 +54,7 @@ This is the current list of active OVN committers:
* - Mark Michelson
  - mmich...@redhat.com
* - Numan Siddique
- - nusd...@redhat.com
+ - nusid...@redhat.com\, num...@ovn.org
 
 The project also maintains a list of Emeritus Committers (or Maintainers).
 More information about Emeritus Committers can be found here:
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] Maintainers: Fix the typo in the email id.

2024-07-08 Thread Numan Siddique
On Mon, Jul 8, 2024 at 5:05 PM Dumitru Ceara  wrote:
>
> On 7/8/24 22:38, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > Also added my ovn.org email id to the list.
> >
> > Signed-off-by: Numan Siddique 
> > ---
>
> Hi Numan,
>
> >  MAINTAINERS.rst | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
> > index f72c8b5eda..0d70048078 100644
> > --- a/MAINTAINERS.rst
> > +++ b/MAINTAINERS.rst
> > @@ -54,7 +54,8 @@ This is the current list of active OVN committers:
> > * - Mark Michelson
> >   - mmich...@redhat.com
> > * - Numan Siddique
> > - - nusd...@redhat.com
> > + - nusid...@redhat.com
> > + - num...@ovn.org
>
> Unfortunately, this breaks the format of the maintainers file because
> the table only has two columns and this adds 3 values for this row:
>
> https://github.com/ovsrobot/ovn/blob/series_414329/MAINTAINERS.rst
>
> Maybe you can change it to list both emails, separated by comma?  I

Thanks for pointing out this error.
Separated by comma seems fine to me. Let me test it out before submitting v2.

Thanks
Numan

> don't really know much .rst so maybe there are other options too.
>
> Regards,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 8/8] northd: Add ACL Sampling.

2024-07-08 Thread Dumitru Ceara
On 7/8/24 13:24, Dumitru Ceara wrote:
> From: Adrian Moreno 
> 
> Introduce a new table called Sample where per-flow IPFIX configuration
> can be specified.
> Also, reference rows from such table from the ACL table to enable the
> configuration of ACL sampling. If enabled, northd will add a sample
> action to each ACL related logical flow.
> 
> Packets that hit stateful ACLs are sampled in different ways depending
> whether they are initiating a new session or are just forwarded on an
> existing (already allowed) session.  Two new columns ("sample_new" and
> "sample_est") are added to the ACL table to allow for potentially
> different sampling rates for the two cases.
> 
> Note: If an ACL has both sampling enabled and a label associated to it
> then the label value overrides the observation point ID defined in the
> sample configuration.  This is a side effect of the implementation
> (observation point IDs are stored in conntrack in the same part of the
> ct_label where ACL labels are also stored).  The two features
> (sampling and ACL labels) serve however similar purposes so it's not
> expected that they're both enabled together.
> 
> When sampling is enabled on an ACL additional logical flows are created
> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in the ACL
> action stage of the logical pipeline.  These additional flows match on a
> combination of conntrack state values and observation point id values
> (either against a logical register or against the stored ct_label state)
> in order to determine whether the packets hitting the ACLs must be
> sampled or not.  This comes with a slight increase in the number of
> logical flows and in the number of OpenFlow rules.  The number of
> additional flows _does not_ depend on the original ACL match or action.
> 
> New --sample-new and --sample-est optional arguments are added to the
> 'ovn-nbctl acl-add' command to allow configuring these new types of
> sampling for ACLs.
> 
> An example workflow of configuring ACL samples is:
>   # Create Sampling_App mappings for ACL traffic types:
>   ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \
> id="42"
>   ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \
>   id="43"
>   # Create two sample collectors, one that samples all packets (c1)
>   # and another one that samples with a probability of 10% (c2):
>   c1=$(ovn-nbctl create Sample_Collector name=c1 \
>probability=65535 set_id=1)
>   c2=$(ovn-nbctl create Sample_Collector name=c2 \
>probability=6553 set_id=2)
>   # Create two sample configurations (for new and for established
>   # traffic):
>   s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301)
>   s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302)
>   # Create an ingress ACL to allow IP traffic:
>   ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \
> from-lport 1 "ip" allow-related
> 
> The config above will generate IPFIX samples with:
> - observation domain id set to 42 (Sampling_App
>   "acl-new-traffic-sampling" config) and observation point id
>   set to 4301 (Sample s1) for packets that create a new
>   connection
> - observation domain id set to 43 (Sampling_app
>   "acl-est-traffic-sampling" config) and observation point id
>   set to 4302 (Sample s2) for packets that are part of an already
>   existing connection
> 
> Reported-at: https://issues.redhat.com/browse/FDP-305
> Signed-off-by: Adrian Moreno 
> Co-authored-by: Dumitru Ceara 
> Signed-off-by: Dumitru Ceara 
> ---

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] Maintainers: Fix the typo in the email id.

2024-07-08 Thread Dumitru Ceara
On 7/8/24 22:38, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> Also added my ovn.org email id to the list.
> 
> Signed-off-by: Numan Siddique 
> ---

Hi Numan,

>  MAINTAINERS.rst | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
> index f72c8b5eda..0d70048078 100644
> --- a/MAINTAINERS.rst
> +++ b/MAINTAINERS.rst
> @@ -54,7 +54,8 @@ This is the current list of active OVN committers:
> * - Mark Michelson
>   - mmich...@redhat.com
> * - Numan Siddique
> - - nusd...@redhat.com
> + - nusid...@redhat.com
> + - num...@ovn.org

Unfortunately, this breaks the format of the maintainers file because
the table only has two columns and this adds 3 values for this row:

https://github.com/ovsrobot/ovn/blob/series_414329/MAINTAINERS.rst

Maybe you can change it to list both emails, separated by comma?  I
don't really know much .rst so maybe there are other options too.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] Maintainers: Fix the typo in the email id.

2024-07-08 Thread numans
From: Numan Siddique 

Also added my ovn.org email id to the list.

Signed-off-by: Numan Siddique 
---
 MAINTAINERS.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
index f72c8b5eda..0d70048078 100644
--- a/MAINTAINERS.rst
+++ b/MAINTAINERS.rst
@@ -54,7 +54,8 @@ This is the current list of active OVN committers:
* - Mark Michelson
  - mmich...@redhat.com
* - Numan Siddique
- - nusd...@redhat.com
+ - nusid...@redhat.com
+ - num...@ovn.org
 
 The project also maintains a list of Emeritus Committers (or Maintainers).
 More information about Emeritus Committers can be found here:
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] selftests: openvswitch: retry instead of sleep

2024-07-08 Thread Ilya Maximets
On 7/8/24 15:44, Adrian Moreno wrote:
> There are a couple of places where the test script "sleep"s to wait for
> some external condition to be met.
> 
> This is error prone, specially in slow systems (identified in CI by
> "KSFT_MACHINE_SLOW=yes").
> 
> To fix this, add a "ovs_wait" function that tries to execute a command
> a few times until it succeeds. The timeout used is set to 5s for
> "normal" systems and doubled if a slow CI machine is detected.
> 
> This should make the following work:
> 
> $ vng --build  \
> --config tools/testing/selftests/net/config \
> --config kernel/configs/debug.config
> 
> $ vng --run . --user root -- "make -C tools/testing/selftests/ \
> KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
> 
> Signed-off-by: Adrian Moreno 
> ---
>  .../selftests/net/openvswitch/openvswitch.sh  | 49 ---
>  .../selftests/net/openvswitch/ovs-dpctl.py|  1 +
>  2 files changed, 42 insertions(+), 8 deletions(-)
> 

Hi, Adrian.  See a small pile of nitpicks below.

None of them are blocking from my perspective, except for a typo.
Just listed them since there is a typo anyway.

> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
> b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> index bc71dbc18b21..83407b42073a 100755
> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> @@ -11,6 +11,7 @@ ksft_skip=4
>  PAUSE_ON_FAIL=no
>  VERBOSE=0
>  TRACING=0
> +WAIT_TIMEOUT=5
>  
>  tests="
>   arp_pingeth-arp: Basic arp ping between 
> two NS
> @@ -29,6 +30,32 @@ info() {
>   [ $VERBOSE = 0 ] || echo $*
>  }
>  
> +ovs_wait() {
> + info "waiting $WAIT_TIMEOUT s for: $@"
> +
> + "$@"
> + if [[ $? -eq 0 ]]; then

Maybe just 'if "$@"; then' ?

> + info "wait succeeded inmediately"

* immediately

> + return 0
> + fi
> +
> + # A quick re-check helps speed up small races in fast systems.
> + # However, fractional sleeps might not necessarily work.
> + local start=0
> + sleep 0.1 || { sleep 1; start=1; }
> +
> + for (( i=start; i + "$@"
> + if [[ $? -eq 0 ]]; then

if "$@"; then

> + info "wait succeeded after $i seconds"
> + return 0
> + fi
> + sleep 1
> + done
> + info "wait failed after $i seconds"
> + return 1
> +}
> +
>  ovs_base=`pwd`
>  sbxs=
>  sbx_add () {
> @@ -278,20 +305,21 @@ test_psample() {
>  
>   # Record psample data.
>   ovs_spawn_daemon "test_psample" python3 $ovs_base/ovs-dpctl.py 
> psample-events
> + ovs_wait grep -q "listening for psample events" ${ovs_dir}/stdout
>  
>   # Send a single ping.
> - sleep 1
>   ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 
> 1 || return 1
> - sleep 1
>  
>   # We should have received one userspace action upcall and 2 psample 
> packets.
> - grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || 
> return 1
> + ovs_wait grep -q "userspace action command" $ovs_dir/s0.out
> + [[ $? -eq 0 ]] || return 1

Why checking separately and not one the same line with || return 1 ?
Also double brackets seem unnecessary.

>  
>   # client -> server samples should only contain the first 14 bytes of 
> the packet.
> - grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
> -  $ovs_dir/stdout >/dev/null 2>&1 || return 1
> - grep -E "rate:4294967295,group:2,cookie:eeff0c" \
> -  $ovs_dir/stdout >/dev/null 2>&1 || return 1
> + ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee 
> data:[0-9a-f]{28}$" $ovs_dir/stdout
> + [[ $? -eq 0 ]] || return 1
> +
> + ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout
> + [[ $? -eq 0 ]] || return 1

Same for above two.

>  
>   return 0
>  }
> @@ -711,7 +739,8 @@ test_upcall_interfaces() {
>   ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \
>   172.31.110.1/24 -u || return 1
>  
> - sleep 1
> + ovs_wait grep -q "listening on upcall packet handler" 
> ${ovs_dir}/left0.out
> +
>   info "sending arping"
>   ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
>   >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
> @@ -811,6 +840,10 @@ shift $(($OPTIND-1))
>  IFS="
>  "
>  
> +if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then
> + WAIT_TIMEOUT=10
> +fi

Should this be done closer to the first initialization of WAIT_TIMEOUT ?

> +
>  for arg do
>   # Check first that all requested tests are available before running any
>   command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not 
> found"; usage; }
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index 1e15b0818

Re: [ovs-dev] [PATCH v1] selftests: openvswitch: retry instead of sleep

2024-07-08 Thread Adrián Moreno
On Mon, Jul 08, 2024 at 02:27:45PM GMT, Aaron Conole wrote:
> Adrian Moreno  writes:
>
> > There are a couple of places where the test script "sleep"s to wait for
> > some external condition to be met.
> >
> > This is error prone, specially in slow systems (identified in CI by
> > "KSFT_MACHINE_SLOW=yes").
> >
> > To fix this, add a "ovs_wait" function that tries to execute a command
> > a few times until it succeeds. The timeout used is set to 5s for
> > "normal" systems and doubled if a slow CI machine is detected.
> >
> > This should make the following work:
> >
> > $ vng --build  \
> > --config tools/testing/selftests/net/config \
> > --config kernel/configs/debug.config
> >
> > $ vng --run . --user root -- "make -C tools/testing/selftests/ \
> > KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
> >
> > Signed-off-by: Adrian Moreno 
> > ---
>
> Looks like this does resolve the issue in question on the -dbg
> environment:
>
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh

Nice! I guess the 10s global timeout is enough for now.

>
> Thanks Adrian!  Also, thanks for including the fractional sleep.
>
> Reviewed-by: Aaron Conole 
>

Thanks.

I just realized that I've missed the target branch ("net-next") in the
subject. I'll wait a bit and respin to fix that.

Adrián

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 03/13] ofproto_dpif: Check for psample support.

2024-07-08 Thread Adrián Moreno
On Mon, Jul 08, 2024 at 12:10:15PM GMT, Eelco Chaudron wrote:
>
>
> On 8 Jul 2024, at 10:37, Adrián Moreno wrote:
>
> > On Sun, Jul 07, 2024 at 10:08:55PM GMT, Adrian Moreno wrote:
> >> Only kernel datapath supports this action so add a function in dpif.c
> >> that checks for that.
> >>
> >> Signed-off-by: Adrian Moreno 
> >> ---
> >>  lib/dpif.c |  7 +++
> >>  lib/dpif.h |  1 +
> >>  ofproto/ofproto-dpif.c | 43 ++
> >>  ofproto/ofproto-dpif.h |  7 ++-
> >>  4 files changed, 57 insertions(+), 1 deletion(-)
> >>
> >
> > After Dumitru's comment on the last patch, I think we also need a
> > capability for this exposed in the OVSDB. I'll add it in the next
> > version.
>
> ACK, can you wait with the v2 until I review the RFC/v1?

Sure, thanks.

>
> >> diff --git a/lib/dpif.c b/lib/dpif.c
> >> index 71728badc..0a964ba89 100644
> >> --- a/lib/dpif.c
> >> +++ b/lib/dpif.c
> >> @@ -1953,6 +1953,13 @@ dpif_supports_lb_output_action(const struct dpif 
> >> *dpif)
> >>  return dpif_is_netdev(dpif);
> >>  }
> >>
> >> +bool
> >> +dpif_may_support_psample(const struct dpif *dpif)
> >> +{
> >> +/* Userspace datapath does not support this action. */
> >> +return !dpif_is_netdev(dpif);
> >> +}
> >> +
> >>  /* Meters */
> >>  void
> >>  dpif_meter_get_features(const struct dpif *dpif,
> >> diff --git a/lib/dpif.h b/lib/dpif.h
> >> index a764e8a59..6bef7d5b3 100644
> >> --- a/lib/dpif.h
> >> +++ b/lib/dpif.h
> >> @@ -941,6 +941,7 @@ int dpif_get_pmds_for_port(const struct dpif * dpif, 
> >> odp_port_t port_no,
> >>  char *dpif_get_dp_version(const struct dpif *);
> >>  bool dpif_supports_tnl_push_pop(const struct dpif *);
> >>  bool dpif_may_support_explicit_drop_action(const struct dpif *);
> >> +bool dpif_may_support_psample(const struct dpif *);
> >>  bool dpif_synced_dp_layers(struct dpif *);
> >>
> >>  /* Log functions. */
> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> >> index 4712d10a8..94c84d697 100644
> >> --- a/ofproto/ofproto-dpif.c
> >> +++ b/ofproto/ofproto-dpif.c
> >> @@ -873,6 +873,12 @@ ovs_lb_output_action_supported(struct ofproto_dpif 
> >> *ofproto)
> >>  return ofproto->backer->rt_support.lb_output_action;
> >>  }
> >>
> >> +bool
> >> +ovs_psample_supported(struct ofproto_dpif *ofproto)
> >> +{
> >> +return ofproto->backer->rt_support.psample;
> >> +}
> >> +
> >>  /* Tests whether 'backer''s datapath supports recirculation.  Only newer
> >>   * datapaths support OVS_KEY_ATTR_RECIRC_ID in keys.  We need to disable 
> >> some
> >>   * features on older datapaths that don't support this feature.
> >> @@ -1609,6 +1615,42 @@ check_add_mpls(struct dpif_backer *backer)
> >>  return supported;
> >>  }
> >>
> >> +/* Tests whether 'backer''s datapath supports the OVS_ACTION_ATTR_PSAMPLE
> >> + * action. */
> >> +static bool
> >> +check_psample(struct dpif_backer *backer)
> >> +{
> >> +uint8_t cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE];
> >> +struct odputil_keybuf keybuf;
> >> +struct ofpbuf actions;
> >> +struct ofpbuf key;
> >> +struct flow flow;
> >> +bool supported;
> >> +
> >> +struct odp_flow_key_parms odp_parms = {
> >> +.flow = &flow,
> >> +.probe = true,
> >> +};
> >> +
> >> +memset(&flow, 0, sizeof flow);
> >> +ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
> >> +odp_flow_key_from_flow(&odp_parms, &key);
> >> +ofpbuf_init(&actions, 64);
> >> +
> >> +/* Generate a random max-size cookie. */
> >> +random_bytes(cookie, sizeof(cookie));
> >> +
> >> +odp_put_psample_action(&actions, 10, cookie, sizeof cookie);
> >> +
> >> +supported = dpif_may_support_psample(backer->dpif) &&
> >> +dpif_probe_feature(backer->dpif, "psample", &key, &actions, NULL);
> >> +
> >> +ofpbuf_uninit(&actions);
> >> +VLOG_INFO("%s: Datapath %s psample", dpif_name(backer->dpif),
> >> +  supported ? "supports" : "does not support");
> >> +return supported;
> >> +}
> >> +
> >>  #define CHECK_FEATURE__(NAME, SUPPORT, FIELD, VALUE, ETHTYPE) 
> >>   \
> >>  static bool   
> >>   \
> >>  check_##NAME(struct dpif_backer *backer)  
> >>   \
> >> @@ -1698,6 +1740,7 @@ check_support(struct dpif_backer *backer)
> >>  dpif_supports_lb_output_action(backer->dpif);
> >>  backer->rt_support.ct_zero_snat = dpif_supports_ct_zero_snat(backer);
> >>  backer->rt_support.add_mpls = check_add_mpls(backer);
> >> +backer->rt_support.psample = check_psample(backer);
> >>
> >>  /* Flow fields. */
> >>  backer->rt_support.odp.ct_state = check_ct_state(backer);
> >> diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
> >> index d33f73df8..bc7a19dab 100644
> >> --- a/ofproto/ofproto-dpif.h
> >> +++ b/ofproto/ofproto-dpif.h
> >> @@ -213,7 +213,10 @@ struct group_dpif *group_dpif_lookup(struct 
> >> ofproto_dpif

[ovs-dev] [PATCH v10 2/2] ofproto-dpif-mirror: Add support for pre-selection filter.

2024-07-08 Thread Mike Pattrick
Currently a bridge mirror will collect all packets and tools like
ovs-tcpdump can apply additional filters after they have already been
duplicated by vswitchd. This can result in inefficient collection.

This patch adds support to apply pre-selection to bridge mirrors, which
can limit which packets are mirrored based on flow metadata. This
significantly improves overall vswitchd performance during mirroring if
only a subset of traffic is required.

Signed-off-by: Mike Pattrick 
---
v8:
 - Corrected code from v7 related to sequence and in_port. Mirrors
   reject filters with an in_port set as this could cause confusion.
 - Combined ovsrcu pointers into a new struct, minimatch wasn't used
   because the minimatch_* functions didn't fit the usage here.
 - Added a test to check for modifying filters when partially
   overlapping flows already exist.
 - Corrected documentation.
v9:
 - Explicitly cleared mirror_config.filter* when not set
v10:
 - Changed rcu memory order
 - Updated documentation to refer to packets instead of flows
 - Updated comments
---
 Documentation/ref/ovs-tcpdump.8.rst |   8 +-
 NEWS|   6 +
 lib/flow.h  |   9 ++
 ofproto/ofproto-dpif-mirror.c   | 106 +-
 ofproto/ofproto-dpif-mirror.h   |   8 +-
 ofproto/ofproto-dpif-xlate.c|  15 ++-
 ofproto/ofproto-dpif.c  |  12 +-
 ofproto/ofproto.h   |   3 +
 tests/ofproto-dpif.at   | 167 
 utilities/ovs-tcpdump.in|  13 ++-
 vswitchd/bridge.c   |  13 ++-
 vswitchd/vswitch.ovsschema  |   7 +-
 vswitchd/vswitch.xml|  16 +++
 13 files changed, 368 insertions(+), 15 deletions(-)

diff --git a/Documentation/ref/ovs-tcpdump.8.rst 
b/Documentation/ref/ovs-tcpdump.8.rst
index b9f8cdf6f..e7bd5e9e4 100644
--- a/Documentation/ref/ovs-tcpdump.8.rst
+++ b/Documentation/ref/ovs-tcpdump.8.rst
@@ -61,8 +61,14 @@ Options
 
   If specified, mirror all ports (optional).
 
+* ``--filter ``
+
+  If specified, only mirror packets that match the provided OpenFlow filter.
+  The available fields are documented in ``ovs-fields(7)``.
+
 See Also
 
 
 ``ovs-appctl(8)``, ``ovs-vswitchd(8)``, ``ovs-pcap(1)``,
-``ovs-tcpundump(1)``, ``tcpdump(8)``, ``wireshark(8)``.
+``ovs-fields(7)``, ``ovs-tcpundump(1)``, ``tcpdump(8)``,
+``wireshark(8)``.
diff --git a/NEWS b/NEWS
index e0359b759..cbe777a71 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,12 @@ Post-v3.3.0
per interface 'options:dpdk-lsc-interrupt' to 'false'.
- Python:
  * Added custom transaction support to the Idl via add_op().
+   - ovs-vsctl:
+ * Added a new filter column in the Mirror table which can be used to
+   apply filters to mirror ports.
+   - ovs-tcpdump:
+ * Added command line parameter --filter to enable filtering the packets
+   that are captured by tcpdump.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/flow.h b/lib/flow.h
index 75a9be3c1..60ec4b0d7 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -939,6 +939,15 @@ flow_union_with_miniflow(struct flow *dst, const struct 
miniflow *src)
 flow_union_with_miniflow_subset(dst, src, src->map);
 }
 
+/* Perform a bitwise OR of minimask 'src' mask data with the equivalent
+ * fields in 'dst', storing the result in 'dst'. */
+static inline void
+flow_wildcards_union_with_minimask(struct flow_wildcards *dst,
+   const struct minimask *src)
+{
+flow_union_with_miniflow_subset(&dst->masks, &src->masks, src->masks.map);
+}
+
 static inline bool is_ct_valid(const struct flow *flow,
const struct flow_wildcards *mask,
struct flow_wildcards *wc)
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 4967ecc9a..a14f85843 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -21,6 +21,7 @@
 #include "cmap.h"
 #include "hmapx.h"
 #include "ofproto.h"
+#include "ofproto-dpif-trace.h"
 #include "vlan-bitmap.h"
 #include "openvswitch/vlog.h"
 
@@ -48,6 +49,11 @@ struct mbundle {
 mirror_mask_t mirror_out;   /* Mirrors that output to this mbundle. */
 };
 
+struct filtermask {
+struct miniflow *flow;
+struct minimask *mask;
+};
+
 struct mirror {
 struct mbridge *mbridge;/* Owning ofproto. */
 size_t idx; /* In ofproto's "mirrors" array. */
@@ -57,6 +63,10 @@ struct mirror {
 struct hmapx srcs;  /* Contains "struct mbundle*"s. */
 struct hmapx dsts;  /* Contains "struct mbundle*"s. */
 
+/* Filter criteria. */
+OVSRCU_TYPE(struct filtermask *) filter_mask;
+char *filter_str;
+
 /* This is accessed by handler threads assuming RCU protection (see
  * mirror_get()), but can be manipulated by mirror_set() without any
  * explicit synchronization. */
@@ -83,6 +93,25 @@ static void mbundle_lookup_multiple(const struct mbr

Re: [ovs-dev] [PATCH v1] selftests: openvswitch: retry instead of sleep

2024-07-08 Thread Aaron Conole
Adrian Moreno  writes:

> There are a couple of places where the test script "sleep"s to wait for
> some external condition to be met.
>
> This is error prone, specially in slow systems (identified in CI by
> "KSFT_MACHINE_SLOW=yes").
>
> To fix this, add a "ovs_wait" function that tries to execute a command
> a few times until it succeeds. The timeout used is set to 5s for
> "normal" systems and doubled if a slow CI machine is detected.
>
> This should make the following work:
>
> $ vng --build  \
> --config tools/testing/selftests/net/config \
> --config kernel/configs/debug.config
>
> $ vng --run . --user root -- "make -C tools/testing/selftests/ \
> KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
>
> Signed-off-by: Adrian Moreno 
> ---

Looks like this does resolve the issue in question on the -dbg
environment:

https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh

Thanks Adrian!  Also, thanks for including the fractional sleep.

Reviewed-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 1/2] ofproto-dpif-mirror: Reduce number of function parameters.

2024-07-08 Thread Mike Pattrick
Previously the mirror_set() and mirror_get() functions took a large
number of parameters, which was inefficient and difficult to read and
extend. This patch moves most of the parameters into a struct.

Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-mirror.c | 60 ++-
 ofproto/ofproto-dpif-mirror.h | 40 ++-
 ofproto/ofproto-dpif-xlate.c  | 29 -
 ofproto/ofproto-dpif.c| 23 +++---
 4 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..4967ecc9a 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -207,19 +207,22 @@ mirror_bundle_dst(struct mbridge *mbridge, struct 
ofbundle *ofbundle)
 }
 
 int
-mirror_set(struct mbridge *mbridge, void *aux, const char *name,
-   struct ofbundle **srcs, size_t n_srcs,
-   struct ofbundle **dsts, size_t n_dsts,
-   unsigned long *src_vlans, struct ofbundle *out_bundle,
-   uint16_t snaplen,
-   uint16_t out_vlan)
+mirror_set(struct mbridge *mbridge, void *aux,
+   const struct ofproto_mirror_settings *ms,
+   const struct mirror_bundles *mb)
 {
 struct mbundle *mbundle, *out;
 mirror_mask_t mirror_bit;
 struct mirror *mirror;
 struct hmapx srcs_map;  /* Contains "struct ofbundle *"s. */
 struct hmapx dsts_map;  /* Contains "struct ofbundle *"s. */
+uint16_t out_vlan;
 
+if (!ms || !mbridge) {
+return EINVAL;
+}
+
+out_vlan = ms->out_vlan;
 mirror = mirror_lookup(mbridge, aux);
 if (!mirror) {
 int idx;
@@ -227,7 +230,7 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 idx = mirror_scan(mbridge);
 if (idx < 0) {
 VLOG_WARN("maximum of %d port mirrors reached, cannot create %s",
-  MAX_MIRRORS, name);
+  MAX_MIRRORS, ms->name);
 return EFBIG;
 }
 
@@ -242,8 +245,8 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 unsigned long *vlans = ovsrcu_get(unsigned long *, &mirror->vlans);
 
 /* Get the new configuration. */
-if (out_bundle) {
-out = mbundle_lookup(mbridge, out_bundle);
+if (mb->out_bundle) {
+out = mbundle_lookup(mbridge, mb->out_bundle);
 if (!out) {
 mirror_destroy(mbridge, mirror->aux);
 return EINVAL;
@@ -252,16 +255,16 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 } else {
 out = NULL;
 }
-mbundle_lookup_multiple(mbridge, srcs, n_srcs, &srcs_map);
-mbundle_lookup_multiple(mbridge, dsts, n_dsts, &dsts_map);
+mbundle_lookup_multiple(mbridge, mb->srcs, mb->n_srcs, &srcs_map);
+mbundle_lookup_multiple(mbridge, mb->dsts, mb->n_dsts, &dsts_map);
 
 /* If the configuration has not changed, do nothing. */
 if (hmapx_equals(&srcs_map, &mirror->srcs)
 && hmapx_equals(&dsts_map, &mirror->dsts)
-&& vlan_bitmap_equal(vlans, src_vlans)
+&& vlan_bitmap_equal(vlans, ms->src_vlans)
 && mirror->out == out
 && mirror->out_vlan == out_vlan
-&& mirror->snaplen == snaplen)
+&& mirror->snaplen == ms->snaplen)
 {
 hmapx_destroy(&srcs_map);
 hmapx_destroy(&dsts_map);
@@ -275,15 +278,15 @@ mirror_set(struct mbridge *mbridge, void *aux, const char 
*name,
 hmapx_swap(&dsts_map, &mirror->dsts);
 hmapx_destroy(&dsts_map);
 
-if (vlans || src_vlans) {
+if (vlans || ms->src_vlans) {
 ovsrcu_postpone(free, vlans);
-vlans = vlan_bitmap_clone(src_vlans);
+vlans = vlan_bitmap_clone(ms->src_vlans);
 ovsrcu_set(&mirror->vlans, vlans);
 }
 
 mirror->out = out;
 mirror->out_vlan = out_vlan;
-mirror->snaplen = snaplen;
+mirror->snaplen = ms->snaplen;
 
 /* Update mbundles. */
 mirror_bit = MIRROR_MASK_C(1) << mirror->idx;
@@ -406,23 +409,22 @@ mirror_update_stats(struct mbridge *mbridge, 
mirror_mask_t mirrors,
 /* Retrieves the mirror numbered 'index' in 'mbridge'.  Returns true if such a
  * mirror exists, false otherwise.
  *
- * If successful, '*vlans' receives the mirror's VLAN membership information,
+ * If successful 'mc->vlans' receives the mirror's VLAN membership information,
  * either a null pointer if the mirror includes all VLANs or a 4096-bit bitmap
  * in which a 1-bit indicates that the mirror includes a particular VLAN,
- * '*dup_mirrors' receives a bitmap of mirrors whose output duplicates mirror
- * 'index', '*out' receives the output ofbundle (if any), and '*out_vlan'
- * receives the output VLAN (if any).
+ * 'mc->dup_mirrors' receives a bitmap of mirrors whose output duplicates
+ * mirror 'index', 'mc->out' receives the output ofbundle (if any),
+ * and 'mc->out_vlan' receives the output VLAN (if 

Re: [ovs-dev] Intel CI not running?

2024-07-08 Thread Ilya Maximets
On 7/8/24 11:02, Phelan, Michael wrote:
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Thursday, July 4, 2024 9:14 PM
>> To: Phelan, Michael 
>> Cc: i.maxim...@ovn.org; ovs-dev ; Aaron Conole
>> ; Chaudron, Eelco 
>> Subject: Re: Intel CI not running?
>>
>> On 7/4/24 20:46, Ilya Maximets wrote:
>>> On 7/4/24 13:04, Phelan, Michael wrote:
 Hi Ilya,
 The CI got stuck running make check on a patch, I have solved the
 issue now so reports should be back to normal now.
>>>
>>> Thanks for checking!  Though I see only two reports were sent out in
>>> the past 7 hours with a few hour interval between them.
>>> Did it get stuck again?
>>
>> Got another report.  Looks like we're getting reports at rate of one per 3.5
>> hours.  That doesn't sound right.
> 
> We have added make check to the job parameters so that has increased the 
> duration of the testing.

'make check' in GitHub CI takes about 7 minutes.  And machines there
are not very fast.  It shouldn't take hours.

In GitHub actions we're running it with TESTSUITEFLAGS='-j4' RECHECK=yes

> 
> It seems like the test number 98: barrier module from make check is a bit 
> finicky and stalls occasionally also.

Hmm. I've never seen this test getting stuck.  Could you try to reproduce
this by running './tests/ovstest test-barrier -v' manually?  Would be
also great to know where exactly it is getting stuck, e.g. by running
under gdb.

Best regards, Ilya Maximets.

> 
>>
>>>
>>> Best regards, Ilya Maximets.
>>>

 Thanks,
 Michael.
> -Original Message-
> From: Ilya Maximets 
> Sent: Thursday, July 4, 2024 10:47 AM
> To: Phelan, Michael 
> Cc: i.maxim...@ovn.org; ovs-dev ; Aaron
> Conole ; Chaudron, Eelco
>> 
> Subject: Intel CI not running?
>
> Hi, Michael!  We seem to not get reports from Intel CI since some
> time on Monday.  The last report was:
>
>
> https://mail.openvswitch.org/pipermail/ovs-build/2024-July/039755.ht
> ml
>
> Could you, please, check?
>
> Best regards, Ilya Maximets.
>>>
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Fix issues for Forwarding_Group

2024-07-08 Thread Qiang Qiang45 Zhang via dev
When a logical switch has 2 FWGs and each FWG has 3 ports,
Logical-Flow only has one fwg_group() action.
Submitted-at: northd: Fix issues for Forwarding_Group by ZhangQiang3 * Pull 
Request #250 * ovn-org/ovn (github.com)

>From 02186da234426bc361615eb6b5142c76f296202f Mon Sep 17 00:00:00 2001
From: zhangqiang45 zhangqian...@lenovo.com
Date: Mon, 8 Jul 2024 14:25:04 +0800
Subject: [PATCH ovn] northd: Fix issues for Forwarding_Group The use of
variables from the outer loop in the inner loop causes the outer loop to
terminate prematurely. eg. LVS are three fwgs,
fwg1(p1,p2,p3,p4),fwg2(p11,p22),fwg3(p31,p32),only fwg1 in logical_flows

---
northd/northd.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 6898daa00..21ab0bb91 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -7929,6 +7929,9 @@ build_fwd_group_lflows(struct ovn_datapath *od, struct 
lflow_table *lflows,
 continue;
 }

+ds_clear(&match);
+ds_clear(&actions);
+
 /* ARP responder for the forwarding group's virtual IP */
 ds_put_format(&match, "arp.tpa == %s && arp.op == 1",
   fwd_group->vip);
@@ -7959,9 +7962,9 @@ build_fwd_group_lflows(struct ovn_datapath *od, struct 
lflow_table *lflows,
 ds_put_cstr(&group_ports, "liveness=\"true\",");
 }
 ds_put_cstr(&group_ports, "childports=");
-for (i = 0; i < (fwd_group->n_child_port - 1); ++i) {
+for (int j = 0; j < (fwd_group->n_child_port - 1); ++j) {
 ds_put_format(&group_ports, "\"%s\",",
- fwd_group->child_port[i]);
+ fwd_group->child_port[j]);
 }
 ds_put_format(&group_ports, "\"%s\"",
   fwd_group->child_port[fwd_group->n_child_port - 1]);
--
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/3] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-07-08 Thread Mike Pattrick
On Mon, Jul 8, 2024 at 8:23 AM David Marchand  wrote:
>
> On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick  wrote:
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index f2d921ed6..866dbf3b7 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -69,8 +69,6 @@ COVERAGE_DEFINE(netdev_received);
> >  COVERAGE_DEFINE(netdev_sent);
> >  COVERAGE_DEFINE(netdev_add_router);
> >  COVERAGE_DEFINE(netdev_get_stats);
> > -COVERAGE_DEFINE(netdev_vxlan_tso_drops);
> > -COVERAGE_DEFINE(netdev_geneve_tso_drops);
> >  COVERAGE_DEFINE(netdev_push_header_drops);
> >  COVERAGE_DEFINE(netdev_soft_seg_good);
> >  COVERAGE_DEFINE(netdev_soft_seg_drops);
> > @@ -910,28 +908,30 @@ netdev_send(struct netdev *netdev, int qid, struct 
> > dp_packet_batch *batch,
> >  struct dp_packet *packet;
> >  int error;
> >
> > -if (userspace_tso_enabled() &&
> > -!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) {
> > -DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > -if (dp_packet_hwol_is_tso(packet)) {
> > -if (dp_packet_hwol_is_tunnel_vxlan(packet)
> > -&& !(netdev_flags & NETDEV_TX_VXLAN_TNL_TSO)) {
> > -VLOG_WARN_RL(&rl, "%s: No VXLAN TSO support",
> > - netdev_get_name(netdev));
> > -COVERAGE_INC(netdev_vxlan_tso_drops);
> > -dp_packet_delete_batch(batch, true);
> > -return false;
> > +if (userspace_tso_enabled()) {
> > +if (!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) {
> > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +if (dp_packet_hwol_is_tso(packet)) {
> > +return netdev_send_tso(netdev, qid, batch, 
> > concurrent_txq);
> >  }
> > -
> > -if (dp_packet_hwol_is_tunnel_geneve(packet)
> > -&& !(netdev_flags & NETDEV_TX_GENEVE_TNL_TSO)) {
> > -VLOG_WARN_RL(&rl, "%s: No GENEVE TSO support",
> > - netdev_get_name(netdev));
> > -COVERAGE_INC(netdev_geneve_tso_drops);
> > -dp_packet_delete_batch(batch, true);
> > -return false;
> > +}
> > +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) {
>
> Just checking this feature is not enough at this point.
>
> Imagine a driver that has TSO support, but neither outer udp checksum
> nor any tunnel tso support (the one I have in mind is net/ixgbe).
> IIUC, a batch of (tunneled) tso packets with no request for outer udp
> csum would not be segmented in sw (and it would probably be dropped by
> hw).
>
> I think this block should be moved after checking the presence of
> *tnl_tso flags.
>
>
> > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +if (dp_packet_hwol_is_tso(packet) &&
> > +(dp_packet_hwol_is_tunnel_vxlan(packet) ||
> > + dp_packet_hwol_is_tunnel_geneve(packet)) &&
> > +dp_packet_hwol_is_outer_udp_cksum(packet)) {
> > +return netdev_send_tso(netdev, qid, batch, 
> > concurrent_txq);
> > +}
> > +}
> > +} else if (!(netdev_flags & (NETDEV_TX_VXLAN_TNL_TSO |
> > + NETDEV_TX_GENEVE_TNL_TSO))) {
> > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +if (dp_packet_hwol_is_tso(packet) &&
> > +(dp_packet_hwol_is_tunnel_vxlan(packet) ||
> > + dp_packet_hwol_is_tunnel_geneve(packet))) {
>
> Note: you can imagine a netdev supporting geneve tso, but not vxlan tso.
> A batch containing only tso requests through a geneve tunnel would end
> up being segmented in sw.
>
> DPDK drivers either support both or none. And only those drivers
> expose these netdev features in OVS.
> So I don't think it is a real problem and I am ok with this hunk.
>
>
> > +return netdev_send_tso(netdev, qid, batch, 
> > concurrent_txq);
> >  }
> > -return netdev_send_tso(netdev, qid, batch, concurrent_txq);
> >  }
> >  }
> >  }
>
>
> A final note, I suspect all those checks negatively impact non tso
> packets processing when OVS tso is enabled.
>
> Would it be feasible to mark that tso (or a tx offload) has been
> requested at the "batch" level?
> This is more an optimisation... maybe in the future?

I've been thinking about changing the trunc member to a flags bit
array lately in other contexts as well. It would be useful to mark if
a batch has fragmented packets, if there is mixed memory between
mbuf's and other types of packet source memory, there should be other
uses as well. Having batch flags would remove a few instances of
iterating over the whole batch.

I think this is a seperate patch though.

>
>
>
> Overall the patch lg

Re: [ovs-dev] [PATCH] issue-251: ovn.at: FIX delete Mac binding test

2024-07-08 Thread Шагов Георгий via dev
Hello Dumitru

Yes, you are right
My apology, I attached a patch, it seems like it was purged
Repeating:


diff --git a/tests/ovn.at b/tests/ovn.at
index 87a64499f..928b5d286 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -9598,21 +9598,25 @@ as hv1
 ovs-vsctl -- add-br br-phys
 ovn_attach n1 br-phys 192.168.0.1
 # Create logical router lr0
-ovn-nbctl lr-add lr0
+check ovn-nbctl lr-add lr0
 # Create ports lrp0, lrp1 in lr0
-ovn-nbctl lrp-add lr0 lrp0 f0:00:00:00:00:01 192.168.0.1/24
-ovn-nbctl lrp-add lr0 lrp1 f0:00:00:00:00:02 192.168.1.1/24
+check ovn-nbctl lrp-add lr0 lrp0 f0:00:00:00:00:01 192.168.0.1/24
+check ovn-nbctl lrp-add lr0 lrp1 f0:00:00:00:00:02 192.168.1.1/24
 dp_uuid=$(fetch_column Datapath_Binding _uuid)
-ovn-sbctl create MAC_Binding ip=10.0.0.1 datapath=$dp_uuid logical_port=lrp0 
mac="00:00:00:01:00:01"
-ovn-sbctl create MAC_Binding ip=10.0.0.1 datapath=$dp_uuid logical_port=lrp1 
mac="00:00:00:01:00:02"
+ovn-sbctl create MAC_Binding ip=10.0.0.1 datapath=$dp_uuid logical_port=lrp0 
mac="00\:00\:00\:01\:00\:01"
+ovn-sbctl create MAC_Binding ip=10.0.0.1 datapath=$dp_uuid logical_port=lrp1 
mac="00\:00\:00\:01\:00\:02"
 ovn-sbctl find MAC_Binding
+# Checking mac binding records were added as expected
+wait_row_count MAC_Binding 1 logical_port=lrp0
+wait_row_count MAC_Binding 1 logical_port=lrp1
 # Delete port lrp0 and check that its MAC_Binding is deleted.
-ovn-nbctl lrp-del lrp0
+check ovn-nbctl lrp-del lrp0
 ovn-sbctl find MAC_Binding
+# Checking mac binding record was deleted
 wait_row_count MAC_Binding 0 logical_port=lrp0
 # Delete logical switch lr0 and check that its MAC_Binding is deleted.
-ovn-nbctl lr-del lr0
-ovn-sbctl find MAC_Binding
+check ovn-nbctl lr-del lr0
+check ovn-sbctl find MAC_Binding
 wait_row_count MAC_Binding 0

 # Ignore warning from mac_cache trying to remove mb while aging mechanism is 
disabled and lr0 is not local

On 08.07.2024, 17:03, "Dumitru Ceara" mailto:dce...@redhat.com>> wrote:


ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
пароль,
не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru 
>


Hi George,


I assume this is about https://github.com/ovn-org/ovn/issues/251 
 right?


Regards,
Dumitru


On 7/8/24 14:08, Шагов Георгий via dev wrote:
>
> УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые 
> документы, приложенные к нему, содержат конфиденциальную информацию. 
> Настоящим уведомляем Вас о том, что если это сообщение не предназначено Вам, 
> использование, копирование, распространение информации, содержащейся в 
> настоящем сообщении, а также осуществление любых действий на основе этой 
> информации, строго запрещено. Если Вы получили это сообщение по ошибке, 
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это 
> сообщение.
> CONFIDENTIALITY NOTICE: This email and any files attached to it are 
> confidential. If you are not the intended recipient you are notified that 
> using, copying, distributing or taking any action in reliance on the contents 
> of this information is strictly prohibited. If you have received this email 
> in error please notify the sender and delete this email.
> ___
> dev mailing list
> d...@openvswitch.org 
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> 






УВЕДОМЛЕНИЕОКОНФИДЕНЦИАЛЬНОСТИ:Этоэлектронноесообщениеилюбыедокументы,приложенныекнему,содержатконфиденциальнуюинформацию.НастоящимуведомляемВасотом,чтоеслиэтосообщениенепредназначеноВам,использование,копирование,распространениеинформации,содержащейсявнастоящемсообщении,атакжеосуществлениелюбыхдействийнаосновеэтойинформации,строгозапрещено.ЕслиВыполучилиэтосообщениепоошибке,пожалуйста,сообщитеобэтомотправителюпоэлектроннойпочтеиудалитеэтосообщение.
CONFIDENTIALITY NOTICE: This email and any files attached to it are 
confidential. If you are not the intended recipient you are notified that 
using, copying, distributing or taking any action in reliance on the contents 
of this information is strictly prohibited. If you have received this email in 
error please notify the sender and delete this email.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] issue-251: ovn.at: FIX delete Mac binding test

2024-07-08 Thread Dumitru Ceara
Hi George,

I assume this is about https://github.com/ovn-org/ovn/issues/251 right?

Regards,
Dumitru

On 7/8/24 14:08, Шагов Георгий via dev wrote:
> 
> УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые 
> документы, приложенные к нему, содержат конфиденциальную информацию. 
> Настоящим уведомляем Вас о том, что если это сообщение не предназначено Вам, 
> использование, копирование, распространение информации, содержащейся в 
> настоящем сообщении, а также осуществление любых действий на основе этой 
> информации, строго запрещено. Если Вы получили это сообщение по ошибке, 
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это 
> сообщение.
> CONFIDENTIALITY NOTICE: This email and any files attached to it are 
> confidential. If you are not the intended recipient you are notified that 
> using, copying, distributing or taking any action in reliance on the contents 
> of this information is strictly prohibited. If you have received this email 
> in error please notify the sender and delete this email.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] selftests: openvswitch: retry instead of sleep

2024-07-08 Thread 0-day Robot
Bleep bloop.  Greetings Adrian Moreno, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless 
(tools/testing/selftests/net/openvswitch/openvswitch.sh).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 selftests: openvswitch: retry instead of sleep
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] issue-251: ovn.at: FIX delete Mac binding test

2024-07-08 Thread Шагов Георгий via dev

УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые документы, 
приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем 
Вас о том, что если это сообщение не предназначено Вам, использование, 
копирование, распространение информации, содержащейся в настоящем сообщении, а 
также осуществление любых действий на основе этой информации, строго запрещено. 
Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом 
отправителю по электронной почте и удалите это сообщение.
CONFIDENTIALITY NOTICE: This email and any files attached to it are 
confidential. If you are not the intended recipient you are notified that 
using, copying, distributing or taking any action in reliance on the contents 
of this information is strictly prohibited. If you have received this email in 
error please notify the sender and delete this email.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1] selftests: openvswitch: retry instead of sleep

2024-07-08 Thread Adrian Moreno
There are a couple of places where the test script "sleep"s to wait for
some external condition to be met.

This is error prone, specially in slow systems (identified in CI by
"KSFT_MACHINE_SLOW=yes").

To fix this, add a "ovs_wait" function that tries to execute a command
a few times until it succeeds. The timeout used is set to 5s for
"normal" systems and doubled if a slow CI machine is detected.

This should make the following work:

$ vng --build  \
--config tools/testing/selftests/net/config \
--config kernel/configs/debug.config

$ vng --run . --user root -- "make -C tools/testing/selftests/ \
KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/openvswitch.sh  | 49 ---
 .../selftests/net/openvswitch/ovs-dpctl.py|  1 +
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index bc71dbc18b21..83407b42073a 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -11,6 +11,7 @@ ksft_skip=4
 PAUSE_ON_FAIL=no
 VERBOSE=0
 TRACING=0
+WAIT_TIMEOUT=5
 
 tests="
arp_pingeth-arp: Basic arp ping between 
two NS
@@ -29,6 +30,32 @@ info() {
[ $VERBOSE = 0 ] || echo $*
 }
 
+ovs_wait() {
+   info "waiting $WAIT_TIMEOUT s for: $@"
+
+   "$@"
+   if [[ $? -eq 0 ]]; then
+   info "wait succeeded inmediately"
+   return 0
+   fi
+
+   # A quick re-check helps speed up small races in fast systems.
+   # However, fractional sleeps might not necessarily work.
+   local start=0
+   sleep 0.1 || { sleep 1; start=1; }
+
+   for (( i=start; i/dev/null 2>&1 || 
return 1
+   ovs_wait grep -q "userspace action command" $ovs_dir/s0.out
+   [[ $? -eq 0 ]] || return 1
 
# client -> server samples should only contain the first 14 bytes of 
the packet.
-   grep -E "rate:4294967295,group:1,cookie:c0ffee data:[0-9a-f]{28}$" \
-$ovs_dir/stdout >/dev/null 2>&1 || return 1
-   grep -E "rate:4294967295,group:2,cookie:eeff0c" \
-$ovs_dir/stdout >/dev/null 2>&1 || return 1
+   ovs_wait grep -qE "rate:4294967295,group:1,cookie:c0ffee 
data:[0-9a-f]{28}$" $ovs_dir/stdout
+   [[ $? -eq 0 ]] || return 1
+
+   ovs_wait grep -q "rate:4294967295,group:2,cookie:eeff0c" $ovs_dir/stdout
+   [[ $? -eq 0 ]] || return 1
 
return 0
 }
@@ -711,7 +739,8 @@ test_upcall_interfaces() {
ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \
172.31.110.1/24 -u || return 1
 
-   sleep 1
+   ovs_wait grep -q "listening on upcall packet handler" 
${ovs_dir}/left0.out
+
info "sending arping"
ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
>$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
@@ -811,6 +840,10 @@ shift $(($OPTIND-1))
 IFS="  
 "
 
+if test "X$KSFT_MACHINE_SLOW" == "Xyes"; then
+   WAIT_TIMEOUT=10
+fi
+
 for arg do
# Check first that all requested tests are available before running any
command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not 
found"; usage; }
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1e15b0818074..8a0396bfaf99 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -2520,6 +2520,7 @@ class PsampleEvent(EventSocket):
 marshal_class = psample_msg
 
 def read_samples(self):
+print("listening for psample events", flush=True)
 while True:
 try:
 for msg in self.get():
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v3 13/13] ofp-actions: Load data from fields in sample action.

2024-07-08 Thread Ilya Maximets
On 7/8/24 10:12, Adrián Moreno wrote:
> On Thu, Jul 04, 2024 at 11:25:34AM GMT, Dumitru Ceara wrote:
>> On 7/4/24 09:52, Adrian Moreno wrote:
>>> When sample action gets used as a way of sampling traffic with
>>> controller-generated metadata (i.e: obs_domain_id and obs_point_id),
>>> the controller will have to increase the number of flows to ensure each
>>> part of the pipeline contains the right metadata.
>>>
>>> As an example, if the controller decides to sample stateful traffic, it
>>> could store the computed metadata for each connection in the conntrack
>>> label. However, for established connections, a flow must be created for
>>> each different ct_label value with a sample action that contains a
>>> different hardcoded obs_domain and obs_point id.
>>>
>>> This patch adds a new version of the NXAST_RAW_SAMPLE* action (number 4)
>>> that supports specifying the observation point and domain using an
>>> OpenFlow field reference, so now the controller can express:
>>>
>>>  sample(...
>>> obs_domain_id=NXM_NX_CT_LABEL[0..31],
>>> obs_point_id=NXM_NX_CT_LABEL[32..63]
>>> ...
>>>)
>>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>
>> Hi Adrian,
>>
>> Thanks a lot for working on expanding the sampling support (in the
>> kernel too).
>>
>> I didn't review the patch (or the rest of the series) yet but one thing
>> we should definitely add to the non-RFC version is a way for controllers
>> to detect that this new action version is supported.
>>
>> In other occasions we've used the OVSDB.Datapath.Capabilities column to
>> report that new actions are supported (e.g., "ct_flush" for the
>> extension that allows flushing CT with a generic match).  It might make
>> sense to add another capability there for this new action version.
>>
> 
> Yep. I've sent v1 but it does not contain this since I wanted to do a
> quick respin removing the RFC now that the kernel part landed in
> net-next.
> 
> I'll include it in v2.


I didn't review the set yet, but while it can be argued that support
for the ct flush is partially a datapath feature, the new OF action
is definitely not, so it should not be exposed via datapath features.

Controller can either probe the action by starting and cancelling a
bundle and checking for particular error codes, or we should start
reporting match/action extensions via OFPTFPT_EXPERIMENTER property of
the table features.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 3/3] netdev-dpdk: Re-enable VXLAN/Geneve offload for Intel cards.

2024-07-08 Thread David Marchand
On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick  wrote:
>
> Previously the OVS support for checksum/TSO offloading didn't work well
> with some network cards that supported VXLAN/Geneve tunnel TSO but not
> outer UDP checksums. Now support for this configuration is improved and
> we no longer need to disable the VXLAN/Geneve TSO flags from intel
> hardware support flags.
>
> The modification to outer UDP offload is still required pending a future
> DPDK release.
>
> Suggested-by: David Marchand 
> Signed-off-by: Mike Pattrick 

Reviewed-by: David Marchand 
(for the record, I tested with E810 PF and VF)


Thanks Mike!

-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/3] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-07-08 Thread David Marchand
On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick  wrote:
> diff --git a/lib/netdev.c b/lib/netdev.c
> index f2d921ed6..866dbf3b7 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -69,8 +69,6 @@ COVERAGE_DEFINE(netdev_received);
>  COVERAGE_DEFINE(netdev_sent);
>  COVERAGE_DEFINE(netdev_add_router);
>  COVERAGE_DEFINE(netdev_get_stats);
> -COVERAGE_DEFINE(netdev_vxlan_tso_drops);
> -COVERAGE_DEFINE(netdev_geneve_tso_drops);
>  COVERAGE_DEFINE(netdev_push_header_drops);
>  COVERAGE_DEFINE(netdev_soft_seg_good);
>  COVERAGE_DEFINE(netdev_soft_seg_drops);
> @@ -910,28 +908,30 @@ netdev_send(struct netdev *netdev, int qid, struct 
> dp_packet_batch *batch,
>  struct dp_packet *packet;
>  int error;
>
> -if (userspace_tso_enabled() &&
> -!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) {
> -DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> -if (dp_packet_hwol_is_tso(packet)) {
> -if (dp_packet_hwol_is_tunnel_vxlan(packet)
> -&& !(netdev_flags & NETDEV_TX_VXLAN_TNL_TSO)) {
> -VLOG_WARN_RL(&rl, "%s: No VXLAN TSO support",
> - netdev_get_name(netdev));
> -COVERAGE_INC(netdev_vxlan_tso_drops);
> -dp_packet_delete_batch(batch, true);
> -return false;
> +if (userspace_tso_enabled()) {
> +if (!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) {
> +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> +if (dp_packet_hwol_is_tso(packet)) {
> +return netdev_send_tso(netdev, qid, batch, 
> concurrent_txq);
>  }
> -
> -if (dp_packet_hwol_is_tunnel_geneve(packet)
> -&& !(netdev_flags & NETDEV_TX_GENEVE_TNL_TSO)) {
> -VLOG_WARN_RL(&rl, "%s: No GENEVE TSO support",
> - netdev_get_name(netdev));
> -COVERAGE_INC(netdev_geneve_tso_drops);
> -dp_packet_delete_batch(batch, true);
> -return false;
> +}
> +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) {

Just checking this feature is not enough at this point.

Imagine a driver that has TSO support, but neither outer udp checksum
nor any tunnel tso support (the one I have in mind is net/ixgbe).
IIUC, a batch of (tunneled) tso packets with no request for outer udp
csum would not be segmented in sw (and it would probably be dropped by
hw).

I think this block should be moved after checking the presence of
*tnl_tso flags.


> +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> +if (dp_packet_hwol_is_tso(packet) &&
> +(dp_packet_hwol_is_tunnel_vxlan(packet) ||
> + dp_packet_hwol_is_tunnel_geneve(packet)) &&
> +dp_packet_hwol_is_outer_udp_cksum(packet)) {
> +return netdev_send_tso(netdev, qid, batch, 
> concurrent_txq);
> +}
> +}
> +} else if (!(netdev_flags & (NETDEV_TX_VXLAN_TNL_TSO |
> + NETDEV_TX_GENEVE_TNL_TSO))) {
> +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> +if (dp_packet_hwol_is_tso(packet) &&
> +(dp_packet_hwol_is_tunnel_vxlan(packet) ||
> + dp_packet_hwol_is_tunnel_geneve(packet))) {

Note: you can imagine a netdev supporting geneve tso, but not vxlan tso.
A batch containing only tso requests through a geneve tunnel would end
up being segmented in sw.

DPDK drivers either support both or none. And only those drivers
expose these netdev features in OVS.
So I don't think it is a real problem and I am ok with this hunk.


> +return netdev_send_tso(netdev, qid, batch, 
> concurrent_txq);
>  }
> -return netdev_send_tso(netdev, qid, batch, concurrent_txq);
>  }
>  }
>  }


A final note, I suspect all those checks negatively impact non tso
packets processing when OVS tso is enabled.

Would it be feasible to mark that tso (or a tx offload) has been
requested at the "batch" level?
This is more an optimisation... maybe in the future?



Overall the patch lgtm.
I'll be off soon, so with the issue I mentionned for net/ixgbe fixed,
feel free to add my review tag.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/3] userspace: Adjust segment size on encapsulation.

2024-07-08 Thread David Marchand
On Fri, Jul 5, 2024 at 10:44 PM Mike Pattrick  wrote:
>
> When prepending a tunnel header to a packet marked for segmentation, we
> need to adjust the segment size. Failure to do so can result in packets
> that are larger than the intended MTU post segmentation.
>
> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> Signed-off-by: Mike Pattrick 

Reviewed-by: David Marchand 


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 8/8] northd: Add ACL Sampling.

2024-07-08 Thread 0-day Robot
Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 96 characters long (recommended limit is 79)
#177 FILE: northd/northd.c:215:
 * ++--+ X |   
LB_L2_AFF_BACKEND_IP6   |

WARNING: Line is 96 characters long (recommended limit is 79)
#178 FILE: northd/northd.c:216:
 * | R1 | ORIG_DIP_IPV4 (>= IN_PRE_STATEFUL)   | R |(>= 
IN_LB_AFF_CHECK && |

WARNING: Line is 96 characters long (recommended limit is 79)
#179 FILE: northd/northd.c:217:
 * ++--+ E | <= 
IN_LB_AFF_LEARN)   |

WARNING: Line is 96 characters long (recommended limit is 79)
#183 FILE: northd/northd.c:220:
 * | R3 | OBS_POINT_ID_NEW |   |
   |

WARNING: Line is 96 characters long (recommended limit is 79)
#184 FILE: northd/northd.c:221:
 * ||   (>= ACL_EVAL* && <= ACL_ACTION*)   |   |
   |

WARNING: Comment with 'xxx' marker
#577 FILE: northd/northd.c:6840:
/* XXX: ACLs with action "pass" do not support sampling. */

WARNING: Line is 560 characters long (recommended limit is 79)
#1734 FILE: utilities/ovn-nbctl.8.xml:402:
[--type={switch | 
port-group}] [--log] 
[--meter=meter] 
[--severity=severity] 
[--name=name] [--label=label] 
[--sample-new=sample] 
[--sample-est=sample] [--may-exist] 
[--apply-after-lb] [--tier] acl-add 
entity direction priority match 
verdict

Lines checked: 1836, Warnings: 7, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 7/8] northd: Override NB_Global drop sampling id with Sampling_App config.

2024-07-08 Thread Dumitru Ceara
Signed-off-by: Dumitru Ceara 
---
 NEWS  |  3 +++
 northd/debug.c| 12 +++-
 northd/debug.h|  3 ++-
 northd/en-global-config.c | 31 +++
 northd/inc-proc-northd.c  |  1 +
 tests/ovn-northd.at   | 27 ++-
 6 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/NEWS b/NEWS
index 3e392ff08b..fcf182bc02 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,9 @@ Post v24.03.0
 ability to disable "VXLAN mode" to extend available tunnel IDs space for
 datapaths from 4095 to 16711680.  For more details see man ovn-nb(5) for
 mentioned option.
+  - The NB_Global.debug_drop_domain_id configured value is now overridden by
+the ID associated with the Sampling_App record created for drop sampling
+(Sampling_App.name configured to be "drop-sampling").
 
 OVN v24.03.0 - 01 Mar 2024
 --
diff --git a/northd/debug.c b/northd/debug.c
index 59da5d4f66..457993b7cf 100644
--- a/northd/debug.c
+++ b/northd/debug.c
@@ -3,6 +3,7 @@
 #include 
 
 #include "debug.h"
+#include "en-sampling-app.h"
 
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/vlog.h"
@@ -26,16 +27,17 @@ debug_enabled(void)
 }
 
 void
-init_debug_config(const struct nbrec_nb_global *nb)
+init_debug_config(const struct nbrec_nb_global *nb,
+  uint8_t drop_domain_id_override)
 {
-
 const struct smap *options = &nb->options;
 uint32_t collector_set_id = smap_get_uint(options,
   "debug_drop_collector_set",
   0);
-uint32_t observation_domain_id = smap_get_uint(options,
-   "debug_drop_domain_id",
-   0);
+uint32_t observation_domain_id =
+drop_domain_id_override != SAMPLING_APP_ID_NONE
+? drop_domain_id_override
+: smap_get_uint(options, "debug_drop_domain_id", 0);
 
 if (collector_set_id != config.collector_set_id ||
 observation_domain_id != config.observation_domain_id ||
diff --git a/northd/debug.h b/northd/debug.h
index c1a5e5aadb..a0b535253a 100644
--- a/northd/debug.h
+++ b/northd/debug.h
@@ -21,7 +21,8 @@
 #include "lib/ovn-nb-idl.h"
 #include "openvswitch/dynamic-string.h"
 
-void init_debug_config(const struct nbrec_nb_global *nb);
+void init_debug_config(const struct nbrec_nb_global *nb,
+   uint8_t drop_domain_id_override);
 void destroy_debug_config(void);
 
 const char *debug_drop_action(void);
diff --git a/northd/en-global-config.c b/northd/en-global-config.c
index 5b71ede1f2..c683c8fae5 100644
--- a/northd/en-global-config.c
+++ b/northd/en-global-config.c
@@ -24,6 +24,7 @@
 /* OVN includes */
 #include "debug.h"
 #include "en-global-config.h"
+#include "en-sampling-app.h"
 #include "include/ovn/features.h"
 #include "ipam.h"
 #include "lib/ovn-nb-idl.h"
@@ -42,8 +43,10 @@ static bool chassis_features_changed(const struct 
chassis_features *,
 static bool config_out_of_sync(const struct smap *config,
const struct smap *saved_config,
const char *key, bool must_be_present);
-static bool check_nb_options_out_of_sync(const struct nbrec_nb_global *,
- struct ed_type_global_config *);
+static bool check_nb_options_out_of_sync(
+const struct nbrec_nb_global *,
+struct ed_type_global_config *,
+const struct sampling_app_table *);
 static void update_sb_config_options_to_sbrec(struct ed_type_global_config *,
   const struct sbrec_sb_global *);
 
@@ -72,6 +75,9 @@ en_global_config_run(struct engine_node *node , void *data)
 EN_OVSDB_GET(engine_get_input("SB_sb_global", node));
 const struct sbrec_chassis_table *sbrec_chassis_table =
 EN_OVSDB_GET(engine_get_input("SB_chassis", node));
+const struct ed_type_sampling_app_data *sampling_app_data =
+engine_get_input_data("sampling_app", node);
+const struct sampling_app_table *sampling_apps = &sampling_app_data->apps;
 
 struct ed_type_global_config *config_data = data;
 
@@ -145,7 +151,8 @@ en_global_config_run(struct engine_node *node , void *data)
 build_chassis_features(sbrec_chassis_table, &config_data->features);
 }
 
-init_debug_config(nb);
+init_debug_config(nb, sampling_app_get_id(sampling_apps,
+  SAMPLING_APP_DROP_DEBUG));
 
 const struct sbrec_sb_global *sb =
 sbrec_sb_global_table_first(sb_global_table);
@@ -186,6 +193,9 @@ global_config_nb_global_handler(struct engine_node *node, 
void *data)
 EN_OVSDB_GET(engine_get_input("NB_nb_global", node));
 const struct sbrec_sb_global_table *sb_global_table =
 EN_OVSDB_GET(engine_get_input("SB_sb_global", node));
+const struct ed_type_sampl

[ovs-dev] [PATCH ovn 8/8] northd: Add ACL Sampling.

2024-07-08 Thread Dumitru Ceara
From: Adrian Moreno 

Introduce a new table called Sample where per-flow IPFIX configuration
can be specified.
Also, reference rows from such table from the ACL table to enable the
configuration of ACL sampling. If enabled, northd will add a sample
action to each ACL related logical flow.

Packets that hit stateful ACLs are sampled in different ways depending
whether they are initiating a new session or are just forwarded on an
existing (already allowed) session.  Two new columns ("sample_new" and
"sample_est") are added to the ACL table to allow for potentially
different sampling rates for the two cases.

Note: If an ACL has both sampling enabled and a label associated to it
then the label value overrides the observation point ID defined in the
sample configuration.  This is a side effect of the implementation
(observation point IDs are stored in conntrack in the same part of the
ct_label where ACL labels are also stored).  The two features
(sampling and ACL labels) serve however similar purposes so it's not
expected that they're both enabled together.

When sampling is enabled on an ACL additional logical flows are created
for that ACL (one for stateless ACLs and 3 for stateful ACLs) in the ACL
action stage of the logical pipeline.  These additional flows match on a
combination of conntrack state values and observation point id values
(either against a logical register or against the stored ct_label state)
in order to determine whether the packets hitting the ACLs must be
sampled or not.  This comes with a slight increase in the number of
logical flows and in the number of OpenFlow rules.  The number of
additional flows _does not_ depend on the original ACL match or action.

New --sample-new and --sample-est optional arguments are added to the
'ovn-nbctl acl-add' command to allow configuring these new types of
sampling for ACLs.

An example workflow of configuring ACL samples is:
  # Create Sampling_App mappings for ACL traffic types:
  ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \
id="42"
  ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \
id="43"
  # Create two sample collectors, one that samples all packets (c1)
  # and another one that samples with a probability of 10% (c2):
  c1=$(ovn-nbctl create Sample_Collector name=c1 \
   probability=65535 set_id=1)
  c2=$(ovn-nbctl create Sample_Collector name=c2 \
   probability=6553 set_id=2)
  # Create two sample configurations (for new and for established
  # traffic):
  s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301)
  s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302)
  # Create an ingress ACL to allow IP traffic:
  ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \
from-lport 1 "ip" allow-related

The config above will generate IPFIX samples with:
- observation domain id set to 42 (Sampling_App
  "acl-new-traffic-sampling" config) and observation point id
  set to 4301 (Sample s1) for packets that create a new
  connection
- observation domain id set to 43 (Sampling_app
  "acl-est-traffic-sampling" config) and observation point id
  set to 4302 (Sample s2) for packets that are part of an already
  existing connection

Reported-at: https://issues.redhat.com/browse/FDP-305
Signed-off-by: Adrian Moreno 
Co-authored-by: Dumitru Ceara 
Signed-off-by: Dumitru Ceara 
---
 NEWS   |   3 +
 include/ovn/logical-fields.h   |   2 +
 lib/logical-fields.c   |   6 +
 northd/northd.c| 519 +++--
 northd/ovn-northd.8.xml|  26 ++
 ovn-nb.ovsschema   |  44 ++-
 ovn-nb.xml |  56 +++
 tests/atlocal.in   |   6 +
 tests/ovn-macros.at|   4 +
 tests/ovn-nbctl.at |  20 +
 tests/ovn-northd.at| 240 ++--
 tests/ovn.at   |   3 +
 tests/system-common-macros.at  |  11 +
 tests/system-ovn.at| 149 +++
 utilities/containers/fedora/Dockerfile |   1 +
 utilities/containers/ubuntu/Dockerfile |   1 +
 utilities/ovn-nbctl.8.xml  |   8 +-
 utilities/ovn-nbctl.c  |  43 +-
 18 files changed, 1079 insertions(+), 63 deletions(-)

diff --git a/NEWS b/NEWS
index fcf182bc02..7899c623f2 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,9 @@ Post v24.03.0
   - The NB_Global.debug_drop_domain_id configured value is now overridden by
 the ID associated with the Sampling_App record created for drop sampling
 (Sampling_App.name configured to be "drop-sampling").
+  - Add support for ACL sampling through the new Sample_Collector and Sample
+tables.  Sampling is supported for both traffic that creates new
+connections and for traffic that is part of an existing connection.
 
 OVN v24.03.0 - 01 Mar 2024
 -

[ovs-dev] [PATCH ovn 5/8] northd: Commit from-lport ACL label (and state) when LBs are used.

2024-07-08 Thread Dumitru Ceara
Quoting the ACL label section in the ovn.nb.5 man page:

  Associates an identifier with the ACL.
  The same value will be written to corresponding connection
  tracker entry. The value should be a valid 32-bit unsigned integer.
  This value can help in debugging from connection tracker side.
  For example, through this "label" we can backtrack to the ACL rule
  which is causing a "leaked" connection. Connection tracker entries are
  created only for allowed connections so the label is valid only
  for allow and allow-related actions.

The above states that the ACL label must always be stored in the
connection tracker entry label for allow-related ACLs (regardless of the
direction of the ACL).  However, since 74d82e296f80 ("northd: Support
the option to apply from-lport ACLs after load balancer."), the
connection is not re-committed in the ls_in_stateful stage (because it
already was committed as part of the load balancer DNAT).

Moreover, by not re-committing the connection after LB we also risk not
re-setting any potential ct_mark.blocked value the connection might
have.

This patch addresses the issue by always committing packets matched by
allow-related (or stateful in general) ACLs even if they were also
committed as part of the load balancing stage.

There's potentially a slight overhead when doing this (an additional
commit call into conntrack but _no_ recirculation).  This is however
acceptable as it is required for a correct packet processing pipeline
implementation.  Even without this fix, packets creating new
connections tha hit "--apply-after-lb" ACLs trigger a re-commit (for
storing the label and ct_mark.blocked).

A new test is added to ensure we don't break this functionality in the
future.

CC: Numan Siddique 
Fixes: 74d82e296f80 ("northd: Support the option to apply from-lport ACLs after 
load balancer.")
Signed-off-by: Dumitru Ceara 
---
 northd/northd.c |   8 +---
 tests/ovn-northd.at | 110 +++-
 tests/ovn.at|   4 +-
 3 files changed, 82 insertions(+), 40 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index fcf8f277ac..901b9e9cd1 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -7494,8 +7494,7 @@ build_lb_affinity_ls_flows(struct lflow_table *lflows,
 ipv6 ? REG_LB_L2_AFF_BACKEND_IP6 : REG_LB_AFF_BACKEND_IP4;
 
 /* Prepare common part of affinity LB and affinity learn action. */
-ds_put_format(&aff_action, REGBIT_CONNTRACK_COMMIT" = 0; %s = %s; ",
-  reg_vip, lb_vip->vip_str);
+ds_put_format(&aff_action, "%s = %s; ", reg_vip, lb_vip->vip_str);
 ds_put_cstr(&aff_action_learn, "commit_lb_aff(vip = \"");
 
 if (lb_vip->port_str) {
@@ -7635,11 +7634,6 @@ build_lb_rules(struct lflow_table *lflows, struct 
ovn_lb_datapaths *lb_dps,
 ds_clear(action);
 ds_clear(match);
 
-/* Make sure that we clear the REGBIT_CONNTRACK_COMMIT flag.  Otherwise
- * the load balanced packet will be committed again in
- * S_SWITCH_IN_STATEFUL. */
-ds_put_format(action, REGBIT_CONNTRACK_COMMIT" = 0; ");
-
 /* New connections in Ingress table. */
 const char *meter = NULL;
 bool reject = build_lb_vip_actions(lb, lb_vip, lb_vip_nb, action,
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 6802fac380..5acb13c519 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1413,7 +1413,7 @@ check ovn-nbctl --wait=sb ls-lb-add sw0 lb1
 AT_CAPTURE_FILE([sbflows])
 OVS_WAIT_FOR_OUTPUT(
   [ovn-sbctl dump-flows sw0 | tee sbflows | grep 'priority=120.*backends' | 
ovn_strip_lflows], 0, [dnl
-  table=??(ls_in_lb   ), priority=120  , match=(ct.new && ip4.dst == 
10.0.0.10 && tcp.dst == 80), action=(reg0[[1]] = 0; 
ct_lb_mark(backends=10.0.0.3:80,20.0.0.3:80);)
+  table=??(ls_in_lb   ), priority=120  , match=(ct.new && ip4.dst == 
10.0.0.10 && tcp.dst == 80), 
action=(ct_lb_mark(backends=10.0.0.3:80,20.0.0.3:80);)
 ])
 
 # disabled LSPs should not be a backend of Load Balancer
@@ -1422,7 +1422,7 @@ check ovn-nbctl lsp-set-enabled sw0-p1 disabled
 AT_CAPTURE_FILE([sbflows])
 OVS_WAIT_FOR_OUTPUT(
   [ovn-sbctl dump-flows sw0 | tee sbflows | grep 'priority=120.*backends' | 
ovn_strip_lflows], 0, [dnl
-  table=??(ls_in_lb   ), priority=120  , match=(ct.new && ip4.dst == 
10.0.0.10 && tcp.dst == 80), action=(reg0[[1]] = 0; 
ct_lb_mark(backends=20.0.0.3:80);)
+  table=??(ls_in_lb   ), priority=120  , match=(ct.new && ip4.dst == 
10.0.0.10 && tcp.dst == 80), action=(ct_lb_mark(backends=20.0.0.3:80);)
 ])
 wait_row_count Service_Monitor 1
 
@@ -1431,7 +1431,7 @@ check ovn-nbctl lsp-set-enabled sw0-p1 enabled
 AT_CAPTURE_FILE([sbflows])
 OVS_WAIT_FOR_OUTPUT(
   [ovn-sbctl dump-flows sw0 | tee sbflows | grep 'priority=120.*backends' | 
ovn_strip_lflows], 0, [dnl
-  table=??(ls_in_lb   ), priority=120  , match=(ct.new && ip4.dst == 
10.0.0.10 && tcp.dst == 80), action=(reg0[[1]] = 0; 
ct_lb_mark(backend

[ovs-dev] [PATCH ovn 3/8] northd: Assume all chassis support the "ovn-ct-lb-related" feature.

2024-07-08 Thread Dumitru Ceara
This feature is supported in the last two LTS releases and the correct
upgrade procedure mandates that we don't jump across LTS releases.  It's
safe to remove the check in northd.

Signed-off-by: Dumitru Ceara 
---
 northd/en-global-config.c |  14 
 northd/en-global-config.h |   1 -
 northd/inc-proc-northd.c  |   2 -
 northd/northd.c   |  38 ---
 tests/ovn-northd.at   | 130 --
 5 files changed, 14 insertions(+), 171 deletions(-)

diff --git a/northd/en-global-config.c b/northd/en-global-config.c
index 183b535dee..5b71ede1f2 100644
--- a/northd/en-global-config.c
+++ b/northd/en-global-config.c
@@ -367,7 +367,6 @@ northd_enable_all_features(struct ed_type_global_config 
*data)
 {
 data->features = (struct chassis_features) {
 .mac_binding_timestamp = true,
-.ct_lb_related = true,
 .fdb_timestamp = true,
 .ls_dpg_column = true,
 .ct_commit_nat_v2 = true,
@@ -398,15 +397,6 @@ build_chassis_features(const struct sbrec_chassis_table 
*sbrec_chassis_table,
 chassis_features->mac_binding_timestamp = false;
 }
 
-bool ct_lb_related =
-smap_get_bool(&chassis->other_config,
-  OVN_FEATURE_CT_LB_RELATED,
-  false);
-if (!ct_lb_related &&
-chassis_features->ct_lb_related) {
-chassis_features->ct_lb_related = false;
-}
-
 bool fdb_timestamp =
 smap_get_bool(&chassis->other_config,
   OVN_FEATURE_FDB_TIMESTAMP,
@@ -568,10 +558,6 @@ chassis_features_changed(const struct chassis_features 
*present,
 return true;
 }
 
-if (present->ct_lb_related != updated->ct_lb_related) {
-return true;
-}
-
 if (present->fdb_timestamp != updated->fdb_timestamp) {
 return true;
 }
diff --git a/northd/en-global-config.h b/northd/en-global-config.h
index c3cc881371..8a1c35fc8f 100644
--- a/northd/en-global-config.h
+++ b/northd/en-global-config.h
@@ -15,7 +15,6 @@ struct sbrec_sb_global;
 
 struct chassis_features {
 bool mac_binding_timestamp;
-bool ct_lb_related;
 bool fdb_timestamp;
 bool ls_dpg_column;
 bool ct_commit_nat_v2;
diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
index 180b2be3e9..522236ad2a 100644
--- a/northd/inc-proc-northd.c
+++ b/northd/inc-proc-northd.c
@@ -463,8 +463,6 @@ chassis_features_list(struct unixctl_conn *conn, int argc 
OVS_UNUSED,
 struct chassis_features *features = features_;
 struct ds ds = DS_EMPTY_INITIALIZER;
 
-ds_put_format(&ds, "ct_lb_related: %s\n",
-  features->ct_lb_related ? "true" : "false");
 ds_put_format(&ds, "mac_binding_timestamp: %s\n",
   features->mac_binding_timestamp ? "true" : "false");
 unixctl_command_reply(conn, ds_cstr(&ds));
diff --git a/northd/northd.c b/northd/northd.c
index 9cc6e6c14f..fcf8f277ac 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -6839,7 +6839,6 @@ build_acl_log_related_flows(const struct ovn_datapath *od,
 static void
 build_acls(const struct ls_stateful_record *ls_stateful_rec,
const struct ovn_datapath *od,
-   const struct chassis_features *features,
struct lflow_table *lflows,
const struct ls_port_group_table *ls_port_groups,
const struct shash *meter_groups,
@@ -6982,15 +6981,10 @@ build_acls(const struct ls_stateful_record 
*ls_stateful_rec,
  * related traffic such as an ICMP Port Unreachable through
  * that's generated from a non-listening UDP port.  */
 const char *ct_in_acl_action =
-features->ct_lb_related
-? REGBIT_ACL_HINT_ALLOW_REL" = 1; "
-  REGBIT_ACL_VERDICT_ALLOW" = 1; ct_commit_nat;"
-: REGBIT_ACL_HINT_ALLOW_REL" = 1; "
-  REGBIT_ACL_VERDICT_ALLOW" = 1; next;";
+REGBIT_ACL_HINT_ALLOW_REL" = 1; "
+REGBIT_ACL_VERDICT_ALLOW" = 1; ct_commit_nat;";
 const char *ct_out_acl_action =
-features->ct_lb_related
-? REGBIT_ACL_VERDICT_ALLOW" = 1; ct_commit_nat;"
-: REGBIT_ACL_VERDICT_ALLOW" = 1; next;";
+REGBIT_ACL_VERDICT_ALLOW" = 1; ct_commit_nat;";
 ds_clear(&match);
 ds_put_format(&match, "!ct.est && ct.rel && !ct.new%s "
   "&& ct_mark.blocked == 0",
@@ -15177,7 +15171,7 @@ build_lrouter_nat_defrag_and_lb(
  * a dynamically negotiated FTP data channel), but will allow
  * related traffic such as an ICMP Port Unreachable through
  * that's generated from a non-listening UDP port.  */
-if (lr_stateful_rec->has_lb_vip && features->ct_lb_related) {
+if (lr_stateful_rec->has_lb_vip) {
 ds_clear(match);
 
 ds_put_cstr(match, "ct.rel && !ct.est && !ct.new");
@@ -15197,18 +15191,16 @@ build_lrouter_nat_defrag_and_lb(
 ds_truncate(match, matc

[ovs-dev] [PATCH ovn 6/8] northd: Add Sampling_App table.

2024-07-08 Thread Dumitru Ceara
This will represent a unified place to store IPFIX observation domain ID
configurations for sampling applications (currently only drop sampling
is supported as application but following commits will add more).

Signed-off-by: Dumitru Ceara 
---
 northd/automake.mk   |   2 +
 northd/en-lflow.c|   5 ++
 northd/en-sampling-app.c | 120 +++
 northd/en-sampling-app.h |  51 +
 northd/inc-proc-northd.c |  10 +++-
 northd/northd.h  |   1 +
 ovn-nb.ovsschema |  21 ++-
 ovn-nb.xml   |  17 ++
 tests/ovn-northd.at  |  17 ++
 9 files changed, 241 insertions(+), 3 deletions(-)
 create mode 100644 northd/en-sampling-app.c
 create mode 100644 northd/en-sampling-app.h

diff --git a/northd/automake.mk b/northd/automake.mk
index d491973a8b..6566ad2999 100644
--- a/northd/automake.mk
+++ b/northd/automake.mk
@@ -32,6 +32,8 @@ northd_ovn_northd_SOURCES = \
northd/en-lr-stateful.h \
northd/en-ls-stateful.c \
northd/en-ls-stateful.h \
+   northd/en-sampling-app.c \
+   northd/en-sampling-app.h \
northd/inc-proc-northd.c \
northd/inc-proc-northd.h \
northd/ipam.c \
diff --git a/northd/en-lflow.c b/northd/en-lflow.c
index c4b927fb8c..eb91f2a651 100644
--- a/northd/en-lflow.c
+++ b/northd/en-lflow.c
@@ -25,6 +25,7 @@
 #include "en-ls-stateful.h"
 #include "en-northd.h"
 #include "en-meters.h"
+#include "en-sampling-app.h"
 #include "lflow-mgr.h"
 
 #include "lib/inc-proc-eng.h"
@@ -86,6 +87,10 @@ lflow_get_input_data(struct engine_node *node,
 lflow_input->ovn_internal_version_changed =
 global_config->ovn_internal_version_changed;
 lflow_input->svc_monitor_mac = global_config->svc_monitor_mac;
+
+struct ed_type_sampling_app_data *sampling_app_data =
+engine_get_input_data("sampling_app", node);
+lflow_input->sampling_apps = &sampling_app_data->apps;
 }
 
 void en_lflow_run(struct engine_node *node, void *data)
diff --git a/northd/en-sampling-app.c b/northd/en-sampling-app.c
new file mode 100644
index 00..8d2d45a90f
--- /dev/null
+++ b/northd/en-sampling-app.c
@@ -0,0 +1,120 @@
+/*
+ * Copyright (c) 2024, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "openvswitch/vlog.h"
+
+#include "en-sampling-app.h"
+
+VLOG_DEFINE_THIS_MODULE(en_sampling_app);
+
+/* Static function declarations. */
+static void sampling_app_table_add(struct sampling_app_table *,
+   const struct nbrec_sampling_app *);
+static uint8_t sampling_app_table_get_id(const struct sampling_app_table *,
+ enum sampling_app_type);
+static void sampling_app_table_reset(struct sampling_app_table *);
+static enum sampling_app_type sampling_app_get_by_name(const char *app_name);
+
+void *
+en_sampling_app_init(struct engine_node *node OVS_UNUSED,
+ struct engine_arg *arg OVS_UNUSED)
+{
+struct ed_type_sampling_app_data *data = xzalloc(sizeof *data);
+sampling_app_table_reset(&data->apps);
+return data;
+}
+
+void
+en_sampling_app_cleanup(void *data OVS_UNUSED)
+{
+}
+
+void
+en_sampling_app_run(struct engine_node *node, void *data_)
+{
+const struct nbrec_sampling_app_table *nb_sampling_app_table =
+EN_OVSDB_GET(engine_get_input("NB_sampling_app", node));
+struct ed_type_sampling_app_data *data = data_;
+
+sampling_app_table_reset(&data->apps);
+
+const struct nbrec_sampling_app *sa;
+NBREC_SAMPLING_APP_TABLE_FOR_EACH (sa, nb_sampling_app_table) {
+sampling_app_table_add(&data->apps, sa);
+}
+
+engine_set_node_state(node, EN_UPDATED);
+}
+
+uint8_t
+sampling_app_get_id(const struct sampling_app_table *app_table,
+enum sampling_app_type app_type)
+{
+return sampling_app_table_get_id(app_table, app_type);
+}
+
+/* Static functions. */
+static
+void sampling_app_table_add(struct sampling_app_table *table,
+const struct nbrec_sampling_app *sa)
+{
+enum sampling_app_type app_type = sampling_app_get_by_name(sa->name);
+
+if (app_type == SAMPLING_APP_MAX) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_WARN_RL(&rl, "Unexpected Sampling_App name: %s", sa->name);
+return;
+}
+table->app_ids[app_type] = sa->id;
+}
+
+static
+uint8_t sampling_app_table_get_id(const struct

[ovs-dev] [PATCH ovn 4/8] tests: Fix unreliable "ACL and committing to conntrack" system test.

2024-07-08 Thread Dumitru Ceara
For the following logical topology:
VIF1 -- LS1 (stateful ACL) -- LR (no NAT) -- LS2 (stateful ACL) -- VIF2

The test was trying to determine whether a commit happened in the egress
pipeline of LS1 when forwarding packets through the logical patch port
towards LR.  There is unfortunately no reliable way of doing that by
just checking the datapath flows.

Instead, add a test that uses ovn-trace to ensure that the commit
doesn't happen.

This change is required because a follow up fix will add a missing
(re-)commit to the ingress pipeline of LS1 which was causing the
original test to incorrectly fail.

CC: Xavier Simonart 
Fixes: d17ece7f3706 ("northd: prevents sending packet to conntrack for router 
ports")
Signed-off-by: Dumitru Ceara 
---
 tests/ovn-northd.at | 63 +
 tests/system-ovn.at |  5 
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index e345e6f591..6802fac380 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -10211,6 +10211,69 @@ acl_test to-lport "" pg ls_out_acl_eval
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([ACL - ct state clear on router ports])
+AT_KEYWORDS([acl])
+
+ovn_start
+
+dnl Topology: vm1 -- s1 -- r1  -- s2 -- vm2
+dnl - LBs applied on s1 and s2 (all ACLs become stateful)
+dnl - allow ACLs on s1 and s2
+
+check ovn-nbctl  \
+  -- lr-add r1   \
+  -- lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24 \
+  -- lrp-add r1 r1_s2 00:de:ad:fe:00:02 173.0.2.1/24 \
+ \
+  -- ls-add s1   \
+  -- lsp-add s1 s1_r1\
+  -- lsp-set-type s1_r1 router   \
+  -- lsp-set-addresses s1_r1 router  \
+  -- lsp-set-options s1_r1 router-port=r1_s1 \
+ \
+  -- ls-add s2   \
+  -- lsp-add s2 s2_r1\
+  -- lsp-set-type s2_r1 router   \
+  -- lsp-set-addresses s2_r1 router  \
+  -- lsp-set-options s2_r1 router-port=r1_s2 \
+ \
+  -- lsp-add s1 vm1  \
+  -- lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2" \
+ \
+  -- lsp-add s2 vm2  \
+  -- lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2" \
+ \
+  -- lb-add lb1 30.0.0.1 173.0.2.2   \
+  -- lb-add lb2 173.0.2.250 173.0.1.3\
+  -- ls-lb-add s1 lb1\
+  -- ls-lb-add s2 lb2\
+ \
+  -- acl-add s1 from-lport 1001 "ip" allow   \
+  -- acl-add s1 to-lport 1002 "ip" allow \
+  -- acl-add s2 from-lport 1003 "ip" allow   \
+  -- acl-add s2 to-lport 1004 "ip" allow
+
+check ovn-nbctl --wait=sb sync
+
+dnl Simulate a packet going from vm1 -> router -> vm2.
+dnl It should not commit anything in the egress pipeline of S1 or in the
+dnl ingress pipeline of S2.
+flow="inport == \"vm1\" && eth.src == 00:de:ad:01:00:01 && eth.dst == 
00:de:ad:fe:00:01 && ip4.src == 173.0.1.2 && ip4.dst == 30.0.0.1 && ip.ttl==64"
+
+dnl Check that we only commit once for ACLs, in the egress ACL pipeline
+dnl (in S2, towards vm2).  The original problem this test is trying to
+dnl cover was that ct_state wasn't cleared when traversing from s1 -> r1
+dnl which caused two additional commits to happen:
+dnl - in the egress pipeline of S1, when sending the packet out on s1_r1
+dnl - in the ingress pipeline of S2, when processing the packet on s2_r1
+AT_CHECK_UNQUOTED([ovn_trace --ct new --ct new --ct new s1 "$flow" | grep -e 
ls_in_stateful -e ls_out_stateful -A 2 | grep commit], [0], [dnl
+ct_commit { ct_mark.blocked = 0; };
+])
+
+AT_CLEANUP
+])
+
 AT_SETUP([Localnet ports on LS with LB])
 ovn_start
 # In the past, traffic arriving on localnet ports has skipped conntrack.
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index c24ede7c50..ddb3d14e92 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -10632,11 +10632,6 @@ 
icmp,orig=(src=173.0.1.2,dst=173.0.2.2,id=,type=8,code=0),reply=(src=17
 
icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=,type=8,code=0),reply=(src=173.0.2.2,dst=173.0.1.2,id=,type=0,code=0),zone=,mark=2
 ])
 
-# Now check for multiple ct_commits
-ovs-appctl dpctl/dump-flows > dp_flows
-zone_id=$(ovn-appctl -t ovn-controller ct-zone-list | grep vm1 | cut -d ' ' 
-f2)
-AT_CHECK([test 1 = `cat dp_flows | grep "commit,zone=$zone_id" | wc -l`])
-
 

[ovs-dev] [PATCH ovn 2/8] northd: Assume all chassis support the "ct-no-masked-label" feature.

2024-07-08 Thread Dumitru Ceara
This feature is supported in the last two LTS releases and the correct
upgrade procedure mandates that we don't jump across LTS releases.  It's
safe to remove the check in northd.

Also remove logical field definitions for ct_label.s that cannot
appear anymore in the southbound database.

Signed-off-by: Dumitru Ceara 
---
 controller/lflow.c  |  39 ++--
 controller/lflow.h  |   1 -
 controller/ovn-controller.c |  22 -
 lib/logical-fields.c|  22 -
 northd/en-global-config.c   |  23 -
 northd/en-global-config.h   |   1 -
 northd/inc-proc-northd.c|   2 -
 northd/northd.c | 190 
 ovn-sb.xml  |  19 
 tests/ovn-controller.at |   8 +-
 tests/ovn-northd.at | 179 +
 tests/ovn.at|   9 +-
 12 files changed, 77 insertions(+), 438 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index b4c379044f..9f6564787f 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -100,7 +100,6 @@ consider_logical_flow(const struct sbrec_logical_flow 
*lflow,
 static void
 consider_lb_hairpin_flows(const struct ovn_controller_lb *lb,
   const struct hmap *local_datapaths,
-  bool use_ct_mark,
   struct ovn_desired_flow_table *flow_table);
 
 static void add_port_sec_flows(const struct shash *binding_lports,
@@ -1631,7 +1630,6 @@ static void
 add_lb_vip_hairpin_flows(const struct ovn_controller_lb *lb,
  struct ovn_lb_vip *lb_vip,
  struct ovn_lb_backend *lb_backend,
- bool use_ct_mark,
  struct ovn_desired_flow_table *flow_table)
 {
 uint64_t stub[1024 / 8];
@@ -1722,29 +1720,13 @@ add_lb_vip_hairpin_flows(const struct ovn_controller_lb 
*lb,
  * - packets must have ip.src == ip.dst at this point.
  * - the destination protocol and port must be of a valid backend that
  *   has the same IP as ip.dst.
- *
- * During upgrades logical flows might still use the old way of storing
- * ct.natted in ct_label.  For backwards compatibility, only use ct_mark
- * if ovn-northd notified ovn-controller to do that.
  */
-if (use_ct_mark) {
-uint32_t lb_ct_mark = OVN_CT_NATTED;
-match_set_ct_mark_masked(&hairpin_match, lb_ct_mark, lb_ct_mark);
+uint32_t lb_ct_mark = OVN_CT_NATTED;
+match_set_ct_mark_masked(&hairpin_match, lb_ct_mark, lb_ct_mark);
 
-ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
-lb->slb->header_.uuid.parts[0], &hairpin_match,
-&ofpacts, &lb->slb->header_.uuid);
-} else {
-match_set_ct_mark_masked(&hairpin_match, 0, 0);
-ovs_u128 lb_ct_label = {
-.u64.lo = OVN_CT_NATTED,
-};
-match_set_ct_label_masked(&hairpin_match, lb_ct_label, lb_ct_label);
-
-ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
-lb->slb->header_.uuid.parts[0], &hairpin_match,
-&ofpacts, &lb->slb->header_.uuid);
-}
+ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
+lb->slb->header_.uuid.parts[0], &hairpin_match,
+&ofpacts, &lb->slb->header_.uuid);
 
 ofpbuf_uninit(&ofpacts);
 }
@@ -1967,7 +1949,6 @@ add_lb_ct_snat_hairpin_flows(const struct 
ovn_controller_lb *lb,
 static void
 consider_lb_hairpin_flows(const struct ovn_controller_lb *lb,
   const struct hmap *local_datapaths,
-  bool use_ct_mark,
   struct ovn_desired_flow_table *flow_table)
 {
 for (size_t i = 0; i < lb->n_vips; i++) {
@@ -1976,8 +1957,7 @@ consider_lb_hairpin_flows(const struct ovn_controller_lb 
*lb,
 for (size_t j = 0; j < lb_vip->n_backends; j++) {
 struct ovn_lb_backend *lb_backend = &lb_vip->backends[j];
 
-add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend,
- use_ct_mark, flow_table);
+add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend, flow_table);
 }
 }
 
@@ -1989,13 +1969,11 @@ consider_lb_hairpin_flows(const struct 
ovn_controller_lb *lb,
 static void
 add_lb_hairpin_flows(const struct hmap *local_lbs,
  const struct hmap *local_datapaths,
- bool use_ct_mark,
  struct ovn_desired_flow_table *flow_table)
 {
 const struct ovn_controller_lb *lb;
 HMAP_FOR_EACH (lb, hmap_node, local_lbs) {
-consider_lb_hairpin_flows(lb, local_datapaths,
-  use_ct_mark, flow_table);
+consider_lb_hairpin_flows(lb, local_datapaths, flow_table);
 }
 }
 
@@ -2155,7 +2133,6 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct 
lflow_ctx_out *l_ctx_out)
l_ctx_out->flow

[ovs-dev] [PATCH ovn 1/8] northd: Assume all chassis support the "port-up-notif" feature.

2024-07-08 Thread Dumitru Ceara
This feature is supported in the last two LTS releases and the correct
upgrade procedure mandates that we don't jump across LTS releases.  It's
safe to remove the check in northd.

Signed-off-by: Dumitru Ceara 
---
 northd/northd.c |  3 +--
 tests/ovn-northd.at | 24 
 2 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 6898daa00d..1b5a7480e4 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -17903,8 +17903,7 @@ handle_port_binding_changes(struct ovsdb_idl_txn 
*ovnsb_txn,
 if (lsp_is_router(op->nbsp)) {
 up = true;
 } else if (sb->chassis) {
-up = smap_get_bool(&sb->chassis->other_config,
-   OVN_FEATURE_PORT_UP_NOTIF, false)
+up = !smap_get_bool(&sb->chassis->other_config, "is-remote", false)
  ? sb->n_up && sb->up[0]
  : true;
 }
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index a389d19886..7dc94e1f56 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -4168,30 +4168,6 @@ AT_CHECK([grep -qE 'duplicate logical.*port p1' 
northd/ovn-northd.log], [0])
 AT_CLEANUP
 ])
 
-OVN_FOR_EACH_NORTHD_NO_HV([
-AT_SETUP([Port_Binding.up backwards compatibility])
-ovn_start
-
-ovn-nbctl ls-add ls1
-ovn-nbctl --wait=sb lsp-add ls1 lsp1
-
-# Simulate the fact that lsp1 had been previously bound on hv1 by an
-# ovn-controller running an older version.
-ovn-sbctl \
---id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="geneve" \
--- --id=@c create chassis name=hv1 encaps=@e \
--- set Port_Binding lsp1 chassis=@c
-
-wait_for_ports_up lsp1
-
-# Simulate the fact that hv1 is aware of Port_Binding.up, ovn-northd
-# should transition the port state to down.
-check ovn-sbctl set chassis hv1 other_config:port-up-notif=true
-wait_row_count nb:Logical_Switch_Port 1 up=false name=lsp1
-
-AT_CLEANUP
-])
-
 OVN_FOR_EACH_NORTHD_NO_HV_PARALLELIZATION([
 AT_SETUP([Load Balancers and lb_force_snat_ip for Gateway Routers])
 ovn_start
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/8] Add ACL Sampling using per-flow IPFIX

2024-07-08 Thread Dumitru Ceara
This series adds support for sampling packets processed by ACLs by using
per-flow IPFIX.  This new feature allows users to configure
(potentially) different sampling options for ACL matched traffic that
creates new connections or that is forwarded on existing connections.

This work is based on Adrian's original RFC:
https://patchwork.ozlabs.org/project/ovn/cover/20221018155936.1394396-1-amore...@redhat.com/

In order for the whole feature to work properly some pre-requisite work
is done:
- patches 1-3: simplify northd code assuming that all controllers are
  aware of features included in the previous LTS release (22.03) - the
  current LTS release is 24.03.
- patch 4: fixes an incorrect test that mistakenly fails when the bug
  fix in patch 5 is applied.
- patch 5: fixes a bug in the way ACLs with labels are processed when
  the switches also have load balancers configured

The feature itself is implemented by the last 3 patches:
- patch 6: adds support for users to configure different types of
  sampling applications (drop debug, acl-new-traffic,
  acl-established-traffic)
- patch 7: combines the already existing drop debug sampling
  configuration with the new sampling application configuration (giving
  priority to the latter)
- patch 8: adds sampling support to ACLs

Adrian Moreno (1):
  northd: Add ACL Sampling.

Dumitru Ceara (7):
  northd: Assume all chassis support the "port-up-notif" feature.
  northd: Assume all chassis support the "ct-no-masked-label" feature.
  northd: Assume all chassis support the "ovn-ct-lb-related" feature.
  tests: Fix unreliable "ACL and committing to conntrack" system test.
  northd: Commit from-lport ACL label (and state) when LBs are used.
  northd: Add Sampling_App table.
  northd: Override NB_Global drop sampling id with Sampling_App config.

 NEWS   |   6 +
 controller/lflow.c |  39 +-
 controller/lflow.h |   1 -
 controller/ovn-controller.c|  22 -
 include/ovn/logical-fields.h   |   2 +
 lib/logical-fields.c   |  28 +-
 northd/automake.mk |   2 +
 northd/debug.c |  12 +-
 northd/debug.h |   3 +-
 northd/en-global-config.c  |  68 +--
 northd/en-global-config.h  |   2 -
 northd/en-lflow.c  |   5 +
 northd/en-sampling-app.c   | 120 
 northd/en-sampling-app.h   |  51 ++
 northd/inc-proc-northd.c   |  15 +-
 northd/northd.c| 750 ++--
 northd/northd.h|   1 +
 northd/ovn-northd.8.xml|  26 +
 ovn-nb.ovsschema   |  63 +-
 ovn-nb.xml |  73 +++
 ovn-sb.xml |  19 -
 tests/atlocal.in   |   6 +
 tests/ovn-controller.at|   8 +-
 tests/ovn-macros.at|   4 +
 tests/ovn-nbctl.at |  20 +
 tests/ovn-northd.at| 774 +
 tests/ovn.at   |  16 +-
 tests/system-common-macros.at  |  11 +
 tests/system-ovn.at| 154 -
 utilities/containers/fedora/Dockerfile |   1 +
 utilities/containers/ubuntu/Dockerfile |   1 +
 utilities/ovn-nbctl.8.xml  |   8 +-
 utilities/ovn-nbctl.c  |  43 +-
 33 files changed, 1606 insertions(+), 748 deletions(-)
 create mode 100644 northd/en-sampling-app.c
 create mode 100644 northd/en-sampling-app.h

-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 6/6] ofproto: Add JSON output for 'dpif/show' command.

2024-07-08 Thread Ilya Maximets
On 7/4/24 16:09, jm...@redhat.com wrote:
> From: Jakob Meng 
> 
> The 'dpif/show' command now supports machine-readable JSON output in
> addition to the plain-text output for humans. An example would be:
> 
>   ovs-appctl --format json dpif/show
> 
> Reported-at: https://bugzilla.redhat.com/1824861
> Signed-off-by: Jakob Meng 
> ---
>  NEWS   |   2 +-
>  ofproto/ofproto-dpif.c | 127 -
>  tests/ofproto-dpif.at  |  40 +
>  3 files changed, 154 insertions(+), 15 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 8c59a4979..4803566f5 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -6,7 +6,7 @@ Post-v3.3.0
>   * Added new option [-f|--format] to choose the output format, e.g. 
> 'json'
> or 'text' (by default).
>   * Added new option [--pretty] to print JSON output in a readable 
> fashion.
> - * 'list-commands' now supports output in JSON format.
> + * 'dpif/show' and 'list-commands' now support output in JSON format.
> - Python:
>   * Added support for different output formats like 'json' to appctl.py 
> and
> Python's unixctl classes.
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index fcd7cd753..52d9b0c5b 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -28,6 +28,7 @@
>  #include "fail-open.h"
>  #include "guarded-list.h"
>  #include "hmapx.h"
> +#include "json.h"

openvswitch/json.h will suffice.

>  #include "lacp.h"
>  #include "learn.h"
>  #include "mac-learning.h"
> @@ -6519,19 +6520,109 @@ done:
>  return changed;
>  }
>  
> -static void
> -dpif_show_backer(const struct dpif_backer *backer, struct ds *ds)
> +static struct json *
> +dpif_show_backer_json(struct json *backers, const struct dpif_backer *backer)
>  {
> +struct json *json_backer = json_object_create();
> +
> +/* Add datapath as new JSON object using its name as key. */
> +json_object_put(backers, dpif_name(backer->dpif), json_backer);
> +
> +/* Add datapath's stats under "stats" key. */
> +struct json *json_dp_stats = json_object_create();
> +struct dpif_dp_stats dp_stats;
> +
> +dpif_get_dp_stats(backer->dpif, &dp_stats);
> +json_object_put_format(json_dp_stats, "hit", "%"PRIu64, dp_stats.n_hit);
> +json_object_put_format(json_dp_stats, "missed", "%"PRIu64,
> +   dp_stats.n_missed);
> +json_object_put(json_backer, "stats", json_dp_stats);
> +
> +/* Add datapath's bridges under "bridges" key. */
> +struct json *json_dp_bridges = json_object_create();
> +
> +struct shash ofproto_shash = SHASH_INITIALIZER(&ofproto_shash);
> +free(get_ofprotos(&ofproto_shash));
> +
> +struct shash_node *node;
> +SHASH_FOR_EACH (node, &ofproto_shash) {
> +struct ofproto_dpif *ofproto = node->data;
> +
> +if (ofproto->backer != backer) {
> +continue;
> +}
> +
> +/* Add bridge to "bridges" dictionary using its name as key. */
> +struct json *json_ofproto = json_object_create();
> +
> +/* Add bridge ports to the current bridge dictionary. */
> +const struct shash_node *port;
> +SHASH_FOR_EACH (port, &ofproto->up.port_by_name) {
> +/* Add bridge port to a bridge's dict using port name as key. */
> +struct ofport *ofport = port->data;
> +struct json *json_ofproto_port = json_object_create();

Reverse x-mass tree.

> +
> +/* Add OpenFlow port associated with a bridge port. */
> +json_object_put_format(json_ofproto_port, "ofport", "%"PRIu32,
> +   ofport->ofp_port);
> +
> +/* Add bridge port number. */
> +odp_port_t odp_port = ofp_port_to_odp_port(ofproto,
> +   ofport->ofp_port);
> +if (odp_port != ODPP_NONE) {
> +json_object_put_format(json_ofproto_port, "port_no",
> +   "%"PRIu32, odp_port);
> +} else {
> +json_object_put_string(json_ofproto_port, "port_no", "none");
> +}
> +
> +/* Add type of a bridge port. */
> +json_object_put_string(json_ofproto_port, "type",
> +   netdev_get_type(ofport->netdev));
> +
> +/* Add config entries for a bridge port. */
> +
> +struct smap config = SMAP_INITIALIZER(&config);

An empty line.

> +if (!netdev_get_config(ofport->netdev, &config)
> +&& smap_count(&config))
> +{

'{' should be on a previous line.

> +struct json *json_port_config = json_object_create();
> +struct smap_node *cfg_node;
> +
> +SMAP_FOR_EACH (cfg_node, &config) {
> +json_object_put_string(json_port_config, cfg_node->key,
> +   cfg_node->value);
> +}

Re: [ovs-dev] [PATCH v13 5/6] vswitchd: Add JSON output for 'list-commands' command.

2024-07-08 Thread Ilya Maximets
On 7/4/24 16:09, jm...@redhat.com wrote:
> From: Jakob Meng 
> 
> The 'list-commands' command now supports machine-readable JSON output
> in addition to the plain-text output for humans.
> 
> Reported-at: https://bugzilla.redhat.com/1824861
> Signed-off-by: Jakob Meng 
> ---
>  NEWS  |  4 
>  lib/unixctl.c | 44 +--
>  tests/ovs-vswitchd.at | 15 +++
>  3 files changed, 49 insertions(+), 14 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index d903d2f74..8c59a4979 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -6,6 +6,10 @@ Post-v3.3.0
>   * Added new option [-f|--format] to choose the output format, e.g. 
> 'json'
> or 'text' (by default).
>   * Added new option [--pretty] to print JSON output in a readable 
> fashion.
> + * 'list-commands' now supports output in JSON format.
> +   - Python:
> + * Added support for different output formats like 'json' to appctl.py 
> and
> +   Python's unixctl classes.

Artifact from rebasing.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 2/6] python: Add option for JSON output to unixctl classes and appctl.py.

2024-07-08 Thread Ilya Maximets
On 7/4/24 16:09, jm...@redhat.com wrote:
> From: Jakob Meng 
> 
> This patch introduces support for different output formats to Python
> Unixctl* classes and appctl.py, similar to what the previous commit did
> for ovs-appctl.
> In particular, tests/appctl.py gains a global option '-f,--format'
> which allows users to request JSON instead of plain-text for humans.
> 
> Reported-at: https://bugzilla.redhat.com/1824861
> Signed-off-by: Jakob Meng 
> ---
>  NEWS   |  2 ++
>  python/ovs/unixctl/__init__.py |  8 ++
>  python/ovs/unixctl/client.py   |  5 ++--
>  python/ovs/unixctl/server.py   | 52 +-
>  tests/appctl.py| 38 -
>  tests/unixctl-py.at|  4 +++
>  6 files changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index f182647c7..c750ebae2 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -19,6 +19,8 @@ Post-v3.3.0
> per interface 'options:dpdk-lsc-interrupt' to 'false'.
> - Python:
>   * Added custom transaction support to the Idl via add_op().
> + * Added support for different output formats like 'json' to Python's
> +   unixctl classes.
>  
>  
>  v3.3.0 - 16 Feb 2024
> diff --git a/python/ovs/unixctl/__init__.py b/python/ovs/unixctl/__init__.py
> index 8ee312943..b05f3df72 100644
> --- a/python/ovs/unixctl/__init__.py
> +++ b/python/ovs/unixctl/__init__.py
> @@ -12,6 +12,7 @@
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  
> +import enum
>  import sys
>  
>  import ovs.util
> @@ -19,6 +20,13 @@ import ovs.util
>  commands = {}
>  
>  
> +@enum.unique
> +# FIXME: Use @enum.verify(enum.NAMED_FLAGS) from Python 3.11 when available.
> +class UnixctlOutputFormat(enum.IntFlag):
> +TEXT = 1 << 0
> +JSON = 1 << 1
> +
> +
>  class _UnixctlCommand(object):
>  def __init__(self, usage, min_args, max_args, callback, aux):
>  self.usage = usage
> diff --git a/python/ovs/unixctl/client.py b/python/ovs/unixctl/client.py
> index 8283f99bb..8a6fcb1b9 100644
> --- a/python/ovs/unixctl/client.py
> +++ b/python/ovs/unixctl/client.py
> @@ -14,6 +14,7 @@
>  
>  import os
>  
> +import ovs.json
>  import ovs.jsonrpc
>  import ovs.stream
>  import ovs.util
> @@ -41,10 +42,10 @@ class UnixctlClient(object):
>  return error, None, None
>  
>  if reply.error is not None:
> -return 0, str(reply.error), None
> +return 0, reply.error, None
>  else:
>  assert reply.result is not None
> -return 0, None, str(reply.result)
> +return 0, None, reply.result
>  
>  def close(self):
>  self._conn.close()
> diff --git a/python/ovs/unixctl/server.py b/python/ovs/unixctl/server.py
> index d24a7092c..855f38d18 100644
> --- a/python/ovs/unixctl/server.py
> +++ b/python/ovs/unixctl/server.py
> @@ -12,6 +12,7 @@
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>  
> +import argparse
>  import copy
>  import errno
>  import os
> @@ -35,6 +36,7 @@ class UnixctlConnection(object):
>  assert isinstance(rpc, ovs.jsonrpc.Connection)
>  self._rpc = rpc
>  self._request_id = None
> +self._fmt = ovs.unixctl.UnixctlOutputFormat.TEXT
>  
>  def run(self):
>  self._rpc.run()
> @@ -63,10 +65,29 @@ class UnixctlConnection(object):
>  return error
>  
>  def reply(self, body):
> -self._reply_impl(True, body)
> +assert body is None or isinstance(body, str)
> +
> +if body is None:
> +body = ""
> +
> +if self._fmt == ovs.unixctl.UnixctlOutputFormat.JSON:
> +body = {
> +"reply-format": "plain",
> +"reply": body
> +}
> +
> +return self._reply_impl_json(True, body)
> +
> +def reply_json(self, body):
> +self._reply_impl_json(True, body)
>  
>  def reply_error(self, body):
> -self._reply_impl(False, body)
> +assert body is None or isinstance(body, str)
> +
> +if body is None:
> +body = ""
> +
> +return self._reply_impl_json(False, body)
>  
>  # Called only by unixctl classes.
>  def _close(self):
> @@ -78,15 +99,11 @@ class UnixctlConnection(object):
>  if not self._rpc.get_backlog():
>  self._rpc.recv_wait(poller)
>  
> -def _reply_impl(self, success, body):
> +def _reply_impl_json(self, success, body):
>  assert isinstance(success, bool)
> -assert body is None or isinstance(body, str)
>  
>  assert self._request_id is not None
>  
> -if body is None:
> -body = ""
> -
>  if success:
>  reply = Message.create_reply(body, self._request_id)
>  else:
> @@ -133,6 +150,24 @@ def _unixctl_version(conn, unused_argv, version):
>  conn.reply(version)
>  
>  
> +def 

Re: [ovs-dev] [PATCH v13 1/6] Add global option for JSON output to ovs-appctl.

2024-07-08 Thread Ilya Maximets
On 7/4/24 16:09, jm...@redhat.com wrote:
> From: Jakob Meng 
> 
> For monitoring systems such as Prometheus it would be beneficial if
> OVS would expose statistics in a machine-readable format.
> 
> This patch introduces support for different output formats to
> ovs-appctl. It gains a global option '-f,--format' which changes it to
> print a JSON document instead of plain-text for humans. For example, a
> later patch implements support for
> 'ovs-appctl --format json dpif/show'. By default, the output format
> is plain-text as before.
> 
> A new 'set-options' command has been added to lib/unixctl.c which
> allows to change the output format of the commands executed afterwards
> on the same socket connection. It is supposed to be run by ovs-appctl
> transparently for the user when a specific output format has been
> requested.
> For example, when a user calls 'ovs-appctl --format json dpif/show',
> then ovs-appctl will call 'set-options' to set the output format as
> requested by the user and afterwards it will call the actual command
> 'dpif/show'.
> This ovs-appctl behaviour has been implemented in a backward compatible
> way. One can use an updated client (ovs-appctl) with an old server
> (ovs-vswitchd) and vice versa. Of course, JSON output only works when
> both sides have been updated.
> 
> Two access functions unixctl_command_{get,set}_output_format() and a
> unixctl_command_reply_json function have been added to lib/unixctl.h:
> unixctl_command_get_output_format() is supposed to be used in commands
> like 'dpif/show' to query the requested output format. When JSON output
> has been selected, the unixctl_command_reply_json() function can be
> used to return JSON objects to the client (ovs-appctl) instead of
> plain-text with the unixctl_command_reply{,_error}() functions.
> 
> When JSON has been requested but a command has not implemented JSON
> output the plain-text output will be wrapped in a provisional JSON
> document with the following structure:
> 
>   {"reply":"$PLAIN_TEXT_HERE","reply-format":"plain"}
> 
> Thus commands which have been executed successfully will not fail when
> they try to render the output at a later stage.
> 
> A test for the 'version' command has been implemented which shows how
> the provisional JSON document looks like in practice. For a cleaner
> JSON document, the trailing newline has been moved from the program
> version string to function ovs_print_version(). This way, the
> plain-text output of the 'version' command has not changed.
> 
> Output formatting has been moved from unixctl_client_transact() in
> lib/unixctl.c to utilities/ovs-appctl.c. The former merely returns the
> JSON objects returned from the server and the latter is now responsible
> for printing it properly.
> 
> In popular tools like kubectl the option for output control is usually
> called '-o|--output' instead of '-f,--format'. But ovs-appctl already
> has an short option '-o' which prints the available ovs-appctl options
> ('--option'). The now choosen name also better alignes with

* chosen
* aligns

> ovsdb-client where '-f,--format' controls output formatting.
> 
> Reported-at: https://bugzilla.redhat.com/1824861
> Signed-off-by: Jakob Meng 
> ---
>  Documentation/ref/ovs-appctl.8.rst |  12 ++
>  NEWS   |   3 +
>  lib/unixctl.c  | 183 ++---
>  lib/unixctl.h  |  17 ++-
>  lib/util.c |   6 +-
>  python/ovs/unixctl/server.py   |   3 -
>  tests/appctl.py|   5 +
>  tests/ovs-vswitchd.at  |  12 ++
>  utilities/ovs-appctl.c | 135 ++---
>  9 files changed, 307 insertions(+), 69 deletions(-)
> 
> diff --git a/Documentation/ref/ovs-appctl.8.rst 
> b/Documentation/ref/ovs-appctl.8.rst
> index 3ce02e984..148cc7632 100644
> --- a/Documentation/ref/ovs-appctl.8.rst
> +++ b/Documentation/ref/ovs-appctl.8.rst
> @@ -8,6 +8,7 @@ Synopsis
>  ``ovs-appctl``
>  [``--target=`` | ``-t`` ]
>  [``--timeout=`` | ``-T`` ]
> +[``--format=`` | ``-f`` ]
>   [...]
>  
>  ``ovs-appctl --help``
> @@ -67,6 +68,17 @@ In normal use only a single option is accepted:
>runtime to approximately  seconds.  If the timeout expires,
>``ovs-appctl`` exits with a ``SIGALRM`` signal.
>  
> +* ``-f `` or ``--format=``
> +
> +  Tells ``ovs-appctl`` which output format to use.  By default, or with a
> +   of ``text``, ``ovs-appctl`` will print plain-text for humans.
> +  When  is ``json``, ``ovs-appctl`` will return a JSON document.
> +  When ``json`` is requested, but a command has not implemented JSON
> +  output, the plain-text output will be wrapped in a provisional JSON
> +  document with the following structure::
> +
> +{"reply-format":"plain","reply":"$PLAIN_TEXT_HERE"}
> +
>  Common Commands
>  ===
>  
> diff --git a/NEWS b/NEWS
> index e0359b759..f182647c7 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,6 +2,9 @@ Post-v3.3.0
>  ---

Re: [ovs-dev] [PATCH v1 03/13] ofproto_dpif: Check for psample support.

2024-07-08 Thread Eelco Chaudron


On 8 Jul 2024, at 10:37, Adrián Moreno wrote:

> On Sun, Jul 07, 2024 at 10:08:55PM GMT, Adrian Moreno wrote:
>> Only kernel datapath supports this action so add a function in dpif.c
>> that checks for that.
>>
>> Signed-off-by: Adrian Moreno 
>> ---
>>  lib/dpif.c |  7 +++
>>  lib/dpif.h |  1 +
>>  ofproto/ofproto-dpif.c | 43 ++
>>  ofproto/ofproto-dpif.h |  7 ++-
>>  4 files changed, 57 insertions(+), 1 deletion(-)
>>
>
> After Dumitru's comment on the last patch, I think we also need a
> capability for this exposed in the OVSDB. I'll add it in the next
> version.

ACK, can you wait with the v2 until I review the RFC/v1?

>> diff --git a/lib/dpif.c b/lib/dpif.c
>> index 71728badc..0a964ba89 100644
>> --- a/lib/dpif.c
>> +++ b/lib/dpif.c
>> @@ -1953,6 +1953,13 @@ dpif_supports_lb_output_action(const struct dpif 
>> *dpif)
>>  return dpif_is_netdev(dpif);
>>  }
>>
>> +bool
>> +dpif_may_support_psample(const struct dpif *dpif)
>> +{
>> +/* Userspace datapath does not support this action. */
>> +return !dpif_is_netdev(dpif);
>> +}
>> +
>>  /* Meters */
>>  void
>>  dpif_meter_get_features(const struct dpif *dpif,
>> diff --git a/lib/dpif.h b/lib/dpif.h
>> index a764e8a59..6bef7d5b3 100644
>> --- a/lib/dpif.h
>> +++ b/lib/dpif.h
>> @@ -941,6 +941,7 @@ int dpif_get_pmds_for_port(const struct dpif * dpif, 
>> odp_port_t port_no,
>>  char *dpif_get_dp_version(const struct dpif *);
>>  bool dpif_supports_tnl_push_pop(const struct dpif *);
>>  bool dpif_may_support_explicit_drop_action(const struct dpif *);
>> +bool dpif_may_support_psample(const struct dpif *);
>>  bool dpif_synced_dp_layers(struct dpif *);
>>
>>  /* Log functions. */
>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index 4712d10a8..94c84d697 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -873,6 +873,12 @@ ovs_lb_output_action_supported(struct ofproto_dpif 
>> *ofproto)
>>  return ofproto->backer->rt_support.lb_output_action;
>>  }
>>
>> +bool
>> +ovs_psample_supported(struct ofproto_dpif *ofproto)
>> +{
>> +return ofproto->backer->rt_support.psample;
>> +}
>> +
>>  /* Tests whether 'backer''s datapath supports recirculation.  Only newer
>>   * datapaths support OVS_KEY_ATTR_RECIRC_ID in keys.  We need to disable 
>> some
>>   * features on older datapaths that don't support this feature.
>> @@ -1609,6 +1615,42 @@ check_add_mpls(struct dpif_backer *backer)
>>  return supported;
>>  }
>>
>> +/* Tests whether 'backer''s datapath supports the OVS_ACTION_ATTR_PSAMPLE
>> + * action. */
>> +static bool
>> +check_psample(struct dpif_backer *backer)
>> +{
>> +uint8_t cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE];
>> +struct odputil_keybuf keybuf;
>> +struct ofpbuf actions;
>> +struct ofpbuf key;
>> +struct flow flow;
>> +bool supported;
>> +
>> +struct odp_flow_key_parms odp_parms = {
>> +.flow = &flow,
>> +.probe = true,
>> +};
>> +
>> +memset(&flow, 0, sizeof flow);
>> +ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
>> +odp_flow_key_from_flow(&odp_parms, &key);
>> +ofpbuf_init(&actions, 64);
>> +
>> +/* Generate a random max-size cookie. */
>> +random_bytes(cookie, sizeof(cookie));
>> +
>> +odp_put_psample_action(&actions, 10, cookie, sizeof cookie);
>> +
>> +supported = dpif_may_support_psample(backer->dpif) &&
>> +dpif_probe_feature(backer->dpif, "psample", &key, &actions, NULL);
>> +
>> +ofpbuf_uninit(&actions);
>> +VLOG_INFO("%s: Datapath %s psample", dpif_name(backer->dpif),
>> +  supported ? "supports" : "does not support");
>> +return supported;
>> +}
>> +
>>  #define CHECK_FEATURE__(NAME, SUPPORT, FIELD, VALUE, ETHTYPE)   
>> \
>>  static bool 
>> \
>>  check_##NAME(struct dpif_backer *backer)
>> \
>> @@ -1698,6 +1740,7 @@ check_support(struct dpif_backer *backer)
>>  dpif_supports_lb_output_action(backer->dpif);
>>  backer->rt_support.ct_zero_snat = dpif_supports_ct_zero_snat(backer);
>>  backer->rt_support.add_mpls = check_add_mpls(backer);
>> +backer->rt_support.psample = check_psample(backer);
>>
>>  /* Flow fields. */
>>  backer->rt_support.odp.ct_state = check_ct_state(backer);
>> diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
>> index d33f73df8..bc7a19dab 100644
>> --- a/ofproto/ofproto-dpif.h
>> +++ b/ofproto/ofproto-dpif.h
>> @@ -213,7 +213,10 @@ struct group_dpif *group_dpif_lookup(struct 
>> ofproto_dpif *,
>>  DPIF_SUPPORT_FIELD(bool, ct_zero_snat, "Conntrack all-zero IP SNAT")
>> \
>>  
>> \
>>  /* True if the datapath supports add_mpls action. */
>> \
>> -DPIF_SUPPORT_FIELD(bool, add_mpls, "MPLS Label add")
>> +DPIF_SUPPO

Re: [ovs-dev] [ovs-dev, v4, 3/3] netdev-dpdk: Re-enable VXLAN/Geneve offload for Intel cards.

2024-07-08 Thread David Marchand
Hello,

On Mon, Jul 8, 2024 at 11:07 AM junwan...@cestc.cn  wrote:
>
> Hi Mike,
>
> I noticed that the DPDK main version and the v24.07-rc1 version already
>
> include the changes for net/ice's outer UDP checksum offload.
>
> Can we now enable net/ice instead of disabling it?

OVS main branch uses DPDK 23.11 LTS.
Re-enabling this offload for Intel drivers will have to wait until the
fixes are part of a released 23.11 LTS (which is expected in August,
after OVS 3.4 is released).


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev, v4, 3/3] netdev-dpdk: Re-enable VXLAN/Geneve offload for Intel cards.

2024-07-08 Thread junwan...@cestc.cn
Hi Mike, 
I noticed that the DPDK main version and the v24.07-rc1 version already
include the changes for net/ice's outer UDP checksum offload. 
Can we now enable net/ice instead of disabling it?



Jun Wang
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Intel CI not running?

2024-07-08 Thread Phelan, Michael
> -Original Message-
> From: Ilya Maximets 
> Sent: Thursday, July 4, 2024 9:14 PM
> To: Phelan, Michael 
> Cc: i.maxim...@ovn.org; ovs-dev ; Aaron Conole
> ; Chaudron, Eelco 
> Subject: Re: Intel CI not running?
> 
> On 7/4/24 20:46, Ilya Maximets wrote:
> > On 7/4/24 13:04, Phelan, Michael wrote:
> >> Hi Ilya,
> >> The CI got stuck running make check on a patch, I have solved the
> >> issue now so reports should be back to normal now.
> >
> > Thanks for checking!  Though I see only two reports were sent out in
> > the past 7 hours with a few hour interval between them.
> > Did it get stuck again?
> 
> Got another report.  Looks like we're getting reports at rate of one per 3.5
> hours.  That doesn't sound right.

We have added make check to the job parameters so that has increased the 
duration of the testing.

It seems like the test number 98: barrier module from make check is a bit 
finicky and stalls occasionally also.

> 
> >
> > Best regards, Ilya Maximets.
> >
> >>
> >> Thanks,
> >> Michael.
> >>> -Original Message-
> >>> From: Ilya Maximets 
> >>> Sent: Thursday, July 4, 2024 10:47 AM
> >>> To: Phelan, Michael 
> >>> Cc: i.maxim...@ovn.org; ovs-dev ; Aaron
> >>> Conole ; Chaudron, Eelco
> 
> >>> Subject: Intel CI not running?
> >>>
> >>> Hi, Michael!  We seem to not get reports from Intel CI since some
> >>> time on Monday.  The last report was:
> >>>
> >>>
> >>> https://mail.openvswitch.org/pipermail/ovs-build/2024-July/039755.ht
> >>> ml
> >>>
> >>> Could you, please, check?
> >>>
> >>> Best regards, Ilya Maximets.
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 03/13] ofproto_dpif: Check for psample support.

2024-07-08 Thread Adrián Moreno
On Sun, Jul 07, 2024 at 10:08:55PM GMT, Adrian Moreno wrote:
> Only kernel datapath supports this action so add a function in dpif.c
> that checks for that.
>
> Signed-off-by: Adrian Moreno 
> ---
>  lib/dpif.c |  7 +++
>  lib/dpif.h |  1 +
>  ofproto/ofproto-dpif.c | 43 ++
>  ofproto/ofproto-dpif.h |  7 ++-
>  4 files changed, 57 insertions(+), 1 deletion(-)
>

After Dumitru's comment on the last patch, I think we also need a
capability for this exposed in the OVSDB. I'll add it in the next
version.

Thanks.
Adrián


> diff --git a/lib/dpif.c b/lib/dpif.c
> index 71728badc..0a964ba89 100644
> --- a/lib/dpif.c
> +++ b/lib/dpif.c
> @@ -1953,6 +1953,13 @@ dpif_supports_lb_output_action(const struct dpif *dpif)
>  return dpif_is_netdev(dpif);
>  }
>
> +bool
> +dpif_may_support_psample(const struct dpif *dpif)
> +{
> +/* Userspace datapath does not support this action. */
> +return !dpif_is_netdev(dpif);
> +}
> +
>  /* Meters */
>  void
>  dpif_meter_get_features(const struct dpif *dpif,
> diff --git a/lib/dpif.h b/lib/dpif.h
> index a764e8a59..6bef7d5b3 100644
> --- a/lib/dpif.h
> +++ b/lib/dpif.h
> @@ -941,6 +941,7 @@ int dpif_get_pmds_for_port(const struct dpif * dpif, 
> odp_port_t port_no,
>  char *dpif_get_dp_version(const struct dpif *);
>  bool dpif_supports_tnl_push_pop(const struct dpif *);
>  bool dpif_may_support_explicit_drop_action(const struct dpif *);
> +bool dpif_may_support_psample(const struct dpif *);
>  bool dpif_synced_dp_layers(struct dpif *);
>
>  /* Log functions. */
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 4712d10a8..94c84d697 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -873,6 +873,12 @@ ovs_lb_output_action_supported(struct ofproto_dpif 
> *ofproto)
>  return ofproto->backer->rt_support.lb_output_action;
>  }
>
> +bool
> +ovs_psample_supported(struct ofproto_dpif *ofproto)
> +{
> +return ofproto->backer->rt_support.psample;
> +}
> +
>  /* Tests whether 'backer''s datapath supports recirculation.  Only newer
>   * datapaths support OVS_KEY_ATTR_RECIRC_ID in keys.  We need to disable some
>   * features on older datapaths that don't support this feature.
> @@ -1609,6 +1615,42 @@ check_add_mpls(struct dpif_backer *backer)
>  return supported;
>  }
>
> +/* Tests whether 'backer''s datapath supports the OVS_ACTION_ATTR_PSAMPLE
> + * action. */
> +static bool
> +check_psample(struct dpif_backer *backer)
> +{
> +uint8_t cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE];
> +struct odputil_keybuf keybuf;
> +struct ofpbuf actions;
> +struct ofpbuf key;
> +struct flow flow;
> +bool supported;
> +
> +struct odp_flow_key_parms odp_parms = {
> +.flow = &flow,
> +.probe = true,
> +};
> +
> +memset(&flow, 0, sizeof flow);
> +ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
> +odp_flow_key_from_flow(&odp_parms, &key);
> +ofpbuf_init(&actions, 64);
> +
> +/* Generate a random max-size cookie. */
> +random_bytes(cookie, sizeof(cookie));
> +
> +odp_put_psample_action(&actions, 10, cookie, sizeof cookie);
> +
> +supported = dpif_may_support_psample(backer->dpif) &&
> +dpif_probe_feature(backer->dpif, "psample", &key, &actions, NULL);
> +
> +ofpbuf_uninit(&actions);
> +VLOG_INFO("%s: Datapath %s psample", dpif_name(backer->dpif),
> +  supported ? "supports" : "does not support");
> +return supported;
> +}
> +
>  #define CHECK_FEATURE__(NAME, SUPPORT, FIELD, VALUE, ETHTYPE)   \
>  static bool \
>  check_##NAME(struct dpif_backer *backer)\
> @@ -1698,6 +1740,7 @@ check_support(struct dpif_backer *backer)
>  dpif_supports_lb_output_action(backer->dpif);
>  backer->rt_support.ct_zero_snat = dpif_supports_ct_zero_snat(backer);
>  backer->rt_support.add_mpls = check_add_mpls(backer);
> +backer->rt_support.psample = check_psample(backer);
>
>  /* Flow fields. */
>  backer->rt_support.odp.ct_state = check_ct_state(backer);
> diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
> index d33f73df8..bc7a19dab 100644
> --- a/ofproto/ofproto-dpif.h
> +++ b/ofproto/ofproto-dpif.h
> @@ -213,7 +213,10 @@ struct group_dpif *group_dpif_lookup(struct ofproto_dpif 
> *,
>  DPIF_SUPPORT_FIELD(bool, ct_zero_snat, "Conntrack all-zero IP SNAT")\
>  \
>  /* True if the datapath supports add_mpls action. */\
> -DPIF_SUPPORT_FIELD(bool, add_mpls, "MPLS Label add")
> +DPIF_SUPPORT_FIELD(bool, add_mpls, "MPLS Label add")\
> +\
> +/* True if the datapath supports psample action. */ \
> +DPIF_SUPPORT_FIELD(b

Re: [ovs-dev] [RFC PATCH v3 13/13] ofp-actions: Load data from fields in sample action.

2024-07-08 Thread Adrián Moreno
On Thu, Jul 04, 2024 at 11:25:34AM GMT, Dumitru Ceara wrote:
> On 7/4/24 09:52, Adrian Moreno wrote:
> > When sample action gets used as a way of sampling traffic with
> > controller-generated metadata (i.e: obs_domain_id and obs_point_id),
> > the controller will have to increase the number of flows to ensure each
> > part of the pipeline contains the right metadata.
> >
> > As an example, if the controller decides to sample stateful traffic, it
> > could store the computed metadata for each connection in the conntrack
> > label. However, for established connections, a flow must be created for
> > each different ct_label value with a sample action that contains a
> > different hardcoded obs_domain and obs_point id.
> >
> > This patch adds a new version of the NXAST_RAW_SAMPLE* action (number 4)
> > that supports specifying the observation point and domain using an
> > OpenFlow field reference, so now the controller can express:
> >
> >  sample(...
> > obs_domain_id=NXM_NX_CT_LABEL[0..31],
> > obs_point_id=NXM_NX_CT_LABEL[32..63]
> > ...
> >)
> >
> > Signed-off-by: Adrian Moreno 
> > ---
>
> Hi Adrian,
>
> Thanks a lot for working on expanding the sampling support (in the
> kernel too).
>
> I didn't review the patch (or the rest of the series) yet but one thing
> we should definitely add to the non-RFC version is a way for controllers
> to detect that this new action version is supported.
>
> In other occasions we've used the OVSDB.Datapath.Capabilities column to
> report that new actions are supported (e.g., "ct_flush" for the
> extension that allows flushing CT with a generic match).  It might make
> sense to add another capability there for this new action version.
>

Yep. I've sent v1 but it does not contain this since I wanted to do a
quick respin removing the RFC now that the kernel part landed in
net-next.

I'll include it in v2.

Thanks.
Adrián

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] netdev-native-tnl: Fix inner L2 pad loss causing checksum errors.

2024-07-08 Thread junwan...@cestc.cn
>On Tue, Jul 2, 2024 at 10:56 PM Jun Wang  wrote:
>>
>> We encountered a scenario where, if the received packet contains
>> padding bytes, and we then add Geneve tunnel encapsulation without
>> carrying the padding bytes, it results in checksum errors when sending
>> out. Therefore, adding an inner_l2_pad is necessary.
>>
>> For example, this type of packet format:
>>    06 6c 6a 71 e1 d3 6c e2 d3 8b ea 24 81 00 03 e9
>> 0010   08 00 45 00 00 28 9d e5 40 00 3b 06 6e 64 0a 6f
>> 0020   05 14 0a fe 19 06 01 bb 22 ae 59 c0 8c 61 8e 26
>> 0030   14 e3 50 10 00 7f ce 3a 00 00 00 00 9e 38 cf 64
>
> Hello Jun,
>
> Thank you for this submission. This is an interesting case and I don't
> know that we have an appropriate test case for micrograms like this.
> Was this the

Yes, it does require the appropriate test case, but I am not very familiar with 
this area.

> One question I have is shouldn't we remove the padding while we
> encapsulate, not keep it? The encapsulation should always push the
> frame size past 64 bytes.

I have also considered this approach. Theoretically, padding bytes
should be invalid and need to be removed. Shouldn't this be handled
at the packet reception stage rather than during the subsequent
encapsulation phase?
Alternatively, from another perspective, shouldn't we avoid modifying
any lengths, including meaningless padding bytes?

>Thanks,
>M
>
>>
>> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
>>
>> Signed-off-by: Jun Wang 
>> ---
>>  lib/dp-packet.h | 21 -
>>  lib/netdev-native-tnl.c |  3 +++
>>  2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index a75b1c5..d583b28 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -176,6 +176,8 @@ struct dp_packet {
>>  ovs_be32 packet_type;  /* Packet type as defined in OpenFlow */
>>  uint16_t csum_start;   /* Position to start checksumming from. 
>> */
>>  uint16_t csum_offset;  /* Offset to place checksum. */
>> +uint16_t inner_l2_pad_size;/* Detected inner l2 padding size.
>> +* Padding is non-pullable. */
>>  union {
>>  struct pkt_metadata md;
>>  uint64_t data[DP_PACKET_CONTEXT_SIZE / 8];
>> @@ -209,7 +211,10 @@ static inline void *dp_packet_eth(const struct 
>> dp_packet *);
>>  static inline void dp_packet_reset_offsets(struct dp_packet *);
>>  static inline void dp_packet_reset_offload(struct dp_packet *);
>>  static inline uint16_t dp_packet_l2_pad_size(const struct dp_packet *);
>> +static inline uint16_t dp_packet_inner_l2_pad_size(const struct dp_packet 
>> *);
>>  static inline void dp_packet_set_l2_pad_size(struct dp_packet *, uint16_t);
>> +static inline void dp_packet_set_inner_l2_pad_size(struct dp_packet *,
>> +   uint16_t);
>>  static inline void *dp_packet_l2_5(const struct dp_packet *);
>>  static inline void dp_packet_set_l2_5(struct dp_packet *, void *);
>>  static inline void *dp_packet_l3(const struct dp_packet *);
>> @@ -435,6 +440,7 @@ static inline void
>>  dp_packet_reset_offsets(struct dp_packet *b)
>>  {
>>  b->l2_pad_size = 0;
>> +b->inner_l2_pad_size = 0;
>>  b->l2_5_ofs = UINT16_MAX;
>>  b->l3_ofs = UINT16_MAX;
>>  b->l4_ofs = UINT16_MAX;
>> @@ -448,6 +454,12 @@ dp_packet_l2_pad_size(const struct dp_packet *b)
>>  return b->l2_pad_size;
>>  }
>>
>> +static inline uint16_t
>> +dp_packet_inner_l2_pad_size(const struct dp_packet *b)
>> +{
>> +return b->inner_l2_pad_size;
>> +}
>> +
>>  static inline void
>>  dp_packet_set_l2_pad_size(struct dp_packet *b, uint16_t pad_size)
>>  {
>> @@ -455,6 +467,13 @@ dp_packet_set_l2_pad_size(struct dp_packet *b, uint16_t 
>> pad_size)
>>  b->l2_pad_size = pad_size;
>>  }
>>
>> +static inline void
>> +dp_packet_set_inner_l2_pad_size(struct dp_packet *b, uint16_t pad_size)
>> +{
>> +ovs_assert(pad_size <= dp_packet_size(b));
>> +b->inner_l2_pad_size = pad_size;
>> +}
>> +
>>  static inline void *
>>  dp_packet_l2_5(const struct dp_packet *b)
>>  {
>> @@ -543,7 +562,7 @@ dp_packet_inner_l4_size(const struct dp_packet *b)
>>  return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
>> ? (const char *) dp_packet_tail(b)
>> - (const char *) dp_packet_inner_l4(b)
>> -   - dp_packet_l2_pad_size(b)
>> +   - dp_packet_inner_l2_pad_size(b)
>> : 0;
>>  }
>>
>> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>> index 0f9f07f..96ffdc1 100644
>> --- a/lib/netdev-native-tnl.c
>> +++ b/lib/netdev-native-tnl.c
>> @@ -156,6 +156,7 @@ netdev_tnl_push_ip_header(struct dp_packet *packet, 
>> const void *header,
>>  struct eth_header *eth;
>>  struct ip_header *ip;
>>  struct ovs_16aligned_ip6_hdr *ip6;
>> +uint16_t l2_pad_size;
>>
>>  eth = dp_packet_push_uninit(packet, size);
>>  *ip_tot_size = dp_packet_size(pack