Re: [ovs-dev] [PATCH] lib: Add support for sets of UUIDs.

2022-09-26 Thread Ilya Maximets
On 9/20/22 09:12, Ales Musil wrote:
> 
> 
> On Fri, Sep 16, 2022 at 5:50 PM Dumitru Ceara  > wrote:
> 
> Part of the uuidset implementation is taken from the OVN codebase where
> it was added via commit 0e77b3bcbfe2 ("ovn-northd-ddlog: New
> implementation of ovn-northd based on ddlog.").
> 
> We now extend that, adding a few helpers and tests.
> 
> Co-authored-by: Leonid Ryzhyk  >
> Signed-off-by: Leonid Ryzhyk  >
> Co-authored-by: Justin Pettit mailto:jpet...@ovn.org>>
> Signed-off-by: Justin Pettit mailto:jpet...@ovn.org>>
> Co-authored-by: Ben Pfaff mailto:b...@ovn.org>>
> Signed-off-by: Ben Pfaff mailto:b...@ovn.org>>
> Signed-off-by: Dumitru Ceara  >
> ---
> Note: I wasn't sure if I should keep Leonid as main author so I added a
> "Co-authored-by" tag.  I can amend that and send a v2 if needed.
> ---
>  lib/automake.mk       |  2 +
>  lib/uuidset.c        | 99 
>  lib/uuidset.h        | 71 +++
>  tests/.gitignore     |  1 +
>  tests/automake.mk     |  1 +
>  tests/library.at      |  4 ++
>  tests/test-uuidset.c | 72 
>  7 files changed, 250 insertions(+)
>  create mode 100644 lib/uuidset.c
>  create mode 100644 lib/uuidset.h
>  create mode 100644 tests/test-uuidset.c
>
> Looks good to me, thanks.
> The robot ERROR seems to be a false positive.
> 
> Reviewed-by: Ales Musil mailto:amu...@redhat.com>>

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] m4: test avx512 for x86 only

2022-09-26 Thread Ilya Maximets
On 9/16/22 16:52, Van Haaren, Harry wrote:
>> -Original Message-
>> From: lic...@chinatelecom.cn 
>> Sent: Friday, September 16, 2022 10:56 AM
>> To: d...@openvswitch.org; Van Haaren, Harry 
>> Cc: Cheng Li 
>> Subject: [PATCH] m4: test avx512 for x86 only
>>
>> 'as' command of arm version may don't support option '--64', this
>> patch is to move the avx512 test into x86 branch to avoid this.
>>
>> Signed-off-by: Cheng Li 
> 
> That's a good/simple solution to the issue;
> Tested-by: Harry van Haaren 

Thanks!  Applied and backported down to 2.17.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-idl: Preserve references for rows deleted in same IDL as their insertion.

2022-09-26 Thread Ilya Maximets
On 9/16/22 10:40, Xavier Simonart wrote:
> Considering two DB rows, 'a' from table A and 'b' from table B (with
> column 'ref_a' a reference to table A):
> a = {A._uuid=}
> b = {B._uuid=, B.ref_a=}
> 
> Assuming both records are inserted in the IDL client's in-memory view of
> the database, if row 'b' is also deleted in the same transaction, it should
> generate the following tracked changes:
> 
> - for table A:
>   - inserted records: a = {A._uuid=}
> - for table B:
>   - inserted records: b = {B._uuid=, B.ref_a=}
>   - deleted records: b = {B._uuid=, B.ref_a=}
> 
> Before this patch, inserted and deleted records in table B
> would (in some cases [0]) be b = {B._uuid=, B.ref_a=[]}.
> Having B.ref_a=[] would violate the integrity of the database from client
> perspective.
> 
> test-ovsdb has also been updated to show that one row can be
> both inserted and deleted within one IDL.
> 
> [0] In ovn-controller the fact that the reference is NULL caused a crash
> in the following case, when both commands were handled by ovn-controller
> within the same loop:
> $ ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses 
> sw0-port1 "50:54:00:00:00:01 192.168.0.2"
> $ ovn-nbctl lsp-del sw0-port1
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126450
> Fixes: 91e1ff5dde39 ("ovsdb-idl: Don't reparse orphaned rows.")
> Signed-off-by: Xavier Simonart 
> ---
>  lib/ovsdb-idl.c|  4 +++
>  tests/ovsdb-idl.at | 89 ++
>  tests/test-ovsdb.c | 13 ---
>  3 files changed, 101 insertions(+), 5 deletions(-)

Thanks!  Applied and backported down to 2.17.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] system-dpdk: Improve user configured mempool test.

2022-09-26 Thread Ilya Maximets
On 9/20/22 16:37, Phelan, Michael wrote:
> 
>> -Original Message-
>> From: dev  On Behalf Of Sunil Pai G
>> Sent: Wednesday 14 September 2022 09:37
>> To: d...@openvswitch.org
>> Cc: i.maxim...@ovn.org
>> Subject: [ovs-dev] [PATCH v2] system-dpdk: Improve user configured
>> mempool test.
>>
>> Improve the test by adding and varying the MTU of a DPDK null port to check
>> if relevant mempools are created/(re)used.
>>
>> Signed-off-by: Sunil Pai G 
>>
>> ---
>> v1 -> v2: Use DPDK null port instead of vhost-user port.
>> ---
>>  tests/system-dpdk.at | 39 ---
>>  1 file changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at index
>> 15f97097a..6b41d6622 100644
>> --- a/tests/system-dpdk.at
>> +++ b/tests/system-dpdk.at
>> @@ -1127,13 +1127,46 @@ OVS_DPDK_PRE_CHECK()
>>  OVS_DPDK_START_OVSDB()
>>  OVS_DPDK_START_VSWITCHD()
>>
>> -AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:shared-
>> mempool-config=9000,6000,1500])
>> +AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
>> +other_config:shared-mempool-config=8000,6000,1500])
>>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-
>> init=true])
>>
>> -CHECK_MEMPOOL_PARAM([9000], [ALL], [])
>> +CHECK_MEMPOOL_PARAM([8000], [ALL], [])
>>  CHECK_MEMPOOL_PARAM([6000], [ALL], [])
>>  CHECK_MEMPOOL_PARAM([1500], [ALL], [])
>>
>> -OVS_VSWITCHD_STOP("[SYSTEM_DPDK_ALLOWED_LOGS]")
>> +AT_CHECK(ovs-appctl vlog/set netdev_dpdk:dbg)
>> +
>> +dnl Add userspace bridge and a dpdk port AT_CHECK([ovs-vsctl add-br
>> +br10 -- set bridge br10 datapath_type=netdev]) AT_CHECK([ovs-vsctl
>> +add-port br10 p1 -- set Interface p1 type=dpdk
>> +options:dpdk-devargs=net_null0,no-rx=1], [], [stdout], [stderr])
>> +AT_CHECK([ovs-vsctl show], [], [stdout]) sleep 2
>> +
>> +dnl Check if the right user configured mempool is found for default MTU
>> +(1500) AT_CHECK([grep "Found user configured shared mempool .*
>> suitable
>> +for port with MTU 1500" ovs-vswitchd.log], [], [stdout]) AT_CHECK([grep
>> +"Port p1: Requesting a mempool" ovs-vswitchd.log], [], [stdout])
>> +
>> +dnl Change the MTU value to 7000 to trigger mempool change TMP=$(($(cat
>> +ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1)) AT_CHECK(ovs-vsctl
>> +set Interface p1 mtu_request=7000) OVS_WAIT_UNTIL([tail -n +$TMP
>> +ovs-vswitchd.log | grep "Found user configured shared mempool .*
>> +suitable for port with MTU 7000"]) OVS_WAIT_UNTIL([tail -n +$TMP
>> +ovs-vswitchd.log | grep "Port p1: Requesting a mempool"])
>> +
>> +dnl Change back the MTU value to 1500 to trigger mempool change
>> +(re-use) TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
>> +AT_CHECK(ovs-vsctl set Interface p1 mtu_request=1500)
>> +OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Found user
>> +configured shared mempool .* suitable for port with MTU 1500"])
>> +OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Reusing
>> +mempool"])
>> +
>> +dnl Change the MTU value beyond the max value in shared-mempool-
>> config
>> +list TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
>> +AT_CHECK(ovs-vsctl set Interface p1 mtu_request=9000)
>> +OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "No user
>> +configured shared mempool mbuf sizes found suitable for port with MTU
>> +9000"]) OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Port p1:
>> +Requesting a mempool"])
>> +
>> +dnl Clean up
>> +AT_CHECK([ovs-vsctl del-port br10 p1], [], [stdout], [stderr])
>> +OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
>> +])")
>>  AT_CLEANUP
>>  dnl 
>> --
>> --
>> 2.37.3
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> Hey Sunil,
> I tested your patch with different AVX-512 configurations and NICs and 
> everything looked good to me.
> 
> Tested-by: Michael Phelan 

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] ct-dpif: Do not show flag key if empty.

2022-09-26 Thread Ilya Maximets
On 9/9/22 13:29, Paolo Valerio wrote:
> Ilya Maximets  writes:
> 
>> On 8/4/22 18:07, Paolo Valerio wrote:
>>> This patch avoids to show flags_orig/flags_reply key if they have no value.
>>> E.g., the following:
>>>
>>> NEW tcp,orig=([...]),reply=([...]),id=1800618864,
>>> status=CONFIRMED|SRC_NAT_DONE|DST_NAT_DONE,timeout=120,
>>> protoinfo=(state_orig=SYN_SENT,state_reply=SYN_SENT,wscale_orig=7,
>>>wscale_reply=0,flags_orig=WINDOW_SCALE|SACK_PERM,flags_reply=)
>>>
>>> becomes:
>>>
>>> NEW tcp,orig=([...]),reply=([...]),id=1800618864,
>>> status=CONFIRMED|SRC_NAT_DONE|DST_NAT_DONE,timeout=120,
>>> protoinfo=(state_orig=SYN_SENT,state_reply=SYN_SENT,wscale_orig=7,
>>>wscale_reply=0,flags_orig=WINDOW_SCALE|SACK_PERM)
>>>
>>> Signed-off-by: Paolo Valerio 
>>> ---
>>>  lib/ct-dpif.c |   14 ++
>>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
>>> index cfc2315e3..f1a375523 100644
>>> --- a/lib/ct-dpif.c
>>> +++ b/lib/ct-dpif.c
>>> @@ -512,10 +512,16 @@ ct_dpif_format_protoinfo_tcp_verbose(struct ds *ds,
>>>protoinfo->tcp.wscale_orig,
>>>protoinfo->tcp.wscale_reply);
>>>  }
>>> -ct_dpif_format_flags(ds, ",flags_orig=", protoinfo->tcp.flags_orig,
>>> - tcp_flags);
>>> -ct_dpif_format_flags(ds, ",flags_reply=", protoinfo->tcp.flags_reply,
>>> - tcp_flags);
>>> +
>>> +if (protoinfo->tcp.flags_orig) {
>>> +ct_dpif_format_flags(ds, ",flags_orig=", protoinfo->tcp.flags_orig,
>>> + tcp_flags);
>>> +}
>>> +
>>> +if (protoinfo->tcp.flags_reply) {
>>> +ct_dpif_format_flags(ds, ",flags_reply=", 
>>> protoinfo->tcp.flags_reply,
>>> + tcp_flags);
>>> +}
>>
>> Hmm.  I'm trying to understand why ct_dpif_format_flags() exists at all.
>> Shouldn't this be just:
>>
>>   format_flags_masked(ds, "flags_orig", packet_tcp_flag_to_string,
>>   protoinfo->tcp.flags_orig, TCP_FLAGS(OVS_BE16_MAX),
>>   TCP_FLAGS(OVS_BE16_MAX));
>>
>> ?
>>
>> This will change the appearance of the flags, so maybe tcp_flags[] array
>> should be replaced with a simple conversion function.
>>
> 
> Uhm, I guess you're right. It seems redundant and could be removed.
> What about something like this?

The code below looks OK at the first glance.

Best regards, Ilya Maximets.

> 
> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> index cfc2315e3..6f17a26b5 100644
> --- a/lib/ct-dpif.c
> +++ b/lib/ct-dpif.c
> @@ -35,20 +35,11 @@ static void ct_dpif_format_counters(struct ds *,
>  const struct ct_dpif_counters *);
>  static void ct_dpif_format_timestamp(struct ds *,
>   const struct ct_dpif_timestamp *);
> -static void ct_dpif_format_flags(struct ds *, const char *title,
> - uint32_t flags, const struct flags *);
>  static void ct_dpif_format_protoinfo(struct ds *, const char *title,
>   const struct ct_dpif_protoinfo *,
>   bool verbose);
>  static void ct_dpif_format_helper(struct ds *, const char *title,
>const struct ct_dpif_helper *);
> -
> -static const struct flags ct_dpif_status_flags[] = {
> -#define CT_DPIF_STATUS_FLAG(FLAG) { CT_DPIF_STATUS_##FLAG, #FLAG },
> -CT_DPIF_STATUS_FLAGS
> -#undef CT_DPIF_STATUS_FLAG
> -{ 0, NULL } /* End marker. */
> -};
>  
>  /* Dumping */
>  
> @@ -275,6 +266,20 @@ ct_dpif_entry_uninit(struct ct_dpif_entry *entry)
>  }
>  }
>  
> +static const char *
> +ct_dpif_status_flags(uint32_t flags)
> +{
> +switch (flags) {
> +#define CT_DPIF_STATUS_FLAG(FLAG) \
> +case CT_DPIF_STATUS_##FLAG: \
> +return #FLAG;
> +CT_DPIF_STATUS_FLAGS
> +#undef CT_DPIF_TCP_FLAG
> +default:
> +return NULL;
> +}
> +}
> +
>  void
>  ct_dpif_format_entry(const struct ct_dpif_entry *entry, struct ds *ds,
>   bool verbose, bool print_stats)
> @@ -305,8 +310,9 @@ ct_dpif_format_entry(const struct ct_dpif_entry *entry, 
> struct ds *ds,
>  ds_put_format(ds, ",zone=%"PRIu16, entry->zone);
>  }
>  if (verbose) {
> -ct_dpif_format_flags(ds, ",status=", entry->status,
> - ct_dpif_status_flags);
> +format_flags_masked(ds, ",status", ct_dpif_status_flags,
> +entry->status, CT_DPIF_STATUS_MASK,
> +CT_DPIF_STATUS_MASK);
>  }
>  if (print_stats) {
>  ds_put_format(ds, ",timeout=%"PRIu32, entry->timeout);
> @@ -415,28 +421,6 @@ ct_dpif_format_tuple(struct ds *ds, const struct 
> ct_dpif_tuple *tuple)
>  }
>  }
>  
> -static void
> -ct_dpif_format_flags(struct ds *ds, const char *title, uint32_t flags,
> - const struct f

Re: [ovs-dev] [branch-2.16, v2] dpdk: Use DPDK 20.11.6 release.

2022-09-26 Thread Ilya Maximets
On 9/23/22 15:46, David Marchand wrote:
> On Fri, Sep 23, 2022 at 2:43 PM Kevin Traynor  wrote:
>>
>> On 22/09/2022 13:40, Michael Phelan wrote:
>>> Update OVS CLI and relevant documentation to use DPDK 20.11.6.
>>>
>>> A bug was introduced in DPDK 20.11.5 by the commit 33f2e3756186 ("vhost: 
>>> fix unsafe vring addresses modifications").
>>> This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation 
>>> of the virtqueues happen.
>>> A fix [1] has been posted and pushed to the DPDK 20.11 branch.
>>> If a user wishes to avoid the issue then it is recommended to use DPDK 
>>> 20.11.4 until the release of DPDK 20.11.7.
>>> It should be noted that DPDK 20.11.4 does not benefit from the numerous bug 
>>> fixes addressed since its release.
>>> If a user wishes to benefit from these fixes it is recommended to use DPDK 
>>> 20.11.6.
>>>
>>> [1] 
>>> https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/
>>> Signed-off-by: Michael Phelan 
>>>
>>
>> For branches 2.15 [0] and 2.16 [1] I ran github actions and it failed.
>> For 2.16 branch I removed the patch and it passed [2]. It seems like
>> that the meson used (0.47.1 - which is min version for 20.11) does not
>> like the 20.11.5/6 package, or there is some other github effect. It is
>> working fine with 20.11.4.
>>
>> Afterwards, checking the ovs-build mailing [4] list I also see failures
>> here and an additional failure for 2.17 branch. So all these failures
>> need to checked.
>>
>> [1] https://github.com/kevintraynor/ovs/actions/runs/3111862351
>> [2] https://github.com/kevintraynor/ovs/actions/runs/3111865180
>> [3] https://github.com/kevintraynor/ovs/actions/runs/3112089634
>> [4]
>> https://mail.openvswitch.org/pipermail/ovs-build/2022-September/date.html
> 
> This looks like a regression in 20.11 LTS with older meson.
> Adding 20.11 LTS maintainers to the thread.

I'm guessing that this regression will not be fixed until the next
series of DPDK stable releases.  And since we're testing OVS here,
not DPDK, we may just choose a better version of meson.  Would be
also nice to have one that works with python 3.10+, so we can actually
use more recent versions of python without capping it at 3.9.

Any suggestions for a version to use?

Best regards, Ilya Maximets.

P.S. I'll be out for a week, but it would be great if we can finally
 get some working solution for all branches and release a set of
 OVS stable versions after that.

> 
> Afaics, this is triggered by "build: fix warnings when running
> external commands".
> And reverting it is enough to fix the error with meson 0.47.1.
> https://github.com/david-marchand/dpdk/commits/20.11
> https://github.com/david-marchand/dpdk/actions/runs/3113099408
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] daemon-unix: Fix file descriptor leak when monitor restarts child

2022-09-26 Thread Ilya Maximets
On 9/14/22 08:19, Fengqi Li wrote:
> When segmentation fault occured in ovn-northd, monitor will try to
> restart the ovn-northd daemon process every 10s.
> Assume the following scenarios: There is a segmentation fault and
> the ovn-northd daemon process doen not restart properly everytime.
> Nws fds are created each time the ovn-northd daemon process is
> restarted by the monitor process, but old fds(fd[1]) ownered by
> the monitor process was not closed properly. One pipe leak for
> each restart of the ovn-northd daemon process. After a long time
> the OS's pipe was exhausted.

Hi, Fengqi Li.  Thanks for the patch!

> My ovs version is 2.13.2 and I checked the master branch.It also
> have the same problem.

Please add the following Fixes tag instead of the line above:

Fixes: e2ed6fbeb18c ("fatal-signal: Catch SIGSEGV and print backtrace.")

> 
> Signed-off-by: Fengqi Li 
> ---
>  lib/daemon-unix.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> index 52f3d4bc6..3e595687f 100644
> --- a/lib/daemon-unix.c
> +++ b/lib/daemon-unix.c
> @@ -412,6 +412,7 @@ monitor_daemon(pid_t daemon_pid)
>  }
>  last_restart = time(NULL);
>  
> +close(daemonize_fd);

I would move this higher as we don't need to hold this file descriptor
for 10 extra seconds.  We can close it right after log_received_backtrace.
And it's better to set it to -1 right after closing to make sure it will
not be used before it actually set.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC PATCH] netdev-dpdk: add mac based rxq steering option

2022-09-26 Thread thilakraj . sb
From: Thilak Raj Surendra Babu 

For SDN workloads that deal with Guest across NUMAs and Multiple Queues being
serviced by different PMD due to the high Capacity of the NIC,Packets
belonging to the same flow can land on different RXQs or an undesirable
RXQ(PMD in a different NUMA than the Guests NUMA)Both of these scenarios lead
to performance penalties.

Use the RTE FLOW API to steer packets based on the DST MAC of the VM towards a
desirable RXQ so that even if the guest VM has a single VIRT Queue two PMD
threads will not contend on the same spin lock and by steering the flow to the
right RXQ on the same NUMA as VM will also help in cutting down cross NUMA
traffic.

The feature can be enabled per queue per port by specifying the Queue number
to which the Flows matching mac addresses are re-directed.

Example:
ovs-vsctl set interface eno1
options:rxq-steer-params="0|aa:bb:cc:dd:ee:f7"

Default behavior is that while flows for which rules are put on are steered
towards right RXQ, Other flows which don't have rte_flow entry is steered can
potentially land on the same RXQ.

We can isolate the RXQ only to the flows configured on the RXQ with the below
command.

Example:
ovs-vsctl set interface eno1 options:rxq-isolate=true

I validated this with a setup where I have two PMDs on two different NUMA.
Launch a VM and use the MAC address of the VM to insert a flow to steer
packets towards the RXQ being serviced by the PMD on the same NUMA as the
guest VM and start a traffic generator and send multiple flows towards the
VM.

Without this programming, the flows can land on different PMDs on different
NUMA Leading to spin_lock contention on the virtqueue as well cross NUMA
data access for the PMD which is on a different NUMA than the guest.

notes:
This can be further expanded to offload flow matching any 5-tuple as
well to give isolation for important flows such as heart-beats which can
be destined to a VM or the host.

Signed-off-by: Thilak Raj Surendra Babu 
---
 Documentation/topics/dpdk/phy.rst |  41 
 lib/netdev-dpdk.c | 210 +-
 2 files changed, 250 insertions(+), 1 deletion(-)

diff --git a/Documentation/topics/dpdk/phy.rst 
b/Documentation/topics/dpdk/phy.rst
index 937f4c4..e2629d1 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -467,3 +467,44 @@ Command to set interrupt mode for a specific interface::
 
 Command to set polling mode for a specific interface::
 $ ovs-vsctl set interface  options:dpdk-lsc-interrupt=false
+
+RXQ-Steering
+
+
+Typically one RXQ is not sufficient to handle workloads from a high-speed NIC,
+which leads to multiple RXQs.With more RXQ comes the challenge of Placing the
+flows on the right queue.
+
+Though RSS can provide some relief by placing different flows at different
+Queues,It does not provide the flexibility of placing the flow on the optimal
+Queue.
+
+Placing the flow on the right queue is important based on the observations in
+the below scenarios.
+
+1.VM memory is one NUMA and flows land on a RXQ which is handled by a different
+PMD,leading to memory access over NUMA.
+2.5tuple-based RSS placing two different flows towards the Same VM on two
+different PMD threads, leading to the two PMD threads contending on the same
+virtq spin lock.
+
+To gain finer control on the placement of guest flows, RXQ steering is
+introduced to place the right flow to the most optimal RXQ using flow
+director/rte_flow API's to mitigate some of the above points mentioned.
+
+To add an RXQ steering rule, Fill in the MAC address of the VM and Queue to
+which you want to re-direct the packet.Every time the below string value
+changes,all the existing rules are flushed and new rules are inserted::
+
+$ovs-vsctl set interface  \
+options:rxq-steer-params="|,|"
+
+To remove RXQ steering, set the value to be zero.All existing rules will be
+flushed::
+
+$ovs-vsctl set interface  \
+options:rxq-steer-params="0"
+
+To isolate the RXQ only to the configured flows, use rxq_isolate like below::
+
+$ovs-vsctl set interface eno1 options:rxq-isolate=true
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0dd6555..39dd800 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -156,6 +156,7 @@ typedef uint16_t dpdk_port_t;
| RTE_ETH_TX_OFFLOAD_UDP_CKSUM\
| RTE_ETH_TX_OFFLOAD_IPV4_CKSUM)
 
+#define RXQ_STEERING_FLOWS_MAX 10
 
 static const struct rte_eth_conf port_conf = {
 .rxmode = {
@@ -529,6 +530,14 @@ struct netdev_dpdk {
 
 /* VF configuration. */
 struct eth_addr requested_hwaddr;
+
+/* RXQ flow steering */
+char *requested_rxq_steer_params;
+char *rxq_steer_params;
+uint8_t flow_count;
+bool rxq_isolate;
+bool requested_rxq_isolate;
+struct rte_flow *steering_flows[RXQ_STEERING_FLOWS_MAX];
  

[ovs-dev] [PATCH v2] sparse: Add a guard for netinet/ip6.h header on FreeBSD.

2022-09-26 Thread Ilya Maximets
Same as arpa/inet.h, the netinet/ip6.h on FreeBSD requires
netinet/in.h to be included first.  So, adding a similar guard.

Also fixing one instance where this is not respected at the moment.

We do have FreeBSD CI these days, but it is still nice to have
a more clear error message.

Fixes: b2befd5bb2db ("sparse: Add guards to prevent FreeBSD-incompatible 
#include order.")
Signed-off-by: Ilya Maximets 
---

Version 2:
  - Switched the guard from sys/types.h to netinet/in.h since
struct ip6_addr is defined there.  Fixed one instance where
this is not respected at the moment.

 include/sparse/netinet/ip6.h | 4 
 lib/netdev-offload-dpdk.c| 1 +
 2 files changed, 5 insertions(+)

diff --git a/include/sparse/netinet/ip6.h b/include/sparse/netinet/ip6.h
index bfa637a46..b2b6f47d9 100644
--- a/include/sparse/netinet/ip6.h
+++ b/include/sparse/netinet/ip6.h
@@ -18,6 +18,10 @@
 #error "Use this header only with sparse.  It is not a correct implementation."
 #endif
 
+#ifndef NETINET_IN_H_INCLUDED
+#error "Must include  before  for FreeBSD support"
+#endif
+
 #ifndef __NETINET_IP6_SPARSE
 #define __NETINET_IP6_SPARSE 1
 
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index cceefbc50..80a64a6cc 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -17,6 +17,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.37.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] sparse: Add a guard for netinet/ip6.h header on FreeBSD.

2022-09-26 Thread Ilya Maximets
On 9/20/22 12:01, Finn, Emma wrote:
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Monday 19 September 2022 12:50
>> To: ovs-dev@openvswitch.org
>> Cc: Finn, Emma ; Ilya Maximets 
>> Subject: [PATCH] sparse: Add a guard for netinet/ip6.h header on FreeBSD.
>>
>> Same as netinet/in.h, the netinet/ip6.h on FreeBSD requires sys/types.h to be
>> included first.  So, adding a similar guard.
>>
>> We do have FreeBSD CI these days, but it is still nice to have a more clear 
>> error
>> message.
>>
>> Fixes: b2befd5bb2db ("sparse: Add guards to prevent FreeBSD-incompatible
>> #include order.")
>> Signed-off-by: Ilya Maximets 
>> ---
>>  include/sparse/netinet/ip6.h | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/include/sparse/netinet/ip6.h b/include/sparse/netinet/ip6.h 
>> index
>> bfa637a46..3beef3705 100644
>> --- a/include/sparse/netinet/ip6.h
>> +++ b/include/sparse/netinet/ip6.h
>> @@ -18,6 +18,10 @@
>>  #error "Use this header only with sparse.  It is not a correct 
>> implementation."
>>  #endif
>>
>> +#ifndef SYS_TYPES_H_INCLUDED
>> +#error "Must include  before  for FreeBSD
>> support"
>> +#endif
>> +
>>  #ifndef __NETINET_IP6_SPARSE
>>  #define __NETINET_IP6_SPARSE 1
>>
>> --
>> 2.37.3
> 
> The change looks good to me.
> 
> Acked-by: Emma Finn 
> 
> 

Hmm.  It looks like we actually need netinet/in.h here instead for
ip6 structures.   netinet/in.h by itself requires sys/types.h, so
that will cover basic types.

I'll send an updated patch.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/5] netdev-dpdk: Introduce per rxq/txq Vhost-user statistics.

2022-09-26 Thread Ilya Maximets
On 9/22/22 14:56, Maxime Coquelin wrote:
> Hi Ilya,
> 
> On 1/17/22 19:01, Ilya Maximets wrote:
>> On 1/5/22 09:19, Maxime Coquelin wrote:
>>> Hash-based Tx steering feature will enable steering Tx
>>> packets on transmit queues based on their hashes. In order
>>> to test the feature, it is needed to be able to get the
>>> per-queue statistics for Vhost-user ports.
>>>
>>> This patch introduces "bytes", "packets" and "error"
>>> per-queue custom statistics for Vhost-user ports.
>>>
>>> Suggested-by David Marchand 
>>> Signed-off-by: Maxime Coquelin 
>>> Reviewed-by: David Marchand 
>>> ---
>>>   lib/netdev-dpdk.c | 147 +++---
>>>   1 file changed, 138 insertions(+), 9 deletions(-)
>>
>> Hi, Maxime.
>>
>> Thanks for the patch; and I really think that it's an important
>> feature for debugging performance issues in a real-world setups.
>>
>> However, it causes a performance drop by about 2-2.5% for me
>> with the VM-VM bidirectional traffic with 2 PMD threads.
>>
>> The reason is the existing stats_lock.  Unfortunately, in current
>> code, we're taking the same stats_lock on both rx and tx paths,
>> and since rx and tx are likely performed by different threads at
>> the same time, they are frequently locking each other.
>>
>> Under this circumstances even a slight increase of a critical
>> section causes a visible performance drop.
>>
>> One of the possible solutions might be to split the stats_lock
>> in two (one for rx stats and one for tx stats).  We also should
>> split or re-align to different cache lines rx and tx fields
>> of the generic struct netdev_stats, or count all the stats on the
>> per-queue basis.
>> Quick prototype of such a solution gives an extra 2-3% performance
>> boost over the current master and reduces the impact of extra
>> stats in this patch to a minimum.
>>
>> I'll polish and submit my prototype code sometime later.
>> For now, I think, we won't be able to accept this change for 2.17,
>> since some more development is needed to avoid regression.
> 
> I'm currently working on supporting the new Vhost per queue stats API in
> OVS. Have you posted the prototype you did? I cannot find it, and think
> it would be better to be applied before my series.

Hi.  I never actually posted it, but here is the commit:
  https://github.com/igsilya/ovs/commit/cc3b03a8d1eb613bc42c9dc7c491efc42206f824

It's fairly simple.  I'm not sure about modifying the
public 'netdev_stats' structure though.  It might be
better to keep 2 instances of that structure.  One for
rx and one for tx and keep them on separate cache lines
along with their locks.

Best regards, Ilya Maximets.

> 
> Thanks,
> Maxime
> 
>> There is also a memory leak in this code, but that can be easily
>> fixed:
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 0e1efefe3..f8680a058 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1549,6 +1549,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>   dev->vhost_id = NULL;
>>   rte_free(dev->vhost_rxq_enabled);
>>   +    free(dev->vhost_rxq_stats);
>> +    free(dev->vhost_txq_stats);
>> +
>>   common_destruct(dev);
>>     ovs_mutex_unlock(&dpdk_mutex);
>> ---
>>
>> Best regards, Ilya Maximets.
>>
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 13/16] mempool: Use kmalloc_size_roundup() to match ksize() usage

2022-09-26 Thread Kees Cook
On Mon, Sep 26, 2022 at 03:50:43PM +0200, Vlastimil Babka wrote:
> On 9/23/22 22:28, Kees Cook wrote:
> > Round up allocations with kmalloc_size_roundup() so that mempool's use
> > of ksize() is always accurate and no special handling of the memory is
> > needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
> > 
> > Cc: Andrew Morton 
> > Cc: linux...@kvack.org
> > Signed-off-by: Kees Cook 
> > ---
> >   mm/mempool.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/mempool.c b/mm/mempool.c
> > index 96488b13a1ef..0f3107b28e6b 100644
> > --- a/mm/mempool.c
> > +++ b/mm/mempool.c
> > @@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab);
> >*/
> >   void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
> >   {
> > -   size_t size = (size_t)pool_data;
> > +   size_t size = kmalloc_size_roundup((size_t)pool_data);
> 
> Hm it is kinda wasteful to call into kmalloc_size_roundup for every
> allocation that has the same input. We could do it just once in
> mempool_init_node() for adjusting pool->pool_data ?
> 
> But looking more closely, I wonder why poison_element() and
> kasan_unpoison_element() in mm/mempool.c even have to use ksize()/__ksize()
> and not just operate on the requested size (again, pool->pool_data). If no
> kmalloc mempool's users use ksize() to write beyond requested size, then we
> don't have to unpoison/poison that area either?

Yeah, I think that's a fair point. I will adjust this.

-- 
Kees Cook
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7] ovsdb idl: Add the support to specify the uuid for row insert.

2022-09-26 Thread 0-day Robot
Bleep bloop.  Greetings Numan Siddique, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 108 characters long (recommended limit is 79)
#119 FILE: lib/db-ctl-base.man:206:
.IP "[\fB\-\-id=(@\fIname\fR | \fIuuid\fR] \fBcreate\fR \fItable 
column\fR[\fB:\fIkey\fR]\fB=\fIvalue\fR..."

WARNING: Line is 174 characters long (recommended limit is 79)
#142 FILE: lib/db-ctl-base.xml:313:
[--id=(@name|uuid)] 
create table 
column[:key]=value...

Lines checked: 492, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v7] ovsdb idl: Add the support to specify the uuid for row insert.

2022-09-26 Thread numans
From: Numan Siddique 

ovsdb-server allows the OVSDB clients to specify the uuid for
the row inserts [1].  The C IDL client library is  missing this
feature.  This patch adds this support.

For each schema table, a new function is generated -
insert_persistent_uuid(txn, uuid) and the users
of IDL client library can make use of this function.

ovs-vsctl and other derivatives of ctl now supports the same
in the generic 'create' command with the option "--id=".

[1] - a529e3cd1f("ovsdb-server: Allow OVSDB clients to specify the UUID for 
inserted rows.:)

Signed-off-by: Numan Siddique 
Acked-by: Adrian Moreno 
Acked-by: Han Zhou 
---
v6 -> v7
---
  * Rebased to resolve conflicts.
 
v5 -> v6
---
  * Rebased to resolve conflicts.

v4 -> v5
---
  * Addressed review comments from Ilya.
 - Added NEWS item entry.

v3 -> v4
---
  * Added an entry in python/TODO.rst.

v2 -> v3

  * Addressed review comments from Han
  - Added test case for --id ctl option

v1 -> v2
-
  * Addressed review comments from Adrian Moreno
  * Added the support in generic 'create' command to specify the uuid in
--id option.


 NEWS |  2 +
 lib/db-ctl-base.c| 38 --
 lib/db-ctl-base.man  |  5 ++-
 lib/db-ctl-base.xml  |  6 ++-
 lib/ovsdb-idl-provider.h |  1 +
 lib/ovsdb-idl.c  | 85 +---
 lib/ovsdb-idl.h  |  3 ++
 ovsdb/ovsdb-idlc.in  | 15 +++
 python/TODO.rst  |  2 +
 tests/ovs-vsctl.at   | 25 
 tests/ovsdb-idl.at   | 27 +
 tests/test-ovsdb.c   | 59 
 12 files changed, 231 insertions(+), 37 deletions(-)

diff --git a/NEWS b/NEWS
index d5ec09813..d0c8e5f95 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,8 @@ Post-v3.0.0
- Windows:
  * Conntrack IPv6 fragment support.
 
+   - OVSDB-IDL:
+ * Add the support to specify the uuid for row insert.
 
 v3.0.0 - 15 Aug 2022
 
diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c
index bc85e9921..856832a04 100644
--- a/lib/db-ctl-base.c
+++ b/lib/db-ctl-base.c
@@ -1731,29 +1731,43 @@ cmd_create(struct ctl_context *ctx)
 const struct ovsdb_idl_table_class *table;
 const struct ovsdb_idl_row *row;
 const struct uuid *uuid = NULL;
+bool persist_uuid = false;
+struct uuid uuid_;
 int i;
 
 ctx->error = get_table(table_name, &table);
 if (ctx->error) {
 return;
 }
+
 if (id) {
-struct ovsdb_symbol *symbol = NULL;
+if (uuid_from_string(&uuid_, id)) {
+uuid = &uuid_;
+persist_uuid = true;
+} else {
+struct ovsdb_symbol *symbol = NULL;
 
-ctx->error = create_symbol(ctx->symtab, id, &symbol, NULL);
-if (ctx->error) {
-return;
-}
-if (table->is_root) {
-/* This table is in the root set, meaning that rows created in it
- * won't disappear even if they are unreferenced, so disable
- * warnings about that by pretending that there is a reference. */
-symbol->strong_ref = true;
+ctx->error = create_symbol(ctx->symtab, id, &symbol, NULL);
+if (ctx->error) {
+return;
+}
+if (table->is_root) {
+/* This table is in the root set, meaning that rows created in
+ * it won't disappear even if they are unreferenced, so disable
+ * warnings about that by pretending that there is a
+ * reference. */
+symbol->strong_ref = true;
+}
+uuid = &symbol->uuid;
 }
-uuid = &symbol->uuid;
 }
 
-row = ovsdb_idl_txn_insert(ctx->txn, table, uuid);
+if (persist_uuid) {
+row = ovsdb_idl_txn_insert_persist_uuid(ctx->txn, table, uuid);
+} else {
+row = ovsdb_idl_txn_insert(ctx->txn, table, uuid);
+}
+
 for (i = 2; i < ctx->argc; i++) {
 ctx->error = set_column(table, row, ctx->argv[i], ctx->symtab);
 if (ctx->error) {
diff --git a/lib/db-ctl-base.man b/lib/db-ctl-base.man
index a529d8b4d..c8111c9ef 100644
--- a/lib/db-ctl-base.man
+++ b/lib/db-ctl-base.man
@@ -203,7 +203,7 @@ Without \fB\-\-if-exists\fR, it is an error if \fIrecord\fR 
does not
 exist.  With \fB\-\-if-exists\fR, this command does nothing if
 \fIrecord\fR does not exist.
 .
-.IP "[\fB\-\-id=@\fIname\fR] \fBcreate\fR \fItable 
column\fR[\fB:\fIkey\fR]\fB=\fIvalue\fR..."
+.IP "[\fB\-\-id=(@\fIname\fR | \fIuuid\fR] \fBcreate\fR \fItable 
column\fR[\fB:\fIkey\fR]\fB=\fIvalue\fR..."
 Creates a new record in \fItable\fR and sets the initial values of
 each \fIcolumn\fR.  Columns not explicitly set will receive their
 default values.  Outputs the UUID of the new row.
@@ -212,6 +212,9 @@ If \fB@\fIname\fR is specified, then the UUID for the new 
row may be
 referred to by that name elsewhere in the same \fB\*(PN\fR
 invocation in contexts 

Re: [ovs-dev] [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

2022-09-26 Thread Kees Cook
On Mon, Sep 26, 2022 at 03:15:22PM +0200, Vlastimil Babka wrote:
> On 9/23/22 22:28, Kees Cook wrote:
> > In the effort to help the compiler reason about buffer sizes, the
> > __alloc_size attribute was added to allocators. This improves the scope
> > of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> > future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> > as the vast majority of callers are not expecting to use more memory
> > than what they asked for.
> > 
> > There is, however, one common exception to this: anticipatory resizing
> > of kmalloc allocations. These cases all use ksize() to determine the
> > actual bucket size of a given allocation (e.g. 128 when 126 was asked
> > for). This comes in two styles in the kernel:
> > 
> > 1) An allocation has been determined to be too small, and needs to be
> > resized. Instead of the caller choosing its own next best size, it
> > wants to minimize the number of calls to krealloc(), so it just uses
> > ksize() plus some additional bytes, forcing the realloc into the next
> > bucket size, from which it can learn how large it is now. For example:
> > 
> > data = krealloc(data, ksize(data) + 1, gfp);
> > data_len = ksize(data);
> > 
> > 2) The minimum size of an allocation is calculated, but since it may
> > grow in the future, just use all the space available in the chosen
> > bucket immediately, to avoid needing to reallocate later. A good
> > example of this is skbuff's allocators:
> > 
> > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> > ...
> > /* kmalloc(size) might give us more room than requested.
> >  * Put skb_shared_info exactly at the end of allocated zone,
> >  * to allow max possible filling before reallocation.
> >  */
> > osize = ksize(data);
> >  size = SKB_WITH_OVERHEAD(osize);
> > 
> > In both cases, the "how much was actually allocated?" question is answered
> > _after_ the allocation, where the compiler hinting is not in an easy place
> > to make the association any more. This mismatch between the compiler's
> > view of the buffer length and the code's intention about how much it is
> > going to actually use has already caused problems[1]. It is possible to
> > fix this by reordering the use of the "actual size" information.
> > 
> > We can serve the needs of users of ksize() and still have accurate buffer
> > length hinting for the compiler by doing the bucket size calculation
> > _before_ the allocation. Code can instead ask "how large an allocation
> > would I get for a given size?".
> > 
> > Introduce kmalloc_size_roundup(), to serve this function so we can start
> > replacing the "anticipatory resizing" uses of ksize().
> > 
> > [1] https://github.com/ClangBuiltLinux/linux/issues/1599
> >  https://github.com/KSPP/linux/issues/183
> > 
> > Cc: Vlastimil Babka 
> > Cc: Christoph Lameter 
> > Cc: Pekka Enberg 
> > Cc: David Rientjes 
> > Cc: Joonsoo Kim 
> > Cc: Andrew Morton 
> > Cc: linux...@kvack.org
> > Signed-off-by: Kees Cook 
> 
> OK, added patch 1+2 to slab.git for-next branch.
> Had to adjust this one a bit, see below.
> 
> > ---
> >   include/linux/slab.h | 31 +++
> >   mm/slab.c|  9 ++---
> >   mm/slab_common.c | 20 
> >   3 files changed, 57 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 41bd036e7551..727640173568 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t 
> > new_size, gfp_t flags) __r
> >   void kfree(const void *objp);
> >   void kfree_sensitive(const void *objp);
> >   size_t __ksize(const void *objp);
> > +
> > +/**
> > + * ksize - Report actual allocation size of associated object
> > + *
> > + * @objp: Pointer returned from a prior kmalloc()-family allocation.
> > + *
> > + * This should not be used for writing beyond the originally requested
> > + * allocation size. Either use krealloc() or round up the allocation size
> > + * with kmalloc_size_roundup() prior to allocation. If this is used to
> > + * access beyond the originally requested allocation size, UBSAN_BOUNDS
> > + * and/or FORTIFY_SOURCE may trip, since they only know about the
> > + * originally allocated size via the __alloc_size attribute.
> > + */
> >   size_t ksize(const void *objp);
> > +
> >   #ifdef CONFIG_PRINTK
> >   bool kmem_valid_obj(void *object);
> >   void kmem_dump_obj(void *object);
> > @@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
> >   extern void kvfree_sensitive(const void *addr, size_t len);
> >   unsigned int kmem_cache_size(struct kmem_cache *s);
> > +
> > +/**
> > + * kmalloc_size_roundup - Report allocation bucket size for the given size
> > + *
> > + * @size: Number of bytes to round up from.
> > + *
> > + * This returns the number of bytes that would be availab

[ovs-dev] [PATCH ovn v2 3/3] northd: add drop sampling

2022-09-26 Thread Adrian Moreno
Two new options are added to NB_Global table that enable drop
sampling by specifying the collector_set_id and the obs_domain_id of
the sample actions added to all drop flows.

For drops coming from an lflow, the sample has the following fields:
- obs_domain_id (32-bit): obs_domain_id << 8 | tunnel_key
  - 8 most significant bits: the obs_domain_id specified in the
NB_Global options.
  - 24 least significant bits: the tunnel_key.
- obs_point_id: the cookie (first 32-bits of the lflow's UUID).

For drops that are inserted by ovn-controller without any associated
lflow, the sample will have the follwing fields:
- obs_domain_id (32-bit): obs_domain_id << 8
  - 8 most significant bits: the obs_domain_id specified in the
NB_Global options.
  - 24 least significant bits: 0.
- obs_point_id: The table number.

Signed-off-by: Adrian Moreno 
---
 NEWS|  2 +
 controller/ovn-controller.c | 31 ++---
 controller/physical.c   | 34 --
 controller/physical.h   |  8 +++-
 northd/debug.c  | 92 +++--
 northd/debug.h  | 10 
 northd/northd.c | 73 +++--
 ovn-nb.xml  | 24 ++
 tests/ovn.at| 65 --
 9 files changed, 263 insertions(+), 76 deletions(-)

diff --git a/NEWS b/NEWS
index 224a7b83e..c01774499 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,7 @@
 Post v22.09.0
 -
+  - ovn-northd: Add configuration knobs to make drops explicit and
+optionally sample them (using OVS's per-flow IPFIX sampling).
 
 OVN v22.09.0 - 16 Sep 2022
 --
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index cc3bea64b..149199912 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -3119,7 +3119,7 @@ struct ed_type_pflow_output {
 /* Desired physical flows. */
 struct ovn_desired_flow_table flow_table;
 /* Drop debugging options. */
-bool debug_drop;
+struct physical_debug debug;
 };
 
 static void init_physical_ctx(struct engine_node *node,
@@ -3196,8 +3196,15 @@ static void init_physical_ctx(struct engine_node *node,
 p_ctx->local_bindings = &rt_data->lbinding_data.bindings;
 p_ctx->patch_ofports = &non_vif_data->patch_ofports;
 p_ctx->chassis_tunnels = &non_vif_data->chassis_tunnels;
-p_ctx->debug_drop = smap_get_bool(&sb_global->options,
+p_ctx->debug.enabled = smap_get_bool(&sb_global->options,
   "debug_drop_mode", false);
+p_ctx->debug.collector_set_id = smap_get_uint(&sb_global->options,
+  "debug_drop_collector_set",
+  0);
+
+p_ctx->debug.obs_domain_id = smap_get_uint(&sb_global->options,
+   "debug_drop_domain_id",
+   0);
 }
 
 static void *
@@ -3411,12 +3418,22 @@ pflow_output_sb_sb_global_handler(struct engine_node 
*node, void *data)
 
 struct ed_type_pflow_output *pfo = data;
 
-bool debug_drop = smap_get_bool(&sb_global->options,
-"debug_drop_mode", false);
-
-if (pfo->debug_drop != debug_drop) {
+bool debug_enabled = smap_get_bool(&sb_global->options,
+   "debug_drop_mode", false);
+uint32_t collector_set_id = smap_get_uint(&sb_global->options,
+  "debug_drop_collector_set",
+  0);
+uint32_t obs_domain_id = smap_get_uint(&sb_global->options,
+   "debug_drop_domain_id",
+   0);
+
+if (pfo->debug.enabled != debug_enabled ||
+pfo->debug.collector_set_id != collector_set_id ||
+pfo->debug.obs_domain_id != obs_domain_id) {
 engine_set_node_state(node, EN_UPDATED);
-pfo->debug_drop = debug_drop;
+pfo->debug.enabled = debug_enabled;
+pfo->debug.collector_set_id = collector_set_id;
+pfo->debug.obs_domain_id = obs_domain_id;
 }
 return true;
 }
diff --git a/controller/physical.c b/controller/physical.c
index e86d0297c..ee8fd9200 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -825,25 +825,44 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct 
ofpbuf *ofpacts_p)
 }
 }
 
+static void
+put_drop(const struct physical_debug *debug, uint8_t table_id,
+ struct ofpbuf *ofpacts)
+{
+if (debug->collector_set_id) {
+struct ofpact_sample *os = ofpact_put_SAMPLE(ofpacts);
+os->probability = UINT16_MAX;
+os->collector_set_id = debug->collector_set_id;
+os->obs_domain_id = (debug->obs_domain_id << 24);
+os->obs_point_id = table_id;
+}
+}
+
 static void
 add_default_drop_flow(const struct physi

[ovs-dev] [PATCH ovn v2 2/3] northd: add drop-debug-mode to add explicit drops

2022-09-26 Thread Adrian Moreno
Add a new config flag called "drop-debug-mode" that makes northd add an
explicit default drop to all tables that currently do not have a default
(prio=0, match=1) lflow.

In the controller side, also add explicit default drop rules on physical
tables that need it.

When this mode is enabled the explicit drop actions  make it easier to
debug when OVN is dropping a packet.

Signed-off-by: Adrian Moreno 
---
 controller/ovn-controller.c |  33 
 controller/physical.c   |  48 +++
 controller/physical.h   |   1 +
 northd/automake.mk  |   2 +
 northd/debug.c  |  23 ++
 northd/debug.h  |  31 +++
 northd/northd.c |  44 +-
 ovn-nb.xml  |   8 ++
 tests/ovn-northd.at |  77 +-
 tests/ovn.at| 158 +++-
 10 files changed, 418 insertions(+), 7 deletions(-)
 create mode 100644 northd/debug.c
 create mode 100644 northd/debug.h

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 43fbf2ba3..cc3bea64b 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -3118,6 +3118,8 @@ lflow_output_sb_meter_handler(struct engine_node *node, 
void *data)
 struct ed_type_pflow_output {
 /* Desired physical flows. */
 struct ovn_desired_flow_table flow_table;
+/* Drop debugging options. */
+bool debug_drop;
 };
 
 static void init_physical_ctx(struct engine_node *node,
@@ -3167,6 +3169,12 @@ static void init_physical_ctx(struct engine_node *node,
 chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id);
 }
 
+struct sbrec_sb_global_table *sb_global_table =
+(struct sbrec_sb_global_table *)EN_OVSDB_GET(
+engine_get_input("SB_sb_global", node));
+const struct sbrec_sb_global *sb_global =
+sbrec_sb_global_table_first(sb_global_table);
+
 ovs_assert(br_int && chassis);
 
 struct ed_type_ct_zones *ct_zones_data =
@@ -3188,6 +3196,8 @@ static void init_physical_ctx(struct engine_node *node,
 p_ctx->local_bindings = &rt_data->lbinding_data.bindings;
 p_ctx->patch_ofports = &non_vif_data->patch_ofports;
 p_ctx->chassis_tunnels = &non_vif_data->chassis_tunnels;
+p_ctx->debug_drop = smap_get_bool(&sb_global->options,
+  "debug_drop_mode", false);
 }
 
 static void *
@@ -3390,6 +3400,27 @@ pflow_output_activated_ports_handler(struct engine_node 
*node, void *data)
 return true;
 }
 
+static bool
+pflow_output_sb_sb_global_handler(struct engine_node *node, void *data)
+{
+struct sbrec_sb_global_table *sb_global_table =
+(struct sbrec_sb_global_table *)EN_OVSDB_GET(
+engine_get_input("SB_sb_global", node));
+const struct sbrec_sb_global *sb_global =
+sbrec_sb_global_table_first(sb_global_table);
+
+struct ed_type_pflow_output *pfo = data;
+
+bool debug_drop = smap_get_bool(&sb_global->options,
+"debug_drop_mode", false);
+
+if (pfo->debug_drop != debug_drop) {
+engine_set_node_state(node, EN_UPDATED);
+pfo->debug_drop = debug_drop;
+}
+return true;
+}
+
 static void *
 en_flow_output_init(struct engine_node *node OVS_UNUSED,
 struct engine_arg *arg OVS_UNUSED)
@@ -3732,6 +3763,8 @@ main(int argc, char *argv[])
 engine_add_input(&en_pflow_output, &en_mff_ovn_geneve, NULL);
 engine_add_input(&en_pflow_output, &en_ovs_open_vswitch, NULL);
 engine_add_input(&en_pflow_output, &en_ovs_bridge, NULL);
+engine_add_input(&en_pflow_output, &en_sb_sb_global,
+ pflow_output_sb_sb_global_handler);
 
 engine_add_input(&en_northd_options, &en_sb_sb_global,
  en_northd_options_sb_sb_global_handler);
diff --git a/controller/physical.c b/controller/physical.c
index f3c8bddce..e86d0297c 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -825,6 +825,20 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct 
ofpbuf *ofpacts_p)
 }
 }
 
+static void
+add_default_drop_flow(const struct physical_ctx *p_ctx,
+  uint8_t table_id,
+  struct ovn_desired_flow_table *flow_table)
+{
+if (p_ctx->debug_drop) {
+struct match match = MATCH_CATCHALL_INITIALIZER;
+struct ofpbuf ofpacts;
+ofpbuf_init(&ofpacts, 0);
+ofctrl_add_flow(flow_table, table_id, 0, 0, &match,
+&ofpacts, hc_uuid);
+}
+}
+
 static void
 put_local_common_flows(uint32_t dp_key,
const struct sbrec_port_binding *pb,
@@ -2106,6 +2120,13 @@ physical_run(struct physical_ctx *p_ctx,
 }
 }
 
+/* Table 0, priority 0.
+ * ==
+ *
+ * Drop packets tha do not match any tunnel in_port.
+ */
+add_default_drop_flow(p_ctx, OFTABLE_PHY_TO_LOG, flow_table);
+
 /* Table 37, priority 150.
  * 

[ovs-dev] [PATCH ovn v2 1/3] actions: add sample action

2022-09-26 Thread Adrian Moreno
sample ovn action encodes into the OFPACT_SAMPLE ovs action.

OVN action allows the following parameters:

- obs_domain_id: 8-bit integer that identifies the sampling application.
  This value will be combined with the datapath's tunnel_id to form the
  final observation_domain_id that will be used in the OVS action.

- obs_point_id: a 32-bit integer or the $cookie macro that will be
  expanded into the first 32 bits of the lflow's UUID.

- probability: a 16-bit integer that specifies the sampling probability.
  Specifying 0 has no effect and 65535 means sampling all packets.

Signed-off-by: Adrian Moreno 
---
 controller/lflow.c|   1 +
 include/ovn/actions.h |  16 ++
 lib/actions.c | 120 ++
 tests/ovn.at  |  25 +
 tests/test-ovn.c  |   3 ++
 utilities/ovn-trace.c |   2 +
 6 files changed, 167 insertions(+)

diff --git a/controller/lflow.c b/controller/lflow.c
index eef44389f..cbda6cdfb 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -1163,6 +1163,7 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 .group_table = l_ctx_out->group_table,
 .meter_table = l_ctx_out->meter_table,
 .lflow_uuid = lflow->header_.uuid,
+.tunnel_key = ldp->datapath->tunnel_key,
 
 .pipeline = ingress ? OVNACT_P_INGRESS : OVNACT_P_EGRESS,
 .ingress_ptable = OFTABLE_LOG_INGRESS_PIPELINE,
diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index d7ee84dac..c7f40cb7d 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -121,6 +121,7 @@ struct ovn_extend_table;
 OVNACT(COMMIT_ECMP_NH,ovnact_commit_ecmp_nh)  \
 OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)  \
 OVNACT(CHK_ECMP_NH,   ovnact_result)  \
+OVNACT(SAMPLE,ovnact_sample)  \
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -456,6 +457,18 @@ struct ovnact_lookup_fdb {
 struct expr_field dst; /* 1-bit destination field. */
 };
 
+/* OVNACT_SAMPLE */
+struct ovnact_sample {
+struct ovnact ovnact;
+uint16_t probability;   /* probability over UINT16_MAX. */
+uint8_t obs_domain_id;  /* most significant byte of the
+   observation domain id. The other 24 bits
+   will come from the datapath's tunnel key. */
+uint32_t collector_set_id;  /* colector_set_id. */
+uint32_t obs_point_id;  /* observation point id. */
+bool use_cookie;/* use cookie as obs_point_id */
+};
+
 /* OVNACT_COMMIT_ECMP_NH. */
 struct ovnact_commit_ecmp_nh {
 struct ovnact ovnact;
@@ -785,6 +798,9 @@ struct ovnact_encode_params {
 /* The logical flow uuid that drove this action. */
 struct uuid lflow_uuid;
 
+/* The tunnel key of the datapath. */
+uint32_t tunnel_key;
+
 /* OVN maps each logical flow table (ltable), one-to-one, onto a physical
  * OpenFlow flow table (ptable).  A number of parameters describe this
  * mapping and data related to flow tables:
diff --git a/lib/actions.c b/lib/actions.c
index adbb42db4..95772d9a8 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -4279,6 +4279,124 @@ encode_CHECK_OUT_PORT_SEC(const struct ovnact_result 
*dl,
MLF_CHECK_PORT_SEC_BIT, ofpacts);
 }
 
+static void
+format_SAMPLE(const struct ovnact_sample *sample, struct ds *s)
+{
+ds_put_format(s, "sample(probability=%"PRId16, sample->probability);
+
+ds_put_format(s, ",collector_set=%"PRId32, sample->collector_set_id);
+ds_put_format(s, ",obs_domain=%"PRId8, sample->obs_domain_id);
+if (sample->use_cookie) {
+ds_put_cstr(s, ",obs_point=$cookie");
+} else {
+ds_put_format(s, ",obs_point=%"PRId32, sample->obs_point_id);
+}
+ds_put_format(s, ");");
+}
+
+static void
+encode_SAMPLE(const struct ovnact_sample *sample,
+  const struct ovnact_encode_params *ep,
+  struct ofpbuf *ofpacts)
+{
+struct ofpact_sample *os = ofpact_put_SAMPLE(ofpacts);
+os->probability = sample->probability;
+os->collector_set_id = sample->collector_set_id;
+os->obs_domain_id =
+(sample->obs_domain_id << 24) | (ep->tunnel_key & 0xFF);
+
+if (sample->use_cookie) {
+os->obs_point_id = ep->lflow_uuid.parts[0];
+} else {
+os->obs_point_id = sample->obs_point_id;
+}
+os->sampling_port = OFPP_NONE;
+}
+
+static void
+parse_sample_arg(struct action_context *ctx, struct ovnact_sample *sample)
+{
+if (lexer_match_id(ctx->lexer, "probability")) {
+if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) {
+return;
+}
+if (ctx->lexer->token.type == LEX_T_INTEGER
+&& ctx->lexer->token.format == LEX_F_DECIMAL) {
+if (!action_parse_uint16(ctx, &sample->probability,
+ "probability")) {
+ 

[ovs-dev] [PATCH ovn v2 0/3] Add ovn drop debugging

2022-09-26 Thread Adrian Moreno
Very often when troubleshooting networking issues in an OVN cluster one
would like to know if any packet (or a specific one) is being dropped by
OVN.

Currently, this cannot be known because of two main reasons:

1 - Implicit drops: Some tables do not have a default action
(priority=0, match=1). In this case, a packet that does not match any
rule will be silently dropped.

2 - Even on explicit drops, we only know a packet was dropped. We lack
information about that packet.

In order to improve this, this series introduces a two-fold solution:

- First, create a debug-mode option. When enabled, it makes:
   - northd add a default (match = "1") "drop;" action to those tables
   that currently lack one.
   - ovn-controller add an explicit drop action on those tables are not
   associated with logical flows (i.e: physical-to-logical mappings).

- Secondly, allow sampling of all drops. By introducing a new OVN
  action: "sample" (equivalent to OVS's), OVN can make OVS sample the
  packets as they are dropped. In order to be able to correlate those
  samples back to what exact rule generated them, the user specifies the
  a 8-bit observation_domain_id. Based on that, the samples contain
  the following fields:
  - obs_domain_id:
 - 8 most significant bits = the provided observation_domain_id.
 - 24 least significant bits = the datapath's tunnely key if the
   drop comes from a lflow or zero otherwise.
  - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the
cookie) if the drop comes from an lflow or the table number
otherwise.

Based on the above changes in the flows, all of which are optional,
users can collect IPFIX samples of the packets that are dropped by OVN
which contain header information useful for debugging.

* Note on observation_domain_ids:
By allowing the user to specify only the 8 most significant bits of the
obs_domain_id and having OVN combine it with the datapath's tunnel key,
OVN could be extended to support more than one "sampling" application.
For instance, ACL sampling could be developed in the future and, by
specifying a different observation_domain_id, it could co-exist with the
drop sampling mode implemented in the current series while still
allowing to uniquely identify the flow that created the sample.

* Notes on testing and usage:
Any IPFIX collector that parses ObservationPointID and
ObservationDomainID fields can be used. For instance, nfdump supports
these fields in its unicorn branch [1] (future nfdump 1.7). Example of
how to capture and analyze drops:
# Enable debug sampling:
$ ovn-nbctl set NB_Global . options:debug_drop_mode=true
options:debug_drop_collector_set=1 options:debug_drop_domain_id=1
# Start nfcapd:
nfcapd -p 2055 -l nfcap &
# Configue sampling on the OVS you want to inspect:
$ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX
targets=\"172.18.0.1:2055\" --  create Flow_Sample_Collector_Set
bridge=@br id=1
# Inspect samples and figure out what LogicalFlow caused them:
$ nfdump -r nfcap -o fmt:'%line %odid %opid'
Date first seen Duration Proto  Src IP Addr:Port
Dst IP Addr:Port   PacketsBytes Flows obsDomainID   obsPointID
1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 ->
239.255.255.250:190012 6356 1 0x00109 0x00d8dd23c7
1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353  ->
224.0.0.251:5353   16589257 1 0x00109 0x00d8dd23c7
[...]
$ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7
_uuid   : d8dd23c7-1451-4ea3-add7-8d68b4be4691
actions :
"sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie);
/* drop */"
controller_meter: []
external_ids: {source="northd.c:12504",
stage-name=lr_in_ip_input}
logical_datapath: []
logical_dp_group: 0dc1b195-c647-4277-aea0-0bad5e896f51
match   : "ip4.mcast || ip6.mcast"
pipeline: ingress
priority: 82
table_id: 3
tags: {}
hash: 0


[1] https://github.com/phaag/nfdump/tree/unicorn

V2 -> V1
- Rebased and Addressed Mark's comments.
- Added NEWS section.


Adrian Moreno (3):
  actions: add sample action
  northd: add drop-debug-mode to add explicit drops
  northd: add drop sampling

 NEWS|   2 +
 controller/lflow.c  |   1 +
 controller/ovn-controller.c |  50 +
 controller/physical.c   |  80 ++-
 controller/physical.h   |   7 ++
 include/ovn/actions.h   |  16 +++
 lib/actions.c   | 120 ++
 northd/automake.mk  |   2 +
 northd/debug.c  | 107 +++
 northd/debug.h  |  41 
 northd/northd.c | 115 ++---
 ovn-nb.xml  |  32 ++
 tests/ovn-northd.at |  77 +-
 tests/ovn.at| 200 +++-
 tests/test-ovn.c

Re: [ovs-dev] [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size

2022-09-26 Thread Ruhl, Michael J
>-Original Message-
>From: Kees Cook 
>Sent: Friday, September 23, 2022 4:28 PM
>To: Vlastimil Babka 
>Cc: Kees Cook ; Brandeburg, Jesse
>; Nguyen, Anthony L
>; David S. Miller ;
>Eric Dumazet ; Jakub Kicinski ;
>Paolo Abeni ; intel-wired-...@lists.osuosl.org;
>net...@vger.kernel.org; Ruhl, Michael J ;
>Hyeonggon Yoo <42.hye...@gmail.com>; Christoph Lameter
>; Pekka Enberg ; David Rientjes
>; Joonsoo Kim ; Andrew
>Morton ; Greg Kroah-Hartman
>; Nick Desaulniers
>; Alex Elder ; Josef Bacik
>; David Sterba ; Sumit Semwal
>; Christian König ;
>Daniel Micay ; Yonghong Song ;
>Marco Elver ; Miguel Ojeda ; linux-
>ker...@vger.kernel.org; linux...@kvack.org; linux-bt...@vger.kernel.org;
>linux-me...@vger.kernel.org; dri-de...@lists.freedesktop.org; linaro-mm-
>s...@lists.linaro.org; linux-fsde...@vger.kernel.org; d...@openvswitch.org;
>x...@kernel.org; l...@lists.linux.dev; linux-harden...@vger.kernel.org
>Subject: [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size
>
>In preparation for removing the "silently change allocation size"
>users of ksize(), explicitly round up all q_vector allocations so that
>allocations can be correctly compared to ksize().
>
>Additionally fix potential use-after-free in the case of new allocation
>failure: only free memory if the replacement allocation succeeds.
>
>Cc: Jesse Brandeburg 
>Cc: Tony Nguyen 
>Cc: "David S. Miller" 
>Cc: Eric Dumazet 
>Cc: Jakub Kicinski 
>Cc: Paolo Abeni 
>Cc: intel-wired-...@lists.osuosl.org
>Cc: net...@vger.kernel.org
>Signed-off-by: Kees Cook 
>---
> drivers/net/ethernet/intel/igb/igb_main.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
>b/drivers/net/ethernet/intel/igb/igb_main.c
>index 2796e81d2726..eb51e531c096 100644
>--- a/drivers/net/ethernet/intel/igb/igb_main.c
>+++ b/drivers/net/ethernet/intel/igb/igb_main.c
>@@ -1195,15 +1195,16 @@ static int igb_alloc_q_vector(struct igb_adapter
>*adapter,
>   return -ENOMEM;
>
>   ring_count = txr_count + rxr_count;
>-  size = struct_size(q_vector, ring, ring_count);
>+  size = kmalloc_size_roundup(struct_size(q_vector, ring, ring_count));

This looks good to me...

>   /* allocate q_vector and rings */
>   q_vector = adapter->q_vector[v_idx];
>   if (!q_vector) {
>   q_vector = kzalloc(size, GFP_KERNEL);
>   } else if (size > ksize(q_vector)) {
>-  kfree_rcu(q_vector, rcu);
>   q_vector = kzalloc(size, GFP_KERNEL);
>+  if (q_vector)
>+  kfree_rcu(q_vector, rcu);

Even though this is in the ksize part, this seems like an unrelated change?
 Should this be in a different patch?

Also, the kfree_rcu will free q_vector after the RCU grace period?

Is that what you want to do?

How does rcu distinguish between the original q_vector, and the newly kzalloced 
one?

Thanks,

Mike



>   } else {
>   memset(q_vector, 0, size);
>   }
>--
>2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH dpdk-latest] netdev-dpdk: Report device bus specific information.

2022-09-26 Thread Pai G, Sunil
> -Original Message-
> From: dev  On Behalf Of David Marchand
> Sent: Monday, September 26, 2022 2:37 PM
> To: d...@openvswitch.org
> Cc: maxime.coque...@redhat.com
> Subject: [ovs-dev] [PATCH dpdk-latest] netdev-dpdk: Report device bus
> specific information.
> 
> 22.11 dropped direct access to bus specific structures.
> Instead, a new API reports bus specific information.
> 
> Report bus name and device bus specific information.
> 
> The difference looks like:
> -driver_name=mlx5_pci, if_descr="DPDK 21.11.0 mlx5_pci"
> +driver_name=mlx5_pci, if_descr="DPDK 22.11.0-rc0 mlx5_pci"
> 
> -pci-device_id="0x1019"
> -pci-vendor_id="0x15b3"
> +bus_info="bus_name=pci, vendor_id=15b3, device_id=1019"
> 
> Signed-off-by: David Marchand 
> ---
>  lib/netdev-dpdk.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 

Thanks for the patch David.

Everything LGTM,
Acked-by: Sunil Pai G 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 13/16] mempool: Use kmalloc_size_roundup() to match ksize() usage

2022-09-26 Thread Vlastimil Babka

On 9/23/22 22:28, Kees Cook wrote:

Round up allocations with kmalloc_size_roundup() so that mempool's use
of ksize() is always accurate and no special handling of the memory is
needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 
---
  mm/mempool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index 96488b13a1ef..0f3107b28e6b 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab);
   */
  void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
  {
-   size_t size = (size_t)pool_data;
+   size_t size = kmalloc_size_roundup((size_t)pool_data);


Hm it is kinda wasteful to call into kmalloc_size_roundup for every 
allocation that has the same input. We could do it just once in 
mempool_init_node() for adjusting pool->pool_data ?


But looking more closely, I wonder why poison_element() and 
kasan_unpoison_element() in mm/mempool.c even have to use 
ksize()/__ksize() and not just operate on the requested size (again, 
pool->pool_data). If no kmalloc mempool's users use ksize() to write 
beyond requested size, then we don't have to unpoison/poison that area 
either?



return kmalloc(size, gfp_mask);
  }
  EXPORT_SYMBOL(mempool_kmalloc);


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v3] odp-execute: Add ISA implementation of set_masked IPv6 action

2022-09-26 Thread Emma Finn
This commit adds support for the AVX512 implementation of the
ipv6_set_addrs action as well as an AVX512 implementation of
updating the L4 checksums.

Signed-off-by: Emma Finn 

---
v3:
  - Added a runtime check for AVX512 vbmi.
v2:
  - Added check for availbility of s6_addr32 field of struct in6_addr.
  - Fixed network headers for freebsd builds.
---
---
 lib/odp-execute-avx512.c  | 176 ++
 lib/odp-execute-private.c |  17 
 lib/odp-execute-private.h |   1 +
 3 files changed, 194 insertions(+)

diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 6c7713251..f97b3c2f7 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -20,6 +20,9 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "csum.h"
 #include "dp-packet.h"
@@ -483,6 +486,172 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch 
*batch,
 }
 }
 
+#if HAVE_AVX512VBMI
+static inline uint16_t ALWAYS_INLINE
+__attribute__((__target__("avx512vbmi")))
+avx512_ipv6_get_delta(__m512i ip6_header)
+{
+__m256i v_zeros = _mm256_setzero_si256();
+__m512i v_shuf_src_dst = _mm512_setr_epi64(0x01, 0x02, 0x03, 0x04,
+   0xFF, 0xFF, 0xFF, 0xFF);
+
+__m512i v_header = _mm512_permutexvar_epi64(v_shuf_src_dst, ip6_header);
+__m256i v_ip6_src_dst =  _mm512_extracti64x4_epi64(v_header, 0);
+/* These two shuffle masks, v_swap16a and v_swap16b, are to shuffle the
+ * src and dst fields and add padding after each 16-bit value for the
+ * following carry over addition. */
+__m256i v_swap16a = _mm256_setr_epi16(0x0100, 0x, 0x0302, 0x,
+  0x0504, 0x, 0x0706, 0x,
+  0x0100, 0x, 0x0302, 0x,
+  0x0504, 0x, 0x0706, 0x);
+__m256i v_swap16b = _mm256_setr_epi16(0x0908, 0x, 0x0B0A, 0x,
+  0x0D0C, 0x, 0x0F0E, 0x,
+  0x0908, 0x, 0x0B0A, 0x,
+  0x0D0C, 0x, 0x0F0E, 0x);
+__m256i v_shuf_old1 = _mm256_shuffle_epi8(v_ip6_src_dst, v_swap16a);
+__m256i v_shuf_old2 = _mm256_shuffle_epi8(v_ip6_src_dst, v_swap16b);
+
+/* Add each part of the old and new headers together. */
+__m256i v_delta = _mm256_add_epi32(v_shuf_old1, v_shuf_old2);
+
+/* Perform horizontal add to go from 8x32-bits to 2x32-bits. */
+v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
+v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
+
+/* Shuffle 32-bit value from 3rd lane into first lane for final
+ * horizontal add. */
+__m256i v_swap32a = _mm256_setr_epi32(0x0, 0x4, 0xF, 0xF,
+  0xF, 0xF, 0xF, 0xF);
+v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
+
+v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
+v_delta = _mm256_hadd_epi16(v_delta, v_zeros);
+
+/* Extract delta value. */
+return _mm256_extract_epi16(v_delta, 0);
+}
+
+static inline uint16_t ALWAYS_INLINE
+__attribute__((__target__("avx512vbmi")))
+avx512_ipv6_addr_csum_delta(__m512i old_header, __m512i new_header)
+{
+uint16_t delta;
+uint16_t old_delta = avx512_ipv6_get_delta(old_header);
+uint16_t new_delta = avx512_ipv6_get_delta(new_header);
+old_delta = ~old_delta;
+uint32_t csum_delta = old_delta + new_delta;
+delta = csum_finish(csum_delta);
+
+return ~delta;
+}
+
+/* This function performs the same operation on each packet in the batch as
+ * the scalar odp_set_ipv6() function. */
+static void
+__attribute__((__target__("avx512vbmi")))
+action_avx512_ipv6_set_addrs(struct dp_packet_batch *batch,
+ const struct nlattr *a)
+{
+const struct ovs_key_ipv6 *key, *mask;
+struct dp_packet *packet;
+a = nl_attr_get(a);
+key = nl_attr_get(a);
+mask = odp_get_key_mask(a, struct ovs_key_ipv6);
+
+/* Read the content of the key and mask in the respective registers. We
+ * only load the size of the actual structure, which is only 40 bytes. */
+__m512i v_key = _mm512_maskz_loadu_epi64(0x1F, (void *) key);
+__m512i v_mask = _mm512_maskz_loadu_epi64(0x1F, (void *) mask);
+
+/* This shuffle mask v_shuffle, is to shuffle key and mask to match the
+ * ip6_hdr structure layout. */
+static const uint8_t ip_shuffle_mask[64] = {
+0x20, 0x21, 0x22, 0x23, 0xFF, 0xFF, 0x24, 0x26,
+0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
+0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
+0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF, 0xFF,
+0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF
+   

Re: [ovs-dev] [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

2022-09-26 Thread Vlastimil Babka

On 9/23/22 22:28, Kees Cook wrote:

In the effort to help the compiler reason about buffer sizes, the
__alloc_size attribute was added to allocators. This improves the scope
of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
as the vast majority of callers are not expecting to use more memory
than what they asked for.

There is, however, one common exception to this: anticipatory resizing
of kmalloc allocations. These cases all use ksize() to determine the
actual bucket size of a given allocation (e.g. 128 when 126 was asked
for). This comes in two styles in the kernel:

1) An allocation has been determined to be too small, and needs to be
resized. Instead of the caller choosing its own next best size, it
wants to minimize the number of calls to krealloc(), so it just uses
ksize() plus some additional bytes, forcing the realloc into the next
bucket size, from which it can learn how large it is now. For example:

data = krealloc(data, ksize(data) + 1, gfp);
data_len = ksize(data);

2) The minimum size of an allocation is calculated, but since it may
grow in the future, just use all the space available in the chosen
bucket immediately, to avoid needing to reallocate later. A good
example of this is skbuff's allocators:

data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
...
/* kmalloc(size) might give us more room than requested.
 * Put skb_shared_info exactly at the end of allocated zone,
 * to allow max possible filling before reallocation.
 */
osize = ksize(data);
 size = SKB_WITH_OVERHEAD(osize);

In both cases, the "how much was actually allocated?" question is answered
_after_ the allocation, where the compiler hinting is not in an easy place
to make the association any more. This mismatch between the compiler's
view of the buffer length and the code's intention about how much it is
going to actually use has already caused problems[1]. It is possible to
fix this by reordering the use of the "actual size" information.

We can serve the needs of users of ksize() and still have accurate buffer
length hinting for the compiler by doing the bucket size calculation
_before_ the allocation. Code can instead ask "how large an allocation
would I get for a given size?".

Introduce kmalloc_size_roundup(), to serve this function so we can start
replacing the "anticipatory resizing" uses of ksize().

[1] https://github.com/ClangBuiltLinux/linux/issues/1599
 https://github.com/KSPP/linux/issues/183

Cc: Vlastimil Babka 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 


OK, added patch 1+2 to slab.git for-next branch.
Had to adjust this one a bit, see below.


---
  include/linux/slab.h | 31 +++
  mm/slab.c|  9 ++---
  mm/slab_common.c | 20 
  3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41bd036e7551..727640173568 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t 
new_size, gfp_t flags) __r
  void kfree(const void *objp);
  void kfree_sensitive(const void *objp);
  size_t __ksize(const void *objp);
+
+/**
+ * ksize - Report actual allocation size of associated object
+ *
+ * @objp: Pointer returned from a prior kmalloc()-family allocation.
+ *
+ * This should not be used for writing beyond the originally requested
+ * allocation size. Either use krealloc() or round up the allocation size
+ * with kmalloc_size_roundup() prior to allocation. If this is used to
+ * access beyond the originally requested allocation size, UBSAN_BOUNDS
+ * and/or FORTIFY_SOURCE may trip, since they only know about the
+ * originally allocated size via the __alloc_size attribute.
+ */
  size_t ksize(const void *objp);
+
  #ifdef CONFIG_PRINTK
  bool kmem_valid_obj(void *object);
  void kmem_dump_obj(void *object);
@@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
  extern void kvfree_sensitive(const void *addr, size_t len);
  
  unsigned int kmem_cache_size(struct kmem_cache *s);

+
+/**
+ * kmalloc_size_roundup - Report allocation bucket size for the given size
+ *
+ * @size: Number of bytes to round up from.
+ *
+ * This returns the number of bytes that would be available in a kmalloc()
+ * allocation of @size bytes. For example, a 126 byte request would be
+ * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
+ * for the general-purpose kmalloc()-based allocations, and is not for the
+ * pre-sized kmem_cache_alloc()-based allocations.)
+ *
+ * Use this to kmalloc() the full bucket size ahead of time instead of using
+ * ksize() to query the size after an allocation.
+ */
+size_t kmal

Re: [ovs-dev] [Linaro-mm-sig] [PATCH v2 08/16] dma-buf: Proactively round up to kmalloc bucket size

2022-09-26 Thread Christian König

Am 23.09.22 um 22:28 schrieb Kees Cook:

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Kees Cook 


Reviewed-by: Christian König 


---
  drivers/dma-buf/dma-resv.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 205acb2c744d..5b0a4b8830ff 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list,
  static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences)
  {
struct dma_resv_list *list;
+   size_t size;
  
-	list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL);

+   /* Round up to the next kmalloc bucket size. */
+   size = kmalloc_size_roundup(struct_size(list, table, max_fences));
+
+   list = kmalloc(size, GFP_KERNEL);
if (!list)
return NULL;
  
-	list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) /

+   /* Given the resulting bucket size, recalculated max_fences. */
+   list->max_fences = (size - offsetof(typeof(*list), table)) /
sizeof(*list->table);
  
  	return list;


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-22.03] github: ovn-kubernetes: Update go, kube and libovsdb versions.

2022-09-26 Thread Dumitru Ceara
With this they'll match the current upstream ovn-kubernetes code.

(cherry picked from commit 4a5e20ee58cd012eb52a94ee1c97fe225e4e91f2)
Signed-off-by: Dumitru Ceara 
---
Backporting this patch to the LTS too.  Otherwise we can't run ovnkube
tests there.
---
 .ci/ovn-kubernetes/Dockerfile| 2 +-
 .github/workflows/ovn-kubernetes.yml | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/.ci/ovn-kubernetes/Dockerfile b/.ci/ovn-kubernetes/Dockerfile
index 19662889e..e74b620be 100644
--- a/.ci/ovn-kubernetes/Dockerfile
+++ b/.ci/ovn-kubernetes/Dockerfile
@@ -37,7 +37,7 @@ RUN rm rpm/rpmbuild/RPMS/x86_64/*debug*
 RUN rm rpm/rpmbuild/RPMS/x86_64/*docker*
 
 # Build ovn-kubernetes
-FROM golang:1.17 as ovnkubebuilder
+FROM golang:1.18 as ovnkubebuilder
 ARG OVNKUBE_COMMIT
 ARG LIBOVSDB_COMMIT
 
diff --git a/.github/workflows/ovn-kubernetes.yml 
b/.github/workflows/ovn-kubernetes.yml
index c05bbd3f9..7de392e50 100644
--- a/.github/workflows/ovn-kubernetes.yml
+++ b/.github/workflows/ovn-kubernetes.yml
@@ -9,10 +9,10 @@ on:
   - cron: '0 0 * * 0'
 
 env:
-  GO_VERSION: "1.17.6"
-  K8S_VERSION: v1.23.3
+  GO_VERSION: "1.18.4"
+  K8S_VERSION: v1.24.0
   OVNKUBE_COMMIT: "master"
-  LIBOVSDB_COMMIT: "8081fe24e48f"
+  LIBOVSDB_COMMIT: "98c0bad3cff1"
   KIND_CLUSTER_NAME: ovn
   KIND_INSTALL_INGRESS: true
   KIND_ALLOW_SYSTEM_WRITES: true
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-22.03] northd: Do not report WARN for empty requested-chassis

2022-09-26 Thread Dumitru Ceara
On 9/20/22 22:08, Mark Michelson wrote:
> Thanks Ales.
> 
> Acked-by: Mark Michelson 
> 

Applied to branch-22.03, thanks!

> On 9/16/22 05:21, Ales Musil wrote:
>> Reported-at: https://bugzilla.redhat.com/2126400
>> Signed-off-by: Ales Musil 
>> ---
>> This patch should be applied only to 22.03 and below,
>> because it was fixed as side effect of RFE on 22.06, 22.09
>> and main [0].
>>
>> [0]
>> https://github.com/ovn-org/ovn/commit/eaf9832be248daca6d96202d504cc789d6dfec6d
>>
>> ---
>>   northd/northd.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 294a59bd7..aeb2da436 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -3318,7 +3318,7 @@ ovn_port_update_sbrec(struct northd_input
>> *input_data,
>>   bool reset_requested_chassis = false;
>>   requested_chassis = smap_get(&op->nbsp->options,
>>    "requested-chassis");
>> -    if (requested_chassis) {
>> +    if (requested_chassis && requested_chassis[0]) {
>>   const struct sbrec_chassis *chassis; /* May be NULL. */
>>   chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
>>    requested_chassis);
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-22.03] Allow for setting the Next server IP in the DHCP header

2022-09-26 Thread Dumitru Ceara
On 9/16/22 18:06, Numan Siddique wrote:
> On Mon, Aug 22, 2022 at 3:46 PM Mark Michelson  wrote:
>>
>> From: Lucas Alvares Gomes 
>>
>> In order to PXE boot a baremetal server using the OVN DHCP server we
>> need to allow users to set the "next-server" (siaddr) [0] field in the
>> DHCP header.
>>
>> While investigating this issue by comparing the DHCPOFFER and DHCPACK
>> packets sent my dnsmasq and OVN we saw that the "next-server" field
>> was the problem for OVN, without it PXE booting was timing out while
>> fetching the iPXE image from the TFTP server (see the bugzilla ticket
>> below for reference).
>>
>> To confirm this problem we created a bogus patch hardcoding the TFTP
>> address in the siaddr of the DHCP header (see the discussion in the
>> maillist below) and with this in place we were able to deploy a
>> baremetal node using the OVN DHCP end-to-end.
>>
>> This patch is a proper implementation that creates a new DHCP
>> configuration option called "next_server" to allow users to set this
>> field dynamically. This patch uses the DHCP code 253 which is a unsed
>> code for DHCP specification as this is not a normal DHCP option but a
>> special use case in OVN.
>>
>> [0]
>> https://github.com/openvswitch/ovs/blob/9dd3031d2e0e9597449e95428320ccaaff7d8b3d/lib/dhcp.h#L42
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2083629
>> Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051821.html
>> Signed-off-by: Lucas Alvares Gomes 
>> Signed-off-by: Numan Siddique 
>> (cherry picked from commit 0057cde2a64749bd2dbbaff525f7a1edccbd9c8a)
>> Signed-off-by: Mark Michelson 
> 
> Is this patch still not merged to OVN 22.03 ?  Since it already has
> all the Acks and merged in the main branch. I think you can go ahead
> and apply it.
> 
> +1 from me.

Applied to branch-22.03.

> 
> Thanks
> Numan
> 

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2] controller: Avoid building dhcp/nd_ra/controller_event opt maps every time.

2022-09-26 Thread Dumitru Ceara
The nd_ra_opts and controller_event_ops are actually static they never
change at runtime.  DHCP records can instead be computed when populating
the lflow "input context" to be used during incremental processing.  This
is likely more efficient than building the DHCP opts maps for every logical
flow recomputed due to changes in resources they reference (e.g., port
bindings, multicast groups).

Signed-off-by: Dumitru Ceara 
---
V2:
- Patch 1/3 was merged.
- Patch 3/3 was dropped.
- Addressed Han's comments on Patch 2/3:
  - added new I-P node for dhcp
  - re-added missing dhcp/nd/controller_event refs when processing AS
updates.
  - re-worded the commit message.
---
 controller/lflow.c  | 211 +++-
 controller/lflow.h  |   8 +-
 controller/ovn-controller.c |  86 +--
 lib/ovn-l7.h|  16 ++-
 4 files changed, 111 insertions(+), 210 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index eef44389f..32664b080 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -90,9 +90,6 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *,
   struct lflow_ctx_out *);
 static void
 consider_logical_flow(const struct sbrec_logical_flow *lflow,
-  struct hmap *dhcp_opts, struct hmap *dhcpv6_opts,
-  struct hmap *nd_ra_opts,
-  struct controller_event_options *controller_event_opts,
   bool is_recompute,
   struct lflow_ctx_in *l_ctx_in,
   struct lflow_ctx_out *l_ctx_out);
@@ -371,40 +368,9 @@ add_logical_flows(struct lflow_ctx_in *l_ctx_in,
   struct lflow_ctx_out *l_ctx_out)
 {
 const struct sbrec_logical_flow *lflow;
-
-struct hmap dhcp_opts = HMAP_INITIALIZER(&dhcp_opts);
-struct hmap dhcpv6_opts = HMAP_INITIALIZER(&dhcpv6_opts);
-const struct sbrec_dhcp_options *dhcp_opt_row;
-SBREC_DHCP_OPTIONS_TABLE_FOR_EACH (dhcp_opt_row,
-   l_ctx_in->dhcp_options_table) {
-dhcp_opt_add(&dhcp_opts, dhcp_opt_row->name, dhcp_opt_row->code,
- dhcp_opt_row->type);
-}
-
-
-const struct sbrec_dhcpv6_options *dhcpv6_opt_row;
-SBREC_DHCPV6_OPTIONS_TABLE_FOR_EACH (dhcpv6_opt_row,
- l_ctx_in->dhcpv6_options_table) {
-   dhcp_opt_add(&dhcpv6_opts, dhcpv6_opt_row->name, dhcpv6_opt_row->code,
-dhcpv6_opt_row->type);
-}
-
-struct hmap nd_ra_opts = HMAP_INITIALIZER(&nd_ra_opts);
-nd_ra_opts_init(&nd_ra_opts);
-
-struct controller_event_options controller_event_opts;
-controller_event_opts_init(&controller_event_opts);
-
 SBREC_LOGICAL_FLOW_TABLE_FOR_EACH (lflow, l_ctx_in->logical_flow_table) {
-consider_logical_flow(lflow, &dhcp_opts, &dhcpv6_opts,
-  &nd_ra_opts, &controller_event_opts, true,
-  l_ctx_in, l_ctx_out);
+consider_logical_flow(lflow, true, l_ctx_in, l_ctx_out);
 }
-
-dhcp_opts_destroy(&dhcp_opts);
-dhcp_opts_destroy(&dhcpv6_opts);
-nd_ra_opts_destroy(&nd_ra_opts);
-controller_event_opts_destroy(&controller_event_opts);
 }
 
 bool
@@ -414,29 +380,6 @@ lflow_handle_changed_flows(struct lflow_ctx_in *l_ctx_in,
 bool ret = true;
 const struct sbrec_logical_flow *lflow;
 
-struct hmap dhcp_opts = HMAP_INITIALIZER(&dhcp_opts);
-struct hmap dhcpv6_opts = HMAP_INITIALIZER(&dhcpv6_opts);
-const struct sbrec_dhcp_options *dhcp_opt_row;
-SBREC_DHCP_OPTIONS_TABLE_FOR_EACH (dhcp_opt_row,
-   l_ctx_in->dhcp_options_table) {
-dhcp_opt_add(&dhcp_opts, dhcp_opt_row->name, dhcp_opt_row->code,
- dhcp_opt_row->type);
-}
-
-
-const struct sbrec_dhcpv6_options *dhcpv6_opt_row;
-SBREC_DHCPV6_OPTIONS_TABLE_FOR_EACH (dhcpv6_opt_row,
- l_ctx_in->dhcpv6_options_table) {
-   dhcp_opt_add(&dhcpv6_opts, dhcpv6_opt_row->name, dhcpv6_opt_row->code,
-dhcpv6_opt_row->type);
-}
-
-struct hmap nd_ra_opts = HMAP_INITIALIZER(&nd_ra_opts);
-nd_ra_opts_init(&nd_ra_opts);
-
-struct controller_event_options controller_event_opts;
-controller_event_opts_init(&controller_event_opts);
-
 /* Flood remove the flows for all the tracked lflows.  Its possible that
  * lflow_add_flows_for_datapath() may have been called before calling
  * this function. */
@@ -486,9 +429,7 @@ lflow_handle_changed_flows(struct lflow_ctx_in *l_ctx_in,
 lflows_processed_remove(l_ctx_out->lflows_processed, lfp_node);
 }
 
-consider_logical_flow(lflow, &dhcp_opts, &dhcpv6_opts,
-  &nd_ra_opts, &controller_event_opts, false,
-  l_ctx_in, l_ctx_out);
+consider_logical_

[ovs-dev] [PATCH dpdk-latest] netdev-dpdk: Report device bus specific information.

2022-09-26 Thread David Marchand
22.11 dropped direct access to bus specific structures.
Instead, a new API reports bus specific information.

Report bus name and device bus specific information.

The difference looks like:
-driver_name=mlx5_pci, if_descr="DPDK 21.11.0 mlx5_pci"
+driver_name=mlx5_pci, if_descr="DPDK 22.11.0-rc0 mlx5_pci"

-pci-device_id="0x1019"
-pci-vendor_id="0x15b3"
+bus_info="bus_name=pci, vendor_id=15b3, device_id=1019"

Signed-off-by: David Marchand 
---
 lib/netdev-dpdk.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0dd655507b..fb4b3282dc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -26,9 +26,10 @@
 #include 
 #include 
 
-#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3639,6 +3640,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, 
struct smap *args)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 struct rte_eth_dev_info dev_info;
+const char *bus_info;
 uint32_t link_speed;
 uint32_t dev_flags;
 
@@ -3651,19 +3653,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, 
struct smap *args)
 rte_eth_dev_info_get(dev->port_id, &dev_info);
 link_speed = dev->link.link_speed;
 dev_flags = *dev_info.dev_flags;
+bus_info = rte_dev_bus_info(dev_info.device);
 ovs_mutex_unlock(&dev->mutex);
-const struct rte_bus *bus;
-const struct rte_pci_device *pci_dev;
-uint16_t vendor_id = RTE_PCI_ANY_ID;
-uint16_t device_id = RTE_PCI_ANY_ID;
-bus = rte_bus_find_by_device(dev_info.device);
-if (bus && !strcmp(bus->name, "pci")) {
-pci_dev = RTE_DEV_TO_PCI(dev_info.device);
-if (pci_dev) {
-vendor_id = pci_dev->id.vendor_id;
-device_id = pci_dev->id.device_id;
-}
-}
 ovs_mutex_unlock(&dpdk_mutex);
 
 smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id);
@@ -3687,8 +3678,10 @@ netdev_dpdk_get_status(const struct netdev *netdev, 
struct smap *args)
 smap_add_format(args, "if_type", "%"PRIu32, IF_TYPE_ETHERNETCSMACD);
 smap_add_format(args, "if_descr", "%s %s", rte_version(),
dev_info.driver_name);
-smap_add_format(args, "pci-vendor_id", "0x%x", vendor_id);
-smap_add_format(args, "pci-device_id", "0x%x", device_id);
+smap_add_format(args, "bus_info", "bus_name=%s%s%s",
+rte_bus_name(rte_dev_bus(dev_info.device)),
+bus_info != NULL ? ", " : "",
+bus_info != NULL ? bus_info : "");
 
 /* Not all link speeds are defined in the OpenFlow specs e.g. 25 Gbps.
  * In that case the speed will not be reported as part of the usual
-- 
2.37.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 3/3] inc-proc-eng: Rename the 'clear_tracked_data' callback to 'init_run'.

2022-09-26 Thread Dumitru Ceara
On 9/23/22 17:47, Han Zhou wrote:
> On Fri, Sep 23, 2022 at 1:42 AM Dumitru Ceara  wrote:
>>
>> On 9/23/22 01:07, Han Zhou wrote:
>>> On Wed, Sep 14, 2022 at 6:10 AM Dumitru Ceara  wrote:

 This is actually more in line with how the callback is used.  It's
> called
 every the incremental engine preparese for the next engine run.

 Signed-off-by: Dumitru Ceara 
>>>
>>> Thanks Dumtru. The name looks good to me, but why does the new function
>>> require both the node and node->data as parameters?
>>>
>>
>> Thanks for the review!  Considering that this is an initialization
>> function that runs before every engine run for every node, users might
>> find it interesting to do other things too.  For example, getting some
>> OVSDB indexes from input nodes.
>>
>> This is an example from the not yet submitted components template code:
>>
>> static void
>> en_template_vars_init_run(struct engine_node *node, void *data)
>> {
>> struct ed_type_template_vars *tv_data = data;
>>
>> tv_data->sbrec_template_var_table =
>> EN_OVSDB_GET(engine_get_input("SB_template_var", node));
>> tv_data->ovsrec_ovs_table =
>> EN_OVSDB_GET(engine_get_input("OVS_open_vswitch", node));
>> tv_data->sbrec_port_binding_by_name =
>> engine_ovsdb_node_get_index(engine_get_input("SB_port_binding",
> node),
>> "name");
>> tv_data->sbrec_chassis_by_name =
>> engine_ovsdb_node_get_index(engine_get_input("SB_chassis", node),
>> "name");
>>
>> sset_clear(&tv_data->new);
>> sset_clear(&tv_data->deleted);
>> sset_clear(&tv_data->updated);
>> tv_data->change_tracked = false;
>> }
>>
> 
> I don't quite understand this example. It seems ed_type_template_vars
> stores some of its input to its own data, but could you explain why? These

It was just to avoid looking up the indexes every time.  But you're
right there's no huge benefit to doing that.

> members should belong to the input nodes, and they can always be accessed
> in the run() or handler functions.  If it requires more code to explain,
> I'd suggest including this as part of your *template* series so that it is
> easier to review together.
> 

Sounds good.

 ---
  controller/ovn-controller.c |   41
>>> -
  lib/inc-proc-eng.c  |   19 +++
  lib/inc-proc-eng.h  |   19 ++-
  3 files changed, 41 insertions(+), 38 deletions(-)

 diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
 index 18a01bbab..f26d6a9e0 100644
 --- a/controller/ovn-controller.c
 +++ b/controller/ovn-controller.c
 @@ -1058,7 +1058,7 @@ en_ofctrl_is_connected_run(struct engine_node
>>> *node, void *data)
   *processing to OVS_interface changes but simply mark the node
>>> status as
   *UPDATED (and so the run() and the change handler is the same).
   * 2. The iface_table_external_ids_old is computed/updated in the
> member
 - *clear_tracked_data(), because that is when the last round of
>>> processing
 + *init_run(), because that is when the last round of processing
   *has completed but the new IDL data is yet to refresh, so we
>>> replace the
   *old data with the current data. */
  struct ed_type_ovs_interface_shadow {
 @@ -1096,7 +1096,8 @@ en_ovs_interface_shadow_cleanup(void *data_)
  }

  static void
 -en_ovs_interface_shadow_clear_tracked_data(void *data_)
 +en_ovs_interface_shadow_init_run(struct engine_node *node OVS_UNUSED,
 + void *data_)
  {
  struct ed_type_ovs_interface_shadow *data = data_;

>>>
>  iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
 @@ -1163,7 +1164,7 @@ en_activated_ports_cleanup(void *data_)
  }

  static void
 -en_activated_ports_clear_tracked_data(void *data)
 +en_activated_ports_init_run(struct engine_node *node OVS_UNUSED, void
>>> *data)
  {
  en_activated_ports_cleanup(data);
  }
 @@ -1311,7 +1312,7 @@ struct ed_type_runtime_data {
   */

  static void
 -en_runtime_data_clear_tracked_data(void *data_)
 +en_runtime_data_init_run(struct engine_node *node OVS_UNUSED, void
>>> *data_)
  {
  struct ed_type_runtime_data *data = data_;

 @@ -1669,14 +1670,14 @@ en_addr_sets_init(struct engine_node *node
>>> OVS_UNUSED,
  }

  static void
 -en_addr_sets_clear_tracked_data(void *data)
 +en_addr_sets_init_run(struct engine_node *node OVS_UNUSED, void *data)
  {
  struct ed_type_addr_sets *as = data;
  sset_clear(&as->new);
  sset_clear(&as->deleted);
 -struct shash_node *node;
 -SHASH_FOR_EACH_SAFE (node, &as->updated) {
 -struct addr_set_diff *asd = node->data;
 +struct shash_node