Re: [ovs-dev] [PATCH v3] ovs-vsctl: Exit with error if postdb checks report errors.

2023-07-10 Thread Eelco Chaudron



On 5 Jul 2023, at 1:29, Flavio Leitner wrote:

> Today the exit code refers to the execution of the change
> in the database. However, when not using parameter --no-wait
> (default), the ovs-vsctl also checks if OVSDB transactions
> are successfully recorded and reload by ovs-vswitchd. In this
> case, an error message is printed if there is a problem during
> the reload, like for example the one below:
>
>  # ovs-vsctl add-port br0 gre0 -- \
> set interface gre0 type=ip6gre options:key=100 \
> options:remote_ip=2001::2
> ovs-vsctl: Error detected while setting up 'gre0': could not \
> add network device gre0 to ofproto (Address family not supported\
> by protocol).  See ovs-vswitchd log for details.
> ovs-vsctl: The default log directory is "/var/log/openvswitch".
>  # echo $?
> 0
>
> This patch changes to exit with specific error code 160
> (ERROR_BAD_ARGUMENTS) on Windows and 65 (EX_DATAERR) on
> Linux or BSD if errors were reported during the reload.
>
> This change may break existing scripts because ovs-vsctl will
> start to fail when before it was succeeding. However, if an
> error is printed, then it is likely that the change was not
> functional anyway.
>
> Reported-at: https://bugzilla.redhat.com/1731553
> Signed-off-by: Flavio Leitner 

Thanks Flavio for the patch. I verified the Windows part trough AppVeyor.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] ci: Remove '--recheck' in CI.

2023-07-10 Thread Ales Musil
On Mon, Jul 10, 2023 at 5:26 PM Dumitru Ceara  wrote:

> If we want to catch new failures faster we have a better chance if CI
> doesn't auto-retry (once).
>
> There are some tests that are still "unstable" and fail every now and
> then.  In order to reduce the number of false negatives keep the
> --recheck for them.  To achieve that we use a new macro, TAG_UNSTABLE,
> to tag all these tests.  The list of "unstable" tests is compiled based
> on the following discussion:
> https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405465.html
>
> In order to avoid new GitHub actions jobs, we re-purpose the last job of
> each target type to also run the unstable tests.  These jobs were
> already running less tests than others so the additional run time should
> not be an issue.
>
> Signed-off-by: Dumitru Ceara 
> ---
> V2:
> - Addressed Ales' comments:
>   - always run stable and unstable tests before declaring pass/fail
> Changes in v1 (since RFC):
> - kept recheck for unstable tests
> - introduced TAG_UNSTABLE
> - changed test.yml to run unstable tests in the last batch of every
>   test target type.
> ---
>  .ci/ci.sh  |  2 +-
>  .ci/linux-build.sh | 76 ++
>  .github/workflows/test.yml | 15 
>  tests/ovn-ic.at|  1 +
>  tests/ovn-ipsec.at |  1 +
>  tests/ovn-macros.at|  5 +++
>  tests/ovn-northd.at|  1 +
>  tests/ovn-performance.at   |  1 +
>  tests/ovn.at   | 13 +++
>  9 files changed, 92 insertions(+), 23 deletions(-)
>
> diff --git a/.ci/ci.sh b/.ci/ci.sh
> index 10f11939c5..a500aba764 100755
> --- a/.ci/ci.sh
> +++ b/.ci/ci.sh
> @@ -101,7 +101,7 @@ function run_tests() {
>  && \
>  ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \
>  TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \
> -./.ci/linux-build.sh
> +UNSTABLE=$UNSTABLE ./.ci/linux-build.sh
>  "
>  }
>
> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
> index 5a79a52daf..4c5361f3ca 100755
> --- a/.ci/linux-build.sh
> +++ b/.ci/linux-build.sh
> @@ -9,6 +9,7 @@ COMMON_CFLAGS=""
>  OVN_CFLAGS=""
>  OPTS="$OPTS --enable-Werror"
>  JOBS=${JOBS:-"-j4"}
> +RECHECK=${RECHECK:-"no"}
>
>  function install_dpdk()
>  {
> @@ -99,6 +100,17 @@ function configure_clang()
>  COMMON_CFLAGS="${COMMON_CFLAGS}
> -Wno-error=unused-command-line-argument"
>  }
>
> +function run_tests()
> +{
> +if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
> +TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK
> +then
> +# testsuite.log is necessary for debugging.
> +cat */_build/sub/tests/testsuite.log
> +return 1
> +fi
> +}
> +
>  function execute_tests()
>  {
>  # 'distcheck' will reconfigure with required options.
> @@ -106,27 +118,61 @@ function execute_tests()
>  configure_ovn
>
>  export DISTCHECK_CONFIGURE_FLAGS="$OPTS"
> -if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
> -TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes
> -then
> -# testsuite.log is necessary for debugging.
> -cat */_build/sub/tests/testsuite.log
> +
> +local stable_rc=0
> +local unstable_rc=0
> +
> +if ! SKIP_UNSTABLE=yes run_tests; then
> +stable_rc=1
> +fi
> +
> +if [ "$UNSTABLE" ]; then
> +if ! SKIP_UNSTABLE=no TEST_RANGE="-k unstable" RECHECK=yes \
> +run_tests; then
> +unstable_rc=1
> +fi
> +fi
> +
> +if [[ $stable_rc -ne 0 ]] || [[ $unstable_rc -ne 0 ]]; then
>  exit 1
>  fi
>  }
>
> +function run_system_tests()
> +{
> +local type=$1
> +local log_file=$2
> +
> +if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \
> +RECHECK=$RECHECK; then
> +# $log_file is necessary for debugging.
> +cat tests/$log_file
> +return 1
> +fi
> +}
> +
>  function execute_system_tests()
>  {
> -  type=$1
> -  log_file=$2
> -
> -  configure_ovn $OPTS
> -  make $JOBS || { cat config.log; exit 1; }
> -  if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"
> RECHECK=yes; then
> -  # $log_file is necessary for debugging.
> -  cat tests/$log_file
> -  exit 1
> -  fi
> +configure_ovn $OPTS
> +make $JOBS || { cat config.log; exit 1; }
> +
> +local stable_rc=0
> +local unstable_rc=0
> +
> +if ! SKIP_UNSTABLE=yes run_system_tests $@; then
> +stable_rc=1
> +fi
> +
> +if [ "$UNSTABLE" ]; then
> +if ! SKIP_UNSTABLE=no TEST_RANGE="-k unstable" RECHECK=yes \
> +run_system_tests $@; then
> +unstable_rc=1
> +fi
> +fi
> +
> +if [[ $stable_rc -ne 0 ]] || [[ $unstable_rc -ne 0 ]]; then
> +exit 1
> +fi
>  }
>
>  configure_$CC
> diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
> index fe2a14c401..7d40251003 100644
> --

[ovs-dev] [PATCH v2] ovs-tcpdump: Bugfix-of-ovs-tcpdump

2023-07-10 Thread Simon Jones
From: simon 

Fix bug of ovs-tcpdump, which will cause megaflow action wrong.

As use ovs-tcpdump will add mipxxx NIC, and this NIC has IPv6 address by
default.
For vxlan topology, mipxxx will be treated as tunnel port, and will got
error actions.

For detail discuss, refer:
https://github.com/batmancn/github-work/blob/main/ovs-bugfix-ovs-tcpdump.md

Signed-off-by: simon 
---
utilities/ovs-tcpdump.in | 4 
1 file changed, 4 insertions(+)
diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
index 420c11eb8..4cbd9a5d3 100755
--- a/utilities/ovs-tcpdump.in
+++ b/utilities/ovs-tcpdump.in
@@ -96,6 +96,10 @@ def _install_dst_if_linux(tap_name, mtu_value=None):
 *(['ip', 'link', 'set', 'dev', str(tap_name), 'up']))
 pipe.wait()

+pipe = _doexec(
+*(['ip', '-6', 'addr', 'flush', 'dev', str(tap_name)]))
+pipe.wait()
+

 def _remove_dst_if_linux(tap_name):
 _doexec(
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-10 Thread Vladislav Odintsov
Hi Dumitru,

thanks for digging into this! I highly appreciate your help!

Please, see my answers inline.

> On 10 Jul 2023, at 15:28, Dumitru Ceara  wrote:
> 
> On 7/10/23 12:57, Dumitru Ceara wrote:
>> On 7/6/23 13:00, Vladislav Odintsov wrote:
>>> 
>>> 
 On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
 
 Hi Dumitru,
 
 thanks for the quick response!
 
> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
> 
> On 7/5/23 17:14, Vladislav Odintsov wrote:
>> Hi,
>> 
> 
> Hi Vladislav,
> 
>> we’ve noticed there is a huge ovn-controller memory consumption 
>> introduced with [0] comparing to version without its changes in 
>> ovn-controller.c part (just OVS submodule bump without ovn-controller 
>> changes doesn’t trigger such behaviour).
>> 
> 
> Thanks for reporting this!
> 
>> On an empty host connected to working cluster ovn-controller normally 
>> consumes ~175 MiB RSS, and the same host updated to a version with 
>> commit [0] consumes 3.3 GiB RSS. The start time and CPU consumption 
>> during process start of an affected version is higher that for the 
>> normal version.
>> 
>> Before upgrade (normal functioning):
>> 
>> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
>> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
>> ofctrl_sb_flow_ref_usage-KB:18
>> 174.2 MiB + 327.5 KiB = 174.5 MiBovn-controller
>> 
>> After upgrade to an affected version I’ve checked memory report while 
>> ovn-controller was starting and at some time there was a bigger amount 
>> of OVN_Southbound cells comparing to "after start" time:
>> 
>> during start:
>> 
>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
>> idl-outstanding-txns-Open_vSwitch:1
>> 3.3 GiB + 327.0 KiB =   3.3 GiB  ovn-controller
>> 
>> after start:
>> 
>> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
>> ovn-controller)| grep ovn
>> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
>> idl-outstanding-txns-Open_vSwitch:1
>> 3.3 GiB + 327.0 KiB =   3.3 GiB  ovn-controller
>> 
>> 
>> cells during start:
>> 11388742
>> 
>> cells after start:
>> 343896
>> 
> 
> Are you running with ovn-monitor-all=true on this host?
 
 No, it has default false.
 
> 
> I guess it's unlikely but I'll try just in case: would it be possible to
> share the SB database?
 
 Unfortunately, no. But I can say it’s about 450 M in size after 
 compaction. And there are about 1M mac_bindings if it’s important :).
>>> 
>> 
>> I tried in a sandbox, before and after the commit in question, with a
>> script that adds 1M mac bindings on top of the sample topology built by
>> tutorial/ovn-setup.sh.
>> 
>> I see ovn-controller memory usage going to >3GB in before the commit you
>> blamed and to >1.9GB after the same commit.  So it looks different than
>> what you reported but still worrying that we use so much memory for mac
>> bindings.
>> 
>> I'm assuming however that quite a few of the 1M mac bindings in your
>> setup are stale so would it be possible to enable mac_binding aging?  It
>> needs to be enabled per router with something like:
>> 
>> $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60
>> 
>> The threshold is in seconds and is a hard limit (for now) after which a
>> mac binding entry is removed.  There's work in progress to only clear
>> arp cache entries that are really stale [1].

Yeah, I know about mac_binding ageing, but we currently use our own mechanism 
to delete excess records and waiting for patchset you’ve mentioned.

>> 
>>> But if you are interested in any specific information, let me know it, I 
>>> can check.
>>> 
>> 
>> How many "local" datapaths do we have on the hosts that exhibit high
>> memory usage?
>> 
>> The quickest way I can think of to get this info is to run this on the
>> hypervisor:
>> $ ovn-appctl ct-zone-list | grep snat -c
>> 
>> Additional question: how many mac_bindings are "local", i.e., associated
>> to local datapaths?

For both questions: 0.

>> 
 
> 
>> I guess it could be connected with this problem. Can anyone look at this 
>> and comment please?
>> 
> 
> Does the memory usage persist after SB is upgraded too?  I see the
> number of SB idl-cells goes down eventually which means that eventually
> the periodic malloc_trim() call would free up memory.  We trim on idle
> since
> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
> 
 
 In this upgrade DB schemas not upgraded, so they’re u

Re: [ovs-dev] [PATCH v14] netdev-dpdk: Add custom rx-steering configuration.

2023-07-10 Thread Ilya Maximets
On 7/10/23 15:51, Robin Jarry wrote:
> Ilya Maximets, Jul 10, 2023 at 15:48:
>> In order to speed up the process a bit, proposing a following incremental
>> change:
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 27b2fa5e0..aa87ee546 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -2074,23 +2074,23 @@ dpdk_set_rx_steer_config(struct netdev *netdev, 
>> struct netdev_dpdk *dev,
>>  const char *arg = smap_get_def(args, "rx-steering", "rss");
>>  uint64_t flags = 0;
>>  
>> -if (nullable_string_is_equal(arg, "rss+lacp")) {
>> +if (!strcmp(arg, "rss+lacp")) {
>>  flags = DPDK_RX_STEER_LACP;
>> -} else if (!nullable_string_is_equal(arg, "rss")) {
>> -VLOG_WARN_BUF(errp, "%s options:rx-steering "
>> +} else if (strcmp(arg, "rss")) {
>> +VLOG_WARN_BUF(errp, "%s: options:rx-steering "
>>"unsupported parameter value '%s'",
>>netdev_get_name(netdev), arg);
>>  }
>>  
>>  if (flags && dev->type != DPDK_DEV_ETH) {
>> -VLOG_WARN_BUF(errp, "%s options:rx-steering "
>> +VLOG_WARN_BUF(errp, "%s: options:rx-steering "
>>"is only supported on ethernet ports",
>>netdev_get_name(netdev));
>>  flags = 0;
>>  }
>>  
>>  if (flags && netdev_is_flow_api_enabled()) {
>> -VLOG_WARN_BUF(errp, "%s options:rx-steering "
>> +VLOG_WARN_BUF(errp, "%s: options:rx-steering "
>>"is incompatible with hw-offload",
>>netdev_get_name(netdev));
>>  flags = 0;
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 2f756b1b7..01408e90a 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3539,9 +3539,9 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>
>>  
>>  
>> -  If the user has already configured multiple
>> -  options:n_rxq on the port, an additional one will be
>> -  allocated for the specified protocols. Even if the hardware cannot
>> +  If the user has already configured multiple > +  column="options" key="n_rxq" /> on the port, an additional one 
>> will
>> +  be allocated for the specified protocols. Even if the hardware 
>> cannot
>>satisfy the requested number of requested Rx queues, the last Rx
>>queue will be used. If only one Rx queue is available or if the
>>hardware does not support the rte_flow matchers/actions required 
>> to
>> @@ -3551,10 +3551,14 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>  
>>This feature is mutually exclusive with
>>> />
>> -  as it may conflict with the offloaded RTE flows. If both are 
>> enabled,
>> +  as it may conflict with the offloaded flows. If both are enabled,
>>rx-steering will fall back to default 
>> rss
>>mode.
>>  
>> +
>> +  This option is only applicable to interfaces with type
>> +  dpdk.
>> +
>>
>>  
>>> ---
>>
>> If that looks good to you, I could fold it in while applying the patch.
>> What do you think?
>>
>> Best regards, Ilya Maximets.
> 
> Yes, that'll be faster. Thanks!
> 

Thanks, Robin, Kevin and Aaron!  Applied.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Jakub Kicinski
On Mon, 10 Jul 2023 20:39:11 +0200 Ilya Maximets wrote:
> > As far as I understand what you're proposing, yes :)  
> 
> OK.  Just to spell it all out:
> 
> Userspace will install a flow with an OVS_FLOW_CMD_NEW:
> 
>   match:ip,tcp,... actions:something,something,drop(0)
>   match:ip,udp,... actions:something,something,drop(42)
> 
> drop() here represents the OVS_ACTION_ATTR_DROP.
> 
> Then, in net/openvswitch/actions.c:do_execute_actions(), while executing
> these actions:
> 
>   case OVS_ACTION_ATTR_DROP:
>   kfree_skb_reason(skb, nla_get_u32(a) ? OVS_DROP_ACTION_WITH_ERROR
>: OVS_DROP_ACTION);
> 
> Users can enable traces and catch the OVS_DROP_ACTION_WITH_ERROR.
> Later they can dump flows with OVS_FLOW_CMD_GET and see that the
> error value was 42.

nod

> >> Eric, Adrian, Aaron, do you see any problems with such implementation?
> >>
> >> P.S. There is a plan to add more drop reasons for other places in 
> >> openvswitch
> >>  module to catch more regular types of drops like memory issues or 
> >> upcall
> >>  failures.  So, the drop reason subsystem can be extended later.
> >>  The explicit drop action is a bit of an odd case here.  
> > 
> > If you have more than ~4 OvS specific reasons, I wonder if it still
> > makes sense to create a reason group/subsystem for OvS (a'la WiFi)?  
> 
> I believe, we will easily have more than 4 OVS-specific reasons.  A few
> from the top of my head:
>   - upcall failure (failed to send a packet to userspace)
>   - reached the limit for deferred actions
>   - reached the recursion limit
> 
> So, creation of a reason group/subsystem seems reasonable to me.

SG.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Ilya Maximets
On 7/10/23 19:01, Jakub Kicinski wrote:
> On Mon, 10 Jul 2023 18:51:19 +0200 Ilya Maximets wrote:
>> Makes sense.  I wasn't sure that's a good solution from a kernel perspective
>> either.  It's better than defining all these reasons, IMO, but it's not good
>> enough to be considered acceptable, I agree.
>>
>> How about we define just 2 reasons, e.g. OVS_DROP_REASON_EXPLICIT_ACTION and
>> OVS_DROP_REASON_EXPLICIT_ACTION_WITH_ERROR (exact names can be different) ?
>> One for an explicit drop action with a zero argument and one for an explicit
>> drop with non-zero argument.
>>
>> The exact reason for the error can be retrieved by other means, i.e by 
>> looking
>> at the datapath flow dump or OVS logs/traces.
>>
>> This way we can give a user who is catching packet drop traces a signal that
>> there was something wrong with an OVS flow and they can look up exact details
>> from the userspace / flow dump.
>>
>> The point being, most of the flows will have a zero as a drop action 
>> argument,
>> i.e. a regular explicit packet drop.  It will be hard to figure out which 
>> flow
>> exactly we're hitting without looking at the full flow dump.  And if the 
>> value
>> is non-zero, then it should be immediately obvious which flow is to blame 
>> from
>> the dump, as we should not have a lot of such flows.
>>
>> This would still allow us to avoid a maintenance burden of defining every 
>> case,
>> which are fairly meaningless for the kernel itself, while having 99% of the
>> information we may need.
>>
>> Jakub, do you think this will be acceptable?
> 
> As far as I understand what you're proposing, yes :)

OK.  Just to spell it all out:

Userspace will install a flow with an OVS_FLOW_CMD_NEW:

  match:ip,tcp,... actions:something,something,drop(0)
  match:ip,udp,... actions:something,something,drop(42)

drop() here represents the OVS_ACTION_ATTR_DROP.

Then, in net/openvswitch/actions.c:do_execute_actions(), while executing
these actions:

  case OVS_ACTION_ATTR_DROP:
  kfree_skb_reason(skb, nla_get_u32(a) ? OVS_DROP_ACTION_WITH_ERROR
   : OVS_DROP_ACTION);

Users can enable traces and catch the OVS_DROP_ACTION_WITH_ERROR.
Later they can dump flows with OVS_FLOW_CMD_GET and see that the
error value was 42.

> 
>> Eric, Adrian, Aaron, do you see any problems with such implementation?
>>
>> P.S. There is a plan to add more drop reasons for other places in openvswitch
>>  module to catch more regular types of drops like memory issues or upcall
>>  failures.  So, the drop reason subsystem can be extended later.
>>  The explicit drop action is a bit of an odd case here.
> 
> If you have more than ~4 OvS specific reasons, I wonder if it still
> makes sense to create a reason group/subsystem for OvS (a'la WiFi)?

I believe, we will easily have more than 4 OVS-specific reasons.  A few
from the top of my head:
  - upcall failure (failed to send a packet to userspace)
  - reached the limit for deferred actions
  - reached the recursion limit

So, creation of a reason group/subsystem seems reasonable to me.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Eric Garver
On Mon, Jul 10, 2023 at 06:51:19PM +0200, Ilya Maximets wrote:
> On 7/8/23 00:06, Jakub Kicinski wrote:
> > On Fri, 7 Jul 2023 18:04:36 +0200 Ilya Maximets wrote:
>  That already exists, right? Johannes added it in the last release for 
>  WiFi.  
> >>>
> >>> I'm not sure.  The SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE behaves 
> >>> similarly
> >>> to that on a surface.  However, looking closer, any value that can be 
> >>> passed
> >>> into ieee80211_rx_handlers_result() and ends up in the kfree_skb_reason() 
> >>> is
> >>> kind of defined in net/mac80211/drop.h, unless I'm missing something (very
> >>> possible, because I don't really know wifi code).
> >>>
> >>> The difference, I guess, is that for openvswitch values will be provided 
> >>> by
> >>> the userpsace application via netlink interface.  It'll be just a number 
> >>> not
> >>> defined anywhere in the kernel.  Only the subsystem itself will be defined
> >>> in order to occupy the range.  Garbage in, same garbage out, from the 
> >>> kernel's
> >>> perspective.  
> >>
> >> To be clear, I think, not defining them in this particular case is better.
> >> Definition of every reason that userspace can come up with will add extra
> >> uAPI maintenance cost/issues with no practical benefits.  Values are not
> >> going to be used for anything outside reporting a drop reason and subsystem
> >> offset is not part of uAPI anyway.
> > 
> > Ah, I see. No, please don't stuff user space defined values into 
> > the drop reason. The reasons are for debugging the kernel stack 
> > itself. IOW it'd be abuse not reuse.
> 
> Makes sense.  I wasn't sure that's a good solution from a kernel perspective
> either.  It's better than defining all these reasons, IMO, but it's not good
> enough to be considered acceptable, I agree.
> 
> How about we define just 2 reasons, e.g. OVS_DROP_REASON_EXPLICIT_ACTION and
> OVS_DROP_REASON_EXPLICIT_ACTION_WITH_ERROR (exact names can be different) ?
> One for an explicit drop action with a zero argument and one for an explicit
> drop with non-zero argument.
> 
> The exact reason for the error can be retrieved by other means, i.e by looking
> at the datapath flow dump or OVS logs/traces.
> 
> This way we can give a user who is catching packet drop traces a signal that
> there was something wrong with an OVS flow and they can look up exact details
> from the userspace / flow dump.
> 
> The point being, most of the flows will have a zero as a drop action argument,
> i.e. a regular explicit packet drop.  It will be hard to figure out which flow
> exactly we're hitting without looking at the full flow dump.  And if the value
> is non-zero, then it should be immediately obvious which flow is to blame from
> the dump, as we should not have a lot of such flows.
> 
> This would still allow us to avoid a maintenance burden of defining every 
> case,
> which are fairly meaningless for the kernel itself, while having 99% of the
> information we may need.
> 
> Jakub, do you think this will be acceptable?
> 
> Eric, Adrian, Aaron, do you see any problems with such implementation?

I see no problems. I'm content with this approach.

> P.S. There is a plan to add more drop reasons for other places in openvswitch
>  module to catch more regular types of drops like memory issues or upcall
>  failures.  So, the drop reason subsystem can be extended later.
>  The explicit drop action is a bit of an odd case here.
> 
> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Jakub Kicinski
On Mon, 10 Jul 2023 18:51:19 +0200 Ilya Maximets wrote:
> Makes sense.  I wasn't sure that's a good solution from a kernel perspective
> either.  It's better than defining all these reasons, IMO, but it's not good
> enough to be considered acceptable, I agree.
> 
> How about we define just 2 reasons, e.g. OVS_DROP_REASON_EXPLICIT_ACTION and
> OVS_DROP_REASON_EXPLICIT_ACTION_WITH_ERROR (exact names can be different) ?
> One for an explicit drop action with a zero argument and one for an explicit
> drop with non-zero argument.
> 
> The exact reason for the error can be retrieved by other means, i.e by looking
> at the datapath flow dump or OVS logs/traces.
> 
> This way we can give a user who is catching packet drop traces a signal that
> there was something wrong with an OVS flow and they can look up exact details
> from the userspace / flow dump.
> 
> The point being, most of the flows will have a zero as a drop action argument,
> i.e. a regular explicit packet drop.  It will be hard to figure out which flow
> exactly we're hitting without looking at the full flow dump.  And if the value
> is non-zero, then it should be immediately obvious which flow is to blame from
> the dump, as we should not have a lot of such flows.
> 
> This would still allow us to avoid a maintenance burden of defining every 
> case,
> which are fairly meaningless for the kernel itself, while having 99% of the
> information we may need.
> 
> Jakub, do you think this will be acceptable?

As far as I understand what you're proposing, yes :)

> Eric, Adrian, Aaron, do you see any problems with such implementation?
> 
> P.S. There is a plan to add more drop reasons for other places in openvswitch
>  module to catch more regular types of drops like memory issues or upcall
>  failures.  So, the drop reason subsystem can be extended later.
>  The explicit drop action is a bit of an odd case here.

If you have more than ~4 OvS specific reasons, I wonder if it still
makes sense to create a reason group/subsystem for OvS (a'la WiFi)?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Ilya Maximets
On 7/8/23 00:06, Jakub Kicinski wrote:
> On Fri, 7 Jul 2023 18:04:36 +0200 Ilya Maximets wrote:
 That already exists, right? Johannes added it in the last release for 
 WiFi.  
>>>
>>> I'm not sure.  The SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE behaves 
>>> similarly
>>> to that on a surface.  However, looking closer, any value that can be passed
>>> into ieee80211_rx_handlers_result() and ends up in the kfree_skb_reason() is
>>> kind of defined in net/mac80211/drop.h, unless I'm missing something (very
>>> possible, because I don't really know wifi code).
>>>
>>> The difference, I guess, is that for openvswitch values will be provided by
>>> the userpsace application via netlink interface.  It'll be just a number not
>>> defined anywhere in the kernel.  Only the subsystem itself will be defined
>>> in order to occupy the range.  Garbage in, same garbage out, from the 
>>> kernel's
>>> perspective.  
>>
>> To be clear, I think, not defining them in this particular case is better.
>> Definition of every reason that userspace can come up with will add extra
>> uAPI maintenance cost/issues with no practical benefits.  Values are not
>> going to be used for anything outside reporting a drop reason and subsystem
>> offset is not part of uAPI anyway.
> 
> Ah, I see. No, please don't stuff user space defined values into 
> the drop reason. The reasons are for debugging the kernel stack 
> itself. IOW it'd be abuse not reuse.

Makes sense.  I wasn't sure that's a good solution from a kernel perspective
either.  It's better than defining all these reasons, IMO, but it's not good
enough to be considered acceptable, I agree.

How about we define just 2 reasons, e.g. OVS_DROP_REASON_EXPLICIT_ACTION and
OVS_DROP_REASON_EXPLICIT_ACTION_WITH_ERROR (exact names can be different) ?
One for an explicit drop action with a zero argument and one for an explicit
drop with non-zero argument.

The exact reason for the error can be retrieved by other means, i.e by looking
at the datapath flow dump or OVS logs/traces.

This way we can give a user who is catching packet drop traces a signal that
there was something wrong with an OVS flow and they can look up exact details
from the userspace / flow dump.

The point being, most of the flows will have a zero as a drop action argument,
i.e. a regular explicit packet drop.  It will be hard to figure out which flow
exactly we're hitting without looking at the full flow dump.  And if the value
is non-zero, then it should be immediately obvious which flow is to blame from
the dump, as we should not have a lot of such flows.

This would still allow us to avoid a maintenance burden of defining every case,
which are fairly meaningless for the kernel itself, while having 99% of the
information we may need.

Jakub, do you think this will be acceptable?

Eric, Adrian, Aaron, do you see any problems with such implementation?

P.S. There is a plan to add more drop reasons for other places in openvswitch
 module to catch more regular types of drops like memory issues or upcall
 failures.  So, the drop reason subsystem can be extended later.
 The explicit drop action is a bit of an odd case here.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 4/4] selftests: openvswitch: add ct-nat test case with ipv4

2023-07-10 Thread Aaron Conole
Adrian Moreno  writes:

> On 6/28/23 18:27, Aaron Conole wrote:
>> Building on the previous work, add a very simplistic NAT case
>> using ipv4.  This just tests dnat transformation
>> Signed-off-by: Aaron Conole 
>
> Hi Aaron,
>
> I know that the goal is not to support the full syntax, and that nat
> is a specially convoluted action, so I'm just commenting on the
> low-hanging fruits (see below).

Thanks, Adrian!

>> ---
>>   .../selftests/net/openvswitch/openvswitch.sh  | 64 +++
>>   .../selftests/net/openvswitch/ovs-dpctl.py| 60 +
>>   2 files changed, 124 insertions(+)
>> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> index 40a66c72af0f0..dced4f612a78c 100755
>> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> @@ -14,6 +14,7 @@ tests="
>>  arp_pingeth-arp: Basic arp ping between 
>> two NS
>>  ct_connect_v4   ip4-ct-xon: Basic ipv4 tcp 
>> connection using ct
>>  connect_v4  ip4-xon: Basic ipv4 ping 
>> between two NS
>> +nat_connect_v4  ip4-nat-xon: Basic ipv4 tcp 
>> connection via NAT
>>  netlink_checks  ovsnl: validate netlink attrs 
>> and settings
>>  upcall_interfaces   ovs: test the upcall interfaces"
>>   @@ -300,6 +301,69 @@ test_connect_v4 () {
>>  return 0
>>   }
>>   +# nat_connect_v4 test
>> +#  - client has 1500 byte MTU
>> +#  - server has 1500 byte MTU
>> +#  - use ICMP to ping in each direction
>> +#  - only allow CT state stuff to pass through new in c -> s
>> +test_nat_connect_v4 () {
>> +which nc >/dev/null 2>/dev/null || return $ksft_skip
>> +
>> +sbx_add "test_nat_connect_v4" || return $?
>> +
>> +ovs_add_dp "test_nat_connect_v4" nat4 || return 1
>> +info "create namespaces"
>> +for ns in client server; do
>> +ovs_add_netns_and_veths "test_nat_connect_v4" "nat4" "$ns" \
>> +"${ns:0:1}0" "${ns:0:1}1" || return 1
>> +done
>> +
>> +ip netns exec client ip addr add 172.31.110.10/24 dev c1
>> +ip netns exec client ip link set c1 up
>> +ip netns exec server ip addr add 172.31.110.20/24 dev s1
>> +ip netns exec server ip link set s1 up
>> +
>> +ip netns exec client ip route add default via 172.31.110.20
>> +
>> +ovs_add_flow "test_nat_connect_v4" nat4 \
>> +'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
>> +ovs_add_flow "test_nat_connect_v4" nat4 \
>> +'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
>> +ovs_add_flow "test_nat_connect_v4" nat4 \
>> +
>> "ct_state(-trk),in_port(1),eth(),eth_type(0x0800),ipv4(dst=192.168.0.20)" \
>> +"ct(commit,nat(dst=172.31.110.20)),recirc(0x1)"
>> +ovs_add_flow "test_nat_connect_v4" nat4 \
>> +"ct_state(-trk),in_port(2),eth(),eth_type(0x0800),ipv4()" \
>> +"ct(commit,nat),recirc(0x2)"
>> +
>> +ovs_add_flow "test_nat_connect_v4" nat4 \
>> +
>> "recirc_id(0x1),ct_state(+trk-inv),in_port(1),eth(),eth_type(0x0800),ipv4()" 
>> "2"
>> +ovs_add_flow "test_nat_connect_v4" nat4 \
>> +
>> "recirc_id(0x2),ct_state(+trk-inv),in_port(2),eth(),eth_type(0x0800),ipv4()" 
>> "1"
>> +
>> +# do a ping
>> +ovs_sbx "test_nat_connect_v4" ip netns exec client ping 192.168.0.20 -c 
>> 3 || return 1
>> +
>> +# create an echo server in 'server'
>> +echo "server" | \
>> +ovs_netns_spawn_daemon "test_nat_connect_v4" "server" \
>> +nc -lvnp 4443
>> +ovs_sbx "test_nat_connect_v4" ip netns exec client nc -i 1 -zv 
>> 192.168.0.20 4443 || return 1
>> +
>> +# Now test in the other direction (should fail)
>> +echo "client" | \
>> +ovs_netns_spawn_daemon "test_nat_connect_v4" "client" \
>> +nc -lvnp 4443
>> +ovs_sbx "test_nat_connect_v4" ip netns exec client nc -i 1 -zv 
>> 172.31.110.10 4443
>> +if [ $? == 0 ]; then
>> +   info "connect to client was successful"
>> +   return 1
>> +fi
>> +
>> +info "done..."
>> +return 0
>> +}
>> +
>>   # netlink_validation
>>   # - Create a dp
>>   # - check no warning with "old version" simulation
>> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
>> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> index 704cb4adf79a9..12ba5265b88fb 100644
>> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> @@ -511,6 +511,66 @@ class ovsactions(nla):
>>   else:
>>   ctact["attrs"].append([scan[1], None])
>>   actstr = actstr[strspn(actstr, ", ") :]
>> +# it se

Re: [ovs-dev] [PATCH net-next 3/4] selftests: openvswitch: add basic ct test case parsing

2023-07-10 Thread Aaron Conole
Adrian Moreno  writes:

> On 6/28/23 18:27, Aaron Conole wrote:
>> Forwarding via ct() action is an important use case for openvswitch, but
>> generally would require using a full ovs-vswitchd to get working. Add a
>> ct action parser for basic ct test case.
>> Signed-off-by: Aaron Conole 
>> ---
>> NOTE: 3 lines flag the line-length checkpatch warning, but there didnt
>>seem to be a really good way of breaking the lines smaller.
>>   .../selftests/net/openvswitch/openvswitch.sh  | 68
>> +++
>>   .../selftests/net/openvswitch/ovs-dpctl.py| 39 +++
>>   2 files changed, 107 insertions(+)
>> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> index 5d60a9466dab3..40a66c72af0f0 100755
>> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> @@ -12,6 +12,7 @@ TRACING=0
>> tests="
>>  arp_pingeth-arp: Basic arp ping between 
>> two NS
>> +ct_connect_v4   ip4-ct-xon: Basic ipv4 tcp 
>> connection using ct
>>  connect_v4  ip4-xon: Basic ipv4 ping 
>> between two NS
>>  netlink_checks  ovsnl: validate netlink attrs 
>> and settings
>>  upcall_interfaces   ovs: test the upcall interfaces"
>> @@ -193,6 +194,73 @@ test_arp_ping () {
>>  return 0
>>   }
>>   +# ct_connect_v4 test
>> +#  - client has 1500 byte MTU
>> +#  - server has 1500 byte MTU
>> +#  - use ICMP to ping in each direction
>> +#  - only allow CT state stuff to pass through new in c -> s
>> +test_ct_connect_v4 () {
>> +
>> +which nc >/dev/null 2>/dev/null || return $ksft_skip
>> +
>> +sbx_add "test_ct_connect_v4" || return $?
>> +
>> +ovs_add_dp "test_ct_connect_v4" ct4 || return 1
>> +info "create namespaces"
>> +for ns in client server; do
>> +ovs_add_netns_and_veths "test_ct_connect_v4" "ct4" "$ns" \
>> +"${ns:0:1}0" "${ns:0:1}1" || return 1
>> +done
>> +
>> +ip netns exec client ip addr add 172.31.110.10/24 dev c1
>> +ip netns exec client ip link set c1 up
>> +ip netns exec server ip addr add 172.31.110.20/24 dev s1
>> +ip netns exec server ip link set s1 up
>> +
>> +# Add forwarding for ARP and ip packets - completely wildcarded
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> +'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> +'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> + 'ct_state(-trk),eth(),eth_type(0x0800),ipv4()' \
>> + 'ct(commit),recirc(0x1)' || return 1
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> + 
>> 'recirc_id(0x1),ct_state(+trk+new),in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10)'
>>  \
>> + '2' || return 1
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> + 
>> 'recirc_id(0x1),ct_state(+trk+est),in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10)'
>>  \
>> + '2' || return 1
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> + 
>> 'recirc_id(0x1),ct_state(+trk+est),in_port(2),eth(),eth_type(0x0800),ipv4(dst=172.31.110.10)'
>>  \
>> + '1' || return 1
>> +ovs_add_flow "test_ct_connect_v4" ct4 \
>> + 
>> 'recirc_id(0x1),ct_state(+trk+inv),eth(),eth_type(0x0800),ipv4()' 'drop' || \
>> + return 1
>> +
>> +# do a ping
>> +ovs_sbx "test_ct_connect_v4" ip netns exec client ping 172.31.110.20 -c 
>> 3 || return 1
>> +
>> +# create an echo server in 'server'
>> +echo "server" | \
>> +ovs_netns_spawn_daemon "test_ct_connect_v4" "server" \
>> +nc -lvnp 4443
>> +ovs_sbx "test_ct_connect_v4" ip netns exec client nc -i 1 -zv 
>> 172.31.110.20 4443 || return 1
>> +
>> +# Now test in the other direction (should fail)
>> +echo "client" | \
>> +ovs_netns_spawn_daemon "test_ct_connect_v4" "client" \
>> +nc -lvnp 4443
>> +ovs_sbx "test_ct_connect_v4" ip netns exec client nc -i 1 -zv 
>> 172.31.110.10 4443
>> +if [ $? == 0 ]; then
>> +   info "ct connect to client was successful"
>> +   return 1
>> +fi
>> +
>> +info "done..."
>> +return 0
>> +}
>> +
>>   # connect_v4 test
>>   #  - client has 1500 byte MTU
>>   #  - server has 1500 byte MTU
>> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
>> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> index 799bfb3064b90..704cb4adf79a9 100644
>> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> @@ -62,6 +62,15 @@ def macstr(mac):
>>   retu

Re: [ovs-dev] [PATCH v1 7/7] netdev-linux: support 64-bit rates in tc policing

2023-07-10 Thread Adrian Moreno



On 7/7/23 17:04, Ilya Maximets wrote:

On 7/6/23 18:22, Eelco Chaudron wrote:



On 2 Jun 2023, at 16:13, Adrian Moreno wrote:


Use TCA_POLICE_RATE64 if the rate cannot be expressed using 32bits.

This breaks the 32Gbps barrier. The new barrier is ~4Tbps caused by
netdev's API expressing kbps rates using 32-bit integers.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137643
Signed-off-by: Adrian Moreno 


This patch looks good to me, however, I have one additional general issue with 
this series.

For example, if the running kernel does not support TCA_POLICE_RATE64, 
TCA_HTB_RATE64, or TCA_HTB_CEIL64 but the kernel it was compiled on does. This 
will break the implementation. Maybe we need some probe in 
netdev_tc_init_flow_api()?


netdev_tc_init_flow_api() will not be called if hw-offload is not enabled.
But anyway, we only add 64bit arguments if we have to.  i.e. if the value
fits into 32bit range, 64bit argument will not be used.  That should provide
required backward compatibility.  If we need 64bit, but kernel doesn't
support, then we need to fail anyway.  Current code will configure incorrect
values, there is no need preserving that.



However looking at the current code we already use TCA_POLICE_PKTRATE64 which 
is later than TCA_POLICE_RATE64, so we should be good?! If this is true we also 
do not need all the ‘#ifdef HAVE_TCA_POLICE_PKTRATE64’ and related code in the 
this patch.


kpkts policing in a relatively new feature in OVS, so we do not expect it
to be used on old kernels.  So, use of TCA_POLICE_PKTRATE64 should be fine.



The HTB ones were added in 2013, so maybe we are good? If we are we can remove 
all the ‘#ifdef HAVE_TCA_HTB_RATE64’ and related code in patch 4.

Simon/Ilya any opinions on this?

Cheers,

Eelco



---
  acinclude.m4| 10 ++
  lib/netdev-linux.c  | 19 ---
  lib/netdev-linux.h  |  2 +-
  lib/tc.c|  2 ++
  tests/atlocal.in|  1 +
  tests/system-traffic.at | 21 +
  6 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 1e5a9872d..3ac7072b5 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -221,6 +221,16 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [
 [Define to 1 if TCA_HTB_RATE64 is available.])],
  [AC_SUBST(HAVE_TCA_HTB_RATE64,no)]
  )
+
+  AC_COMPILE_IFELSE([
+AC_LANG_PROGRAM([#include ], [
+int x = TCA_POLICE_PKTRATE64;
+])],
+[AC_SUBST(HAVE_TCA_POLICE_PKTRATE64,yes)
+ AC_DEFINE([HAVE_TCA_POLICE_PKTRATE64], [1],
+   [Define to 1 if TCA_POLICE_PKTRATE64 is available.])],
+[AC_SUBST(HAVE_TCA_POLICE_PKTRATE64,no)]
+)
  ])

  dnl OVS_CHECK_LINUX_SCTP_CT
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 3526fbfc3..6a9f36f1e 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -494,7 +494,7 @@ static struct tcmsg *netdev_linux_tc_make_request(const 
struct netdev *,
unsigned int flags,
struct ofpbuf *);

-static int tc_add_policer(struct netdev *, uint32_t kbits_rate,
+static int tc_add_policer(struct netdev *, uint64_t kbits_rate,
uint32_t kbits_burst, uint32_t kpkts_rate,
uint32_t kpkts_burst);

@@ -2660,6 +2660,7 @@ nl_msg_put_act_police(struct ofpbuf *request, uint32_t 
index,
uint64_t pkts_rate, uint64_t pkts_burst,
uint32_t notexceed_act, bool single_action)
  {
+uint64_t bytes_rate = kbits_rate / 8 * 1000;
  size_t offset, act_offset;
  struct tc_police police;
  uint32_t prio = 0;
@@ -2674,8 +2675,13 @@ nl_msg_put_act_police(struct ofpbuf *request, uint32_t 
index,
  nl_msg_act_police_start_nest(request, ++prio, &offset, &act_offset,
   single_action);
  if (police.rate.rate) {
-tc_put_rtab(request, TCA_POLICE_RATE, &police.rate, 0);
+tc_put_rtab(request, TCA_POLICE_RATE, &police.rate, bytes_rate);
  }
+#ifdef HAVE_TCA_POLICE_PKTRATE64
+if (bytes_rate > UINT32_MAX) {
+nl_msg_put_u64(request, TCA_POLICE_RATE64, bytes_rate);
+}
+#endif
  if (pkts_rate) {
  uint64_t pkt_burst_ticks;
  /* Here tc_bytes_to_ticks is used to convert packets rather than bytes
@@ -2689,7 +2695,7 @@ nl_msg_put_act_police(struct ofpbuf *request, uint32_t 
index,
  }

  static int
-tc_add_matchall_policer(struct netdev *netdev, uint32_t kbits_rate,
+tc_add_matchall_policer(struct netdev *netdev, uint64_t kbits_rate,
  uint32_t kbits_burst, uint32_t kpkts_rate,
  uint32_t kpkts_burst)
  {
@@ -5703,9 +5709,8 @@ tc_policer_init(struct tc_police *tc_police, uint64_t 
kbits_rate,
   * Returns 0 if successful, otherwise a positive errno value.
   */
  static int
-tc_add_policer(struct netdev *netdev, uint32_t kbits_rate,
- 

Re: [ovs-dev] [PATCH v3] python: Add async DNS support

2023-07-10 Thread Ilya Maximets
On 7/10/23 17:36, Terry Wilson wrote:
> On Mon, Jul 10, 2023 at 10:32 AM Terry Wilson  wrote:
>>
>> I accidentally forgot to click reply-to-all.
>>
>> On Fri, Jun 30, 2023 at 10:27 AM Ilya Maximets  wrote:
>>>
>>> On 6/30/23 16:54, Adrian Moreno wrote:


 On 6/30/23 14:35, Ilya Maximets wrote:
> On 6/30/23 14:23, Adrian Moreno wrote:
>>
>>
>> On 6/30/23 13:40, Ilya Maximets wrote:
>>> On 6/30/23 12:39, Adrian Moreno wrote:


 On 6/14/23 23:07, Terry Wilson wrote:
> This adds a Python version of the async DNS support added in:
>
> 771680d96 DNS: Add basic support for asynchronous DNS resolving
>
> The above version uses the unbound C library, and this
> implimentation uses the SWIG-wrapped Python version of that.
>
> In the event that the Python unbound library is not available,
> a warning will be logged and the resolve() method will just
> return None. For the case where inet_parse_active() is passed
> an IP address, it will not try to resolve it, so existing
> behavior should be preserved in the case that the unbound
> library is unavailable.
>
> Intentional differences from the C version are as follows:
>
>  OVS_HOSTS_FILE environment variable can bet set to override
>  the system 'hosts' file. This is primarily to allow testing to
>  be done without requiring network connectivity.
>
>  Since resolution can still be done via hosts file lookup, DNS
>  lookups are not disabled when resolv.conf cannot be loaded.
>
>  The Python socket_util module has fallen behind its C equivalent.
>  The bare minimum change was done to inet_parse_active() to 
> support
>  sync/async dns, as there is no equivalent to
>  parse_sockaddr_components(), inet_parse_passive(), etc. A TODO
>  was added to bring socket_util.py up to equivalency to the C
>  version.
>
> Signed-off-by: Terry Wilson 
> ---
> .github/workflows/build-and-test.yml|   4 +-
> Documentation/intro/install/general.rst |   4 +-
> Documentation/intro/install/rhel.rst|   2 +-
> Documentation/intro/install/windows.rst |   2 +-
> NEWS|   4 +-
> debian/control.in   |   1 +
> m4/openvswitch.m4   |   8 +-
> python/TODO.rst |   7 +
> python/automake.mk  |   2 +
> python/ovs/dns_resolve.py   | 272 
> +++
> python/ovs/socket_util.py   |  21 +-
> python/ovs/stream.py|   2 +-
> python/ovs/tests/test_dns_resolve.py| 280 
> 
> python/setup.py |   6 +-
> rhel/openvswitch-fedora.spec.in |   2 +-
> tests/vlog.at   |   2 +
> 16 files changed, 601 insertions(+), 18 deletions(-)
> create mode 100644 python/ovs/dns_resolve.py
> create mode 100644 python/ovs/tests/test_dns_resolve.py
>
> diff --git a/.github/workflows/build-and-test.yml 
> b/.github/workflows/build-and-test.yml
> index f66ab43b0..47d239f10 100644
> --- a/.github/workflows/build-and-test.yml
> +++ b/.github/workflows/build-and-test.yml
> @@ -183,10 +183,10 @@ jobs:
>   run:  sudo apt update || true
> - name: install common dependencies
>   run:  sudo apt install -y ${{ env.dependencies }}
> -- name: install libunbound libunwind
> +- name: install libunbound libunwind python3-unbound
>   # GitHub Actions doesn't have 32-bit versions of these 
> libraries.
>   if:   matrix.m32 == ''
> -  run:  sudo apt install -y libunbound-dev libunwind-dev
> +  run:  sudo apt install -y libunbound-dev libunwind-dev 
> python3-unbound
> - name: install 32-bit libraries
>   if:   matrix.m32 != ''
>   run:  sudo apt install -y gcc-multilib
> diff --git a/Documentation/intro/install/general.rst 
> b/Documentation/intro/install/general.rst
> index 42b5682fd..19e360d47 100644
> --- a/Documentation/intro/install/general.rst
> +++ b/Documentation/intro/install/general.rst
> @@ -90,7 +90,7 @@ need the following software:
>   If libcap-ng is installed, then Open vSwitch will automatically 
> build with
>   support for it.
>
> -- Python 3.

Re: [ovs-dev] [PATCH ovn v2] ci: Remove '--recheck' in CI.

2023-07-10 Thread 0-day Robot
Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 85 characters long (recommended limit is 79)
#171 FILE: .github/workflows/test.yml:108:
- { compiler: gcc, testsuite: test, test_range: "1001-", unstable: 
unstable }

WARNING: Line is 111 characters long (recommended limit is 79)
#177 FILE: .github/workflows/test.yml:113:
- { compiler: clang, testsuite: test, sanitizers: sanitizers, 
test_range: "1201-", unstable: unstable }

WARNING: Line is 103 characters long (recommended limit is 79)
#181 FILE: .github/workflows/test.yml:116:
- { compiler: gcc, testsuite: test, libs: -ljemalloc, test_range: 
"1001-", unstable: unstable }

WARNING: Line is 108 characters long (recommended limit is 79)
#185 FILE: .github/workflows/test.yml:119:
- { compiler: gcc, testsuite: system-test-dpdk, dpdk: dpdk, test_range: 
"201-", unstable: unstable }

WARNING: Line is 101 characters long (recommended limit is 79)
#189 FILE: .github/workflows/test.yml:122:
- { compiler: gcc, testsuite: system-test-userspace, test_range: 
"201-", unstable: unstable }

WARNING: Line is 91 characters long (recommended limit is 79)
#193 FILE: .github/workflows/test.yml:125:
- { compiler: gcc, testsuite: system-test, test_range: "201-", 
unstable: unstable }

WARNING: Line is 117 characters long (recommended limit is 79)
#197 FILE: .github/workflows/test.yml:128:
- { compiler: clang, testsuite: system-test, sanitizers: sanitizers, 
test_range: "201-", unstable: unstable }

Lines checked: 373, Warnings: 7, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] python: Add async DNS support

2023-07-10 Thread Terry Wilson
On Mon, Jul 10, 2023 at 10:32 AM Terry Wilson  wrote:
>
> I accidentally forgot to click reply-to-all.
>
> On Fri, Jun 30, 2023 at 10:27 AM Ilya Maximets  wrote:
> >
> > On 6/30/23 16:54, Adrian Moreno wrote:
> > >
> > >
> > > On 6/30/23 14:35, Ilya Maximets wrote:
> > >> On 6/30/23 14:23, Adrian Moreno wrote:
> > >>>
> > >>>
> > >>> On 6/30/23 13:40, Ilya Maximets wrote:
> >  On 6/30/23 12:39, Adrian Moreno wrote:
> > >
> > >
> > > On 6/14/23 23:07, Terry Wilson wrote:
> > >> This adds a Python version of the async DNS support added in:
> > >>
> > >> 771680d96 DNS: Add basic support for asynchronous DNS resolving
> > >>
> > >> The above version uses the unbound C library, and this
> > >> implimentation uses the SWIG-wrapped Python version of that.
> > >>
> > >> In the event that the Python unbound library is not available,
> > >> a warning will be logged and the resolve() method will just
> > >> return None. For the case where inet_parse_active() is passed
> > >> an IP address, it will not try to resolve it, so existing
> > >> behavior should be preserved in the case that the unbound
> > >> library is unavailable.
> > >>
> > >> Intentional differences from the C version are as follows:
> > >>
> > >>  OVS_HOSTS_FILE environment variable can bet set to override
> > >>  the system 'hosts' file. This is primarily to allow testing to
> > >>  be done without requiring network connectivity.
> > >>
> > >>  Since resolution can still be done via hosts file lookup, DNS
> > >>  lookups are not disabled when resolv.conf cannot be loaded.
> > >>
> > >>  The Python socket_util module has fallen behind its C 
> > >> equivalent.
> > >>  The bare minimum change was done to inet_parse_active() to 
> > >> support
> > >>  sync/async dns, as there is no equivalent to
> > >>  parse_sockaddr_components(), inet_parse_passive(), etc. A TODO
> > >>  was added to bring socket_util.py up to equivalency to the C
> > >>  version.
> > >>
> > >> Signed-off-by: Terry Wilson 
> > >> ---
> > >> .github/workflows/build-and-test.yml|   4 +-
> > >> Documentation/intro/install/general.rst |   4 +-
> > >> Documentation/intro/install/rhel.rst|   2 +-
> > >> Documentation/intro/install/windows.rst |   2 +-
> > >> NEWS|   4 +-
> > >> debian/control.in   |   1 +
> > >> m4/openvswitch.m4   |   8 +-
> > >> python/TODO.rst |   7 +
> > >> python/automake.mk  |   2 +
> > >> python/ovs/dns_resolve.py   | 272 
> > >> +++
> > >> python/ovs/socket_util.py   |  21 +-
> > >> python/ovs/stream.py|   2 +-
> > >> python/ovs/tests/test_dns_resolve.py| 280 
> > >> 
> > >> python/setup.py |   6 +-
> > >> rhel/openvswitch-fedora.spec.in |   2 +-
> > >> tests/vlog.at   |   2 +
> > >> 16 files changed, 601 insertions(+), 18 deletions(-)
> > >> create mode 100644 python/ovs/dns_resolve.py
> > >> create mode 100644 python/ovs/tests/test_dns_resolve.py
> > >>
> > >> diff --git a/.github/workflows/build-and-test.yml 
> > >> b/.github/workflows/build-and-test.yml
> > >> index f66ab43b0..47d239f10 100644
> > >> --- a/.github/workflows/build-and-test.yml
> > >> +++ b/.github/workflows/build-and-test.yml
> > >> @@ -183,10 +183,10 @@ jobs:
> > >>   run:  sudo apt update || true
> > >> - name: install common dependencies
> > >>   run:  sudo apt install -y ${{ env.dependencies }}
> > >> -- name: install libunbound libunwind
> > >> +- name: install libunbound libunwind python3-unbound
> > >>   # GitHub Actions doesn't have 32-bit versions of these 
> > >> libraries.
> > >>   if:   matrix.m32 == ''
> > >> -  run:  sudo apt install -y libunbound-dev libunwind-dev
> > >> +  run:  sudo apt install -y libunbound-dev libunwind-dev 
> > >> python3-unbound
> > >> - name: install 32-bit libraries
> > >>   if:   matrix.m32 != ''
> > >>   run:  sudo apt install -y gcc-multilib
> > >> diff --git a/Documentation/intro/install/general.rst 
> > >> b/Documentation/intro/install/general.rst
> > >> index 42b5682fd..19e360d47 100644
> > >> --- a/Documentation/intro/install/general.rst
> > >> +++ b/Documentation/intro/install/general.rst
> > >> @@ -90,7 +90,7 @@ need the following software:
> > >>   If libcap-ng is installed, then Open vSwitch will 
> > >> automatically build with
> 

[ovs-dev] [PATCH v4] ofproto-dpif-upcall: Mirror packets that are modified

2023-07-10 Thread Mike Pattrick
Currently OVS keeps track of which mirrors that each packet has been
sent to for the purpose of deduplication. However, this doesn't consider
that openflow rules can make significant changes to packets after
ingress.

For example, OVN can create OpenFlow rules that turn an echo request
into an echo response by flipping source/destination addresses and
setting the ICMP type to Reply. When a mirror is configured, only the
request gets mirrored even though a response is recieved.

This can cause a false impression of the actual traffic on wire if
someone inspects the mirror and doesn't see an echo reply even though
one has been sent.

This patch resets the mirrors every time a packet is modified, so
mirrors will recieve every copy of a packet that is sent for output.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2155579
Signed-off-by: Mike Pattrick 
---
 v2: Refactored all code into a single function
 v3: Cleaned up a code change that was incorrectly retained in v2 but
 not needed
 v4: Removed the default case from reset_mirror_ctx()
---
 ofproto/ofproto-dpif-xlate.c | 86 
 tests/ofproto-dpif.at|  6 +--
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 29f4daa63..aba0cdb6e 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -6989,6 +6989,90 @@ xlate_ofpact_unroll_xlate(struct xlate_ctx *ctx,
  "cookie=%#"PRIx64, a->rule_table_id, a->rule_cookie);
 }
 
+/* Reset the mirror context if we modify the packet and would like to mirror
+ * the new copy. */
+static void
+reset_mirror_ctx(struct xlate_ctx *ctx, const struct flow *flow, uint8_t type)
+{
+switch (type) {
+case OFPACT_STRIP_VLAN:
+case OFPACT_PUSH_VLAN:
+case OFPACT_SET_ETH_SRC:
+case OFPACT_SET_ETH_DST:
+case OFPACT_PUSH_MPLS:
+case OFPACT_POP_MPLS:
+case OFPACT_SET_MPLS_LABEL:
+case OFPACT_SET_MPLS_TC:
+case OFPACT_SET_MPLS_TTL:
+case OFPACT_DEC_MPLS_TTL:
+case OFPACT_DEC_NSH_TTL:
+case OFPACT_DEC_TTL:
+case OFPACT_SET_FIELD:
+case OFPACT_SET_VLAN_VID:
+case OFPACT_SET_VLAN_PCP:
+case OFPACT_ENCAP:
+case OFPACT_DECAP:
+case OFPACT_NAT:
+ctx->mirrors = 0;
+return;
+case OFPACT_SET_IPV4_SRC:
+case OFPACT_SET_IPV4_DST:
+if (flow->dl_type == htons(ETH_TYPE_IP)) {
+ctx->mirrors = 0;
+}
+return;
+case OFPACT_SET_IP_DSCP:
+case OFPACT_SET_IP_ECN:
+case OFPACT_SET_IP_TTL:
+if (is_ip_any(flow)) {
+ctx->mirrors = 0;
+}
+return;
+case OFPACT_SET_L4_SRC_PORT:
+case OFPACT_SET_L4_DST_PORT:
+if (is_ip_any(flow) && !(flow->nw_frag & FLOW_NW_FRAG_LATER)) {
+ctx->mirrors = 0;
+}
+return;
+case OFPACT_OUTPUT_REG:
+case OFPACT_OUTPUT_TRUNC:
+case OFPACT_GROUP:
+case OFPACT_OUTPUT:
+case OFPACT_CONTROLLER:
+case OFPACT_RESUBMIT:
+case OFPACT_GOTO_TABLE:
+case OFPACT_WRITE_METADATA:
+case OFPACT_SET_TUNNEL:
+case OFPACT_REG_MOVE:
+case OFPACT_STACK_PUSH:
+case OFPACT_STACK_POP:
+case OFPACT_LEARN:
+case OFPACT_ENQUEUE:
+case OFPACT_SET_QUEUE:
+case OFPACT_POP_QUEUE:
+case OFPACT_MULTIPATH:
+case OFPACT_BUNDLE:
+case OFPACT_EXIT:
+case OFPACT_UNROLL_XLATE:
+case OFPACT_FIN_TIMEOUT:
+case OFPACT_CLEAR_ACTIONS:
+case OFPACT_WRITE_ACTIONS:
+case OFPACT_METER:
+case OFPACT_SAMPLE:
+case OFPACT_CLONE:
+case OFPACT_DEBUG_RECIRC:
+case OFPACT_DEBUG_SLOW:
+case OFPACT_CT:
+case OFPACT_CT_CLEAR:
+case OFPACT_CHECK_PKT_LARGER:
+case OFPACT_DELETE_FIELD:
+case OFPACT_NOTE:
+case OFPACT_CONJUNCTION:
+return;
+}
+OVS_NOT_REACHED();
+}
+
 static void
 do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
  struct xlate_ctx *ctx, bool is_last_action,
@@ -7030,6 +7114,8 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t 
ofpacts_len,
 break;
 }
 
+reset_mirror_ctx(ctx, flow, a->type);
+
 if (OVS_UNLIKELY(ctx->xin->trace)) {
 struct ofputil_port_map map = OFPUTIL_PORT_MAP_INITIALIZER(&map);
 
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 6824ce0bb..f242f77f3 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5349,7 +5349,7 @@ AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
 
flow="in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
 AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "$flow"], [0], [stdout])
 AT_CHECK_UNQUOTED([tail -1 stdout], [0],
-  [Datapath actions: 3,push_vlan(vid=17,pcp=0),2
+  [Datapath actions: 3,push_vlan(vid=17,pcp=0),2,3
 ])
 
 
flow="in_port(2),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),

Re: [ovs-dev] [PATCH v3] python: Add async DNS support

2023-07-10 Thread Terry Wilson
I accidentally forgot to click reply-to-all.

On Fri, Jun 30, 2023 at 10:27 AM Ilya Maximets  wrote:
>
> On 6/30/23 16:54, Adrian Moreno wrote:
> >
> >
> > On 6/30/23 14:35, Ilya Maximets wrote:
> >> On 6/30/23 14:23, Adrian Moreno wrote:
> >>>
> >>>
> >>> On 6/30/23 13:40, Ilya Maximets wrote:
>  On 6/30/23 12:39, Adrian Moreno wrote:
> >
> >
> > On 6/14/23 23:07, Terry Wilson wrote:
> >> This adds a Python version of the async DNS support added in:
> >>
> >> 771680d96 DNS: Add basic support for asynchronous DNS resolving
> >>
> >> The above version uses the unbound C library, and this
> >> implimentation uses the SWIG-wrapped Python version of that.
> >>
> >> In the event that the Python unbound library is not available,
> >> a warning will be logged and the resolve() method will just
> >> return None. For the case where inet_parse_active() is passed
> >> an IP address, it will not try to resolve it, so existing
> >> behavior should be preserved in the case that the unbound
> >> library is unavailable.
> >>
> >> Intentional differences from the C version are as follows:
> >>
> >>  OVS_HOSTS_FILE environment variable can bet set to override
> >>  the system 'hosts' file. This is primarily to allow testing to
> >>  be done without requiring network connectivity.
> >>
> >>  Since resolution can still be done via hosts file lookup, DNS
> >>  lookups are not disabled when resolv.conf cannot be loaded.
> >>
> >>  The Python socket_util module has fallen behind its C equivalent.
> >>  The bare minimum change was done to inet_parse_active() to support
> >>  sync/async dns, as there is no equivalent to
> >>  parse_sockaddr_components(), inet_parse_passive(), etc. A TODO
> >>  was added to bring socket_util.py up to equivalency to the C
> >>  version.
> >>
> >> Signed-off-by: Terry Wilson 
> >> ---
> >> .github/workflows/build-and-test.yml|   4 +-
> >> Documentation/intro/install/general.rst |   4 +-
> >> Documentation/intro/install/rhel.rst|   2 +-
> >> Documentation/intro/install/windows.rst |   2 +-
> >> NEWS|   4 +-
> >> debian/control.in   |   1 +
> >> m4/openvswitch.m4   |   8 +-
> >> python/TODO.rst |   7 +
> >> python/automake.mk  |   2 +
> >> python/ovs/dns_resolve.py   | 272 
> >> +++
> >> python/ovs/socket_util.py   |  21 +-
> >> python/ovs/stream.py|   2 +-
> >> python/ovs/tests/test_dns_resolve.py| 280 
> >> 
> >> python/setup.py |   6 +-
> >> rhel/openvswitch-fedora.spec.in |   2 +-
> >> tests/vlog.at   |   2 +
> >> 16 files changed, 601 insertions(+), 18 deletions(-)
> >> create mode 100644 python/ovs/dns_resolve.py
> >> create mode 100644 python/ovs/tests/test_dns_resolve.py
> >>
> >> diff --git a/.github/workflows/build-and-test.yml 
> >> b/.github/workflows/build-and-test.yml
> >> index f66ab43b0..47d239f10 100644
> >> --- a/.github/workflows/build-and-test.yml
> >> +++ b/.github/workflows/build-and-test.yml
> >> @@ -183,10 +183,10 @@ jobs:
> >>   run:  sudo apt update || true
> >> - name: install common dependencies
> >>   run:  sudo apt install -y ${{ env.dependencies }}
> >> -- name: install libunbound libunwind
> >> +- name: install libunbound libunwind python3-unbound
> >>   # GitHub Actions doesn't have 32-bit versions of these 
> >> libraries.
> >>   if:   matrix.m32 == ''
> >> -  run:  sudo apt install -y libunbound-dev libunwind-dev
> >> +  run:  sudo apt install -y libunbound-dev libunwind-dev 
> >> python3-unbound
> >> - name: install 32-bit libraries
> >>   if:   matrix.m32 != ''
> >>   run:  sudo apt install -y gcc-multilib
> >> diff --git a/Documentation/intro/install/general.rst 
> >> b/Documentation/intro/install/general.rst
> >> index 42b5682fd..19e360d47 100644
> >> --- a/Documentation/intro/install/general.rst
> >> +++ b/Documentation/intro/install/general.rst
> >> @@ -90,7 +90,7 @@ need the following software:
> >>   If libcap-ng is installed, then Open vSwitch will automatically 
> >> build with
> >>   support for it.
> >>
> >> -- Python 3.4 or later.
> >> +- Python 3.6 or later.
> >>
> >> - Unbound library, from http://www.unbound.net, is optional but 
> >> recommended if
> >>   you want to enable ovs-vswitchd and other uti

Re: [ovs-dev] [PATCH v3 5/6] dpif-netdev: Add per pmd sleep config.

2023-07-10 Thread 0-day Robot
Bleep bloop.  Greetings Kevin Traynor, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 80 characters long (recommended limit is 79)
#50 FILE: Documentation/topics/dpdk/pmd.rst:384:
sleep and all others PMD threads to be able to request a max sleep of 50 usecs::

Lines checked: 497, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ci: Remove '--recheck' in CI.

2023-07-10 Thread Dumitru Ceara
On 7/7/23 14:27, Ales Musil wrote:
> On Wed, Jul 5, 2023 at 7:08 PM Dumitru Ceara  wrote:
> 
>> If we want to catch new failures faster we have a better chance if CI
>> doesn't auto-retry (once).
>>
>> There are some tests that are still "unstable" and fail every now and
>> then.  In order to reduce the number of false negatives keep the
>> --recheck for them.  To achieve that we use a new macro, TAG_UNSTABLE,
>> to tag all these tests.  The list of "unstable" tests is compiled based
>> on the following discussion:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405465.html
>>
>> In order to avoid new GitHub actions jobs, we re-purpose the last job of
>> each target type to also run the unstable tests.  These jobs were
>> already running less tests than others so the additional run time should
>> not be an issue.
>>
>> Signed-off-by: Dumitru Ceara 
>>
> 
> Hi,
> 
> I have one question below.
> 

Hi Ales,

Thanks for the review!

> 
>> ---
>> Changes in v1 (since RFC):
>> - kept recheck for unstable tests
>> - introduced TAG_UNSTABLE
>> - changed test.yml to run unstable tests in the last batch of every
>>   test target type.
>> ---
>>  .ci/ci.sh  |  2 +-
>>  .ci/linux-build.sh | 54 +++---
>>  .github/workflows/test.yml | 15 ++-
>>  tests/ovn-ic.at|  1 +
>>  tests/ovn-ipsec.at |  1 +
>>  tests/ovn-macros.at|  5 
>>  tests/ovn-northd.at|  1 +
>>  tests/ovn-performance.at   |  1 +
>>  tests/ovn.at   | 13 +
>>  9 files changed, 70 insertions(+), 23 deletions(-)
>>
>> diff --git a/.ci/ci.sh b/.ci/ci.sh
>> index 10f11939c5..a500aba764 100755
>> --- a/.ci/ci.sh
>> +++ b/.ci/ci.sh
>> @@ -101,7 +101,7 @@ function run_tests() {
>>  && \
>>  ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \
>>  TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \
>> -./.ci/linux-build.sh
>> +UNSTABLE=$UNSTABLE ./.ci/linux-build.sh
>>  "
>>  }
>>
>> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
>> index 5a79a52daf..58e41a7b68 100755
>> --- a/.ci/linux-build.sh
>> +++ b/.ci/linux-build.sh
>> @@ -9,6 +9,7 @@ COMMON_CFLAGS=""
>>  OVN_CFLAGS=""
>>  OPTS="$OPTS --enable-Werror"
>>  JOBS=${JOBS:-"-j4"}
>> +RECHECK=${RECHECK:-"no"}
>>
>>  function install_dpdk()
>>  {
>> @@ -99,6 +100,17 @@ function configure_clang()
>>  COMMON_CFLAGS="${COMMON_CFLAGS}
>> -Wno-error=unused-command-line-argument"
>>  }
>>
>> +function run_tests()
>> +{
>> +if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
>> +TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK
>> +then
>> +# testsuite.log is necessary for debugging.
>> +cat */_build/sub/tests/testsuite.log
>> +exit 1
>>
> 
> This essentially mean that if we fail in the stable part we won't run the
> unstable,
> which might be ok. But IMO it would be better to run both and fail
> depending on results
> from both WDYT? The same applies for system tests.
> 

Makes sense, I did that and posted v2 here:

https://patchwork.ozlabs.org/project/ovn/patch/20230710152543.95100-1-dce...@redhat.com/

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2] ci: Remove '--recheck' in CI.

2023-07-10 Thread Dumitru Ceara
If we want to catch new failures faster we have a better chance if CI
doesn't auto-retry (once).

There are some tests that are still "unstable" and fail every now and
then.  In order to reduce the number of false negatives keep the
--recheck for them.  To achieve that we use a new macro, TAG_UNSTABLE,
to tag all these tests.  The list of "unstable" tests is compiled based
on the following discussion:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405465.html

In order to avoid new GitHub actions jobs, we re-purpose the last job of
each target type to also run the unstable tests.  These jobs were
already running less tests than others so the additional run time should
not be an issue.

Signed-off-by: Dumitru Ceara 
---
V2:
- Addressed Ales' comments:
  - always run stable and unstable tests before declaring pass/fail
Changes in v1 (since RFC):
- kept recheck for unstable tests
- introduced TAG_UNSTABLE
- changed test.yml to run unstable tests in the last batch of every
  test target type.
---
 .ci/ci.sh  |  2 +-
 .ci/linux-build.sh | 76 ++
 .github/workflows/test.yml | 15 
 tests/ovn-ic.at|  1 +
 tests/ovn-ipsec.at |  1 +
 tests/ovn-macros.at|  5 +++
 tests/ovn-northd.at|  1 +
 tests/ovn-performance.at   |  1 +
 tests/ovn.at   | 13 +++
 9 files changed, 92 insertions(+), 23 deletions(-)

diff --git a/.ci/ci.sh b/.ci/ci.sh
index 10f11939c5..a500aba764 100755
--- a/.ci/ci.sh
+++ b/.ci/ci.sh
@@ -101,7 +101,7 @@ function run_tests() {
 && \
 ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \
 TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \
-./.ci/linux-build.sh
+UNSTABLE=$UNSTABLE ./.ci/linux-build.sh
 "
 }
 
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 5a79a52daf..4c5361f3ca 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -9,6 +9,7 @@ COMMON_CFLAGS=""
 OVN_CFLAGS=""
 OPTS="$OPTS --enable-Werror"
 JOBS=${JOBS:-"-j4"}
+RECHECK=${RECHECK:-"no"}
 
 function install_dpdk()
 {
@@ -99,6 +100,17 @@ function configure_clang()
 COMMON_CFLAGS="${COMMON_CFLAGS} -Wno-error=unused-command-line-argument"
 }
 
+function run_tests()
+{
+if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
+TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK
+then
+# testsuite.log is necessary for debugging.
+cat */_build/sub/tests/testsuite.log
+return 1
+fi
+}
+
 function execute_tests()
 {
 # 'distcheck' will reconfigure with required options.
@@ -106,27 +118,61 @@ function execute_tests()
 configure_ovn
 
 export DISTCHECK_CONFIGURE_FLAGS="$OPTS"
-if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
-TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes
-then
-# testsuite.log is necessary for debugging.
-cat */_build/sub/tests/testsuite.log
+
+local stable_rc=0
+local unstable_rc=0
+
+if ! SKIP_UNSTABLE=yes run_tests; then
+stable_rc=1
+fi
+
+if [ "$UNSTABLE" ]; then
+if ! SKIP_UNSTABLE=no TEST_RANGE="-k unstable" RECHECK=yes \
+run_tests; then
+unstable_rc=1
+fi
+fi
+
+if [[ $stable_rc -ne 0 ]] || [[ $unstable_rc -ne 0 ]]; then
 exit 1
 fi
 }
 
+function run_system_tests()
+{
+local type=$1
+local log_file=$2
+
+if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \
+RECHECK=$RECHECK; then
+# $log_file is necessary for debugging.
+cat tests/$log_file
+return 1
+fi
+}
+
 function execute_system_tests()
 {
-  type=$1
-  log_file=$2
-
-  configure_ovn $OPTS
-  make $JOBS || { cat config.log; exit 1; }
-  if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" RECHECK=yes; then
-  # $log_file is necessary for debugging.
-  cat tests/$log_file
-  exit 1
-  fi
+configure_ovn $OPTS
+make $JOBS || { cat config.log; exit 1; }
+
+local stable_rc=0
+local unstable_rc=0
+
+if ! SKIP_UNSTABLE=yes run_system_tests $@; then
+stable_rc=1
+fi
+
+if [ "$UNSTABLE" ]; then
+if ! SKIP_UNSTABLE=no TEST_RANGE="-k unstable" RECHECK=yes \
+run_system_tests $@; then
+unstable_rc=1
+fi
+fi
+
+if [[ $stable_rc -ne 0 ]] || [[ $unstable_rc -ne 0 ]]; then
+exit 1
+fi
 }
 
 configure_$CC
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index fe2a14c401..7d40251003 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -92,6 +92,7 @@ jobs:
   TESTSUITE:   ${{ matrix.cfg.testsuite }}
   TEST_RANGE:  ${{ matrix.cfg.test_range }}
   SANITIZERS:  ${{ matrix.cfg.sanitizers }}
+  UNSTABLE:${{ matrix.cfg.unstable }}
 
 name: linux ${{ join(matrix.cfg.*, ' ') }}
 runs-on: ubuntu-latest
@@ -

[ovs-dev] [PATCH v5] ofproto-dpif-xlate: Reduce stack usage in recursive xlate functions

2023-07-10 Thread Mike Pattrick
Several xlate actions used in recursive translation currently store a
large amount of information on the stack. This can result in handler
threads quickly running out of stack space despite before
xlate_resubmit_resource_check() is able to terminate translation. This
patch reduces stack usage by over 3kb from several translation actions.

This patch also moves some trace function from do_xlate_actions into its
own function.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2104779
Signed-off-by: Mike Pattrick 
Reviewed-by: Simon Horman 

---
Since v1:
 - Refactored code into specialized functions and renamed variables for
 improved readability.
Since v2:
 - Removed inline keywords
Since v3:
 - Improved formatting.
Since v4:
 - v4 patch was an incorrect revision
Since v5:
 - Moved trace portmap to heap
 - Moved additional flow_tnl mf_subvalue structs to heap

Signed-off-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-xlate.c | 259 ++-
 1 file changed, 164 insertions(+), 95 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 29f4daa63..2ecab08fb 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -501,6 +501,84 @@ ctx_cancel_freeze(struct xlate_ctx *ctx)
 
 static void finish_freezing(struct xlate_ctx *ctx);
 
+/* These functions and structure are used to save stack space in actions that
+ * need to retain a large amount of xlate_ctx state. */
+struct xretained_state {
+union mf_subvalue new_stack[1024 / sizeof(union mf_subvalue)];
+uint64_t actset_stub[1024 / 8];
+struct ofpbuf old_stack;
+struct ofpbuf old_action_set;
+struct flow old_flow;
+struct flow old_base;
+struct flow_tnl flow_tnl_mask;
+};
+
+/* The return of this function must be freed by
+ * xretain_state_restore_and_free(). */
+static struct xretained_state *
+xretain_state_save(struct xlate_ctx *ctx)
+{
+struct xretained_state *retained = xmalloc(sizeof *retained);
+
+retained->old_flow = ctx->xin->flow;
+retained->old_stack = ctx->stack;
+retained->old_action_set = ctx->action_set;
+ofpbuf_use_stub(&ctx->stack, retained->new_stack,
+sizeof retained->new_stack);
+ofpbuf_use_stub(&ctx->action_set, retained->actset_stub,
+sizeof retained->actset_stub);
+
+return retained;
+}
+
+static void
+xretain_tunnel_mask_save(const struct xlate_ctx *ctx,
+ struct xretained_state *retained)
+{
+retained->flow_tnl_mask = ctx->wc->masks.tunnel;
+}
+
+static void
+xretain_base_flow_save(const struct xlate_ctx *ctx,
+   struct xretained_state *retained)
+{
+retained->old_base = ctx->base_flow;
+}
+
+static void
+xretain_base_flow_restore(struct xlate_ctx *ctx,
+  const struct xretained_state *retained)
+{
+ctx->base_flow = retained->old_base;
+}
+
+static void
+xretain_flow_restore(struct xlate_ctx *ctx,
+ const struct xretained_state *retained)
+{
+ctx->xin->flow = retained->old_flow;
+}
+
+static void
+xretain_tunnel_mask_restore(struct xlate_ctx *ctx,
+const struct xretained_state *retained)
+{
+ctx->wc->masks.tunnel = retained->flow_tnl_mask;
+}
+
+static void
+xretain_state_restore_and_free(struct xlate_ctx *ctx,
+   struct xretained_state *retained)
+{
+ctx->xin->flow = retained->old_flow;
+ofpbuf_uninit(&ctx->action_set);
+ctx->action_set = retained->old_action_set;
+ofpbuf_uninit(&ctx->stack);
+ctx->stack = retained->old_stack;
+
+free(retained);
+}
+
 /* A controller may use OFPP_NONE as the ingress port to indicate that
  * it did not arrive on a "real" port.  'ofpp_none_bundle' exists for
  * when an input bundle is needed for validation (e.g., mirroring or
@@ -3915,20 +3993,17 @@ static void
 patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev,
   struct xport *out_dev, bool is_last_action)
 {
+bool old_was_mpls = ctx->was_mpls;
 struct flow *flow = &ctx->xin->flow;
-struct flow old_flow = ctx->xin->flow;
-struct flow_tnl old_flow_tnl_wc = ctx->wc->masks.tunnel;
 bool old_conntrack = ctx->conntracked;
-bool old_was_mpls = ctx->was_mpls;
-ovs_version_t old_version = ctx->xin->tables_version;
-struct ofpbuf old_stack = ctx->stack;
-uint8_t new_stack[1024];
-struct ofpbuf old_action_set = ctx->action_set;
+struct xretained_state *retained_state;
 struct ovs_list *old_trace = ctx->xin->trace;
-uint64_t actset_stub[1024 / 8];
+ovs_version_t old_version = ctx->xin->tables_version;
+
+retained_state = xretain_state_save(ctx);
+
+xretain_tunnel_mask_save(ctx, retained_state);
 
-ofpbuf_use_stub(&ctx->stack, new_stack, sizeof new_stack);
-ofpbuf_use_stub(&ctx->action_set, actset_stub, sizeof actset_stub);
 flow->in_port.ofp_port = out_dev->ofp_port;
 flow->metadata = ht

[ovs-dev] [PATCH v3 6/6] pmd.at: Add per pmd max sleep unit tests.

2023-07-10 Thread Kevin Traynor
Add unit tests for new per pmd options of pmd-sleep-max.

Signed-off-by: Kevin Traynor 
Reviewed-by: David Marchand 
---
 tests/pmd.at | 152 +++
 1 file changed, 152 insertions(+)

diff --git a/tests/pmd.at b/tests/pmd.at
index 6040d558c..31b832e3e 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -77,4 +77,18 @@ m4_define([CHECK_DP_SLEEP_MAX], [
 ])
 
+dnl CHECK_PMD_SLEEP_MAX([core_id], [numa_id], [max_sleep], [+line])
+dnl
+dnl Checks max sleep time of each pmd with core_id.
+dnl Checking starts from line number 'line' in ovs-vswithd.log .
+m4_define([CHECK_PMD_SLEEP_MAX], [
+PATTERN="PMD thread core   *[$1] NUMA  *[$2]: Max sleep request set to  
*[$3] usecs."
+line_st=$4
+if [[ -z "$line_st" ]]
+then
+line_st="+0"
+fi
+OVS_WAIT_UNTIL([tail -n $line_st ovs-vswitchd.log | grep "$PATTERN"])
+])
+
 m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/"])
 m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])
@@ -1339,2 +1353,140 @@ PMD load based sleeps are enabled by default.
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([PMD - per pmd sleep])
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy-pmd 
options:n_rxq=8 options:numa_id=1], [], [], [--dummy-numa 0,0,0,1,1,8,8])
+
+dnl Check system default
+CHECK_DP_SLEEP_MAX([0], [disabled], [])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-sleep-show], [0], [dnl
+PMD max sleep request is 0 usecs by default.
+PMD load based sleeps are disabled by default.
+PMD thread core   0 NUMA  0: Max sleep request set to0 usecs.
+PMD thread core   3 NUMA  1: Max sleep request set to0 usecs.
+PMD thread core   5 NUMA  8: Max sleep request set to0 usecs.
+])
+
+dnl only a dp default
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-sleep-max=200])
+CHECK_DP_SLEEP_MAX([200], [enabled], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([0], [0], [200], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([3], [1], [200], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([5], [8], [200], [+$LINENUM])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-sleep-show], [0], [dnl
+PMD max sleep request is 200 usecs by default.
+PMD load based sleeps are enabled by default.
+PMD thread core   0 NUMA  0: Max sleep request set to  200 usecs.
+PMD thread core   3 NUMA  1: Max sleep request set to  200 usecs.
+PMD thread core   5 NUMA  8: Max sleep request set to  200 usecs.
+])
+
+dnl only per pmd
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set Open_vSwitch . 
other_config:pmd-sleep-max=3:300,0:100,5:400])
+CHECK_DP_SLEEP_MAX([0], [disabled], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([0], [0], [100], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([3], [1], [300], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([5], [8], [400], [+$LINENUM])
+
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-sleep-show], [0], [dnl
+PMD max sleep request is 0 usecs by default.
+PMD load based sleeps are disabled by default.
+PMD thread core   0 NUMA  0: Max sleep request set to  100 usecs.
+PMD thread core   3 NUMA  1: Max sleep request set to  300 usecs.
+PMD thread core   5 NUMA  8: Max sleep request set to  400 usecs.
+])
+
+dnl mix of not used default and per-pmd
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set Open_vSwitch . 
other_config:pmd-sleep-max=50,3:300,0:100,5:200])
+CHECK_DP_SLEEP_MAX([50], [enabled], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([0], [0], [100], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([3], [1], [300], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([5], [8], [200], [+$LINENUM])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-sleep-show], [0], [dnl
+PMD max sleep request is 50 usecs by default.
+PMD load based sleeps are enabled by default.
+PMD thread core   0 NUMA  0: Max sleep request set to  100 usecs.
+PMD thread core   3 NUMA  1: Max sleep request set to  300 usecs.
+PMD thread core   5 NUMA  8: Max sleep request set to  200 usecs.
+])
+
+dnl remove a per-pmd entry and use default
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-sleep-max=50,3:300])
+CHECK_DP_SLEEP_MAX([50], [enabled], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([0], [0], [50], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([3], [1], [300], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([5], [8], [50], [+$LINENUM])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-sleep-show], [0], [dnl
+PMD max sleep request is 50 usecs by default.
+PMD load based sleeps are enabled by default.
+PMD thread core   0 NUMA  0: Max sleep request set to   50 usecs.
+PMD thread core   3 NUMA  1: Max sleep request set to  300 usecs.
+PMD thread core   5 NUMA  8: Max sleep request set to   50 usecs.
+])
+
+dnl mix and change values
+
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-sleep-max=3:400,200])
+CHECK_DP_SLEEP_MAX([200], [enabled], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([0], [0], [200], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([3], [1], [400], [+$LINENUM])
+CHECK_PMD_SLEEP_MAX([5], [8], [200], [+$LINENUM])
+
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-sleep-show], [0], [dnl
+PMD max sleep request is 200 use

[ovs-dev] [PATCH v3 5/6] dpif-netdev: Add per pmd sleep config.

2023-07-10 Thread Kevin Traynor
Extend 'pmd-sleep-max' so that individual PMD thread cores
may have a specified max sleep request value.

Any PMD thread core without a value will use the datapath default
(no sleep request) or datapath global value set by the user.

To set PMD thread cores 8 and 9 to never request a load based sleep
and all other PMD thread cores to be able to request a max sleep of
50 usecs:

$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=50,8:0,9:0

To set PMD thread cores 10 and 11 to request a max sleep of 100 usecs
and all other PMD thread cores to never request a sleep:

$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=10:100,11:100

'pmd-sleep-show' can be used to dump the global and individual PMD thread
core max sleep request values.

Signed-off-by: Kevin Traynor 
Reviewed-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst |  23 +++
 lib/dpif-netdev-private-thread.h  |   3 +
 lib/dpif-netdev.c | 226 +++---
 tests/pmd.at  |  32 ++---
 4 files changed, 252 insertions(+), 32 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index 40e6b7843..7eb4eb7b3 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -375,4 +375,27 @@ system configuration (e.g. enabling processor C-states) 
and workloads.
 rate.
 
+Max sleep request values can be set for individual PMDs using key:value pairs.
+Any PMD that has been assigned a specified value will use that. Any PMD that
+does not have a specified value will use the current global default.
+
+Specified values for individual PMDs can be added or removed at any time.
+
+For example, to set PMD threads on cores 8 and 9 to never request a load based
+sleep and all others PMD threads to be able to request a max sleep of 50 
usecs::
+
+$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=50,8:0,9:0
+
+The max sleep request for each PMD can be checked in the logs or with::
+
+$ ovs-appctl dpif-netdev/pmd-sleep-show
+PMD max sleep request is 50 usecs by default.
+PMD load based sleeps are enabled by default.
+PMD thread core   8 NUMA  0: Max sleep request set to0 usecs.
+PMD thread core   9 NUMA  1: Max sleep request set to0 usecs.
+PMD thread core  10 NUMA  0: Max sleep request set to   50 usecs.
+PMD thread core  11 NUMA  1: Max sleep request set to   50 usecs.
+PMD thread core  12 NUMA  0: Max sleep request set to   50 usecs.
+PMD thread core  13 NUMA  1: Max sleep request set to   50 usecs.
+
 .. _ovs-vswitchd(8):
 http://openvswitch.org/support/dist-docs/ovs-vswitchd.8.html
diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
index 1ec3cd794..5c72ce5d9 100644
--- a/lib/dpif-netdev-private-thread.h
+++ b/lib/dpif-netdev-private-thread.h
@@ -181,4 +181,7 @@ struct dp_netdev_pmd_thread {
 bool isolated;
 
+/* Max sleep request. UINT64_MAX indicates dp default should be used.*/
+atomic_uint64_t max_sleep;
+
 /* Queue id used by this pmd thread to send packets on all netdevs if
  * XPS disabled for this netdev. All static_tx_qid's are unique and less
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 522253c82..0e79705d8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -180,4 +180,9 @@ static struct odp_support dp_netdev_support = {
 #define PMD_SLEEP_INC_US 1
 
+struct pmd_sleep {
+unsigned core_id;
+uint64_t max_sleep;
+};
+
 struct dpcls {
 struct cmap_node node;  /* Within dp_netdev_pmd_thread.classifiers */
@@ -290,4 +295,6 @@ struct dp_netdev {
 /* Max load based sleep request. */
 atomic_uint64_t pmd_max_sleep;
+/* Max load based sleep request user string. */
+char *max_sleep_list;
 /* Enable the SMC cache from ovsdb config */
 atomic_bool smc_enable_db;
@@ -1013,9 +1020,15 @@ pmd_max_sleep_show(struct ds *reply, struct 
dp_netdev_pmd_thread *pmd,
 {
 if (pmd->core_id != NON_PMD_CORE_ID) {
+uint64_t pmd_max_sleep;
+
+atomic_read_relaxed(&pmd->max_sleep, &pmd_max_sleep);
 ds_put_format(reply,
   "PMD thread core %3u NUMA %2d: "
   "Max sleep request set to",
   pmd->core_id, pmd->numa_id);
-ds_put_format(reply, " %4"PRIu64" usecs.", default_max_sleep);
+ds_put_format(reply, " %4"PRIu64" usecs.",
+  pmd_max_sleep == UINT64_MAX
+  ? default_max_sleep
+  : pmd_max_sleep);
 ds_put_cstr(reply, "\n");
 }
@@ -1528,7 +1541,8 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, 
const char *argv[],
 atomic_read_relaxed(&dp->pmd_max_sleep, &default_max_sleep);
 ds_put_format(&reply, "PMD max sleep request is %"PRIu64" "
-  "usecs.", default_max_sleep);
+  "usecs by default.", default_max_sleep);
 

[ovs-dev] [PATCH v3 4/6] dpif-netdev: Remove pmd-sleep-max experimental tag.

2023-07-10 Thread Kevin Traynor
Signed-off-by: Kevin Traynor 
Reviewed-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst | 4 ++--
 NEWS  | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index bedd42194..40e6b7843 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -325,6 +325,6 @@ reassignment due to PMD Auto Load Balance. For example, 
this could be set
 (in min) such that a reassignment is triggered at most every few hours.
 
-PMD load based sleeping (Experimental)
---
+PMD load based sleeping
+---
 
 PMD threads constantly poll Rx queues which are assigned to them. In order to
diff --git a/NEWS b/NEWS
index dd3d3aa91..ee8006563 100644
--- a/NEWS
+++ b/NEWS
@@ -48,4 +48,5 @@ Post-v3.1.0
  * Add 'ovs-appctl dpif-netdev/pmd-sleep-show' command to get the
max sleep request setting of PMD thread cores.
+ * Remove experimental tag from PMD load based sleeping.
- Linux TC offload:
  * Add support for offloading VXLAN tunnels with the GBP extensions.
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 3/6] dpif-netdev: Add pmd-sleep-show command.

2023-07-10 Thread Kevin Traynor
Max requested sleep time and status for a PMD thread
is logged at start up or when changed, but it can be
convenient to have a command to dump this information
explicitly.

It is envisaged that this will be expanded when future
additions are added.

Signed-off-by: Kevin Traynor 
Reviewed-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst |  4 
 NEWS  |  2 ++
 lib/dpif-netdev.c | 40 +++
 tests/pmd.at  | 29 ++
 4 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index b261e9254..bedd42194 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -349,4 +349,8 @@ time not processing packets will be determined by the sleep 
and processor
 wake-up times and should be tested with each system configuration.
 
+The max sleep time that a PMD thread may request can be shown with::
+
+$ ovs-appctl dpif-netdev/pmd-sleep-show
+
 Sleep time statistics for 10 secs can be seen with::
 
diff --git a/NEWS b/NEWS
index 6c0b09e0a..dd3d3aa91 100644
--- a/NEWS
+++ b/NEWS
@@ -46,4 +46,6 @@ Post-v3.1.0
table to check the status.
  * Rename 'pmd-maxsleep' other_config to 'pmd-sleep-max'.
+ * Add 'ovs-appctl dpif-netdev/pmd-sleep-show' command to get the
+   max sleep request setting of PMD thread cores.
- Linux TC offload:
  * Add support for offloading VXLAN tunnels with the GBP extensions.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9df481dd6..522253c82 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -702,4 +702,5 @@ enum pmd_info_type {
 PMD_INFO_SHOW_RXQ,/* Show poll lists of pmd threads. */
 PMD_INFO_PERF_SHOW,   /* Show pmd performance details. */
+PMD_INFO_MAX_SLEEP_SHOW,   /* Show max sleep performance details. */
 };
 
@@ -1007,4 +1008,18 @@ sorted_poll_thread_list(struct dp_netdev *dp,
 }
 
+static void
+pmd_max_sleep_show(struct ds *reply, struct dp_netdev_pmd_thread *pmd,
+   uint64_t default_max_sleep)
+{
+if (pmd->core_id != NON_PMD_CORE_ID) {
+ds_put_format(reply,
+  "PMD thread core %3u NUMA %2d: "
+  "Max sleep request set to",
+  pmd->core_id, pmd->numa_id);
+ds_put_format(reply, " %4"PRIu64" usecs.", default_max_sleep);
+ds_put_cstr(reply, "\n");
+}
+}
+
 static void
 dpif_netdev_subtable_lookup_get(struct unixctl_conn *conn, int argc OVS_UNUSED,
@@ -1442,5 +1457,6 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, 
const char *argv[],
 unsigned long long max_secs = (PMD_INTERVAL_LEN * PMD_INTERVAL_MAX)
   / INTERVAL_USEC_TO_SEC;
-bool first_show_rxq = true;
+bool first_pmd = true;
+uint64_t default_max_sleep = 0;
 
 ovs_mutex_lock(&dp_netdev_mutex);
@@ -1490,5 +1506,5 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, 
const char *argv[],
 }
 if (type == PMD_INFO_SHOW_RXQ) {
-if (first_show_rxq) {
+if (first_pmd) {
 if (!secs || secs > max_secs) {
 secs = max_secs;
@@ -1499,5 +1515,5 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, 
const char *argv[],
 ds_put_format(&reply, "Displaying last %u seconds "
   "pmd usage %%\n", secs);
-first_show_rxq = false;
+first_pmd = false;
 }
 pmd_info_show_rxq(&reply, pmd, secs);
@@ -1508,4 +1524,16 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int 
argc, const char *argv[],
 } else if (type == PMD_INFO_PERF_SHOW) {
 pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params *)aux);
+} else if (type == PMD_INFO_MAX_SLEEP_SHOW) {
+if (first_pmd) {
+atomic_read_relaxed(&dp->pmd_max_sleep, &default_max_sleep);
+ds_put_format(&reply, "PMD max sleep request is %"PRIu64" "
+  "usecs.", default_max_sleep);
+ds_put_cstr(&reply, "\n");
+ds_put_format(&reply, "PMD load based sleeps are %s.",
+  default_max_sleep ? "enabled" : "disabled");
+ds_put_cstr(&reply, "\n");
+first_pmd = false;
+}
+pmd_max_sleep_show(&reply, pmd, default_max_sleep);
 }
 }
@@ -1608,5 +1636,6 @@ dpif_netdev_init(void)
 static enum pmd_info_type show_aux = PMD_INFO_SHOW_STATS,
   clear_aux = PMD_INFO_CLEAR_STATS,
-  poll_aux = PMD_INFO_SHOW_RXQ;
+  poll_aux = PMD_INFO_SHOW_RXQ,
+  sleep_aux = PMD_INFO_MAX_SLEEP_SHOW;
 
 unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]",
@@ -16

[ovs-dev] [PATCH v3 1/6] dpif-netdev: Rename pmd-maxsleep config option.

2023-07-10 Thread Kevin Traynor
other_config:pmd-maxsleep is a config option to allow
PMD thread cores to sleep under low or no load conditions.

Rename it to 'pmd-sleep-max' to allow a more structured
name and so that additional options or command can follow
the 'pmd-sleep-xyz' pattern.

Use of other_config:pmd-maxsleep will result in a warning
and it's value will be ignored.

Signed-off-by: Kevin Traynor 
Reviewed-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst |  2 +-
 NEWS  |  1 +
 lib/dpif-netdev.c |  7 ++-
 tests/pmd.at  | 12 ++--
 vswitchd/vswitch.xml  |  2 +-
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index e70986d16..b261e9254 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -335,5 +335,5 @@ This can be enabled by setting the max requested sleep time 
(in microseconds)
 for a PMD thread::
 
-$ ovs-vsctl set open_vswitch . other_config:pmd-maxsleep=50
+$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=50
 
 With a non-zero max value a PMD may request to sleep by an incrementing amount
diff --git a/NEWS b/NEWS
index 6a990c921..6c0b09e0a 100644
--- a/NEWS
+++ b/NEWS
@@ -45,4 +45,5 @@ Post-v3.1.0
interfaces that support it.  See the 'status' column in the 'interface'
table to check the status.
+ * Rename 'pmd-maxsleep' other_config to 'pmd-sleep-max'.
- Linux TC offload:
  * Add support for offloading VXLAN tunnels with the GBP extensions.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index ab493f9d4..9df481dd6 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4983,5 +4983,10 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 set_pmd_auto_lb(dp, autolb_state, log_autolb);
 
-pmd_max_sleep = smap_get_ullong(other_config, "pmd-maxsleep", 0);
+if (smap_get_ullong(other_config, "pmd-maxsleep", 0)) {
+VLOG_WARN("pmd-maxsleep is not supported. "
+  "Please use pmd-sleep-max instead.");
+}
+
+pmd_max_sleep = smap_get_ullong(other_config, "pmd-sleep-max", 0);
 pmd_max_sleep = MIN(PMD_RCU_QUIESCE_INTERVAL, pmd_max_sleep);
 atomic_read_relaxed(&dp->pmd_max_sleep, &cur_pmd_max_sleep);
diff --git a/tests/pmd.at b/tests/pmd.at
index 48f3d432d..374ad7217 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -1266,5 +1266,5 @@ OVS_WAIT_UNTIL([tail ovs-vswitchd.log | grep "PMD load 
based sleeps are disabled
 dnl Check low value max sleep
 get_log_next_line_num
-AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="1"])
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="1"])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 1 usecs."])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
@@ -1272,5 +1272,5 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep 
"PMD load based sleeps
 dnl Check high value max sleep
 get_log_next_line_num
-AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="1"])
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="1"])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 1 usecs."])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
@@ -1278,5 +1278,5 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep 
"PMD load based sleeps
 dnl Check setting max sleep to zero
 get_log_next_line_num
-AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="0"])
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="0"])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 0 usecs."])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are disabled."])
@@ -1284,5 +1284,5 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep 
"PMD load based sleeps
 dnl Check above high value max sleep
 get_log_next_line_num
-AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="10001"])
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="10001"])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 1 usecs."])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
@@ -1290,10 +1290,10 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | 
grep "PMD load based sleeps
 dnl Check rounding
 get_log_next_line_num
-AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-maxsleep="490"])
+AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="490"])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 490 usecs."])
 OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled.

[ovs-dev] [PATCH v3 2/6] pmd.at: Add macro for checking pmd sleep max time and state.

2023-07-10 Thread Kevin Traynor
This is just cosmetic. There is no change to the tests.

Signed-off-by: Kevin Traynor 
Reviewed-by: David Marchand 
---
 tests/pmd.at | 39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/tests/pmd.at b/tests/pmd.at
index 374ad7217..4dd775bd3 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -61,4 +61,20 @@ m4_define([CHECK_PMD_THREADS_CREATED], [
 ])
 
+dnl CHECK_DP_SLEEP_MAX([max_sleep], [enabled], [+line])
+dnl
+dnl Checks correct pmd load based sleep is set for the datapath.
+dnl Checking starts from line number 'line' in ovs-vswithd.log .
+m4_define([CHECK_DP_SLEEP_MAX], [
+SLEEP_TIME="PMD max sleep request is $1 usecs."
+SLEEP_STATE="PMD load based sleeps are $2."
+line_st=$3
+if [[ -z "$line_st" ]]
+then
+line_st="+0"
+fi
+OVS_WAIT_UNTIL([tail -n $line_st ovs-vswitchd.log | grep "$SLEEP_TIME"])
+OVS_WAIT_UNTIL([tail -n $line_st ovs-vswitchd.log | grep "$SLEEP_STATE"])
+])
+
 m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/"])
 m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])
@@ -1256,46 +1272,39 @@ OVS_VSWITCHD_STOP
 AT_CLEANUP
 
-dnl Check default state
 AT_SETUP([PMD - pmd sleep])
 OVS_VSWITCHD_START
 
 dnl Check default
-OVS_WAIT_UNTIL([tail ovs-vswitchd.log | grep "PMD max sleep request is 0 
usecs."])
-OVS_WAIT_UNTIL([tail ovs-vswitchd.log | grep "PMD load based sleeps are 
disabled."])
+CHECK_DP_SLEEP_MAX([0], [disabled], [])
 
 dnl Check low value max sleep
 get_log_next_line_num
 AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="1"])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 1 usecs."])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
+CHECK_DP_SLEEP_MAX([1], [enabled], [+$LINENUM])
 
 dnl Check high value max sleep
 get_log_next_line_num
 AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="1"])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 1 usecs."])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
+CHECK_DP_SLEEP_MAX([1], [enabled], [+$LINENUM])
 
 dnl Check setting max sleep to zero
 get_log_next_line_num
 AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="0"])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 0 usecs."])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are disabled."])
+CHECK_DP_SLEEP_MAX([0], [disabled], [+$LINENUM])
 
 dnl Check above high value max sleep
 get_log_next_line_num
 AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="10001"])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 1 usecs."])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
+CHECK_DP_SLEEP_MAX([1], [enabled], [+$LINENUM])
 
 dnl Check rounding
 get_log_next_line_num
 AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="490"])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 490 usecs."])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
+CHECK_DP_SLEEP_MAX([490], [enabled], [+$LINENUM])
+
 dnl Check rounding
 get_log_next_line_num
 AT_CHECK([ovs-vsctl set open_vswitch . other_config:pmd-sleep-max="499"])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD max sleep 
request is 499 usecs."])
-OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based 
sleeps are enabled."])
+CHECK_DP_SLEEP_MAX([499], [enabled], [+$LINENUM])
 
 OVS_VSWITCHD_STOP
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 0/6] PMD load based sleep updates and per-pmd config.

2023-07-10 Thread Kevin Traynor
Patches 1-4 are about polishing the existing functionality
and preparing for new functionality later in the series.

Patch 1 renames 'pmd-maxsleep' to 'pmd-sleep-max'. I think
this is allowed as it is still experimental. A warning will
be displayed if pmd-maxsleep is attempted to be used.

Patches 5-6 adds a new per pmd config option functionality and tests.
These can be considered separate from patches 1-4 but as there
are dependencies, it made sense to add all the patches to this
patch set.

v3
- Rebased NEWS
- 1/6 Added warning for use of pmd-maxsleep (Simon)
- 5/6 Minor changes (David)

v2
- 2/6 fixed UT macro
- 3/6 rebased NEWS confict
- 3/6 moved pmd_max_sleep_show() location
- 6/6 removed incorrect check

GHA: https://github.com/kevintraynor/ovs/actions/runs/5507254457

Kevin Traynor (6):
  dpif-netdev: Rename pmd-maxsleep config option.
  pmd.at: Add macro for checking pmd sleep max time and state.
  dpif-netdev: Add pmd-sleep-show command.
  dpif-netdev: Remove pmd-sleep-max experimental tag.
  dpif-netdev: Add per pmd sleep config.
  pmd.at: Add per pmd max sleep unit tests.

 Documentation/topics/dpdk/pmd.rst |  33 +++-
 NEWS  |   4 +
 lib/dpif-netdev-private-thread.h  |   3 +
 lib/dpif-netdev.c | 261 --
 tests/pmd.at  | 232 +++---
 vswitchd/vswitch.xml  |   2 +-
 6 files changed, 495 insertions(+), 40 deletions(-)

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/7] netdev-linux: use speed as max rate in tc classes

2023-07-10 Thread Eelco Chaudron


On 10 Jul 2023, at 16:26, Adrian Moreno wrote:

> On 7/10/23 16:19, Adrian Moreno wrote:
>>
>>
>> On 7/6/23 14:50, Eelco Chaudron wrote:
>>>
>>>
>>> On 2 Jun 2023, at 16:13, Adrian Moreno wrote:
>>>
 Instead of relying on feature bits, use the speed value directly as
 maximum rate for htb and hfsc classes.

 There is still a limitation with the maximum rate that we can express
 with a 32-bit number in bytes/s (~ 34.3Gbps), but using the actual link 
 speed
 instead of the feature bits, we can at least use an accurate maximum for
 some link speeds (such as 25Gbps) which are not supported by netdev's 
 feature
 bits.
>>>
>>> Hi Adrian,
>>>
>>> See some comments below.
>>>
>>> //Eelco
>>>
>>>
 Signed-off-by: Adrian Moreno 
 ---
   lib/netdev-linux.c | 14 ++
   1 file changed, 6 insertions(+), 8 deletions(-)

 diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
 index 3055f88d2..56b487eea 100644
 --- a/lib/netdev-linux.c
 +++ b/lib/netdev-linux.c
 @@ -4746,11 +4746,10 @@ htb_parse_qdisc_details__(struct netdev *netdev_,

   hc->max_rate = smap_get_ullong(details, "max-rate", 0) / 8;
   if (!hc->max_rate) {
 -    enum netdev_features current;
 -
   netdev_linux_read_features(netdev);
 -    current = !netdev->get_features_error ? netdev->current : 0;
 -    hc->max_rate = netdev_features_to_bps(current, 
 NETDEV_DEFAULT_BPS) / 8;
>>>
>>> Re-reading my previous comments on fallback for to netdev_get_features() I 
>>> guess the new API should replace this one.
>>> So to make a statement I would say try to remove the 
>>> netdev_features_to_bps() API, or move it to netdev_bsd_features_to_bps() 
>>> for netdev_bsd_get_speed only (with a comment) so it’s clear that people 
>>> should NOT use this API from now on.
>>>
>>
>> We are still not deprecating the old API (netdev_get_features() and "enum 
>> netdev_features") since it is in line with OFP1.0 that does not have numeric 
>> speeds. e.g: lib/ofp-port uses netdev_get_features() to convert the maximum 
>> speed to kbps when decoding OFP1.0 port descriptions.
>>
>> What I do think is a good idea is to add a comment pointing new users to the 
>> new API if what they are looking for is a more accurate speed. WDYT?
>>
>>
>
> Or maybe we could even rename it and add "legacy" in the name so that it's 
> more clear that it should not be used (for those who don't read the comments 
> :-)).

What about a check patch warning when the API is used?

> -- 
> Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 4/7] netdev-linux: use 64-bit rates in htb tc classes

2023-07-10 Thread Ilya Maximets
On 6/2/23 16:13, Adrian Moreno wrote:
> Currently, htb rates are capped at ~34Gbps because they are internally
> expressed as 32-bit fields.
> 
> Move min and max rates to 64-bit fields and use TCA_HTB_RATE64 and
> TCA_HTB_CEIL64 to configure HTC classes to break this barrier.
> 
> In order to test this, create a dummy tuntap device and set it's
> speed to a very high value so we can try adding a QoS queue with big
> rates.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137619
> Signed-off-by: Adrian Moreno 
> ---
>  acinclude.m4| 10 +++
>  lib/netdev-linux.c  | 59 -
>  tests/atlocal.in|  1 +
>  tests/system-traffic.at | 28 +++
>  4 files changed, 80 insertions(+), 18 deletions(-)
> 



> index 4c378e1d0..6200ec52e 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -2354,6 +2354,34 @@ AT_CHECK([tc class show dev ovs-p1 | grep -q 'class 
> htb .* HTB_CONF'])
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([QoS - 64bit])
> +AT_SKIP_IF([test $HAVE_TC = no])
> +AT_SKIP_IF([test $HAVE_TCA_HTB_RATE64 = no])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Configure the QoS with rates that require 64bits, i.e: > 34Gbps.
> +AT_CHECK([ovs-vsctl set port ovs-p0 qos=@qos -- set port ovs-p1 qos=@qos dnl
> +-- --id=@qos create qos dnl
> +   type=linux-htb other-config:max-rate=500 
> queues:0=@queue dnl
> +-- --id=@queue create queue dnl
> +   other_config:min-rate=400 
> other_config:max-rate=500 dnl
> +   other_config:burst=500],
> + [ignore], [ignore])
> +
> +OVS_WAIT_UNTIL([tc qdisc show dev ovs-p0 | grep -q htb])
> +OVS_WAIT_UNTIL([tc qdisc show dev ovs-p1 | grep -q htb])
> +
> +m4_define([HTB_CONF], [rate 40Gbit ceil 50Gbit burst 195503b cburst 
> 1955031250b])

Same comment as for 7/7.
The burst seems to be overflowing here.  Should it be documented that
limits for the burst are lower?  (I didn't read the whole set.  Sorry
if I missed these docs.)

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/7] netdev-linux: use speed as max rate in tc classes

2023-07-10 Thread Adrian Moreno



On 7/10/23 16:19, Adrian Moreno wrote:



On 7/6/23 14:50, Eelco Chaudron wrote:



On 2 Jun 2023, at 16:13, Adrian Moreno wrote:


Instead of relying on feature bits, use the speed value directly as
maximum rate for htb and hfsc classes.

There is still a limitation with the maximum rate that we can express
with a 32-bit number in bytes/s (~ 34.3Gbps), but using the actual link speed
instead of the feature bits, we can at least use an accurate maximum for
some link speeds (such as 25Gbps) which are not supported by netdev's feature
bits.


Hi Adrian,

See some comments below.

//Eelco



Signed-off-by: Adrian Moreno 
---
  lib/netdev-linux.c | 14 ++
  1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 3055f88d2..56b487eea 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -4746,11 +4746,10 @@ htb_parse_qdisc_details__(struct netdev *netdev_,

  hc->max_rate = smap_get_ullong(details, "max-rate", 0) / 8;
  if (!hc->max_rate) {
-    enum netdev_features current;
-
  netdev_linux_read_features(netdev);
-    current = !netdev->get_features_error ? netdev->current : 0;
-    hc->max_rate = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 8;


Re-reading my previous comments on fallback for to netdev_get_features() I 
guess the new API should replace this one.
So to make a statement I would say try to remove the netdev_features_to_bps() 
API, or move it to netdev_bsd_features_to_bps() for netdev_bsd_get_speed only 
(with a comment) so it’s clear that people should NOT use this API from now on.




We are still not deprecating the old API (netdev_get_features() and "enum 
netdev_features") since it is in line with OFP1.0 that does not have numeric 
speeds. e.g: lib/ofp-port uses netdev_get_features() to convert the maximum 
speed to kbps when decoding OFP1.0 port descriptions.


What I do think is a good idea is to add a comment pointing new users to the new 
API if what they are looking for is a more accurate speed. WDYT?





Or maybe we could even rename it and add "legacy" in the name so that it's more 
clear that it should not be used (for those who don't read the comments :-)).



--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 4/7] netdev-linux: use 64-bit rates in htb tc classes

2023-07-10 Thread Adrian Moreno



On 7/6/23 15:54, Eelco Chaudron wrote:



On 2 Jun 2023, at 16:13, Adrian Moreno wrote:


Currently, htb rates are capped at ~34Gbps because they are internally
expressed as 32-bit fields.

Move min and max rates to 64-bit fields and use TCA_HTB_RATE64 and
TCA_HTB_CEIL64 to configure HTC classes to break this barrier.

In order to test this, create a dummy tuntap device and set it's
speed to a very high value so we can try adding a QoS queue with big
rates.


Hi Adrian,

This patch looks good, one suggestion to move some of the stuff to the previous 
patch.

//Eelco


Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137619
Signed-off-by: Adrian Moreno 
---
  acinclude.m4| 10 +++
  lib/netdev-linux.c  | 59 -
  tests/atlocal.in|  1 +
  tests/system-traffic.at | 28 +++
  4 files changed, 80 insertions(+), 18 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index ac1eab790..1e5a9872d 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -211,6 +211,16 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [
  ])],
  [AC_DEFINE([HAVE_TCA_STATS_PKT64], [1],
 [Define to 1 if TCA_STATS_PKT64 is available.])])
+
+  AC_COMPILE_IFELSE([
+AC_LANG_PROGRAM([#include ], [
+int x = TCA_HTB_RATE64;
+])],
+[AC_SUBST(HAVE_TCA_HTB_RATE64,yes)
+ AC_DEFINE([HAVE_TCA_HTB_RATE64], [1],
+   [Define to 1 if TCA_HTB_RATE64 is available.])],
+[AC_SUBST(HAVE_TCA_HTB_RATE64,no)]
+)
  ])

  dnl OVS_CHECK_LINUX_SCTP_CT
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index bb81e7022..dff9d627f 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -484,9 +484,9 @@ static const struct tc_ops *const tcs[] = {
  NULL
  };

-static unsigned int tc_ticks_to_bytes(unsigned int rate, unsigned int ticks);
-static unsigned int tc_bytes_to_ticks(unsigned int rate, unsigned int size);
-static unsigned int tc_buffer_per_jiffy(unsigned int rate);
+static unsigned int tc_ticks_to_bytes(uint64_t rate, unsigned int ticks);
+static unsigned int tc_bytes_to_ticks(uint64_t rate, unsigned int size);
+static unsigned int tc_buffer_per_jiffy(uint64_t rate);
  static uint32_t tc_time_to_ticks(uint32_t time);

  static struct tcmsg *netdev_linux_tc_make_request(const struct netdev *,
@@ -516,7 +516,7 @@ tc_put_rtab(struct ofpbuf *msg, uint16_t type, const struct 
tc_ratespec *rate,
  uint64_t rate64);
  static int tc_calc_cell_log(unsigned int mtu);
  static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, int mtu);
-static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_t burst_bytes);
+static int tc_calc_buffer(uint64_t Bps, int mtu, uint64_t burst_bytes);


These calculation changes (these 4 function types) I would move these to the 
previous patch as they are general and not htb specific.



Ack. Will do in next version.


  

  /* This is set pretty low because we probably won't learn anything from the
@@ -4560,13 +4560,13 @@ static const struct tc_ops tc_ops_netem = {

  struct htb {
  struct tc tc;
-unsigned int max_rate;  /* In bytes/s. */
+uint64_t max_rate;  /* In bytes/s. */
  };

  struct htb_class {
  struct tc_queue tc_queue;
-unsigned int min_rate;  /* In bytes/s. */
-unsigned int max_rate;  /* In bytes/s. */
+uint64_t min_rate;  /* In bytes/s. */
+uint64_t max_rate;  /* In bytes/s. */
  unsigned int burst; /* In bytes. */
  unsigned int priority;  /* Lower values are higher priorities. */
  };
@@ -4654,8 +4654,8 @@ htb_setup_class__(struct netdev *netdev, unsigned int 
handle,
  if ((class->min_rate / HTB_RATE2QUANTUM) < mtu) {
  opt.quantum = mtu;
  }
-opt.buffer = tc_calc_buffer(opt.rate.rate, mtu, class->burst);
-opt.cbuffer = tc_calc_buffer(opt.ceil.rate, mtu, class->burst);
+opt.buffer = tc_calc_buffer(class->min_rate, mtu, class->burst);
+opt.cbuffer = tc_calc_buffer(class->max_rate, mtu, class->burst);
  opt.prio = class->priority;

  tcmsg = netdev_linux_tc_make_request(netdev, RTM_NEWTCLASS, NLM_F_CREATE,
@@ -4668,15 +4668,26 @@ htb_setup_class__(struct netdev *netdev, unsigned int 
handle,

  nl_msg_put_string(&request, TCA_KIND, "htb");
  opt_offset = nl_msg_start_nested(&request, TCA_OPTIONS);
+
+#ifdef HAVE_TCA_HTB_RATE64
+if (class->min_rate > UINT32_MAX) {
+nl_msg_put_u64(&request, TCA_HTB_RATE64, class->min_rate);
+}
+if (class->max_rate > UINT32_MAX) {
+nl_msg_put_u64(&request, TCA_HTB_CEIL64, class->max_rate);
+}
+#endif
  nl_msg_put_unspec(&request, TCA_HTB_PARMS, &opt, sizeof opt);
-tc_put_rtab(&request, TCA_HTB_RTAB, &opt.rate, 0);
-tc_put_rtab(&request, TCA_HTB_CTAB, &opt.ceil, 0);
+
+tc_put_rtab(&request, TCA_HTB_RTAB, &opt.rate, class->min_rate);
+tc_put_rtab(&request, TCA_HTB_CTAB, &opt.ceil, class->max_rate);
  nl_msg_end_nested(&request, opt_offset);

Re: [ovs-dev] [PATCH v1 2/7] netdev-linux: use speed as max rate in tc classes

2023-07-10 Thread Adrian Moreno



On 7/6/23 14:50, Eelco Chaudron wrote:



On 2 Jun 2023, at 16:13, Adrian Moreno wrote:


Instead of relying on feature bits, use the speed value directly as
maximum rate for htb and hfsc classes.

There is still a limitation with the maximum rate that we can express
with a 32-bit number in bytes/s (~ 34.3Gbps), but using the actual link speed
instead of the feature bits, we can at least use an accurate maximum for
some link speeds (such as 25Gbps) which are not supported by netdev's feature
bits.


Hi Adrian,

See some comments below.

//Eelco



Signed-off-by: Adrian Moreno 
---
  lib/netdev-linux.c | 14 ++
  1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 3055f88d2..56b487eea 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -4746,11 +4746,10 @@ htb_parse_qdisc_details__(struct netdev *netdev_,

  hc->max_rate = smap_get_ullong(details, "max-rate", 0) / 8;
  if (!hc->max_rate) {
-enum netdev_features current;
-
  netdev_linux_read_features(netdev);
-current = !netdev->get_features_error ? netdev->current : 0;
-hc->max_rate = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 8;


Re-reading my previous comments on fallback for to netdev_get_features() I 
guess the new API should replace this one.
So to make a statement I would say try to remove the netdev_features_to_bps() 
API, or move it to netdev_bsd_features_to_bps() for netdev_bsd_get_speed only 
(with a comment) so it’s clear that people should NOT use this API from now on.



We are still not deprecating the old API (netdev_get_features() and "enum 
netdev_features") since it is in line with OFP1.0 that does not have numeric 
speeds. e.g: lib/ofp-port uses netdev_get_features() to convert the maximum 
speed to kbps when decoding OFP1.0 port descriptions.


What I do think is a good idea is to add a comment pointing new users to the new 
API if what they are looking for is a more accurate speed. WDYT?




+hc->max_rate = !netdev->get_features_error
+   ? netdev->current_speed / 8 * 100ULL
+   : NETDEV_DEFAULT_BPS / 8;


What if for some reason netdev->current_speed equals zero, it no longer reports 
NETDEV_DEFAULT_BPS / 8.



Although it's likely this never happens with the current implementation, this 
could change and your proposal is more robust. I'll update it.



  }
  hc->min_rate = hc->max_rate;
  hc->burst = 0;
@@ -5218,11 +5217,10 @@ hfsc_parse_qdisc_details__(struct netdev *netdev_, 
const struct smap *details,

  uint32_t max_rate = smap_get_ullong(details, "max-rate", 0) / 8;
  if (!max_rate) {
-enum netdev_features current;
-
  netdev_linux_read_features(netdev);
-current = !netdev->get_features_error ? netdev->current : 0;
-max_rate = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 8;
+max_rate = !netdev->get_features_error
+   ? netdev->current_speed / 8 * 100ULL
+   : NETDEV_DEFAULT_BPS / 8;


Same as above when netdev->current_speed == 0.


  }

  class->min_rate = max_rate;
--
2.40.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev




--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/6] dpif-netdev: Rename pmd-maxsleep config option.

2023-07-10 Thread Kevin Traynor

On 29/06/2023 15:30, Simon Horman wrote:

On Wed, Jun 14, 2023 at 02:36:39PM +0100, Kevin Traynor wrote:

other_config:pmd-maxsleep is a config option to allow
PMD thread cores to sleep under low or no load conditions.

Rename it to 'pmd-sleep-max' to allow a more structured
name and so that additional options or command can follow
the 'pmd-sleep-xyz' pattern.

Signed-off-by: Kevin Traynor 


Hi Kevin,

are there any backwards compatibility considerations for this change?



Hi Simon,

It does mean the previous config option would not work anymore, yes.
However, that is an experimental option for a new feature in OVS 3.1, so 
I don't think in practice it will be widely used. Also, with it being 
experimental it carries the health risk that it may be changed.


I considered leaving it as a type of alias, but the later patches 
extend the functionality to using a string and per core, so having both 
might confusion for the user too.


For example, a user might set both strings with competing values etc. or 
think the new one is only for per core etc. e.g. pmd-maxsleep=50 
(general default sleep is 50) and pmd-sleep-max=12:200 (general default 
sleep is 0).


A clearer way might be to add a warning if the user sets pmd-maxsleep 
just for the chance that anyone starts using with 3.1 and hops to 3.2+ 
and doesn't notice the change.


I've added the below to v3. We could remove it sometime in the future 
when we think OVS 3.1 has become a bit old for users to start trying 
experimental features with it.


What do you think?

+if (smap_get_ullong(other_config, "pmd-maxsleep", 0)) {
+VLOG_WARN("pmd-maxsleep is not supported. "
+  "Please use pmd-sleep-max instead.");
+}

ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=200
2023-07-10T10:55:21Z|00108|dpif_netdev|WARN|pmd-maxsleep is not 
supported. Please use pmd-sleep-max instead.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14] netdev-dpdk: Add custom rx-steering configuration.

2023-07-10 Thread Robin Jarry
Ilya Maximets, Jul 10, 2023 at 15:48:
> In order to speed up the process a bit, proposing a following incremental
> change:
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 27b2fa5e0..aa87ee546 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2074,23 +2074,23 @@ dpdk_set_rx_steer_config(struct netdev *netdev, 
> struct netdev_dpdk *dev,
>  const char *arg = smap_get_def(args, "rx-steering", "rss");
>  uint64_t flags = 0;
>  
> -if (nullable_string_is_equal(arg, "rss+lacp")) {
> +if (!strcmp(arg, "rss+lacp")) {
>  flags = DPDK_RX_STEER_LACP;
> -} else if (!nullable_string_is_equal(arg, "rss")) {
> -VLOG_WARN_BUF(errp, "%s options:rx-steering "
> +} else if (strcmp(arg, "rss")) {
> +VLOG_WARN_BUF(errp, "%s: options:rx-steering "
>"unsupported parameter value '%s'",
>netdev_get_name(netdev), arg);
>  }
>  
>  if (flags && dev->type != DPDK_DEV_ETH) {
> -VLOG_WARN_BUF(errp, "%s options:rx-steering "
> +VLOG_WARN_BUF(errp, "%s: options:rx-steering "
>"is only supported on ethernet ports",
>netdev_get_name(netdev));
>  flags = 0;
>  }
>  
>  if (flags && netdev_is_flow_api_enabled()) {
> -VLOG_WARN_BUF(errp, "%s options:rx-steering "
> +VLOG_WARN_BUF(errp, "%s: options:rx-steering "
>"is incompatible with hw-offload",
>netdev_get_name(netdev));
>  flags = 0;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 2f756b1b7..01408e90a 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3539,9 +3539,9 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>
>  
>  
> -  If the user has already configured multiple
> -  options:n_rxq on the port, an additional one will be
> -  allocated for the specified protocols. Even if the hardware cannot
> +  If the user has already configured multiple  +  column="options" key="n_rxq" /> on the port, an additional one will
> +  be allocated for the specified protocols. Even if the hardware 
> cannot
>satisfy the requested number of requested Rx queues, the last Rx
>queue will be used. If only one Rx queue is available or if the
>hardware does not support the rte_flow matchers/actions required to
> @@ -3551,10 +3551,14 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>  
>This feature is mutually exclusive with
>
> -  as it may conflict with the offloaded RTE flows. If both are 
> enabled,
> +  as it may conflict with the offloaded flows. If both are enabled,
>rx-steering will fall back to default rss
>mode.
>  
> +
> +  This option is only applicable to interfaces with type
> +  dpdk.
> +
>
>  
> ---
>
> If that looks good to you, I could fold it in while applying the patch.
> What do you think?
>
> Best regards, Ilya Maximets.

Yes, that'll be faster. Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14] netdev-dpdk: Add custom rx-steering configuration.

2023-07-10 Thread Ilya Maximets
On 7/4/23 21:59, Robin Jarry wrote:
> Some control protocols are used to maintain link status between
> forwarding engines (e.g. LACP). When the system is not sized properly,
> the PMD threads may not be able to process all incoming traffic from the
> configured Rx queues. When a signaling packet of such protocols is
> dropped, it can cause link flapping, worsening the situation.
> 
> Use the rte_flow API to redirect these protocols into a dedicated Rx
> queue. The assumption is made that the ratio between control protocol
> traffic and user data traffic is very low and thus this dedicated Rx
> queue will never get full. Re-program the RSS redirection table to only
> use the other Rx queues.
> 
> The additional Rx queue will be assigned a PMD core like any other Rx
> queue. Polling that extra queue may introduce increased latency and
> a slight performance penalty at the benefit of preventing link flapping.
> 
> This feature must be enabled per port on specific protocols via the
> rx-steering option. This option takes "rss" followed by a "+" separated
> list of protocol names. It is only supported on ethernet ports. This
> feature is experimental.
> 
> If the user has already configured multiple Rx queues on the port, an
> additional one will be allocated for control packets. If the hardware
> cannot satisfy the number of requested Rx queues, the last Rx queue will
> be assigned for control plane. If only one Rx queue is available, the
> rx-steering feature will be disabled. If the hardware does not support
> the rte_flow matchers/actions, the rx-steering feature will be
> completely disabled on the port and regular rss will be performed
> instead.
> 
> It cannot be enabled when other-config:hw-offload=true as it may
> conflict with the offloaded flows. Similarly, if hw-offload is enabled,
> custom rx-steering will be forcibly disabled on all ports and replaced
> by regular rss.
> 
> Example use:
> 
>  ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
>set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \
>set interface phy0 options:rx-steering=rss+lacp -- \
>set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \
>set interface phy1 options:rx-steering=rss+lacp
> 
> As a starting point, only one protocol is supported: LACP. Other
> protocols can be added in the future. NIC compatibility should be
> checked.
> 
> To validate that this works as intended, I used a traffic generator to
> generate random traffic slightly above the machine capacity at line rate
> on a two ports bond interface. OVS is configured to receive traffic on
> two VLANs and pop/push them in a br-int bridge based on tags set on
> patch ports.
> 
>+--+
>| DUT  |
>|++|
>||   br-int   || 
> in_port=patch10,actions=mod_dl_src:$patch11,mod_dl_dst:$tgen1,output:patch11
>|||| 
> in_port=patch11,actions=mod_dl_src:$patch10,mod_dl_dst:$tgen0,output:patch10
>|| patch10patch11 ||
>|+---|---|+|
>||   | |
>|+---|---|+|
>|| patch00patch01 ||
>||  tag:10tag:20  ||
>||||
>||   br-phy   || default flow, action=NORMAL
>||||
>||   bond0|| balance-slb, lacp=passive, lacp-time=fast
>||phy0   phy1 ||
>|+--|-|---+|
>+---|-|+
>| |
>+---|-|+
>| port0  port1 | balance L3/L4, lacp=active, lacp-time=fast
>| lag  | mode trunk VLANs 10, 20
>|  |
>|switch|
>|  |
>|  vlan 10vlan 20  |  mode access
>|   port2  port3   |
>+-|--|-+
>  |  |
>+-|--|-+
>|   tgen0  tgen1   |  Random traffic that is properly balanced
>|  |  across the bond ports in both directions.
>|  traffic generator   |
>+--+
> 
> Without rx-steering, the bond0 links are randomly switching to
> "defaulted" when one of the LACP packets sent by the switch is dropped
> because the RX queues are full and the PMD threads did not process them
> fast enough. When that happens, all traffic must go through a single
> link which causes above line rate traffic to be dropped.
> 
>  ~# ovs-appctl lacp/show-stats bond0
>   bond0 statistics 
>  member: phy0:
>TX PDUs: 347246
>RX PDUs: 14865
>RX Bad PDUs: 0
>RX Marker Request PDUs: 0
>Link Expired: 168
>Link Defaulted: 0
>Carrier Status Changed: 0
>  member: phy1:
>TX PDUs: 347245
>RX PDUs: 14919
>RX Bad PDUs: 0
>RX Marker Request PDUs: 0
>Link Expired: 147
>Link Defaulted: 1
>Carrier Status Changed: 0
> 
> When rx-steering is enabled, no LACP packet is dropped and the bond
> links remain enabled at all times,

Re: [ovs-dev] [PATCH v2 5/6] dpif-netdev: Add per pmd sleep config.

2023-07-10 Thread Kevin Traynor

On 07/07/2023 14:22, David Marchand wrote:

On Wed, Jun 21, 2023 at 11:26 AM Kevin Traynor  wrote:


Extend 'pmd-sleep-max' so that individual PMD thread cores
may have a specified max sleep request value.

Any PMD thread core without a value will use the datapath default
(no sleep request) or datapath global value set by the user.

To set PMD thread cores 8 and 9 to never request a load based sleep
and all other PMD thread cores to be able to request a max sleep of
50 usecs:

$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=50,8:0,9:0

To set PMD thread cores 10 and 11 to request a max sleep of 100 usecs
and all other PMD thread cores to never request a sleep:

$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=10:100,11:100

'pmd-sleep-show' can be used to dump the global and individual PMD thread
core max sleep request values.

Signed-off-by: Kevin Traynor 


Reviewed-by: David Marchand 

I have some nits below which can be fixed when applying.
But I am ok too if we go with the current patch.




diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index 40e6b7843..eafcbc504 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -375,4 +375,27 @@ system configuration (e.g. enabling processor C-states) 
and workloads.
  rate.

+Max sleep request values can be set for individual PMDs using key:value pairs.
+Any PMD that has been assigned a specified value will use that. Any PMD that
+does not have a specified value will use the current global default.
+
+Specified values for individual PMDs can be added or removed at any time.
+
+For example, to set PMD thread cores 8 and 9 to never request a load based


Nit: in the dpif-netdev/pmd-sleep-show command output, "PMD thread
core X" is short, and understandable.

But for descriptions in the documentation, "PMD thread cores" is strange.
We didn't use such denomination so far in the doc.

I would go with "PMD on cores 8 and 9" (or PMD threads on cores 8 and 9).


+sleep and all others PMD cores to be able to request a max sleep of 50 usecs::


Nit: Idem, PMD (or PMD threads)




That makes sense - I agree, this is documentation so there is no need to 
abbreviate. I changed to your "PMD threads" style suggestions.





+
+$ ovs-vsctl set open_vswitch . other_config:pmd-sleep-max=50,8:0,9:0
+
+The max sleep request for each PMD can be checked in the logs or with::
+
+$ ovs-appctl dpif-netdev/pmd-sleep-show
+PMD max sleep request is 50 usecs by default.
+PMD load based sleeps are enabled by default.
+PMD thread core   8 NUMA  0: Max sleep request set to0 usecs.
+PMD thread core   9 NUMA  1: Max sleep request set to0 usecs.
+PMD thread core  10 NUMA  0: Max sleep request set to   50 usecs.
+PMD thread core  11 NUMA  1: Max sleep request set to   50 usecs.
+PMD thread core  12 NUMA  0: Max sleep request set to   50 usecs.
+PMD thread core  13 NUMA  1: Max sleep request set to   50 usecs.
+
  .. _ovs-vswitchd(8):
  http://openvswitch.org/support/dist-docs/ovs-vswitchd.8.html


[snip]


diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 88d25f1da..d9ee53ff5 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c


[snip]


@@ -5065,4 +5077,182 @@ parse_affinity_list(const char *affinity_list, unsigned 
*core_ids, int n_rxq)
  }

+static int
+parse_pmd_sleep_list(const char *max_sleep_list,
+ struct pmd_sleep **pmd_sleeps)
+{
+char *list, *copy, *key, *value;
+int num_vals = 0;
+
+if (!max_sleep_list) {
+return num_vals;
+}
+
+list = copy = xstrdup(max_sleep_list);
+
+while (ofputil_parse_key_value(&list, &key, &value)) {
+char *error = NULL;
+unsigned core;
+uint64_t temp, pmd_max_sleep;
+int i;
+
+error = str_to_u64(key, &temp);
+if (error) {
+free(error);
+continue;
+}
+
+error = str_to_u64(value, &pmd_max_sleep);
+if (error) {
+/* No value specified. key is dp default. */
+core = UINT_MAX;
+pmd_max_sleep = temp;
+free(error);
+} else {
+/* Value specified. key is  pmd core id.*/


Nit: extra space between "is pmd".



Fixed in v3.

Thanks for your review and comments on both versions.




+if (temp >= UINT_MAX) {
+continue;
+}
+core = (unsigned) temp;
+}
+


[snip]




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-10 Thread Dumitru Ceara
On 7/10/23 12:57, Dumitru Ceara wrote:
> On 7/6/23 13:00, Vladislav Odintsov wrote:
>>
>>
>>> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>>>
>>> Hi Dumitru,
>>>
>>> thanks for the quick response!
>>>
 On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:

 On 7/5/23 17:14, Vladislav Odintsov wrote:
> Hi,
>

 Hi Vladislav,

> we’ve noticed there is a huge ovn-controller memory consumption 
> introduced with [0] comparing to version without its changes in 
> ovn-controller.c part (just OVS submodule bump without ovn-controller 
> changes doesn’t trigger such behaviour).
>

 Thanks for reporting this!

> On an empty host connected to working cluster ovn-controller normally 
> consumes ~175 MiB RSS, and the same host updated to a version with commit 
> [0] consumes 3.3 GiB RSS. The start time and CPU consumption during 
> process start of an affected version is higher that for the normal 
> version.
>
> Before upgrade (normal functioning):
>
> #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
> ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
> ofctrl_sb_flow_ref_usage-KB:18
> 174.2 MiB + 327.5 KiB = 174.5 MiB ovn-controller
>
> After upgrade to an affected version I’ve checked memory report while 
> ovn-controller was starting and at some time there was a bigger amount of 
> OVN_Southbound cells comparing to "after start" time:
>
> during start:
>
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
> 3.3 GiB + 327.0 KiB =   3.3 GiB   ovn-controller
>
> after start:
>
> # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
> ovn-controller)| grep ovn
> idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
> idl-outstanding-txns-Open_vSwitch:1
> 3.3 GiB + 327.0 KiB =   3.3 GiB   ovn-controller
>
>
> cells during start:
> 11388742
>
> cells after start:
> 343896
>

 Are you running with ovn-monitor-all=true on this host?
>>>
>>> No, it has default false.
>>>

 I guess it's unlikely but I'll try just in case: would it be possible to
 share the SB database?
>>>
>>> Unfortunately, no. But I can say it’s about 450 M in size after compaction. 
>>> And there are about 1M mac_bindings if it’s important :).
>>
> 
> I tried in a sandbox, before and after the commit in question, with a
> script that adds 1M mac bindings on top of the sample topology built by
> tutorial/ovn-setup.sh.
> 
> I see ovn-controller memory usage going to >3GB in before the commit you
> blamed and to >1.9GB after the same commit.  So it looks different than
> what you reported but still worrying that we use so much memory for mac
> bindings.
> 
> I'm assuming however that quite a few of the 1M mac bindings in your
> setup are stale so would it be possible to enable mac_binding aging?  It
> needs to be enabled per router with something like:
> 
>  $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60
> 
> The threshold is in seconds and is a hard limit (for now) after which a
> mac binding entry is removed.  There's work in progress to only clear
> arp cache entries that are really stale [1].
> 
>> But if you are interested in any specific information, let me know it, I can 
>> check.
>>
> 
> How many "local" datapaths do we have on the hosts that exhibit high
> memory usage?
> 
> The quickest way I can think of to get this info is to run this on the
> hypervisor:
>  $ ovn-appctl ct-zone-list | grep snat -c
> 
> Additional question: how many mac_bindings are "local", i.e., associated
> to local datapaths?
> 
>>>

> I guess it could be connected with this problem. Can anyone look at this 
> and comment please?
>

 Does the memory usage persist after SB is upgraded too?  I see the
 number of SB idl-cells goes down eventually which means that eventually
 the periodic malloc_trim() call would free up memory.  We trim on idle
 since
 https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded

>>>
>>> In this upgrade DB schemas not upgraded, so they’re up to date.
>>>
 Are you using a different allocator?  E.g., jemalloc.
>>>
>>> No, this issue reproduces with gcc.
>>>
> 
> Can we run a test and see if malloc_trim() was actually called?  I'd
> first disable lflow-cache log rate limiting:
> 
>  $ ovn-appctl vlog/disable-rate-limit lflow_cache
> 
> Then check if malloc_trim() was called after the lflow-cache detected
> inactivity.  You'd see logs like:
> 
> "lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago):
> trimming cache

[ovs-dev] [PATCH ovn 5/5] controller: Update MAC binding timestamp

2023-07-10 Thread Ales Musil
To achieve that add thread that will handle
statistics requests and delegate the processing
to defined functions. This allows the thread to
be flexible enough, so it could be extended in future
if needed.

At the same time connected the thread with the MAC
cache I-P node to have the timestamp updates. The
updates should happen once per dump_period
(3/4 of the aging threshold) per chassis only if
the MAC binding is actively used.

Signed-off-by: Ales Musil 
---
 controller/automake.mk  |   4 +-
 controller/mac_cache.c  |  84 +++
 controller/mac_cache.h  |   7 +
 controller/ovn-controller.c |  11 +
 controller/statctrl.c   | 434 
 controller/statctrl.h   |  28 +++
 tests/ovn.at|  23 +-
 7 files changed, 585 insertions(+), 6 deletions(-)
 create mode 100644 controller/statctrl.c
 create mode 100644 controller/statctrl.h

diff --git a/controller/automake.mk b/controller/automake.mk
index 562290359..0dbbd5d26 100644
--- a/controller/automake.mk
+++ b/controller/automake.mk
@@ -45,7 +45,9 @@ controller_ovn_controller_SOURCES = \
controller/mirror.h \
controller/mirror.c \
controller/mac_cache.h \
-   controller/mac_cache.c
+   controller/mac_cache.c \
+   controller/statctrl.h \
+   controller/statctrl.c
 
 controller_ovn_controller_LDADD = lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la
 man_MANS += controller/ovn-controller.8
diff --git a/controller/mac_cache.c b/controller/mac_cache.c
index 4663499a1..6f1d661d4 100644
--- a/controller/mac_cache.c
+++ b/controller/mac_cache.c
@@ -252,3 +252,87 @@ mac_cache_threshold_remove(struct hmap *thresholds,
 hmap_remove(thresholds, &threshold->hmap_node);
 free(threshold);
 }
+
+struct mac_cache_mb_stats {
+struct ovs_list list_node;
+
+int64_t idle_age_ms;
+uint32_t cookie;
+/* Common data to identify MAC binding. */
+struct mac_cache_mb_data data;
+};
+
+void
+mac_cache_mb_stats_process_flow_stats(struct ovs_list *stats_list,
+  struct ofputil_flow_stats *ofp_stats)
+{
+struct mac_cache_mb_stats *stats = xmalloc(sizeof *stats);
+
+stats->idle_age_ms = ofp_stats->idle_age * 1000;
+stats->cookie = ntohll(ofp_stats->cookie);
+stats->data.port_key =
+ofp_stats->match.flow.regs[MFF_LOG_INPORT - MFF_REG0];
+stats->data.dp_key = ntohll(ofp_stats->match.flow.metadata);
+
+if (ofp_stats->match.flow.dl_type == htons(ETH_TYPE_IP)) {
+stats->data.ip = in6_addr_mapped_ipv4(ofp_stats->match.flow.nw_src);
+} else {
+stats->data.ip = ofp_stats->match.flow.ipv6_src;
+}
+
+stats->data.mac = ofp_stats->match.flow.dl_src;
+
+ovs_list_push_back(stats_list, &stats->list_node);
+}
+
+void
+mac_cache_mb_stats_destroy(struct ovs_list *stats_list)
+{
+struct mac_cache_mb_stats *stats;
+LIST_FOR_EACH_POP (stats, list_node, stats_list) {
+free(stats);
+}
+}
+
+void
+mac_cache_mb_stats_run(struct ovs_list *stats_list, uint64_t *req_delay,
+   void *data)
+{
+struct mac_cache_data *cache_data = data;
+long long timewall_now = time_wall_msec();
+
+struct mac_cache_threshold *threshold;
+struct mac_cache_mb_stats *stats;
+struct mac_cache_mac_binding *mc_mb;
+LIST_FOR_EACH_POP (stats, list_node, stats_list) {
+mc_mb = mac_cache_mac_binding_find_by_mb_data(cache_data,
+  &stats->data);
+
+if (!mc_mb) {
+free(stats);
+continue;
+}
+
+struct uuid *dp_uuid = &mc_mb->sbrec_mb->datapath->header_.uuid;
+threshold = mac_cache_threshold_find(&cache_data->mb_thresholds,
+ dp_uuid);
+
+uint64_t dump_period = (3 * threshold->value) / 4;
+/* If "idle_age" is under threshold it means that the mac binding is
+ * used on this chassis. Also make sure that we don't update the
+ * timestamp more than once during the dump period. */
+if (stats->idle_age_ms < threshold->value &&
+(timewall_now - mc_mb->sbrec_mb->timestamp) >= dump_period) {
+sbrec_mac_binding_set_timestamp(mc_mb->sbrec_mb, timewall_now);
+}
+
+free(stats);
+}
+
+uint64_t dump_period = UINT64_MAX;
+HMAP_FOR_EACH (threshold, hmap_node, &cache_data->mb_thresholds) {
+dump_period = MIN(dump_period, (3 * threshold->value) / 4);
+}
+
+*req_delay = dump_period < UINT64_MAX ? dump_period : 0;
+}
diff --git a/controller/mac_cache.h b/controller/mac_cache.h
index f1f1772c8..a29713908 100644
--- a/controller/mac_cache.h
+++ b/controller/mac_cache.h
@@ -71,4 +71,11 @@ void mac_cache_mac_binding_remove(struct mac_cache_data 
*data,
 void mac_cache_mac_bindings_destroy(struct mac_cache_data *data);
 bool mac_cache_sb_mac_binding_updated(const struct sbrec_mac_binding *mb);
 
+void
+mac_cache_mb_stats

[ovs-dev] [PATCH ovn 4/5] controller: Add MAC cache I-P node

2023-07-10 Thread Ales Musil
The node currently keeps track of MAC bindings
that have aging enabled. The purpose of this node
is to have structures that can be used by the
thread to update the timestamp when the MAC
binding is used. The I-P is generic enough
to be used by FDB timestamp refresh in very similar
way to the MAC binding.

This is a preparation for the MAC binding refresh mechanism.

Signed-off-by: Ales Musil 
---
 controller/automake.mk  |   4 +-
 controller/mac_cache.c  | 254 
 controller/mac_cache.h  |  74 +++
 controller/ovn-controller.c | 220 +++
 4 files changed, 551 insertions(+), 1 deletion(-)
 create mode 100644 controller/mac_cache.c
 create mode 100644 controller/mac_cache.h

diff --git a/controller/automake.mk b/controller/automake.mk
index 334672b4d..562290359 100644
--- a/controller/automake.mk
+++ b/controller/automake.mk
@@ -43,7 +43,9 @@ controller_ovn_controller_SOURCES = \
controller/vif-plug.h \
controller/vif-plug.c \
controller/mirror.h \
-   controller/mirror.c
+   controller/mirror.c \
+   controller/mac_cache.h \
+   controller/mac_cache.c
 
 controller_ovn_controller_LDADD = lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la
 man_MANS += controller/ovn-controller.8
diff --git a/controller/mac_cache.c b/controller/mac_cache.c
new file mode 100644
index 0..4663499a1
--- /dev/null
+++ b/controller/mac_cache.c
@@ -0,0 +1,254 @@
+/* Copyright (c) 2023, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+
+#include "lport.h"
+#include "mac_cache.h"
+#include "openvswitch/hmap.h"
+#include "openvswitch/vlog.h"
+#include "ovn/logical-fields.h"
+#include "ovn-sb-idl.h"
+
+VLOG_DEFINE_THIS_MODULE(mac_cache);
+
+static uint32_t
+mac_cache_mb_data_hash(const struct mac_cache_mb_data *mb_data);
+static inline bool
+mac_cache_mb_data_equals(const struct mac_cache_mb_data *a,
+  const struct mac_cache_mb_data *b);
+static struct mac_cache_mac_binding *
+mac_cache_mac_binding_find_by_mb_data(struct mac_cache_data *data,
+  const struct mac_cache_mb_data *mb_data);
+static bool
+mac_cache_mb_data_from_sbrec(struct mac_cache_mb_data *data,
+  const struct sbrec_mac_binding *mb,
+  struct ovsdb_idl_index *sbrec_pb_by_name);
+static struct mac_cache_threshold *
+mac_cache_threshold_find(struct hmap *thresholds, const struct uuid *uuid);
+static void
+mac_cache_threshold_remove(struct hmap *thresholds,
+   struct mac_cache_threshold *threshold);
+
+bool
+mac_cache_threshold_add(struct mac_cache_data *data,
+const struct sbrec_datapath_binding *dp)
+{
+struct mac_cache_threshold *threshold =
+mac_cache_threshold_find(&data->mb_thresholds, &dp->header_.uuid);
+if (threshold) {
+return true;
+}
+
+uint64_t mb_threshold = smap_get_uint(&dp->external_ids,
+  "mac_binding_age_threshold", 0);
+if (!mb_threshold) {
+return false;
+}
+
+threshold = xmalloc(sizeof *threshold);
+threshold->uuid = dp->header_.uuid;
+threshold->value = mb_threshold * 1000;
+
+hmap_insert(&data->mb_thresholds, &threshold->hmap_node,
+uuid_hash(&dp->header_.uuid));
+
+return true;
+}
+
+bool
+mac_cache_threshold_replace(struct mac_cache_data *data,
+const struct sbrec_datapath_binding *dp)
+{
+struct mac_cache_threshold *threshold =
+mac_cache_threshold_find(&data->mb_thresholds, &dp->header_.uuid);
+if (threshold) {
+mac_cache_threshold_remove(&data->mb_thresholds, threshold);
+}
+
+return mac_cache_threshold_add(data, dp);
+}
+
+void
+mac_cache_thresholds_destroy(struct mac_cache_data *data)
+{
+struct mac_cache_threshold *threshold;
+HMAP_FOR_EACH_POP (threshold, hmap_node, &data->mb_thresholds) {
+free(threshold);
+}
+}
+
+void
+mac_cache_mac_binding_add(struct mac_cache_data *data,
+   const struct sbrec_mac_binding *mb,
+   struct ovsdb_idl_index *sbrec_pb_by_name)
+{
+struct mac_cache_mb_data mb_data;
+if (!mac_cache_mb_data_from_sbrec(&mb_data, mb, sbrec_pb_by_name)) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+ 

[ovs-dev] [PATCH ovn 3/5] northd: Synchronize the MAC binding age threshold

2023-07-10 Thread Ales Musil
Synchrinoize the MAC binding age threashold to
SB datapath.

This is a preparation for the MAC binding refresh mechanism.

Signed-off-by: Ales Musil 
---
 northd/northd.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/northd/northd.c b/northd/northd.c
index 4e3c5d02a..7958304cf 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -1150,6 +1150,13 @@ ovn_datapath_update_external_ids(struct ovn_datapath *od)
 if (!learn_from_arp_request) {
 smap_add(&ids, "always_learn_from_arp_request", "false");
 }
+
+uint32_t age_threshold = smap_get_uint(&od->nbr->options,
+   "mac_binding_age_threshold", 0);
+if (age_threshold) {
+smap_add_format(&ids, "mac_binding_age_threshold",
+"%u", age_threshold);
+}
 }
 
 sbrec_datapath_binding_set_external_ids(od->sb, &ids);
-- 
2.40.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/5] northd, controller: Use the MAC cache table

2023-07-10 Thread Ales Musil
Populate and use the MAC cache table. The MAC cache has
the following entries:
ip,reg14=,metadata=,dl_src=,nw_src= 
actions=drop
ipv6,reg14=,metadata=,dl_src=,ipv6_src= 
actions=drop

The "mac_cache_use" action will resubmit packets from
"lr_in_learn_neighbor" table for MAC bindings that we already
know.

This is a preparation for the MAC binding refresh mechanism.

Signed-off-by: Ales Musil 
---
 controller/lflow.c  | 22 ++
 controller/lflow.h  |  1 +
 northd/northd.c |  2 +-
 tests/ovn-northd.at |  2 +-
 tests/ovn.at| 23 ---
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 22faaf013..bc5f73279 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -887,6 +887,7 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 .fdb_lookup_ptable = OFTABLE_LOOKUP_FDB,
 .in_port_sec_ptable = OFTABLE_CHK_IN_PORT_SEC,
 .out_port_sec_ptable = OFTABLE_CHK_OUT_PORT_SEC,
+.mac_cache_use_table = OFTABLE_MAC_CACHE_USE,
 .ctrl_meter_id = ctrl_meter_id,
 .common_nat_ct_zone = get_common_nat_zone(ldp),
 };
@@ -1337,6 +1338,7 @@ consider_neighbor_flow(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 
 struct match get_arp_match = MATCH_CATCHALL_INITIALIZER;
 struct match lookup_arp_match = MATCH_CATCHALL_INITIALIZER;
+struct match mb_cache_use_match = MATCH_CATCHALL_INITIALIZER;
 
 if (strchr(ip, '.')) {
 ovs_be32 ip_addr;
@@ -1345,9 +1347,14 @@ consider_neighbor_flow(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 VLOG_WARN_RL(&rl, "bad 'ip' %s", ip);
 return;
 }
+
 match_set_reg(&get_arp_match, 0, ntohl(ip_addr));
+
 match_set_reg(&lookup_arp_match, 0, ntohl(ip_addr));
 match_set_dl_type(&lookup_arp_match, htons(ETH_TYPE_ARP));
+
+match_set_dl_type(&mb_cache_use_match, htons(ETH_TYPE_IP));
+match_set_nw_src(&mb_cache_use_match, ip_addr);
 } else {
 struct in6_addr ip6;
 if (!ipv6_parse(ip, &ip6)) {
@@ -1363,6 +1370,9 @@ consider_neighbor_flow(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 match_set_dl_type(&lookup_arp_match, htons(ETH_TYPE_IPV6));
 match_set_nw_proto(&lookup_arp_match, 58);
 match_set_icmp_code(&lookup_arp_match, 0);
+
+match_set_dl_type(&mb_cache_use_match, htons(ETH_TYPE_IPV6));
+match_set_ipv6_src(&mb_cache_use_match, &ip6);
 }
 
 match_set_metadata(&get_arp_match, htonll(pb->datapath->tunnel_key));
@@ -1372,6 +1382,11 @@ consider_neighbor_flow(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 match_set_reg(&lookup_arp_match, MFF_LOG_INPORT - MFF_REG0,
   pb->tunnel_key);
 
+match_set_dl_src(&mb_cache_use_match, mac_addr);
+match_set_reg(&mb_cache_use_match, MFF_LOG_INPORT - MFF_REG0,
+  pb->tunnel_key);
+match_set_metadata(&mb_cache_use_match, htonll(pb->datapath->tunnel_key));
+
 uint64_t stub[1024 / 8];
 struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub);
 uint8_t value = 1;
@@ -1392,6 +1407,13 @@ consider_neighbor_flow(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 &lookup_arp_match, &ofpacts,
 b ? &b->header_.uuid : &smb->header_.uuid);
 
+if (b) {
+ofpbuf_clear(&ofpacts);
+ofctrl_add_flow(flow_table, OFTABLE_MAC_CACHE_USE, priority,
+b->header_.uuid.parts[0], &mb_cache_use_match,
+&ofpacts, &b->header_.uuid);
+}
+
 ofpbuf_uninit(&ofpacts);
 }
 
diff --git a/controller/lflow.h b/controller/lflow.h
index 2472dec29..5da4385e4 100644
--- a/controller/lflow.h
+++ b/controller/lflow.h
@@ -93,6 +93,7 @@ struct uuid;
 #define OFTABLE_ECMP_NH_MAC  76
 #define OFTABLE_ECMP_NH  77
 #define OFTABLE_CHK_LB_AFFINITY  78
+#define OFTABLE_MAC_CACHE_USE79
 
 struct lflow_ctx_in {
 struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
diff --git a/northd/northd.c b/northd/northd.c
index b9605862e..4e3c5d02a 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -12615,7 +12615,7 @@ build_neigh_learning_flows_for_lrouter(
   learn_from_arp_request ? "" :
   " || "REGBIT_LOOKUP_NEIGHBOR_IP_RESULT" == 0");
 ovn_lflow_add(lflows, od, S_ROUTER_IN_LEARN_NEIGHBOR, 100,
-  ds_cstr(match), "next;");
+  ds_cstr(match), "mac_cache_use; next;");
 
 ovn_lflow_metered(lflows, od, S_ROUTER_IN_LEARN_NEIGHBOR, 90,
   "arp", "put_arp(inport, arp.spa, arp.sha); next;",
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 3e06f14c9..6298d4cec 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -7432,7 +7432,7 @@ AT_CHECK([cat lrflows | grep -e lr_in_lookup_neighbor -e 
lr_in_learn_neighbor |
   table=1 (lr_in_lookup_neighbo

[ovs-dev] [PATCH ovn 1/5] actions: Add mac_cache_use action

2023-07-10 Thread Ales Musil
The mac_cache_use is just resubmit to MAC use cache table,
which will be later on used by the timestamp refresh
mechanism.

This is a preparation for the MAC binding refresh mechanism.

Signed-off-by: Ales Musil 
---
 include/ovn/actions.h |  3 +++
 lib/actions.c | 17 +
 ovn-sb.xml|  8 
 tests/ovn.at  |  4 
 tests/test-ovn.c  |  1 +
 utilities/ovn-trace.c |  2 ++
 6 files changed, 35 insertions(+)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 23a919049..04bb6ffd0 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -126,6 +126,7 @@ struct collector_set_ids;
 OVNACT(COMMIT_LB_AFF, ovnact_commit_lb_aff)   \
 OVNACT(CHK_LB_AFF,ovnact_result)  \
 OVNACT(SAMPLE,ovnact_sample)  \
+OVNACT(MAC_CACHE_USE, ovnact_null)\
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -879,6 +880,8 @@ struct ovnact_encode_params {
sends packets to controller. */
 uint32_t common_nat_ct_zone; /* When performing NAT in a common CT zone,
 this determines which CT zone to use */
+uint32_t mac_cache_use_table; /* OpenFlow table for 'mac_cache_use'
+   * to resubmit. */
 };
 
 void ovnacts_encode(const struct ovnact[], size_t ovnacts_len,
diff --git a/lib/actions.c b/lib/actions.c
index 037172e60..4f67df13f 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -5224,6 +5224,20 @@ encode_CHK_LB_AFF(const struct ovnact_result *res,
MLF_USE_LB_AFF_SESSION_BIT, ofpacts);
 }
 
+static void
+format_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED, struct ds *s)
+{
+ds_put_cstr(s, "mac_cache_use;");
+}
+
+static void
+encode_MAC_CACHE_USE(const struct ovnact_null *null OVS_UNUSED,
+ const struct ovnact_encode_params *ep,
+ struct ofpbuf *ofpacts)
+{
+emit_resubmit(ofpacts, ep->mac_cache_use_table);
+}
+
 /* Parses an assignment or exchange or put_dhcp_opts action. */
 static void
 parse_set_action(struct action_context *ctx)
@@ -5429,9 +5443,12 @@ parse_action(struct action_context *ctx)
 parse_commit_lb_aff(ctx, ovnact_put_COMMIT_LB_AFF(ctx->ovnacts));
 } else if (lexer_match_id(ctx->lexer, "sample")) {
 parse_sample(ctx);
+} else if (lexer_match_id(ctx->lexer, "mac_cache_use")) {
+ovnact_put_MAC_CACHE_USE(ctx->ovnacts);
 } else {
 lexer_syntax_error(ctx->lexer, "expecting action");
 }
+
 lexer_force_match(ctx->lexer, LEX_T_SEMICOLON);
 return !ctx->lexer->error;
 }
diff --git a/ovn-sb.xml b/ovn-sb.xml
index f9dd13602..46aedf973 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -2790,6 +2790,14 @@ tcp.flags = RST;
 
   
 
+
+mac_cache_use;
+
+  
+This action resubmits to corresponding table which updates the
+use statistics of MAC cache.
+  
+
   
 
 
diff --git a/tests/ovn.at b/tests/ovn.at
index cd6d4b9ff..7fee42acf 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -2253,6 +2253,10 @@ 
sample(probability=0,collector_set=200,obs_domain=300,obs_point=foo);
 sample(probability=10,foo=bar,obs_domain=0,obs_point=1000);
 Syntax error at `foo' unknown argument.
 
+# mac_cache_use
+mac_cache_use;
+encodes as resubmit(,79)
+
 # Miscellaneous negative tests.
 ;
 Syntax error at `;'.
diff --git a/tests/test-ovn.c b/tests/test-ovn.c
index 6c7754eac..1f1e27b51 100644
--- a/tests/test-ovn.c
+++ b/tests/test-ovn.c
@@ -1374,6 +1374,7 @@ test_parse_actions(struct ovs_cmdl_context *ctx 
OVS_UNUSED)
 .common_nat_ct_zone = MFF_LOG_DNAT_ZONE,
 .in_port_sec_ptable = OFTABLE_CHK_IN_PORT_SEC,
 .out_port_sec_ptable = OFTABLE_CHK_OUT_PORT_SEC,
+.mac_cache_use_table = OFTABLE_MAC_CACHE_USE,
 .lflow_uuid.parts =
 { 0x, 0x, 0x, 0x},
 .dp_key = 0xabcdef,
diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
index fb54ed060..e6b161850 100644
--- a/utilities/ovn-trace.c
+++ b/utilities/ovn-trace.c
@@ -3353,6 +3353,8 @@ trace_actions(const struct ovnact *ovnacts, size_t 
ovnacts_len,
 break;
 case OVNACT_SAMPLE:
 break;
+case OVNACT_MAC_CACHE_USE:
+break;
 }
 }
 ofpbuf_uninit(&stack);
-- 
2.40.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/5] Add MAC binding aging timestamp refresh mechanism

2023-07-10 Thread Ales Musil
The timestamp refresh mechanism is based on
separate table. The new thread will collect
flow statistics from this table and based on
"idle_age" data we can decide whether we should
bump the timestamp (MAC binding still in use)
or keep it as is.

The timestamp refresh mechanism works only if
the MAC binding aging is activated, so ovn-controller
doesn't do anything otherwise. To prevent a lot of updates
of timestamp happening concurrently the flow dump
period is set to 3/4 of aging thresholdi, and we do
update only rows which have the timestamp older than
the dump period.

One thing to note, the current implementation keeps in
mind possible enhancement for FDB timestamp refresh in
very similar manner. Once the design is settled I'll
append the FDB to the series.

Another thing to note is that the thread is purposely
made somewhat generic, there is an idea to reuse it for
packet statistics on logical ports. Which should be fairly
easy to do in its current form.

There isn't any test case yet, it will be in the official
patch iteration. It was tested manually for now.

Ales Musil (5):
  actions: Add mac_cache_use action
  northd, controller: Use the MAC cache table
  northd: Synchronize the MAC binding age threshold
  controller: Add MAC cache I-P node
  controller: Update MAC binding timestamp

 controller/automake.mk  |   6 +-
 controller/lflow.c  |  22 ++
 controller/lflow.h  |   1 +
 controller/mac_cache.c  | 338 
 controller/mac_cache.h  |  81 +++
 controller/ovn-controller.c | 231 +++
 controller/statctrl.c   | 434 
 controller/statctrl.h   |  28 +++
 include/ovn/actions.h   |   3 +
 lib/actions.c   |  17 ++
 northd/northd.c |   9 +-
 ovn-sb.xml  |   8 +
 tests/ovn-northd.at |   2 +-
 tests/ovn.at|  50 -
 tests/test-ovn.c|   1 +
 utilities/ovn-trace.c   |   2 +
 16 files changed, 1222 insertions(+), 11 deletions(-)
 create mode 100644 controller/mac_cache.c
 create mode 100644 controller/mac_cache.h
 create mode 100644 controller/statctrl.c
 create mode 100644 controller/statctrl.h

-- 
2.40.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] ovn-controller high memory consumption with sbrec_server_has_##table##_table - enabled codepath

2023-07-10 Thread Dumitru Ceara
On 7/6/23 13:00, Vladislav Odintsov wrote:
> 
> 
>> On 5 Jul 2023, at 20:07, Vladislav Odintsov  wrote:
>>
>> Hi Dumitru,
>>
>> thanks for the quick response!
>>
>>> On 5 Jul 2023, at 19:53, Dumitru Ceara  wrote:
>>>
>>> On 7/5/23 17:14, Vladislav Odintsov wrote:
 Hi,

>>>
>>> Hi Vladislav,
>>>
 we’ve noticed there is a huge ovn-controller memory consumption introduced 
 with [0] comparing to version without its changes in ovn-controller.c part 
 (just OVS submodule bump without ovn-controller changes doesn’t trigger 
 such behaviour).

>>>
>>> Thanks for reporting this!
>>>
 On an empty host connected to working cluster ovn-controller normally 
 consumes ~175 MiB RSS, and the same host updated to a version with commit 
 [0] consumes 3.3 GiB RSS. The start time and CPU consumption during 
 process start of an affected version is higher that for the normal version.

 Before upgrade (normal functioning):

 #ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:343855 idl-cells-Open_vSwitch:14131 
 ofctrl_desired_flow_usage-KB:76 ofctrl_installed_flow_usage-KB:60 
 ofctrl_sb_flow_ref_usage-KB:18
 174.2 MiB + 327.5 KiB = 174.5 MiB  ovn-controller

 After upgrade to an affected version I’ve checked memory report while 
 ovn-controller was starting and at some time there was a bigger amount of 
 OVN_Southbound cells comparing to "after start" time:

 during start:

 # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:11388742 idl-cells-Open_vSwitch:14131 
 idl-outstanding-txns-Open_vSwitch:1
 3.3 GiB + 327.0 KiB =   3.3 GiBovn-controller

 after start:

 # ovn-appctl -t ovn-controller memory/show ;ps-mem -p $(pidof 
 ovn-controller)| grep ovn
 idl-cells-OVN_Southbound:343896 idl-cells-Open_vSwitch:14131 
 idl-outstanding-txns-Open_vSwitch:1
 3.3 GiB + 327.0 KiB =   3.3 GiBovn-controller


 cells during start:
 11388742

 cells after start:
 343896

>>>
>>> Are you running with ovn-monitor-all=true on this host?
>>
>> No, it has default false.
>>
>>>
>>> I guess it's unlikely but I'll try just in case: would it be possible to
>>> share the SB database?
>>
>> Unfortunately, no. But I can say it’s about 450 M in size after compaction. 
>> And there are about 1M mac_bindings if it’s important :).
> 

I tried in a sandbox, before and after the commit in question, with a
script that adds 1M mac bindings on top of the sample topology built by
tutorial/ovn-setup.sh.

I see ovn-controller memory usage going to >3GB in before the commit you
blamed and to >1.9GB after the same commit.  So it looks different than
what you reported but still worrying that we use so much memory for mac
bindings.

I'm assuming however that quite a few of the 1M mac bindings in your
setup are stale so would it be possible to enable mac_binding aging?  It
needs to be enabled per router with something like:

 $ ovn-nbctl set logical_router RTR options:mac_binding_age_threshold=60

The threshold is in seconds and is a hard limit (for now) after which a
mac binding entry is removed.  There's work in progress to only clear
arp cache entries that are really stale [1].

> But if you are interested in any specific information, let me know it, I can 
> check.
> 

How many "local" datapaths do we have on the hosts that exhibit high
memory usage?

The quickest way I can think of to get this info is to run this on the
hypervisor:
 $ ovn-appctl ct-zone-list | grep snat -c

Additional question: how many mac_bindings are "local", i.e., associated
to local datapaths?

>>
>>>
 I guess it could be connected with this problem. Can anyone look at this 
 and comment please?

>>>
>>> Does the memory usage persist after SB is upgraded too?  I see the
>>> number of SB idl-cells goes down eventually which means that eventually
>>> the periodic malloc_trim() call would free up memory.  We trim on idle
>>> since
>>> https://github.com/ovn-org/ovn/commit/b4c593d23bd959e98fcc9ada4a973ac933579ded
>>>
>>
>> In this upgrade DB schemas not upgraded, so they’re up to date.
>>
>>> Are you using a different allocator?  E.g., jemalloc.
>>
>> No, this issue reproduces with gcc.
>>

Can we run a test and see if malloc_trim() was actually called?  I'd
first disable lflow-cache log rate limiting:

 $ ovn-appctl vlog/disable-rate-limit lflow_cache

Then check if malloc_trim() was called after the lflow-cache detected
inactivity.  You'd see logs like:

"lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago):
trimming cache"

The fact that the number of SB idl-cells goes down and memory doesn't
seems to indicate we might have a bug in the auto cache trimming mechanism.

In any case, a malloc_trim() can be manually triggered by flushing the
lfl

[ovs-dev] [PATCH ovn] northd: Lookup route policy before ip routing

2023-07-10 Thread wangchuanlei
If there is no route in table ip_routing, the route policy
item in table policy is useless.
Because route policy has a higher priority than ip routing,
so moddify the table id of IP_ROUTING and POLICY is a little
better.By this way, there is no need reroute any more, this
should be more effienct.

Signed-off-by: wangchuanlei 
---
 northd/northd.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index a45c8b53a..35187abf8 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -174,10 +174,10 @@ enum ovn_stage {
 PIPELINE_STAGE(ROUTER, IN,  ND_RA_OPTIONS,   10, "lr_in_nd_ra_options") \
 PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE,  11, "lr_in_nd_ra_response") \
 PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_PRE,  12, "lr_in_ip_routing_pre")  \
-PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  13, "lr_in_ip_routing")  \
-PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 14, "lr_in_ip_routing_ecmp") \
-PIPELINE_STAGE(ROUTER, IN,  POLICY,  15, "lr_in_policy")  \
-PIPELINE_STAGE(ROUTER, IN,  POLICY_ECMP, 16, "lr_in_policy_ecmp") \
+PIPELINE_STAGE(ROUTER, IN,  POLICY,  13, "lr_in_policy")  \
+PIPELINE_STAGE(ROUTER, IN,  POLICY_ECMP, 14, "lr_in_policy_ecmp") \
+PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  15, "lr_in_ip_routing")  \
+PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 16, "lr_in_ip_routing_ecmp") \
 PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 17, "lr_in_arp_resolve") \
 PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN, 18, "lr_in_chk_pkt_len") \
 PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS, 19, "lr_in_larger_pkts") \
@@ -278,6 +278,7 @@ enum ovn_stage {
 #define REG_SRC_IPV4 "reg1"
 #define REG_SRC_IPV6 "xxreg1"
 #define REG_ROUTE_TABLE_ID "reg7"
+#define REG_ROUTE_POLICY "reg2"
 
 /* Register used to store backend ipv6 address
  * for load balancer affinity. */
@@ -10240,6 +10241,7 @@ build_routing_policy_flow(struct hmap *lflows, struct 
ovn_datapath *od,
 bool is_ipv4 = strchr(nexthop, '.') ? true : false;
 ds_put_format(&actions, "%s = %s; "
   "%s = %s; "
+  "%s = 1; "
   "eth.src = %s; "
   "outport = %s; "
   "flags.loopback = 1; "
@@ -10249,6 +10251,7 @@ build_routing_policy_flow(struct hmap *lflows, struct 
ovn_datapath *od,
   nexthop,
   is_ipv4 ? REG_SRC_IPV4 : REG_SRC_IPV6,
   lrp_addr_s,
+  REG_ROUTE_POLICY,
   out_port->lrp_networks.ea_s,
   out_port->json_key);
 
@@ -10340,6 +10343,7 @@ build_ecmp_routing_policy_flows(struct hmap *lflows, 
struct ovn_datapath *od,
   out_port->json_key);
 
 ds_clear(&match);
+ds_put_cstr(&actions, REG_ROUTE_POLICY" = 1; ");
 ds_put_format(&match, REG_ECMP_GROUP_ID" == %"PRIu16" && "
   REG_ECMP_MEMBER_ID" == %"PRIuSIZE,
   ecmp_group_id, i + 1);
@@ -12911,6 +12915,8 @@ build_mcast_lookup_flows_for_lrouter(
  */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, 10550,
   "nd_rs || nd_ra", debug_drop_action());
+ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, 1,
+  REG_ROUTE_POLICY" == 1", "reg8[0..15] = 0; next;");
 if (!od->mcast_info.rtr.relay) {
 return;
 }
-- 
2.27.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Scale testing OVN with ovn-heater for OpenStack use cases

2023-07-10 Thread Frode Nordahl
Have now sent out an invite to everyone participating in this thread.

For the benefit of anyone else wanting to attend I'm also sharing the
video link and other resources here:
Date/Time: Tuesday July 11th 13:30 UTC
Video link: https://meet.google.com/jno-ouvk-rfs
Meeting notes: 
https://docs.google.com/document/d/1IHeWiLuspPiUIzgKZMmV_9i8_0VujUZfw58LFeau8eE

Agenda:
Introductions / Code of Conduct 10m
approach to development 5m
ovn-heater approach – simulation 10m
simulation of desired state vs. load inducing CMS behavior 10m
first steps 10m
next meeting 5m
AOB

(Apologies for the top post.)

-- 
Frode Nordahl

On Wed, Jul 5, 2023 at 10:18 AM Haresh Khandelwal  wrote:
>
> Hi Frode, Please add me (hakha...@redhat.com) as well to the invite list.
>
> Thanks
> -Haresh
>
> On Tue, Jul 4, 2023 at 10:30 PM Roberto Bartzen Acosta via dev <
> ovs-dev@openvswitch.org> wrote:
>
> > I'm interested in attending this meet, Frode. Please include me in the
> > invite list.
> >
> > Thanks
> >
> > Em ter., 4 de jul. de 2023 às 13:06, Numan Siddique 
> > escreveu:
> >
> > > On Tue, Jul 4, 2023, 8:00 PM Frode Nordahl 
> > > wrote:
> > >
> > > > On Tue, Jul 4, 2023 at 4:16 PM Dumitru Ceara 
> > wrote:
> > > > >
> > > > > On 6/30/23 23:07, Terry Wilson wrote:
> > > > > > On Fri, Jun 30, 2023 at 2:26 AM Frode Nordahl
> > > > > >  wrote:
> > > > > >>
> > > > > >> Hello all,
> > > > > >>
> > > > > >> On Tue, May 30, 2023 at 5:16 PM Felix Huettner
> > > > > >>  wrote:
> > > > > >>>
> > > > > >>> Hi Dumitru,
> > > > > >>>
> > > > > >>> On Fri, May 26, 2023 at 01:30:54PM +0200, Dumitru Ceara wrote:
> > > > >  On 5/24/23 09:37, Felix Huettner wrote:
> > > > > > Hi everyone,
> > > > > 
> > > > >  Hi Felix,
> > > > > 
> > > > > >
> > > > > > Ilya mentioned to me that you will want to bring openstack
> > > > examples to
> > > > > > ovn-heater.
> > > > > >
> > > > > 
> > > > >  Yes, we're discussing that.
> > > > > 
> > > > > > I wanted to ask how to best join this effort. It would be great
> > > > for us
> > > > > 
> > > > >  Everyone is welcome to contribute! :)
> > > > > 
> > > > > >>>
> > > > > >>> Great, we will try that out.
> > > > > >>
> > > > > >> Thank you for your interest in this effort, as promised I'm in the
> > > > > >> process of organizing a community meeting to get this going.
> > > > > >>
> > > > > >> Below are some proposals for dates and times, please respond with
> > a
> > > > > >> prioritized list of what works best for you and we'll try to land
> > > on a
> > > > > >> single date/time for a first meeting:
> > > > > >> Wednesday July 5th 13:00 UTC
> > > > > >> Tuesday July 11th 13:30 UTC
> > > > > >> Wednesday July 12th 8:30 UTC
> > > > > >
> > > > > > I'd be interested in attending. Of these, Tuesday July 11th @ 13:30
> > > > > > works best for me. 08:30 would be 03:30 for me, which is a little
> > > > > > early. I could do Wed July 5th as well. It's the day after the the
> > > > > > July 4 holiday here, and a lot of people are making it a long
> > > weekend.
> > > > > > So there will be the inevitable "catch up" associated with that,
> > but
> > > > > > it is doable.
> > > > > >
> > > > > > Terry (otherwiseguy)
> > > > > >
> > > > >
> > > > > Given that July 5th is tomorrow I'm assuming there won't be a
> > community
> > > > > meeting as it seems quite short notice.
> > > > >
> > > > > Frode, is Tuesday July 11th 13:30 UTC still an option for you?  If
> > I'm
> > > > > not wrong this was the slot that worked for everyone that replied.
> > > >
> > > > Yes, Tuesday July 11th 13:30 UTC is indeed sailing up as the consensus,
> > > > so let's conclude that's the date/time of our first meeting.
> > > >
> > > > Thanks for swift replies everyone! I'll send an meeting invite with
> > > > proposed
> > > > agenda to those who responded to this thread soon, and I'll also pop a
> > > > video link into this thread as we get closer for the benefit of anyone
> > > else
> > > > interested.
> > > >
> > >
> > > Please include me in the invite list.
> > > (nusid...@redhat.com)
> > >
> > > Numan
> > >
> > >
> > > > --
> > > > Frode Nordahl
> > > >
> > > > >
> > > > > Thanks,
> > > > > Dumitru
> > > > >
> > > > >
> > > > ___
> > > > dev mailing list
> > > > d...@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > > ___
> > > dev mailing list
> > > d...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
> > --
> >
> >
> >
> >
> > _‘Esta mensagem é direcionada apenas para os endereços constantes no
> > cabeçalho inicial. Se você não está listado nos endereços constantes no
> > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> > estão
> > imediatamente anuladas e proibidas’._
> >
> >
> > * **‘Apesar do Magazine Luiza to