Re: [PATCH v2 1/1] app/test: resolve mbuf_test application failure

2023-10-17 Thread Olivier Matz
Hi Rakesh,

Sorry for the delay.

On Tue, Oct 03, 2023 at 04:20:57AM +, Rakesh Kudurumalla wrote:
> Hi Olivier,
> 
> Let me know if you have any comments on this patch.
> 
> Regards,
> Rakesh
> 
> > -Original Message-
> > From: Rakesh Kudurumalla 
> > Sent: Wednesday, July 26, 2023 11:25 AM
> > To: Olivier Matz 
> > Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran ; Nithin
> > Kumar Dabilpuram ; Rakesh Kudurumalla
> > 
> > Subject: [PATCH v2 1/1] app/test: resolve mbuf_test application failure

app/test: fix external mbuf test when assertions enabled

> > 
> > when RTE_ENABLE_ASSERT is defined test_mbuf application is failing
> > because we are trying to attach extbuf to a cloned buffer to which external
> > mbuf is already attached.To make test_mbuf pass CI we have updated
> > ol_flags. This patch fixes the same.
> > 

Fixes: 7b295dceea07 ("test/mbuf: add unit test cases")

> > Signed-off-by: Rakesh Kudurumalla 

Acked-by: Olivier Matz 

> > ---
> > v2 : Addressed comments by removing extbuf call
> >  as mbuf is already attached
> > 
> >  app/test/test_mbuf.c | 5 +
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c index
> > efac01806b..722e1ef624 100644
> > --- a/app/test/test_mbuf.c
> > +++ b/app/test/test_mbuf.c
> > @@ -2345,16 +2345,13 @@ test_pktmbuf_ext_shinfo_init_helper(struct
> > rte_mempool *pktmbuf_pool)
> > GOTO_FAIL("%s: External buffer is not attached to mbuf\n",
> > __func__);
> > 
> > -   /* allocate one more mbuf */
> > +   /* allocate one more mbuf, it is attached to the same external buffer
> > +*/
> > clone = rte_pktmbuf_clone(m, pktmbuf_pool);
> > if (clone == NULL)
> > GOTO_FAIL("%s: mbuf clone allocation failed!\n", __func__);
> > if (rte_pktmbuf_pkt_len(clone) != 0)
> > GOTO_FAIL("%s: Bad packet length\n", __func__);
> > 
> > -   /* attach the same external buffer to the cloned mbuf */
> > -   rte_pktmbuf_attach_extbuf(clone, ext_buf_addr, buf_iova,
> > buf_len,
> > -   ret_shinfo);
> > if (clone->ol_flags != RTE_MBUF_F_EXTERNAL)
> > GOTO_FAIL("%s: External buffer is not attached to mbuf\n",
> > __func__);
> > --
> > 2.25.1
> 


[PATCH] maintainers: remove olivier.m...@6wind.com

2023-10-17 Thread Olivier Matz
Unfortunatly I don't have enough time to undertake my maintainer role at
the expected level. It will probably not going to get better anytime
soon, so remove myself from maintainers.

Signed-off-by: Olivier Matz 
---
 MAINTAINERS | 6 --
 1 file changed, 6 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 80e071d13e..2b7dd0afb9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -378,7 +378,6 @@ Core Libraries
 T: git://dpdk.org/dpdk
 
 Memory pool
-M: Olivier Matz 
 M: Andrew Rybchenko 
 F: lib/mempool/
 F: drivers/mempool/ring/
@@ -395,14 +394,12 @@ F: app/test/test_ring*
 F: app/test/test_func_reentrancy.c
 
 Stack
-M: Olivier Matz 
 F: lib/stack/
 F: drivers/mempool/stack/
 F: app/test/test_stack*
 F: doc/guides/prog_guide/stack_lib.rst
 
 Packet buffer
-M: Olivier Matz 
 F: lib/mbuf/
 F: doc/guides/prog_guide/mbuf_lib.rst
 F: app/test/test_mbuf.c
@@ -1468,7 +1465,6 @@ Packet processing
 -
 
 Network headers
-M: Olivier Matz 
 F: lib/net/
 F: app/test/test_cksum.c
 F: app/test/test_cksum_perf.c
@@ -1649,7 +1645,6 @@ F: app/test/test_cfgfile.c
 F: app/test/test_cfgfiles/
 
 Interactive command line
-M: Olivier Matz 
 F: lib/cmdline/
 F: app/test-cmdline/
 F: app/test/test_cmdline*
@@ -1657,7 +1652,6 @@ F: examples/cmdline/
 F: doc/guides/sample_app_ug/cmd_line.rst
 
 Key/Value parsing
-M: Olivier Matz 
 F: lib/kvargs/
 F: app/test/test_kvargs.c
 
-- 
2.30.2



Re: [PATCH] maintainers: remove olivier.m...@6wind.com

2023-10-17 Thread Olivier Matz
Hello,

On Tue, Oct 17, 2023 at 04:35:50PM +, Konstantin Ananyev wrote:
> 
> > >
> > > > On Tue, Oct 17, 2023 at 04:27:37PM +0200, Olivier Matz wrote:
> > > > > Unfortunatly I don't have enough time to undertake my maintainer role 
> > > > > at
> > > > > the expected level. It will probably not going to get better anytime
> > > > > soon, so remove myself from maintainers.
> > > > >
> > > > > Signed-off-by: Olivier Matz 
> > > > > ---
> > > > Sorry to see your name dropped from the file after so many years!
> > > >
> > > > Sadly,
> > > > Acked-by: Bruce Richardson 
> > >
> > > Sorry to see you go.
> > 
> > Echo the same here.
> > 
> > > Should start a CREDITS file to remember past maintainers?
> > 
> > +1
> 
> +1 for both.

Thank you very much for your kind reactions.

About the CREDITS file, many people would deserve to have their name in
such a file. We already have the git history and the mail history, which
are forever engraved in the marble of the Internet :)

And... I'm not leaving the DPDK community, I'll definitely find some
time for a few reviews. I just want to have the MAINTAINERS file
reflecting the reality.

Olivier


Re: [PATCH] net/ixgbevf: fix promiscuous and allmulti

2022-10-13 Thread Olivier Matz
Hi Wenjun,

On Mon, Oct 10, 2022 at 01:30:54AM +, Wu, Wenjun1 wrote:
> Hi Olivier,
> 
> > -Original Message-
> > From: Olivier Matz 
> > Sent: Thursday, September 29, 2022 8:22 PM
> > To: dev@dpdk.org
> > Cc: Yang, Qiming ; Wu, Wenjun1
> > ; Zhao1, Wei 
> > Subject: [PATCH] net/ixgbevf: fix promiscuous and allmulti
> > 
> > The configuration of allmulti and promiscuous modes conflicts together. For
> > instance, if we enable promiscuous mode, then enable and disable allmulti,
> > then the promiscuous mode is wrongly disabled.
> > 
> > Fix this behavior by:
> > - doing nothing when we set/unset allmulti if promiscuous mode is on
> > - restorting the proper mode (none or allmulti) when we disable
> >   promiscuous mode
> > 
> > Fixes: 1f4564ed7696 ("net/ixgbevf: enable promiscuous mode")
> > 
> > Signed-off-by: Olivier Matz 
> > ---
> > 
> > Hi,
> > 
> > For reference, this was tested with this plan:
> > 
> > echo 8 > "/sys/bus/pci/devices/:01:00.1/sriov_numvfs"
> > ip link set dev eno2 up
> > ip link set dev eno2 promisc on
> > bridge link set dev eno2 hwmode veb
> > ip link set dev eno2 mtu 9000
> > 
> > ip link set dev eno2 vf 0 mac ac:1f:6b:fe:ba:b0 ip link set dev eno2 vf 0
> > spoofchk off ip link set dev eno2 vf 0 trust on
> > 
> > ip link set dev eno2 vf 1 mac ac:1f:6b:fe:ba:b1 ip link set dev eno2 vf 1
> > spoofchk off ip link set dev eno2 vf 1 trust on
> > 
> > python3 usertools/dpdk-devbind.py -s
> > python3 usertools/dpdk-devbind.py -b vfio-pci :01:10.1   # vf 0
> > python3 usertools/dpdk-devbind.py -b ixgbevf :01:10.3# vf 1
> > 
> > 
> > # in another terminal
> > scapy
> > while True:
> >   sendp(Ether(dst='ac:1f:6b:00:00:00'), iface='eno2v1')  # wrong mac
> >   sendp(Ether(dst='ac:1f:6b:fe:ba:b0'), iface='eno2v1')  # correct mac
> >   time.sleep(1)
> > 
> > 
> > ./build/app/dpdk-testpmd -l 1,2 -a :01:10.1 -- -i --total-num-
> > mbufs=32768 show port info all set fwd rxonly set verbose 1 set promisc all
> > off set allmulti all off start
> > 
> > # ok, only packets to dst='ac:1f:6b:fe:ba:b0' are received
> > 
> > 
> > # ok, both packets are received
> > set promisc all on
> > 
> > 
> > # nok, only packets to dst='ac:1f:6b:fe:ba:b0' are received set allmulti 
> > all on
> > set allmulti all off
> > 
> > 
> >  drivers/net/ixgbe/ixgbe_ethdev.c | 12 +++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 8cec951d94..cc8383c5a9 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -7785,9 +7785,13 @@ static int
> >  ixgbevf_dev_promiscuous_disable(struct rte_eth_dev *dev)  {
> > struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> > >dev_private);
> > +   int mode = IXGBEVF_XCAST_MODE_NONE;
> > int ret;
> > 
> > -   switch (hw->mac.ops.update_xcast_mode(hw,
> > IXGBEVF_XCAST_MODE_NONE)) {
> > +   if (dev->data->all_multicast)
> > +   mode = IXGBEVF_XCAST_MODE_ALLMULTI;
> > +
> > +   switch (hw->mac.ops.update_xcast_mode(hw, mode)) {
> > case IXGBE_SUCCESS:
> > ret = 0;
> > break;
> > @@ -7809,6 +7813,9 @@ ixgbevf_dev_allmulticast_enable(struct
> > rte_eth_dev *dev)
> > int ret;
> > int mode = IXGBEVF_XCAST_MODE_ALLMULTI;
> > 
> > +   if (dev->data->promiscuous)
> > +   return 0;
> > +
> > switch (hw->mac.ops.update_xcast_mode(hw, mode)) {
> > case IXGBE_SUCCESS:
> > ret = 0;
> > @@ -7830,6 +7837,9 @@ ixgbevf_dev_allmulticast_disable(struct
> > rte_eth_dev *dev)
> > struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> > >dev_private);
> > int ret;
> > 
> > +   if (dev->data->promiscuous)
> > +   return 0;
> > +
> 
> It seems that we cannot actually turn off allmulticast mode when promiscuous
> mode is enabled, so can we return error and add a log message here as a
> reminder?

I think we should not return an error here: when we disable all_multi, there are
2 cases:

1/ promiscuous is off: no issue here, we do as before, we ask the PF to disable
   the all_multi mode

2/ promiscuous is on: we have nothing to ask to the PF, because we still want
   to stay in promisc mode. We just need to remember that all_multi is disabled,
   and this is done already done by the ethdev layer.

On the PF dpdk driver, the behavior is the same: we can enable or disable
promisc and all_multi independently, and changing the all_multi value when
promisc is enabled is allowed and does not impact the rx traffic.


Thanks,
Olivier

> Thanks,
> Wenjun
> 
> > switch (hw->mac.ops.update_xcast_mode(hw,
> > IXGBEVF_XCAST_MODE_MULTI)) {
> > case IXGBE_SUCCESS:
> > ret = 0;
> > --
> > 2.30.2
> 


Re: [PATCH v6 3/4] mempool: fix cache flushing algorithm

2022-10-14 Thread Olivier Matz
Hi Morten, Andrew,

On Sun, Oct 09, 2022 at 05:08:39PM +0200, Morten Brørup wrote:
> > From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> > Sent: Sunday, 9 October 2022 16.52
> > 
> > On 10/9/22 17:31, Morten Brørup wrote:
> > >> From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> > >> Sent: Sunday, 9 October 2022 15.38
> > >>
> > >> From: Morten Brørup 
> > >>
> 
> [...]

I finally took a couple of hours to carefully review the mempool-related
series (including the ones that have already been pushed).

The new behavior looks better to me in all situations I can think about.

> 
> > >> --- a/lib/mempool/rte_mempool.h
> > >> +++ b/lib/mempool/rte_mempool.h
> > >> @@ -90,7 +90,7 @@ struct rte_mempool_cache {
> > >>   * Cache is allocated to this size to allow it to overflow in
> > >> certain
> > >>   * cases to avoid needless emptying of cache.
> > >>   */
> > >> -void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects 
> > >> */
> > >> +void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects 
> > >> */
> > >>   } __rte_cache_aligned;
> > >
> > > How much are we allowed to break the ABI here?
> > >
> > > This patch reduces the size of the structure by removing a now unused
> > part at the end, which should be harmless.

It is an ABI breakage: an existing application will use the new 22.11
function to create the mempool (with a smaller cache), but will use the
old inlined get/put that can exceed MAX_SIZE x 2 will remain.

But this is a nice memory consumption improvement, in my opinion we
should accept it for 22.11 with an entry in the release note.


> > >
> > > If we may also move the position of the objs array, I would add
> > __rte_cache_aligned to the objs array. It makes no difference in the
> > general case, but if get/put operations are always 32 objects, it will
> > reduce the number of memory (or last level cache) accesses from five to
> > four 64 B cache lines for every get/put operation.

Will it really be the case? Since cache->len has to be accessed too,
I don't think it would make a difference.


> > >
> > >   uint32_t len; /**< Current cache count */
> > > - /*
> > > -  * Cache is allocated to this size to allow it to overflow in
> > certain
> > > -  * cases to avoid needless emptying of cache.
> > > -  */
> > > - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
> > > + /**
> > > +  * Cache objects
> > > +  *
> > > +  * Cache is allocated to this size to allow it to overflow in
> > certain
> > > +  * cases to avoid needless emptying of cache.
> > > +  */
> > > + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
> > > } __rte_cache_aligned;
> > 
> > I think aligning objs on cacheline should be a separate patch.
> 
> Good point. I'll let you do it. :-)
> 
> PS: Thank you for following up on this patch series, Andrew!

Many thanks for this rework.

Acked-by: Olivier Matz 


Re: [PATCH v6 4/4] mempool: flush cache completely on overflow

2022-10-14 Thread Olivier Matz
On Sun, Oct 09, 2022 at 04:44:08PM +0200, Morten Brørup wrote:
> > From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> > Sent: Sunday, 9 October 2022 15.38
> > To: Olivier Matz
> > Cc: dev@dpdk.org; Morten Brørup; Bruce Richardson
> > Subject: [PATCH v6 4/4] mempool: flush cache completely on overflow
> > 
> > The cache was still full after flushing. In the opposite direction,
> > i.e. when getting objects from the cache, the cache is refilled to full
> > level when it crosses the low watermark (which happens to be zero).
> > Similarly, the cache should be flushed to empty level when it crosses
> > the high watermark (which happens to be 1.5 x the size of the cache).
> > The existing flushing behaviour was suboptimal for real applications,
> > because crossing the low or high watermark typically happens when the
> > application is in a state where the number of put/get events are out of
> > balance, e.g. when absorbing a burst of packets into a QoS queue
> > (getting more mbufs from the mempool), or when a burst of packets is
> > trickling out from the QoS queue (putting the mbufs back into the
> > mempool).
> > Now, the mempool cache is completely flushed when crossing the flush
> > threshold, so only the newly put (hot) objects remain in the mempool
> > cache afterwards.
> > 
> > This bug degraded performance caused by too frequent flushing.
> > 
> > Consider this application scenario:
> > 
> > Either, an lcore thread in the application is in a state of balance,
> > where it uses the mempool cache within its flush/refill boundaries; in
> > this situation, the flush method is less important, and this fix is
> > irrelevant.
> > 
> > Or, an lcore thread in the application is out of balance (either
> > permanently or temporarily), and mostly gets or puts objects from/to
> > the
> > mempool. If it mostly puts objects, not flushing all of the objects
> > will
> > cause more frequent flushing. This is the scenario addressed by this
> > fix. E.g.:
> > 
> > Cache size=256, flushthresh=384 (1.5x size), initial len=256;
> > application burst len=32.
> > 
> > If there are "size" objects in the cache after flushing, the cache is
> > flushed at every 4th burst.
> > 
> > If the cache is flushed completely, the cache is only flushed at every
> > 16th burst.
> > 
> > As you can see, this bug caused the cache to be flushed 4x too
> > frequently in this example.
> > 
> > And when/if the application thread breaks its pattern of continuously
> > putting objects, and suddenly starts to get objects instead, it will
> > either get objects already in the cache, or the get() function will
> > refill the cache.
> > 
> > The concept of not flushing the cache completely was probably based on
> > an assumption that it is more likely for an application's lcore thread
> > to get() after flushing than to put() after flushing.
> > I strongly disagree with this assumption! If an application thread is
> > continuously putting so much that it overflows the cache, it is much
> > more likely to keep putting than it is to start getting. If in doubt,
> > consider how CPU branch predictors work: When the application has done
> > something many times consecutively, the branch predictor will expect
> > the
> > application to do the same again, rather than suddenly do something
> > else.
> > 
> > Signed-off-by: Morten Brørup 
> > Signed-off-by: Andrew Rybchenko 
> > ---
> 
> Reviewed-by: Morten Brørup 
> 

Acked-by: Olivier Matz 


Re: [PATCH v4] mempool: fix get objects from mempool with cache

2022-10-14 Thread Olivier Matz
On Sat, Oct 08, 2022 at 10:56:06PM +0200, Thomas Monjalon wrote:
> 07/10/2022 12:44, Andrew Rybchenko:
> > From: Morten Brørup 
> > 
> > A flush threshold for the mempool cache was introduced in DPDK version
> > 1.3, but rte_mempool_do_generic_get() was not completely updated back
> > then, and some inefficiencies were introduced.
> > 
> > Fix the following in rte_mempool_do_generic_get():
> > 
> > 1. The code that initially screens the cache request was not updated
> > with the change in DPDK version 1.3.
> > The initial screening compared the request length to the cache size,
> > which was correct before, but became irrelevant with the introduction of
> > the flush threshold. E.g. the cache can hold up to flushthresh objects,
> > which is more than its size, so some requests were not served from the
> > cache, even though they could be.
> > The initial screening has now been corrected to match the initial
> > screening in rte_mempool_do_generic_put(), which verifies that a cache
> > is present, and that the length of the request does not overflow the
> > memory allocated for the cache.
> > 
> > This bug caused a major performance degradation in scenarios where the
> > application burst length is the same as the cache size. In such cases,
> > the objects were not ever fetched from the mempool cache, regardless if
> > they could have been.
> > This scenario occurs e.g. if an application has configured a mempool
> > with a size matching the application's burst size.
> > 
> > 2. The function is a helper for rte_mempool_generic_get(), so it must
> > behave according to the description of that function.
> > Specifically, objects must first be returned from the cache,
> > subsequently from the backend.
> > After the change in DPDK version 1.3, this was not the behavior when
> > the request was partially satisfied from the cache; instead, the objects
> > from the backend were returned ahead of the objects from the cache.
> > This bug degraded application performance on CPUs with a small L1 cache,
> > which benefit from having the hot objects first in the returned array.
> > (This is probably also the reason why the function returns the objects
> > in reverse order, which it still does.)
> > Now, all code paths first return objects from the cache, subsequently
> > from the backend.
> > 
> > The function was not behaving as described (by the function using it)
> > and expected by applications using it. This in itself is also a bug.
> > 
> > 3. If the cache could not be backfilled, the function would attempt
> > to get all the requested objects from the backend (instead of only the
> > number of requested objects minus the objects available in the backend),
> > and the function would fail if that failed.
> > Now, the first part of the request is always satisfied from the cache,
> > and if the subsequent backfilling of the cache from the backend fails,
> > only the remaining requested objects are retrieved from the backend.
> > 
> > The function would fail despite there are enough objects in the cache
> > plus the common pool.
> > 
> > 4. The code flow for satisfying the request from the cache was slightly
> > inefficient:
> > The likely code path where the objects are simply served from the cache
> > was treated as unlikely. Now it is treated as likely.
> > 
> > Signed-off-by: Morten Brørup 
> > Signed-off-by: Andrew Rybchenko 
> > Reviewed-by: Morten Brørup 
> 
> Applied, thanks.

Better late than never: I reviewed this patch after it has been pushed,
and it looks good to me.

Thanks,
Olivier



Re: [PATCH] usertools/pmdinfo: remove dependency to ldd

2022-10-14 Thread Olivier Matz
Hi Robin,

On Thu, Oct 13, 2022 at 03:41:25PM +0200, Robin Jarry wrote:
> Some environments (buildroot) do not have the ldd utility installed by
> default. However, ldd is often only a wrapper shell script that actually
> checks that the arguments are valid ELF files and executes them with
> the LD_TRACE_LOADED_OBJECTS=1 variable set in the environment.
> 
> Since ld.so is the actual ELF interpreter which is loaded first when
> executing a program, executing any dynamic ELF program/library with that
> variable set will cause all dependent dynamic libraries to be printed
> and ld.so will exit before even running main.
> 
> Excerpt from ld.so(7) man page:
> 
>   LD_TRACE_LOADED_OBJECTS
> If set (to any value), causes the program to list its dynamic
> dependencies, as if run by ldd(1), instead of running normally.
> 
> Change dpdk-pmdinfo.py to actually "execute" the files provided on the
> command line with LD_TRACE_LOADED_OBJECTS=1 set. Ensure that the files
> are valid dynamically executable ELF programs to avoid obscure and
> confusing errors.
> 
> Reported-by: Olivier Matz 
> Signed-off-by: Robin Jarry 

Tested on buildroot without ldd.

Reviewed-by: Olivier Matz 


Re: [PATCH v6 3/4] mempool: fix cache flushing algorithm

2022-10-14 Thread Olivier Matz
On Fri, Oct 14, 2022 at 05:57:39PM +0200, Morten Brørup wrote:
> > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > Sent: Friday, 14 October 2022 16.01
> > 
> > Hi Morten, Andrew,
> > 
> > On Sun, Oct 09, 2022 at 05:08:39PM +0200, Morten Brørup wrote:
> > > > From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> > > > Sent: Sunday, 9 October 2022 16.52
> > > >
> > > > On 10/9/22 17:31, Morten Brørup wrote:
> > > > >> From: Andrew Rybchenko [mailto:andrew.rybche...@oktetlabs.ru]
> > > > >> Sent: Sunday, 9 October 2022 15.38
> > > > >>
> > > > >> From: Morten Brørup 
> > > > >>
> > >
> > > [...]
> > 
> > I finally took a couple of hours to carefully review the mempool-
> > related
> > series (including the ones that have already been pushed).
> > 
> > The new behavior looks better to me in all situations I can think
> > about.
> 
> Extreme care is required when touching a core library like the mempool.
> 
> Thank you, Olivier.
> 
> > 
> > >
> > > > >> --- a/lib/mempool/rte_mempool.h
> > > > >> +++ b/lib/mempool/rte_mempool.h
> > > > >> @@ -90,7 +90,7 @@ struct rte_mempool_cache {
> > > > >>   * Cache is allocated to this size to allow it to overflow
> > in
> > > > >> certain
> > > > >>   * cases to avoid needless emptying of cache.
> > > > >>   */
> > > > >> -void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache
> > objects */
> > > > >> +void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache
> > objects */
> > > > >>   } __rte_cache_aligned;
> > > > >
> > > > > How much are we allowed to break the ABI here?
> > > > >
> > > > > This patch reduces the size of the structure by removing a now
> > unused
> > > > part at the end, which should be harmless.
> > 
> > It is an ABI breakage: an existing application will use the new 22.11
> > function to create the mempool (with a smaller cache), but will use the
> > old inlined get/put that can exceed MAX_SIZE x 2 will remain.
> > 
> > But this is a nice memory consumption improvement, in my opinion we
> > should accept it for 22.11 with an entry in the release note.
> > 
> > 
> > > > >
> > > > > If we may also move the position of the objs array, I would add
> > > > __rte_cache_aligned to the objs array. It makes no difference in
> > the
> > > > general case, but if get/put operations are always 32 objects, it
> > will
> > > > reduce the number of memory (or last level cache) accesses from
> > five to
> > > > four 64 B cache lines for every get/put operation.
> > 
> > Will it really be the case? Since cache->len has to be accessed too,
> > I don't think it would make a difference.
> 
> Yes, the first cache line, containing cache->len, will be accessed always. I 
> forgot to count that; so the improvement by aligning cache->objs will be five 
> cache line accesses instead of six.
> 
> Let me try to explain the scenario in other words:
> 
> In an application where a mempool cache is only accessed in bursts of 32 
> objects (256 B), it matters if those 256 B accesses in the mempool cache 
> start at a cache line aligned address or not. If cache line aligned, 
> accessing those 256 B in the mempool cache will only touch 4 cache lines; if 
> not, 5 cache lines will be touched. (For architectures with 128 B cache line, 
> it will be 2 instead of 3 touched cache lines per mempool cache get/put 
> operation in applications using only bursts of 32 objects.)
> 
> If we cache line align cache->objs, those bursts of 32 objects (256 B) will 
> be cache line aligned: Any address at cache->objs[N * 32 objects] is cache 
> line aligned if objs->objs[0] is cache line aligned.
> 
> Currently, the cache->objs directly follows cache->len, which makes 
> cache->objs[0] cache line unaligned.
> 
> If we decide to break the mempool cache ABI, we might as well include my 
> suggested cache line alignment performance improvement. It doesn't degrade 
> performance for mempool caches not only accessed in bursts of 32 objects.

I don't follow you. Currently, with 16 objects (128B), we access to 3
cache lines:

  ┌┐
  │len │
cache ││---
line0 ││ ^
  ││ |
  ├┤ | 16 objects
  │***

[PATCH 1/2] event/sw: fix missing flow ID init in selftest

2022-10-14 Thread Olivier Matz
The issue is seen by unit tests:

> root@dpdk-VF-dut247:~/dpdk# MALLOC_PERTURB_=204 \
>   DPDK_TEST=eventdev_selftest_sw \
>   /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test -c 0xff
> (...)
> *** Running XStats ID Reset test...
> 12: 1761: qid_0_port_2_pinned_flows value , expected 1 got 7
> 1778: qid_0_port_2_pinned_flows value incorrect, expected 1 got 7
> ERROR - XStats ID Reset test FAILED.
> SW Eventdev Selftest Failed.
> Test Failed

The flow id is not set in the event, which results in an undefined
flow, whose value depends on what was previously in stack. Having
different flows for the packets makes the test to fail, since only one
flow is expected.

This only happens in -O3, where the same stack area is shared by the
event object and the address of the mbuf allocated in rte_gen_arp().

Fix this by properly initializing the flow id.

Bugzilla ID: 1101
Fixes: e21df4b062b5 ("test/eventdev: add SW xstats tests")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
---
 drivers/event/sw/sw_evdev_selftest.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/event/sw/sw_evdev_selftest.c 
b/drivers/event/sw/sw_evdev_selftest.c
index ed7ae6a685..4f18d66f36 100644
--- a/drivers/event/sw/sw_evdev_selftest.c
+++ b/drivers/event/sw/sw_evdev_selftest.c
@@ -1489,6 +1489,7 @@ xstats_id_reset_tests(struct test *t)
goto fail;
}
ev.queue_id = t->qid[i];
+   ev.flow_id = 0;
ev.op = RTE_EVENT_OP_NEW;
ev.mbuf = arp;
*rte_event_pmd_selftest_seqn(arp) = i;
-- 
2.30.2



[PATCH 2/2] event/sw: fix invalid log in selftest

2022-10-14 Thread Olivier Matz
The log should display the value, not the id.

Fixes: e21df4b062b5 ("test/eventdev: add SW xstats tests")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
---
 drivers/event/sw/sw_evdev_selftest.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/event/sw/sw_evdev_selftest.c 
b/drivers/event/sw/sw_evdev_selftest.c
index 4f18d66f36..834681dbec 100644
--- a/drivers/event/sw/sw_evdev_selftest.c
+++ b/drivers/event/sw/sw_evdev_selftest.c
@@ -1644,8 +1644,8 @@ xstats_id_reset_tests(struct test *t)
}
if (val != port_expected[i]) {
printf("%d: %s value incorrect, expected %"PRIu64
-   " got %d\n", __LINE__, port_names[i],
-   port_expected[i], id);
+   " got %"PRIu64"\n", __LINE__, port_names[i],
+   port_expected[i], val);
failed = 1;
}
/* reset to zero */
-- 
2.30.2



Re: [PATCH] net/virtio: add queue and port ID in some logs

2022-10-17 Thread Olivier Matz
Hi Chenbo,

On Mon, Oct 17, 2022 at 06:58:59AM +, Xia, Chenbo wrote:
> Hi Olivier,
> 
> > -Original Message-
> > From: Olivier Matz 
> > Sent: Thursday, September 29, 2022 8:22 PM
> > To: dev@dpdk.org
> > Cc: Maxime Coquelin ; Xia, Chenbo
> > 
> > Subject: [PATCH] net/virtio: add queue and port ID in some logs
> > 
> > Add the queue id and/or the port id in some logs, so it is easier to
> > understand what happens.
> > 
> > Signed-off-by: Olivier Matz 
> > ---
> >  drivers/net/virtio/virtio_ethdev.c | 6 --
> >  drivers/net/virtio/virtio_rxtx.c   | 3 ++-
> >  2 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtio_ethdev.c
> > b/drivers/net/virtio/virtio_ethdev.c
> > index 7e07270a8b..44811c299b 100644
> > --- a/drivers/net/virtio/virtio_ethdev.c
> > +++ b/drivers/net/virtio/virtio_ethdev.c
> > @@ -2807,7 +2807,8 @@ virtio_dev_start(struct rte_eth_dev *dev)
> > return -EINVAL;
> > }
> > 
> > -   PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
> > +   PMD_INIT_LOG(DEBUG, "nb_queues=%d (port=%d)", nb_queues,
> > +dev->data->port_id);
> 
> Better to use %u for all port_id since it's uint16_t

Yes, will do.
I'll update "nb_queues=%d" too by the way.

> 
> Thanks,
> Chenbo
> 
> > 
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > vq = virtnet_rxq_to_vq(dev->data->rx_queues[i]);
> > @@ -2821,7 +2822,8 @@ virtio_dev_start(struct rte_eth_dev *dev)
> > virtqueue_notify(vq);
> > }
> > 
> > -   PMD_INIT_LOG(DEBUG, "Notified backend at initialization");
> > +   PMD_INIT_LOG(DEBUG, "Notified backend at initialization (port=%d)",
> > +dev->data->port_id);
> > 
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > vq = virtnet_rxq_to_vq(dev->data->rx_queues[i]);
> > diff --git a/drivers/net/virtio/virtio_rxtx.c
> > b/drivers/net/virtio/virtio_rxtx.c
> > index 4795893ec7..f8a9ee5cdb 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -793,7 +793,8 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev
> > *dev, uint16_t queue_idx)
> > vq_update_avail_idx(vq);
> > }
> > 
> > -   PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
> > +   PMD_INIT_LOG(DEBUG, "Allocated %d bufs (port=%d queue=%d)", nbufs,
> > +dev->data->port_id, queue_idx);
> > 
> > VIRTQUEUE_DUMP(vq);
> > 
> > --
> > 2.30.2
> 


[PATCH v2] net/virtio: add queue and port ID in some logs

2022-10-17 Thread Olivier Matz
Add the queue id and/or the port id in some logs, so it is easier to
understand what happens.

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---

v2
* use %u instead of %d for unsigned types

 drivers/net/virtio/virtio_ethdev.c | 6 --
 drivers/net/virtio/virtio_rxtx.c   | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 574f671158..760ba4e368 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -2814,7 +2814,8 @@ virtio_dev_start(struct rte_eth_dev *dev)
return -EINVAL;
}
 
-   PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
+   PMD_INIT_LOG(DEBUG, "nb_queues=%u (port=%u)", nb_queues,
+dev->data->port_id);
 
for (i = 0; i < dev->data->nb_rx_queues; i++) {
vq = virtnet_rxq_to_vq(dev->data->rx_queues[i]);
@@ -2828,7 +2829,8 @@ virtio_dev_start(struct rte_eth_dev *dev)
virtqueue_notify(vq);
}
 
-   PMD_INIT_LOG(DEBUG, "Notified backend at initialization");
+   PMD_INIT_LOG(DEBUG, "Notified backend at initialization (port=%u)",
+dev->data->port_id);
 
for (i = 0; i < dev->data->nb_rx_queues; i++) {
vq = virtnet_rxq_to_vq(dev->data->rx_queues[i]);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 4795893ec7..d9d40832e0 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -793,7 +793,8 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, 
uint16_t queue_idx)
vq_update_avail_idx(vq);
}
 
-   PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
+   PMD_INIT_LOG(DEBUG, "Allocated %d bufs (port=%u queue=%u)", nbufs,
+dev->data->port_id, queue_idx);
 
VIRTQUEUE_DUMP(vq);
 
-- 
2.30.2



Re: [PATCH] usertools/pmdinfo: remove dependency to ldd

2022-10-18 Thread Olivier Matz
On Fri, Oct 14, 2022 at 04:02:02PM +0200, Olivier Matz wrote:
> Hi Robin,
> 
> On Thu, Oct 13, 2022 at 03:41:25PM +0200, Robin Jarry wrote:
> > Some environments (buildroot) do not have the ldd utility installed by
> > default. However, ldd is often only a wrapper shell script that actually
> > checks that the arguments are valid ELF files and executes them with
> > the LD_TRACE_LOADED_OBJECTS=1 variable set in the environment.
> > 
> > Since ld.so is the actual ELF interpreter which is loaded first when
> > executing a program, executing any dynamic ELF program/library with that
> > variable set will cause all dependent dynamic libraries to be printed
> > and ld.so will exit before even running main.
> > 
> > Excerpt from ld.so(7) man page:
> > 
> >   LD_TRACE_LOADED_OBJECTS
> > If set (to any value), causes the program to list its dynamic
> > dependencies, as if run by ldd(1), instead of running normally.
> > 
> > Change dpdk-pmdinfo.py to actually "execute" the files provided on the
> > command line with LD_TRACE_LOADED_OBJECTS=1 set. Ensure that the files
> > are valid dynamically executable ELF programs to avoid obscure and
> > confusing errors.
> > 
> > Reported-by: Olivier Matz 
> > Signed-off-by: Robin Jarry 
> 
> Tested on buildroot without ldd.
> 
> Reviewed-by: Olivier Matz 

After some more tests, it appears that it only works for an executable
binary, but it does not work for .so PMDs:

# LD_TRACE_LOADED_OBJECTS=1 
/usr/lib/x86_64-linux-gnu/dpdk/pmds-22.0/librte_net_ixgbe.so
Segmentation fault (core dumped)

So it is maybe better to keep the original code using ldd.

Thank you Robin anyway.


[PATCH] app/test: fix PMD perf test on devices with no socket ID

2022-10-22 Thread Olivier Matz
If the socket ID of a device is unknown, rte_eth_dev_socket_id(portid)
now returns -1 instead of 0 since commit 7dcd73e37965 ("drivers/bus: set
device NUMA node to unknown by default").

This change breaks the pmd_perf test on environment where the device
socket ID is unknown. The test fails with the following error, because
it does not find a lcore on socket -1:

> No avail lcore to run test

Take the new behavior in account in the pmd_perf test: in this
environment, the test can now run on any lcore, and not only those from
socket 0 (this was the old behavior).

Bugzilla ID: 1105
Fixes: 7dcd73e37965 ("drivers/bus: set device NUMA node to unknown by default")

Signed-off-by: Olivier Matz 
---
 app/test/test_pmd_perf.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index fe765c4173..300c3a8a6f 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -265,13 +265,14 @@ init_mbufpool(unsigned nb_mbuf)
 }
 
 static uint16_t
-alloc_lcore(uint16_t socketid)
+alloc_lcore(int socketid)
 {
unsigned lcore_id;
 
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
if (LCORE_AVAIL != lcore_conf[lcore_id].status ||
-   lcore_conf[lcore_id].socketid != socketid ||
+   (socketid != -SOCKET_ID_ANY &&
+lcore_conf[lcore_id].socketid != socketid) ||
lcore_id == rte_get_main_lcore())
continue;
lcore_conf[lcore_id].status = LCORE_USED;
@@ -711,17 +712,18 @@ test_pmd_perf(void)
num = 0;
RTE_ETH_FOREACH_DEV(portid) {
if (socketid == -1) {
-   socketid = rte_eth_dev_socket_id(portid);
-   worker_id = alloc_lcore(socketid);
+   worker_id = alloc_lcore(rte_eth_dev_socket_id(portid));
if (worker_id == (uint16_t)-1) {
printf("No avail lcore to run test\n");
return -1;
}
+   socketid = rte_lcore_to_socket_id(worker_id);
printf("Performance test runs on lcore %u socket %u\n",
   worker_id, socketid);
}
 
-   if (socketid != rte_eth_dev_socket_id(portid)) {
+   if (socketid != rte_eth_dev_socket_id(portid) &&
+   rte_eth_dev_socket_id(portid) != SOCKET_ID_ANY) {
printf("Skip port %d\n", portid);
continue;
}
-- 
2.30.2



[PATCH v2] app/test: fix PMD perf test on devices with no socket ID

2022-10-24 Thread Olivier Matz
If the socket ID of a device is unknown, rte_eth_dev_socket_id(portid)
now returns -1 instead of 0 since commit 7dcd73e37965 ("drivers/bus: set
device NUMA node to unknown by default").

This change breaks the pmd_perf test on environment where the device
socket ID is unknown. The test fails with the following error, because
it does not find a lcore on socket -1:

> No avail lcore to run test

Take the new behavior in account in the pmd_perf test: in this
environment, the test can now run on any lcore, and not only those from
socket 0 (this was the old behavior).

Bugzilla ID: 1105
Fixes: 7dcd73e37965 ("drivers/bus: set device NUMA node to unknown by default")

Signed-off-by: Olivier Matz 
---

v2:
* fix typo (-SOCKET_ID_ANY instead of SOCKET_ID_ANY)

 app/test/test_pmd_perf.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index fe765c4173..ff84d251ff 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -265,13 +265,14 @@ init_mbufpool(unsigned nb_mbuf)
 }
 
 static uint16_t
-alloc_lcore(uint16_t socketid)
+alloc_lcore(int socketid)
 {
unsigned lcore_id;
 
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
if (LCORE_AVAIL != lcore_conf[lcore_id].status ||
-   lcore_conf[lcore_id].socketid != socketid ||
+   (socketid != SOCKET_ID_ANY &&
+lcore_conf[lcore_id].socketid != socketid) ||
lcore_id == rte_get_main_lcore())
continue;
lcore_conf[lcore_id].status = LCORE_USED;
@@ -711,17 +712,18 @@ test_pmd_perf(void)
num = 0;
RTE_ETH_FOREACH_DEV(portid) {
if (socketid == -1) {
-   socketid = rte_eth_dev_socket_id(portid);
-   worker_id = alloc_lcore(socketid);
+   worker_id = alloc_lcore(rte_eth_dev_socket_id(portid));
if (worker_id == (uint16_t)-1) {
printf("No avail lcore to run test\n");
return -1;
}
+   socketid = rte_lcore_to_socket_id(worker_id);
printf("Performance test runs on lcore %u socket %u\n",
   worker_id, socketid);
}
 
-   if (socketid != rte_eth_dev_socket_id(portid)) {
+   if (socketid != rte_eth_dev_socket_id(portid) &&
+   rte_eth_dev_socket_id(portid) != SOCKET_ID_ANY) {
printf("Skip port %d\n", portid);
continue;
}
-- 
2.30.2



Re: [PATCH] mempool: cache align mempool cache objects

2022-10-27 Thread Olivier Matz
Hi Morten,

On Wed, Oct 26, 2022 at 04:44:36PM +0200, Morten Brørup wrote:
> Add __rte_cache_aligned to the objs array.
> 
> It makes no difference in the general case, but if get/put operations are
> always 32 objects, it will reduce the number of memory (or last level
> cache) accesses from five to four 64 B cache lines for every get/put
> operation.
> 
> For readability reasons, an example using 16 objects follows:
> 
> Currently, with 16 objects (128B), we access to 3
> cache lines:
> 
>   ┌┐
>   │len │
> cache ││---
> line0 ││ ^
>   ││ |
>   ├┤ | 16 objects
>   ││ | 128B
> cache ││ |
> line1 ││ |
>   ││ |
>   ├┤ |
>   ││_v_
> cache ││
> line2 ││
>   ││
>   └┘
> 
> With the alignment, it is also 3 cache lines:
> 
>   ┌┐
>   │len │
> cache ││
> line0 ││
>   ││
>   ├┤---
>   ││ ^
> cache ││ |
> line1 ││ |
>   ││ |
>   ├┤ | 16 objects
>   ││ | 128B
> cache ││ |
> line2 ││ |
>   ││ v
>   └┘---
> 
> However, accessing the objects at the bottom of the mempool cache is a
> special case, where cache line0 is also used for objects.
> 
> Consider the next burst (and any following bursts):
> 
> Current:
>   ┌┐
>   │len │
> cache ││
> line0 ││
>   ││
>   ├┤
>   ││
> cache ││
> line1 ││
>   ││
>   ├┤
>   ││
> cache ││---
> line2 ││ ^
>   ││ |
>   ├┤ | 16 objects
>   ││ | 128B
> cache ││ |
> line3 ││ |
>   ││ |
>   ├┤ |
>   ││_v_
> cache ││
> line4 ││
>   ││
>   └┘
> 4 cache lines touched, incl. line0 for len.
> 
> With the proposed alignment:
>   ┌┐
>   │len │
> cache ││
> line0 ││
>   ││
>   ├┤
>   ││
> cache ││
> line1 ││
>   ││
>   ├┤
>   ││
> cache ││
> line2 ││
>   ││
>   ├┤
>   ││---
> cache ││ ^
> line3 ││ |
>   ││ | 16 objects
>   ├┤ | 128B
>   ││ |
> cache ││ |
> line4 ││ |
>   ││_v_
>   └┘
> Only 3 cache lines touched, incl. line0 for len.

I understand your logic, but are we sure that having an application that
works with bulks of 32 means that the cache will stay aligned to 32
elements for the whole life of the application?

In an application, the alignment of the cache can change if you have
any of:
- software queues (reassembly for instance)
- packet duplication (bridge, multicast)
- locally generated packets (keepalive, control protocol)
- pipeline to other cores

Even with testpmd, which work by bulk of 32, I can see that the size
of the cache filling is not aligned to 32. Right after starting the
application, we already have this:

  internal cache infos:
cache_size=250
cache_count[0]=231

This is probably related to the hw rx rings size, number of queues,
number of ports.

The "250" default value for cache size in testpmd is questionable, but
with --mbcache=256, the behavior is similar.

Also, when we transmit to a NIC, the mbufs are not returned immediatly
to the pool, they may stay in the hw tx ring during some time, which is
a driver decision.

After processing traffic on cores 8 and 24 with this testpmd, I get:
cache_count[0]=231
cache_count[8]=123
cache_count[24]=122

In my opinion, it is not realistic to think that the mempool cache will
remain aligned to cachelines. In these conditions, it looks better to
keep the structure packed to avoid wasting memory.

Olivier


> 
> Credits go to Olivier Matz for the nice ASCII graphics.
> 
> Signed-off-by: Morten Brørup 
> ---
>  lib/mempool/rte_mempool.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1f5707f46a..3725a72951 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -86,11 +86,13 @@ struct rte_mempool_cache {
>   uint32_t size;/**< Size of the cache */
>   uint32_t flushthresh; /**< Threshold before we flush excess elements */
>   uin

Re: [PATCH] mempool: cache align mempool cache objects

2022-10-27 Thread Olivier Matz
On Thu, Oct 27, 2022 at 11:22:07AM +0200, Morten Brørup wrote:
> > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > Sent: Thursday, 27 October 2022 10.35
> > 
> > Hi Morten,
> > 
> > On Wed, Oct 26, 2022 at 04:44:36PM +0200, Morten Brørup wrote:
> > > Add __rte_cache_aligned to the objs array.
> > >
> > > It makes no difference in the general case, but if get/put operations
> > are
> > > always 32 objects, it will reduce the number of memory (or last level
> > > cache) accesses from five to four 64 B cache lines for every get/put
> > > operation.
> > >
> > > For readability reasons, an example using 16 objects follows:
> > >
> > > Currently, with 16 objects (128B), we access to 3
> > > cache lines:
> > >
> > >   ┌┐
> > >   │len │
> > > cache ││---
> > > line0 ││ ^
> > >   ││ |
> > >   ├┤ | 16 objects
> > >   ││ | 128B
> > > cache ││ |
> > > line1 ││ |
> > >   ││ |
> > >   ├┤ |
> > >   ││_v_
> > > cache ││
> > > line2 ││
> > >   ││
> > >   └┘
> > >
> > > With the alignment, it is also 3 cache lines:
> > >
> > >   ┌┐
> > >   │len │
> > > cache ││
> > > line0 ││
> > >   ││
> > >   ├┤---
> > >   ││ ^
> > > cache ││ |
> > > line1 ││ |
> > >   ││ |
> > >   ├┤ | 16 objects
> > >   ││ | 128B
> > > cache ││ |
> > > line2 ││ |
> > >   ││ v
> > >   └┘---
> > >
> > > However, accessing the objects at the bottom of the mempool cache is
> > a
> > > special case, where cache line0 is also used for objects.
> > >
> > > Consider the next burst (and any following bursts):
> > >
> > > Current:
> > >   ┌┐
> > >   │len │
> > > cache ││
> > > line0 ││
> > >   ││
> > >   ├┤
> > >   ││
> > > cache ││
> > > line1 ││
> > >   ││
> > >   ├┤
> > >   ││
> > > cache ││---
> > > line2 ││ ^
> > >   ││ |
> > >   ├┤ | 16 objects
> > >   ││ | 128B
> > > cache ││ |
> > > line3 ││ |
> > >   ││ |
> > >   ├┤ |
> > >   ││_v_
> > > cache ││
> > > line4 ││
> > >   ││
> > >   └┘
> > > 4 cache lines touched, incl. line0 for len.
> > >
> > > With the proposed alignment:
> > >   ┌┐
> > >   │len │
> > > cache ││
> > > line0 ││
> > >   ││
> > >   ├┤
> > >   ││
> > > cache ││
> > > line1 ││
> > >   ││
> > >   ├┤
> > >   ││
> > > cache ││
> > > line2 ││
> > >   ││
> > >   ├┤
> > >   ││---
> > > cache ││ ^
> > > line3 ││ |
> > >   ││ | 16 objects
> > >   ├┤ | 128B
> > >   ││ |
> > > cache ││ |
> > > line4 ││ |
> > >   ││_v_
> > >   └┘
> > > Only 3 cache lines touched, incl. line0 for len.
> > 
> > I understand your logic, but are we sure that having an application
> > that
> > works with bulks of 32 means that the cache will stay aligned to 32
> > elements for the whole life of the application?
> > 
> > In an application, the alignment of the cache can change if you have
> > any of:
> > - software queues (reassembly for instance)
> > - packet duplication (bridge, multicast)
> > - locally generated packets (keepalive, control protocol)
> > - pipeline to other cores
> > 
> > Even with testpmd, which work by bulk of 32, I can see that the size
> > of the cache filling is not aligned to 32. Right af

Re: [PATCH] mempool: cache align mempool cache objects

2022-10-27 Thread Olivier Matz
On Thu, Oct 27, 2022 at 02:11:29PM +0200, Morten Brørup wrote:
> > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > Sent: Thursday, 27 October 2022 13.43
> > 
> > On Thu, Oct 27, 2022 at 11:22:07AM +0200, Morten Brørup wrote:
> > > > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > > > Sent: Thursday, 27 October 2022 10.35
> > > >
> > > > Hi Morten,
> > > >
> > > > On Wed, Oct 26, 2022 at 04:44:36PM +0200, Morten Brørup wrote:
> > > > > Add __rte_cache_aligned to the objs array.
> > > > >
> > > > > It makes no difference in the general case, but if get/put
> > operations
> > > > are
> > > > > always 32 objects, it will reduce the number of memory (or last
> > level
> > > > > cache) accesses from five to four 64 B cache lines for every
> > get/put
> > > > > operation.
> > > > >
> > > > > For readability reasons, an example using 16 objects follows:
> > > > >
> > > > > Currently, with 16 objects (128B), we access to 3
> > > > > cache lines:
> > > > >
> > > > >   ┌┐
> > > > >   │len │
> > > > > cache ││---
> > > > > line0 ││ ^
> > > > >   ││ |
> > > > >   ├┤ | 16 objects
> > > > >   ││ | 128B
> > > > > cache ││ |
> > > > > line1 ││ |
> > > > >   ││ |
> > > > >   ├┤ |
> > > > >   ││_v_
> > > > > cache ││
> > > > > line2 ││
> > > > >   ││
> > > > >   └┘
> > > > >
> > > > > With the alignment, it is also 3 cache lines:
> > > > >
> > > > >   ┌┐
> > > > >   │len │
> > > > > cache ││
> > > > > line0 ││
> > > > >   ││
> > > > >   ├┤---
> > > > >   ││ ^
> > > > > cache ││ |
> > > > > line1 ││ |
> > > > >   ││ |
> > > > >   ├┤ | 16 objects
> > > > >   ││ | 128B
> > > > > cache ││ |
> > > > > line2 ││ |
> > > > >   ││ v
> > > > >   └┘---
> > > > >
> > > > > However, accessing the objects at the bottom of the mempool cache
> > is
> > > > a
> > > > > special case, where cache line0 is also used for objects.
> > > > >
> > > > > Consider the next burst (and any following bursts):
> > > > >
> > > > > Current:
> > > > >   ┌┐
> > > > >   │len │
> > > > > cache ││
> > > > > line0 ││
> > > > >   ││
> > > > >   ├┤
> > > > >   ││
> > > > > cache ││
> > > > > line1 ││
> > > > >   ││
> > > > >   ├┤
> > > > >   ││
> > > > > cache ││---
> > > > > line2 ││ ^
> > > > >   ││ |
> > > > >   ├┤ | 16 objects
> > > > >   ││ | 128B
> > > > > cache ││ |
> > > > > line3 ││ |
> > > > >   ││ |
> > > > >   ├┤ |
> > > > >   ││_v_
> > > > > cache ││
> > > > > line4 ││
> > > > >   ││
> > > > >   └┘
> > > > > 4 cache lines touched, incl. line0 for len.
> > > > >
> > > > > With the proposed alignment:
> > > > >   ┌┐
> > > > >   │len │
> > > > > cache ││
> > > > > line0 ││
> > > > >   ││
> > > > >   ├┤
> > > > >   ││
> > > > > cache ││
> > > > > line1 ││
> > > > >   ││
> > > > >   ├┤
> > > > >   │  

Re: [dpdk-dev] [dpdk-stable] [v2] test/mempool: fix heap buffer overflow

2021-04-14 Thread Olivier Matz
On Tue, Apr 13, 2021 at 01:52:26PM +0200, Thomas Monjalon wrote:
> 13/04/2021 22:05, Wenwu Ma:
> > Amount of allocated memory was not enough for mempool
> > which cause buffer overflow when access fields of mempool
> > private structure in the rte_pktmbuf_priv_size function.
>
> Was it causing the test to fail?
> How do you reproduce the overflow?

In the test, right after the rte_mempool_create(), the function
rte_mempool_obj_iter() is called too initialize the mempool objects with
the rte_pktmbuf_init() callback function. This callback expects that the
mempool is a packet pool, i.e. its private area is a struct
rte_pktmbuf_pool_private structure.

In the current test, the size of the private area is 0, which probably
causes the function rte_pktmbuf_priv_size() to return an unpredictable
value, and this value is used as a size in a memset.

This part of the test was added in commit 923ceaeac140 ("test/mempool:
add unit test cases").

Instead of changing the size of the private area like done in the patch,
I suggest to use another callback than rte_pktmbuf_init(). After all,
this is a mempool test, so we should not rely on mbuf features. The
function my_obj_init() could be used like in other places of the test,
like this:

  @@ -552,7 +552,7 @@ test_mempool(void)
 GOTO_ERR(ret, err);
  
 /* test to initialize mempool objects and memory */
  -nb_objs = rte_mempool_obj_iter(mp_stack_mempool_iter, 
rte_pktmbuf_init,
  +nb_objs = rte_mempool_obj_iter(mp_stack_mempool_iter, my_obj_init,
 NULL);
 if (nb_objs == 0)
 GOTO_ERR(ret, err);


Wenwu, does it solve your issue?


Regards,
Olivier


Re: [dpdk-dev] [dpdk-stable] [v2] test/mempool: fix heap buffer overflow

2021-04-16 Thread Olivier Matz
On Thu, Apr 15, 2021 at 08:51:27AM -0400, Aaron Conole wrote:
> Olivier Matz  writes:
> 
> > On Tue, Apr 13, 2021 at 01:52:26PM +0200, Thomas Monjalon wrote:
> >> 13/04/2021 22:05, Wenwu Ma:
> >> > Amount of allocated memory was not enough for mempool
> >> > which cause buffer overflow when access fields of mempool
> >> > private structure in the rte_pktmbuf_priv_size function.
> >>
> >> Was it causing the test to fail?
> >> How do you reproduce the overflow?
> >
> > In the test, right after the rte_mempool_create(), the function
> > rte_mempool_obj_iter() is called too initialize the mempool objects with
> > the rte_pktmbuf_init() callback function. This callback expects that the
> > mempool is a packet pool, i.e. its private area is a struct
> > rte_pktmbuf_pool_private structure.
> >
> > In the current test, the size of the private area is 0, which probably
> > causes the function rte_pktmbuf_priv_size() to return an unpredictable
> > value, and this value is used as a size in a memset.
> 
> Is it possible to have rte_mempool_get_priv() detect that the private
> area isn't valid and return a ref to a const static member for this that
> will have the correct mbuf_priv_size?  There isn't really documentation
> that I can find that describes this corner case with the mempool private
> data section.  Actually, it doesn't really say what happens if private
> data size is 0, so maybe a documentation update should go with this test
> case fix, too?

Good point, we can indeed add something in the API documentation. To
detect that the private area is not big enough in rte_pktmbuf_init(),
unfortunatly the function has no return code, but for now we can add at
least an RTE_ASSERT() (only active when -DRTE_ENABLE_ASSERT is passed),
as it's already done for other checks.

I can do a new version of the patch. Wenwu, is it ok for you?

In a second step, we can think about changing the API of all mempool
callbacks and their wrappers to add a return code.


> > This part of the test was added in commit 923ceaeac140 ("test/mempool:
> > add unit test cases").
> >
> > Instead of changing the size of the private area like done in the patch,
> > I suggest to use another callback than rte_pktmbuf_init(). After all,
> > this is a mempool test, so we should not rely on mbuf features. The
> > function my_obj_init() could be used like in other places of the test,
> > like this:
> >
> >   @@ -552,7 +552,7 @@ test_mempool(void)
> >  GOTO_ERR(ret, err);
> >   
> >  /* test to initialize mempool objects and memory */
> >   -nb_objs = rte_mempool_obj_iter(mp_stack_mempool_iter, 
> > rte_pktmbuf_init,
> >   +nb_objs = rte_mempool_obj_iter(mp_stack_mempool_iter, 
> > my_obj_init,
> >  NULL);
> >  if (nb_objs == 0)
> >  GOTO_ERR(ret, err);
> >
> >
> > Wenwu, does it solve your issue?
> >
> >
> > Regards,
> > Olivier
> 


Re: [dpdk-dev] [PATCH v2] mbuf: support eCPRI hardware packet type

2021-04-19 Thread Olivier Matz
Hi Lingyu,

On Sat, Apr 17, 2021 at 09:25:31AM +, Lingyu Liu wrote:
> Add L2_ETHER_ECPRI and L4_UDP_TUNNEL_ECPRI in RTE_PTYPE.
> 
> Signed-off-by: Lingyu Liu 
> Acked-by: Hemant Agrawal 

The number of available packet types for tunnels is quite low (already
mentionned in this thread [1]).

[1] 
https://patches.dpdk.org/project/dpdk/patch/20210408081720.23314-3-ktejas...@marvell.com

Can you give some details about how it will be used? For instance, which
driver will set it, which kind of application will use it.

Thanks,
Olivier

> ---
> V2 change:
>  - refine commit log
> 
>  app/test-pmd/util.c  | 25 -
>  lib/librte_mbuf/rte_mbuf_ptype.c |  2 ++
>  lib/librte_mbuf/rte_mbuf_ptype.h | 22 ++
>  3 files changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
> index a9e431a8b2..494ebbf909 100644
> --- a/app/test-pmd/util.c
> +++ b/app/test-pmd/util.c
> @@ -258,16 +258,23 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct 
> rte_mbuf *pkts[],
>   udp_hdr = rte_pktmbuf_mtod_offset(mb,
>   struct rte_udp_hdr *,
>   l2_len + l3_len);
> - l4_len = sizeof(struct rte_udp_hdr);
> - vxlan_hdr = rte_pktmbuf_mtod_offset(mb,
> - struct rte_vxlan_hdr *,
> - l2_len + l3_len + l4_len);
>   udp_port = RTE_BE_TO_CPU_16(udp_hdr->dst_port);
> - vx_vni = rte_be_to_cpu_32(vxlan_hdr->vx_vni);
> - MKDUMPSTR(print_buf, buf_size, cur_len,
> -   " - VXLAN packet: packet type =%d, "
> -   "Destination UDP port =%d, VNI = %d",
> -   packet_type, udp_port, vx_vni >> 8);
> + l4_len = sizeof(struct rte_udp_hdr);
> + if (RTE_ETH_IS_ECPRI_HDR(packet_type)) {
> + MKDUMPSTR(print_buf, buf_size, cur_len,
> +   " - eCPRI packet: packet type 
> =%d, "
> +   "Destination UDP port =%d",
> +   packet_type, udp_port);
> + } else {
> + vxlan_hdr = rte_pktmbuf_mtod_offset(mb,
> + struct rte_vxlan_hdr *,
> + l2_len + l3_len + l4_len);
> + vx_vni = 
> rte_be_to_cpu_32(vxlan_hdr->vx_vni);
> + MKDUMPSTR(print_buf, buf_size, cur_len,
> +   " - VXLAN packet: packet type 
> =%d, "
> +   "Destination UDP port =%d, 
> VNI = %d",
> +   packet_type, udp_port, vx_vni 
> >> 8);
> + }
>   }
>   }
>   MKDUMPSTR(print_buf, buf_size, cur_len,
> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.c 
> b/lib/librte_mbuf/rte_mbuf_ptype.c
> index d6f906b06c..2bf97c89c6 100644
> --- a/lib/librte_mbuf/rte_mbuf_ptype.c
> +++ b/lib/librte_mbuf/rte_mbuf_ptype.c
> @@ -21,6 +21,7 @@ const char *rte_get_ptype_l2_name(uint32_t ptype)
>   case RTE_PTYPE_L2_ETHER_PPPOE: return "L2_ETHER_PPPOE";
>   case RTE_PTYPE_L2_ETHER_FCOE: return "L2_ETHER_FCOE";
>   case RTE_PTYPE_L2_ETHER_MPLS: return "L2_ETHER_MPLS";
> + case RTE_PTYPE_L2_ETHER_ECPRI: return "L2_ETHER_ECPRI";
>   default: return "L2_UNKNOWN";
>   }
>  }
> @@ -71,6 +72,7 @@ const char *rte_get_ptype_tunnel_name(uint32_t ptype)
>   case RTE_PTYPE_TUNNEL_VXLAN_GPE: return "TUNNEL_VXLAN_GPE";
>   case RTE_PTYPE_TUNNEL_MPLS_IN_UDP: return "TUNNEL_MPLS_IN_UDP";
>   case RTE_PTYPE_TUNNEL_MPLS_IN_GRE: return "TUNNEL_MPLS_IN_GRE";
> + case RTE_PTYPE_TUNNEL_ECPRI: return "TUNNEL_ECPRI";
>   default: return "TUNNEL_UNKNOWN";
>   }
>  }
> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h 
> b/lib/librte_mbuf/rte_mbuf_ptype.h
> index 17a2dd3576..5fdf369ac0 100644
> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
> @@ -144,6 +144,13 @@ extern "C" {
>   * <'ether type'=[0x8847|0x8848]>
>   */
>  #define RTE_PTYPE_L2_ETHER_MPLS 0x000a
> +/**
> + * eCPRI (extend Common Public Radio Interface) packet type.
> + *
> + * Packet format:
> + * <'ether type'=[0xAEFE]>
> + */
> +#define RTE_PTYPE_L2_ETHER_ECPRI0x000b
>  /**
>   * Mask of layer 2 packet types.
>   * It is used for outer packet for tunneling cases.
> @@ -491,6 +498,19 @@ extern "C" {
>   * | 'destination port'=6635>
>   */
>  #define RTE_PTYPE_TUNNEL_MPLS_IN_UDP  0x

Re: [dpdk-dev] [PATCH v3 1/2] lib/mempool: make stats macro generic

2021-04-21 Thread Olivier Matz
On Mon, Apr 19, 2021 at 07:07:59PM -0500, Dharmik Thakkar wrote:
> Make __MEMPOOL_STAT_ADD macro more generic and delete
> __MEMPOOL_CONTIG_BLOCKS_STAT_ADD macro
> 
> Suggested-by: Olivier Matz 
> Signed-off-by: Dharmik Thakkar 
> Reviewed-by: Ruifeng Wang 
> Reviewed-by: Honnappa Nagarahalli 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v3 2/2] lib/mempool: distinguish debug counters from cache and pool

2021-04-21 Thread Olivier Matz
Hi Dharmik,

Please see some comments below.

On Mon, Apr 19, 2021 at 07:08:00PM -0500, Dharmik Thakkar wrote:
> From: Joyce Kong 
> 
> If cache is enabled, objects will be retrieved/put from/to cache,
> subsequently from/to the common pool. Now the debug stats calculate
> the objects retrieved/put from/to cache and pool together, it is
> better to distinguish them.
> 
> Signed-off-by: Joyce Kong 
> Signed-off-by: Dharmik Thakkar 
> Reviewed-by: Ruifeng Wang 
> Reviewed-by: Honnappa Nagarahalli 
> ---
>  lib/librte_mempool/rte_mempool.c | 24 
>  lib/librte_mempool/rte_mempool.h | 47 ++--
>  2 files changed, 57 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c 
> b/lib/librte_mempool/rte_mempool.c
> index afb1239c8d48..339f14455624 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -1244,6 +1244,18 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
>   sum.put_bulk += mp->stats[lcore_id].put_bulk;
>   sum.put_objs += mp->stats[lcore_id].put_objs;
> + sum.put_common_pool_bulk +=
> + mp->stats[lcore_id].put_common_pool_bulk;
> + sum.put_common_pool_objs +=
> + mp->stats[lcore_id].put_common_pool_objs;
> + sum.put_cache_bulk += mp->stats[lcore_id].put_cache_bulk;
> + sum.put_cache_objs += mp->stats[lcore_id].put_cache_objs;
> + sum.get_common_pool_bulk +=
> + mp->stats[lcore_id].get_common_pool_bulk;
> + sum.get_common_pool_objs +=
> + mp->stats[lcore_id].get_common_pool_objs;
> + sum.get_cache_bulk += mp->stats[lcore_id].get_cache_bulk;
> + sum.get_cache_objs += mp->stats[lcore_id].get_cache_objs;
>   sum.get_success_bulk += mp->stats[lcore_id].get_success_bulk;
>   sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
>   sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
> @@ -1254,6 +1266,18 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   fprintf(f, "  stats:\n");
>   fprintf(f, "put_bulk=%"PRIu64"\n", sum.put_bulk);
>   fprintf(f, "put_objs=%"PRIu64"\n", sum.put_objs);
> + fprintf(f, "put_common_pool_bulk=%"PRIu64"\n",
> + sum.put_common_pool_bulk);
> + fprintf(f, "put_common_pool_objs=%"PRIu64"\n",
> + sum.put_common_pool_objs);
> + fprintf(f, "put_cache_bulk=%"PRIu64"\n", sum.put_cache_bulk);
> + fprintf(f, "put_cache_objs=%"PRIu64"\n", sum.put_cache_objs);
> + fprintf(f, "get_common_pool_bulk=%"PRIu64"\n",
> + sum.get_common_pool_bulk);
> + fprintf(f, "get_common_pool_objs=%"PRIu64"\n",
> + sum.get_common_pool_objs);
> + fprintf(f, "get_cache_bulk=%"PRIu64"\n", sum.get_cache_bulk);
> + fprintf(f, "get_cache_objs=%"PRIu64"\n", sum.get_cache_objs);
>   fprintf(f, "get_success_bulk=%"PRIu64"\n", sum.get_success_bulk);
>   fprintf(f, "get_success_objs=%"PRIu64"\n", sum.get_success_objs);
>   fprintf(f, "get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
> diff --git a/lib/librte_mempool/rte_mempool.h 
> b/lib/librte_mempool/rte_mempool.h
> index 848a19226149..0959f8a3f367 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -66,12 +66,20 @@ extern "C" {
>   * A structure that stores the mempool statistics (per-lcore).
>   */
>  struct rte_mempool_debug_stats {
> - uint64_t put_bulk; /**< Number of puts. */
> - uint64_t put_objs; /**< Number of objects successfully put. */
> - uint64_t get_success_bulk; /**< Successful allocation number. */
> - uint64_t get_success_objs; /**< Objects successfully allocated. */
> - uint64_t get_fail_bulk;/**< Failed allocation number. */
> - uint64_t get_fail_objs;/**< Objects that failed to be allocated. */
> + uint64_t put_bulk;/**< Number of puts. */
> + uint64_t put_objs;/**< Number of objects successfully 
> put. */
> + uint64_t put_common_pool_bulk;/**< Number of bulks enqueued in 
> common pool. */
> + uint64_t put_common_pool_objs;/**< Number of objects enqueued in 
> common pool. */
> + uint64_t put_cache_bulk;  /**< Number of bulks enqueued in 
> cache. */
> + uint64_t put_cache_objs;  /**< Number of objects enqueued in 
> cache. */
> + uint64_t get_common_pool_bulk;/**< Number of bulks dequeued from 
> common pool. */
> + uint64_t get_common_pool_objs;/**< Number of objects dequeued from 
> common pool. */
> + uint64_t get_cache_bulk;  /**< Number of bulks d

Re: [dpdk-dev] [PATCH v4 2/2] lib/mempool: distinguish debug counters from cache and pool

2021-04-27 Thread Olivier Matz
Hi Dharmik,

Few comments below.

On Thu, Apr 22, 2021 at 08:29:38PM -0500, Dharmik Thakkar wrote:
> From: Joyce Kong 
> 
> If cache is enabled, objects will be retrieved/put from/to cache,
> subsequently from/to the common pool. Now the debug stats calculate
> the objects retrieved/put from/to cache and pool together, it is
> better to distinguish them.
> 
> Signed-off-by: Joyce Kong 
> Signed-off-by: Dharmik Thakkar 
> Reviewed-by: Ruifeng Wang 
> Reviewed-by: Honnappa Nagarahalli 
> ---
>  lib/mempool/rte_mempool.c | 16 +++
>  lib/mempool/rte_mempool.h | 43 ++-
>  2 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index afb1239c8d48..e9343c2a7f6b 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -1244,6 +1244,14 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
>   sum.put_bulk += mp->stats[lcore_id].put_bulk;
>   sum.put_objs += mp->stats[lcore_id].put_objs;
> + sum.put_common_pool_bulk +=
> + mp->stats[lcore_id].put_common_pool_bulk;
> + sum.put_common_pool_objs +=
> + mp->stats[lcore_id].put_common_pool_objs;
> + sum.get_common_pool_bulk +=
> + mp->stats[lcore_id].get_common_pool_bulk;
> + sum.get_common_pool_objs +=
> + mp->stats[lcore_id].get_common_pool_objs;
>   sum.get_success_bulk += mp->stats[lcore_id].get_success_bulk;
>   sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
>   sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
> @@ -1254,6 +1262,14 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   fprintf(f, "  stats:\n");
>   fprintf(f, "put_bulk=%"PRIu64"\n", sum.put_bulk);
>   fprintf(f, "put_objs=%"PRIu64"\n", sum.put_objs);
> + fprintf(f, "put_common_pool_bulk=%"PRIu64"\n",
> + sum.put_common_pool_bulk);
> + fprintf(f, "put_common_pool_objs=%"PRIu64"\n",
> + sum.put_common_pool_objs);
> + fprintf(f, "get_common_pool_bulk=%"PRIu64"\n",
> + sum.get_common_pool_bulk);
> + fprintf(f, "get_common_pool_objs=%"PRIu64"\n",
> + sum.get_common_pool_objs);
>   fprintf(f, "get_success_bulk=%"PRIu64"\n", sum.get_success_bulk);
>   fprintf(f, "get_success_objs=%"PRIu64"\n", sum.get_success_objs);
>   fprintf(f, "get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 848a19226149..4343b287dc4e 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -64,14 +64,21 @@ extern "C" {
>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>  /**
>   * A structure that stores the mempool statistics (per-lcore).
> + * Note: Cache stats (put_cache_bulk/objs, get_cache_bulk/objs) are not
> + * captured since they can be calculated from other stats.
> + * For example: put_cache_objs = put_objs - put_common_pool_objs.
>   */
>  struct rte_mempool_debug_stats {
> - uint64_t put_bulk; /**< Number of puts. */
> - uint64_t put_objs; /**< Number of objects successfully put. */
> - uint64_t get_success_bulk; /**< Successful allocation number. */
> - uint64_t get_success_objs; /**< Objects successfully allocated. */
> - uint64_t get_fail_bulk;/**< Failed allocation number. */
> - uint64_t get_fail_objs;/**< Objects that failed to be allocated. */
> + uint64_t put_bulk;/**< Number of puts. */
> + uint64_t put_objs;/**< Number of objects successfully 
> put. */
> + uint64_t put_common_pool_bulk;/**< Number of bulks enqueued in 
> common pool. */
> + uint64_t put_common_pool_objs;/**< Number of objects enqueued in 
> common pool. */
> + uint64_t get_common_pool_bulk;/**< Number of bulks dequeued from 
> common pool. */
> + uint64_t get_common_pool_objs;/**< Number of objects dequeued from 
> common pool. */
> + uint64_t get_success_bulk;/**< Successful allocation number. */
> + uint64_t get_success_objs;/**< Objects successfully allocated. 
> */
> + uint64_t get_fail_bulk;   /**< Failed allocation number. */
> + uint64_t get_fail_objs;   /**< Objects that failed to be 
> allocated. */
>   /** Successful allocation number of contiguous blocks. */
>   uint64_t get_success_blks;
>   /** Failed allocation number of contiguous blocks. */
> @@ -699,10 +706,18 @@ rte_mempool_ops_dequeue_bulk(struct rte_mempool *mp,
>   void **obj_table, unsigned n)
>  {
>   struct rte_mempool_ops *ops;
> + i

Re: [dpdk-dev] [PATCH v4 0/2] lib/mempool: add debug stats

2021-04-27 Thread Olivier Matz
On Thu, Apr 22, 2021 at 08:29:36PM -0500, Dharmik Thakkar wrote:
> Subject: [dpdk-dev] [PATCH v4 0/2] lib/mempool: add debug stats

I missed that one: please use "mempool:" instead of "lib/mempool:"
prefix in the titles.

Thanks,
Olivier


>
> - Add debug counters for objects put/get to/from the common pool.
> - Make __MEMPOOL_STAT_ADD() more generic
> 
> ---
> v4:
>  - Remove cache stats
> 
> v3:
>  - Add a patch to make stat add macro generic
>  - Remove other stat add/subtract macros
>  - Rename counters for better understanding
>  - Add put/get cache bulk counters
> 
> v2:
>  - Fix typo in the commit message
> ---
> 
> Dharmik Thakkar (1):
>   lib/mempool: make stats macro generic
> 
> Joyce Kong (1):
>   lib/mempool: distinguish debug counters from cache and pool
> 
>  lib/mempool/rte_mempool.c | 16 ++
>  lib/mempool/rte_mempool.h | 67 +++
>  2 files changed, 56 insertions(+), 27 deletions(-)
> 
> -- 
> 2.17.1
> 


Re: [dpdk-dev] [PATCH 1/3] stack: update lock-free supported archs

2021-04-27 Thread Olivier Matz
On Mon, Apr 12, 2021 at 10:28:59AM +0200, Stanislaw Kardach wrote:
> Since 7911ba047 lock-free stack is supported on arm64 but this
> description was missing from the doxygen for the flag.
> 
> Signed-off-by: Stanislaw Kardach 
> Fixes: 7911ba0473e0 ("stack: enable lock-free implementation for aarch64")
> Cc: phil.y...@arm.com
> Cc: sta...@dpdk.org

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH 3/3] test: run lock-free stack tests when supported

2021-04-27 Thread Olivier Matz
On Mon, Apr 12, 2021 at 10:29:01AM +0200, Stanislaw Kardach wrote:
> Use the recently added RTE_STACK_LF_SUPPORTED flag to disable the
> lock-free stack tests at the compile time.
> Perf test doesn't fail because rte_ring_create() succeeds, however
> marking this test as skipped gives a better indication of what actually
> was tested.
> 
> Signed-off-by: Stanislaw Kardach 
> Cc: sta...@dpdk.org

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH 2/3] stack: add lock-free support indication

2021-04-27 Thread Olivier Matz
On Mon, Apr 12, 2021 at 10:29:00AM +0200, Stanislaw Kardach wrote:
> Currently it is impossible to detect programatically whether lock-free
> implementation of rte_stack is supported. One could check whether the
> header guard for lock-free stubs is defined (_RTE_STACK_LF_STUBS_H_) but
> that's an unstable implementation detail. Because of that currently all
> lock-free ring creations silently succeed (as long as the stack header
> is 16B long) which later leads to push and pop operations being NOPs.
> The observable effect is that stack_lf_autotest fails on platforms not
> supporting the lock-free. Instead it should just skip the lock-free test
> altogether.
> 
> This commit adds a new errno value (ENOTSUP) that may be returned by
> rte_stack_create() to indicate that a given combination of flags is not
> supported on a current platform.
> This is detected by checking a compile-time flag in the include logic in
> rte_stack_lf.h which may be used by applications to check the lock-free
> support at compile time.
> 
> Signed-off-by: Stanislaw Kardach 
> Fixes: 7911ba0473e0 ("stack: enable lock-free implementation for aarch64")
> Cc: phil.y...@arm.com
> Cc: sta...@dpdk.org

Acked-by: Olivier Matz 


[dpdk-dev] [PATCH v3 2/2] mbuf: better document usage of packet pool initializers

2021-04-27 Thread Olivier Matz
Clarify that the mempool private initializer and object initializer used
for packet pools require that the mempool private size is large enough.

Also add an assert (only enabled when -DRTE_ENABLE_ASSERT is passed) to
check this constraint.

Signed-off-by: Olivier Matz 
---
 lib/mbuf/rte_mbuf.c | 5 +
 lib/mbuf/rte_mbuf.h | 8 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index 3ff0a69187..f7e3c1a187 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -43,6 +43,8 @@ rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
*opaque_arg)
struct rte_pktmbuf_pool_private default_mbp_priv;
uint16_t roomsz;
 
+   RTE_ASSERT(mp->private_data_size >=
+  sizeof(struct rte_pktmbuf_pool_private));
RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf));
 
/* if no structure is provided, assume no mbuf private area */
@@ -83,6 +85,9 @@ rte_pktmbuf_init(struct rte_mempool *mp,
struct rte_mbuf *m = _m;
uint32_t mbuf_size, buf_len, priv_size;
 
+   RTE_ASSERT(mp->private_data_size >=
+  sizeof(struct rte_pktmbuf_pool_private));
+
priv_size = rte_pktmbuf_priv_size(mp);
mbuf_size = sizeof(struct rte_mbuf) + priv_size;
buf_len = rte_pktmbuf_data_room_size(mp);
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index c4c9ebfaa0..a555f216ae 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -624,6 +624,9 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
  * address, and so on). This function is given as a callback function to
  * rte_mempool_obj_iter() or rte_mempool_create() at pool creation time.
  *
+ * This function expects that the mempool private area was previously
+ * initialized with rte_pktmbuf_pool_init().
+ *
  * @param mp
  *   The mempool from which mbufs originate.
  * @param opaque_arg
@@ -639,7 +642,7 @@ void rte_pktmbuf_init(struct rte_mempool *mp, void 
*opaque_arg,
  void *m, unsigned i);
 
 /**
- * A  packet mbuf pool constructor.
+ * A packet mbuf pool constructor.
  *
  * This function initializes the mempool private data in the case of a
  * pktmbuf pool. This private data is needed by the driver. The
@@ -648,6 +651,9 @@ void rte_pktmbuf_init(struct rte_mempool *mp, void 
*opaque_arg,
  * pool creation. It can be extended by the user, for example, to
  * provide another packet size.
  *
+ * The mempool private area size must be at least equal to
+ * sizeof(struct rte_pktmbuf_pool_private).
+ *
  * @param mp
  *   The mempool from which mbufs originate.
  * @param opaque_arg
-- 
2.29.2



[dpdk-dev] [PATCH v3 1/2] test/mempool: fix heap buffer overflow

2021-04-27 Thread Olivier Matz
The function rte_pktmbuf_init() expects that the mempool private area is
large enough and was previously initialized by rte_pktmbuf_pool_init(),
which is not the case.

This causes the function rte_pktmbuf_priv_size() to return an
unpredictable value, and this value is used as a size in a memset.

Replace the mempool object initializer by my_obj_init(), which does not
have this constraint, and fits the needs for this test.

Fixes: 923ceaeac140 ("test/mempool: add unit test cases")
Cc: sta...@dpdk.org

Signed-off-by: Wenwu Ma 
---
 app/test/test_mempool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 084842fdaa..3adadd6731 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -552,7 +552,7 @@ test_mempool(void)
GOTO_ERR(ret, err);
 
/* test to initialize mempool objects and memory */
-   nb_objs = rte_mempool_obj_iter(mp_stack_mempool_iter, rte_pktmbuf_init,
+   nb_objs = rte_mempool_obj_iter(mp_stack_mempool_iter, my_obj_init,
NULL);
if (nb_objs == 0)
GOTO_ERR(ret, err);
-- 
2.29.2



[dpdk-dev] [PATCH 0/4] net/tap: fix Rx cksum

2021-04-27 Thread Olivier Matz
This patchset fixes the Rx checksum flags in net/tap
driver. The first two patches are the effective fixes.

The last 2 patches introduce a new checksum API to
verify a L4 checksum and its unt test, in order to
simplify the net/tap code, or any other code that has
the same needs.

The last 2 patches may be postponed to 20.08 if required.

Olivier Matz (4):
  net/tap: fix Rx cksum flags on IP options packets
  net/tap: fix Rx cksum flags on TCP packets
  net: introduce functions to verify L4 checksums
  test/cksum: new test for L3/L4 checksum API

 MAINTAINERS   |   1 +
 app/test/autotest_data.py |   6 +
 app/test/meson.build  |   2 +
 app/test/test_cksum.c | 271 ++
 drivers/net/tap/rte_eth_tap.c |  17 ++-
 lib/net/rte_ip.h  | 124 +---
 6 files changed, 390 insertions(+), 31 deletions(-)
 create mode 100644 app/test/test_cksum.c

-- 
2.29.2



[dpdk-dev] [PATCH 1/4] net/tap: fix Rx cksum flags on IP options packets

2021-04-27 Thread Olivier Matz
When packet type is IPV4_EXT, the checksum is always marked as good in
the mbuf offload flags.

Since we know the header lengths, we can easily call
rte_ipv4_udptcp_cksum() in this case too.

Fixes: 8ae3023387e9 ("net/tap: add Rx/Tx checksum offload support")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
---
 drivers/net/tap/rte_eth_tap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 68baa18523..e7b185a4b5 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -350,7 +350,7 @@ tap_verify_csum(struct rte_mbuf *mbuf)
/* Don't verify checksum for multi-segment packets. */
if (mbuf->nb_segs > 1)
return;
-   if (l3 == RTE_PTYPE_L3_IPV4) {
+   if (l3 == RTE_PTYPE_L3_IPV4 || l3 == RTE_PTYPE_L3_IPV4_EXT) {
if (l4 == RTE_PTYPE_L4_UDP) {
udp_hdr = (struct rte_udp_hdr *)l4_hdr;
if (udp_hdr->dgram_cksum == 0) {
@@ -364,7 +364,7 @@ tap_verify_csum(struct rte_mbuf *mbuf)
}
}
cksum = ~rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
-   } else if (l3 == RTE_PTYPE_L3_IPV6) {
+   } else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
cksum = ~rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
}
mbuf->ol_flags |= cksum ?
-- 
2.29.2



[dpdk-dev] [PATCH 2/4] net/tap: fix Rx cksum flags on TCP packets

2021-04-27 Thread Olivier Matz
Since commit d5df2ae0428a ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() or
rte_ipv6_udptcp_cksum() can return either 0x or 0x when used to
verify a packet containing a valid checksum.

This new behavior broke the checksum verification in tap driver for TCP
packets: these packets are marked with PKT_RX_L4_CKSUM_BAD.

Fix this by checking the 2 possible values. A next commit will introduce
a checksum verification helper to simplify this a bit.

Fixes: d5df2ae0428a ("net: fix unneeded replacement of TCP checksum 0")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
---
 drivers/net/tap/rte_eth_tap.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index e7b185a4b5..71282e8065 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -346,6 +346,8 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
if (l4 == RTE_PTYPE_L4_UDP || l4 == RTE_PTYPE_L4_TCP) {
+   int cksum_ok;
+
l4_hdr = rte_pktmbuf_mtod_offset(mbuf, void *, l2_len + l3_len);
/* Don't verify checksum for multi-segment packets. */
if (mbuf->nb_segs > 1)
@@ -363,13 +365,13 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
}
-   cksum = ~rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum = rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
} else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
-   cksum = ~rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum = rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
}
-   mbuf->ol_flags |= cksum ?
-   PKT_RX_L4_CKSUM_BAD :
-   PKT_RX_L4_CKSUM_GOOD;
+   cksum_ok = (cksum == 0) || (cksum == 0x);
+   mbuf->ol_flags |= cksum_ok ?
+   PKT_RX_L4_CKSUM_GOOD : PKT_RX_L4_CKSUM_BAD;
}
 }
 
-- 
2.29.2



[dpdk-dev] [PATCH 3/4] net: introduce functions to verify L4 checksums

2021-04-27 Thread Olivier Matz
Since commit d5df2ae0428a ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() and
rte_ipv6_udptcp_cksum() can return either 0x or 0x when used to
verify a packet containing a valid checksum.

Since these functions should be used to calculate the checksum to set in
a packet, introduce 2 new helpers for checksum verification. They return
0 if the checksum is valid in the packet.

Use this new helper in net/tap driver.

Signed-off-by: Olivier Matz 
---
 drivers/net/tap/rte_eth_tap.c |   7 +-
 lib/net/rte_ip.h  | 124 +++---
 2 files changed, 104 insertions(+), 27 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 71282e8065..b14d5a1d55 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -365,11 +365,12 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
}
-   cksum = rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum_ok = !rte_ipv4_udptcp_cksum_verify(l3_hdr,
+l4_hdr);
} else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
-   cksum = rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum_ok = !rte_ipv6_udptcp_cksum_verify(l3_hdr,
+l4_hdr);
}
-   cksum_ok = (cksum == 0) || (cksum == 0x);
mbuf->ol_flags |= cksum_ok ?
PKT_RX_L4_CKSUM_GOOD : PKT_RX_L4_CKSUM_BAD;
}
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 8c189009b0..ef84bcc5bf 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -344,20 +344,10 @@ rte_ipv4_phdr_cksum(const struct rte_ipv4_hdr *ipv4_hdr, 
uint64_t ol_flags)
 }
 
 /**
- * Process the IPv4 UDP or TCP checksum.
- *
- * The IP and layer 4 checksum must be set to 0 in the packet by
- * the caller.
- *
- * @param ipv4_hdr
- *   The pointer to the contiguous IPv4 header.
- * @param l4_hdr
- *   The pointer to the beginning of the L4 header.
- * @return
- *   The complemented checksum to set in the IP packet.
+ * @internal Calculate the non-complemented IPv4 L4 checksum
  */
 static inline uint16_t
-rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+__rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void 
*l4_hdr)
 {
uint32_t cksum;
uint32_t l3_len, l4_len;
@@ -374,16 +364,62 @@ rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr 
*ipv4_hdr, const void *l4_hdr)
cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
 
cksum = ((cksum & 0x) >> 16) + (cksum & 0x);
-   cksum = (~cksum) & 0x;
+
+   return (uint16_t)cksum;
+}
+
+/**
+ * Process the IPv4 UDP or TCP checksum.
+ *
+ * The IP and layer 4 checksum must be set to 0 in the packet by
+ * the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+{
+   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
+
+   cksum = ~cksum;
+
/*
-* Per RFC 768:If the computed checksum is zero for UDP,
+* Per RFC 768: If the computed checksum is zero for UDP,
 * it is transmitted as all ones
 * (the equivalent in one's complement arithmetic).
 */
if (cksum == 0 && ipv4_hdr->next_proto_id == IPPROTO_UDP)
cksum = 0x;
 
-   return (uint16_t)cksum;
+   return cksum;
+}
+
+/**
+ * Validate the IPv4 UDP or TCP checksum.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   Return 0 if the checksum is correct, else -1.
+ */
+__rte_experimental
+static inline int
+rte_ipv4_udptcp_cksum_verify(const struct rte_ipv4_hdr *ipv4_hdr,
+const void *l4_hdr)
+{
+   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
+
+   if (cksum != 0x)
+   return -1;
+
+   return 0;
 }
 
 /**
@@ -448,6 +484,25 @@ rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, 
uint64_t ol_flags)
return __rte_raw_cksum_reduce(sum);
 }
 
+/**
+ * @internal Calculate the non-complemented IPv4 L4 checksum
+ */
+static inline uint16_t
+__rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void 
*l4_hdr)
+{
+   uint32_t cksum;
+   uint32_t l4_len;
+
+   l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
+
+   cksum = rte_raw_cksum(l4_hdr, l4_len)

[dpdk-dev] [PATCH 4/4] test/cksum: new test for L3/L4 checksum API

2021-04-27 Thread Olivier Matz
Add a simple unit test for checksum API.

Signed-off-by: Olivier Matz 
---
 MAINTAINERS   |   1 +
 app/test/autotest_data.py |   6 +
 app/test/meson.build  |   2 +
 app/test/test_cksum.c | 271 ++
 4 files changed, 280 insertions(+)
 create mode 100644 app/test/test_cksum.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 44f3d322ed..9fe7c92eac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1309,6 +1309,7 @@ Packet processing
 Network headers
 M: Olivier Matz 
 F: lib/net/
+F: app/test/test_cksum.c
 
 Packet CRC
 M: Jasvinder Singh 
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 097638941f..2871ed8994 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -585,6 +585,12 @@
 "Func":default_autotest,
 "Report":  None,
 },
+{
+"Name":"Checksum autotest",
+"Command": "cksum_autotest",
+"Func":default_autotest,
+"Report":  None,
+},
 #
 #Please always keep all dump tests at the end and together!
 #
diff --git a/app/test/meson.build b/app/test/meson.build
index 08c82d3d23..28d8a9a111 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -17,6 +17,7 @@ test_sources = files(
 'test_bitmap.c',
 'test_bpf.c',
 'test_byteorder.c',
+'test_cksum.c',
 'test_cmdline.c',
 'test_cmdline_cirbuf.c',
 'test_cmdline_etheraddr.c',
@@ -189,6 +190,7 @@ fast_tests = [
 ['atomic_autotest', false],
 ['bitops_autotest', true],
 ['byteorder_autotest', true],
+['cksum_autotest', true],
 ['cmdline_autotest', true],
 ['common_autotest', true],
 ['cpuflags_autotest', true],
diff --git a/app/test/test_cksum.c b/app/test/test_cksum.c
new file mode 100644
index 00..cd983d7c01
--- /dev/null
+++ b/app/test/test_cksum.c
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 6WIND S.A.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MEMPOOL_CACHE_SIZE  0
+#define MBUF_DATA_SIZE  256
+#define NB_MBUF 128
+
+/*
+ * Test L3/L4 checksum API.
+ */
+
+#define GOTO_FAIL(str, ...) do {   \
+   printf("cksum test FAILED (l.%d): <" str ">\n", \
+  __LINE__,  ##__VA_ARGS__);   \
+   goto fail;  \
+   } while (0)
+
+/* generated in scapy with Ether()/IP()/TCP())) */
+static const char test_cksum_ipv4_tcp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x45, 0x00,
+   0x00, 0x28, 0x00, 0x01, 0x00, 0x00, 0x40, 0x06,
+   0x7c, 0xcd, 0x7f, 0x00, 0x00, 0x01, 0x7f, 0x00,
+   0x00, 0x01, 0x00, 0x14, 0x00, 0x50, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x50, 0x02,
+   0x20, 0x00, 0x91, 0x7c, 0x00, 0x00,
+
+};
+
+/* generated in scapy with Ether()/IPv6()/TCP()) */
+static const char test_cksum_ipv6_tcp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x60, 0x00,
+   0x00, 0x00, 0x00, 0x14, 0x06, 0x40, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x14,
+   0x00, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x50, 0x02, 0x20, 0x00, 0x8f, 0x7d,
+   0x00, 0x00,
+};
+
+/* generated in scapy with Ether()/IP()/UDP()/Raw('x')) */
+static const char test_cksum_ipv4_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x45, 0x00,
+   0x00, 0x1d, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+   0x7c, 0xcd, 0x7f, 0x00, 0x00, 0x01, 0x7f, 0x00,
+   0x00, 0x01, 0x00, 0x35, 0x00, 0x35, 0x00, 0x09,
+   0x89, 0x6f, 0x78,
+};
+
+/* generated in scapy with Ether()/IPv6()/UDP()/Raw('x')) */
+static const char test_cksum_ipv6_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x86, 0xdd, 0x60, 0x00,
+   0x00, 0x00, 0x00, 0x09, 0x11, 0x40, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x35,
+   0x00, 0x35, 0x00, 0x09, 0x87, 0x70, 0x78,
+};
+
+/* generated in scapy with Ether()/IP(options

Re: [dpdk-dev] [PATCH 3/4] net: introduce functions to verify L4 checksums

2021-04-28 Thread Olivier Matz
Hi Morten,

Thank you for the review.

<...>

On Tue, Apr 27, 2021 at 05:07:04PM +0200, Morten Brørup wrote:
> > +static inline uint16_t
> > +rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void
> > *l4_hdr)
> > +{
> > +   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
> > +
> > +   cksum = ~cksum;
> > +
> > /*
> > -* Per RFC 768:If the computed checksum is zero for UDP,
> > +* Per RFC 768: If the computed checksum is zero for UDP,
> >  * it is transmitted as all ones
> >  * (the equivalent in one's complement arithmetic).
> >  */
> > if (cksum == 0 && ipv4_hdr->next_proto_id == IPPROTO_UDP)
> > cksum = 0x;
> > 
> > -   return (uint16_t)cksum;
> > +   return cksum;
> > +}
> 
> The GCC static branch predictor treats the above comparison as likely. 
> Playing around with Godbolt, I came up with this alternative:
> 
>   if (likely(cksum != 0)) return cksum;
>   if (ipv4_hdr->next_proto_id == IPPROTO_UDP) return 0x;
>   return 0;

Good idea, this is indeed an unlikely branch.
However this code was already present before this patch,
so I suggest to add it as a specific optimization patch.

> > +
> > +/**
> > + * Validate the IPv4 UDP or TCP checksum.
> > + *
> > + * @param ipv4_hdr
> > + *   The pointer to the contiguous IPv4 header.
> > + * @param l4_hdr
> > + *   The pointer to the beginning of the L4 header.
> > + * @return
> > + *   Return 0 if the checksum is correct, else -1.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_ipv4_udptcp_cksum_verify(const struct rte_ipv4_hdr *ipv4_hdr,
> > +const void *l4_hdr)
> > +{
> > +   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
> > +
> > +   if (cksum != 0x)
> > +   return -1;
> 
> The GCC static branch predictor treats the above comparison as likely, so I 
> would prefer unlikely() around it.

For this one, I'm less convinced: should we decide here whether
the good or the bad checksum is more likely than the other?

Given it's a static inline function, wouldn't it be better to let
the application call it this way:
  if (likely(rte_ipv4_udptcp_cksum_verify(...) == 0))  ?


Regards,
Olivier


Re: [dpdk-dev] [PATCH] mbuf: check mbuf dyn shared memory validity

2021-04-28 Thread Olivier Matz
Hi Chengwen,

On Fri, Apr 23, 2021 at 04:11:04PM +0800, Min Hu (Connor) wrote:
> From: Chengwen Feng 
> 
> Because mbuf dyn shared memory was allocated runtime, so it's
> necessary to check validity when dump mbuf dyn info.
> 
> Also this patch adds an error logging when init shared memory fail.
> 
> Fixes: 4958ca3a443a ("mbuf: support dynamic fields and flags")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Chengwen Feng 
> Signed-off-by: Min Hu (Connor) 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH 0/3] add lock-free stack support discovery

2021-05-03 Thread Olivier Matz
On Mon, May 03, 2021 at 04:21:25PM +0200, David Marchand wrote:
> On Mon, Apr 12, 2021 at 10:29 AM Stanislaw Kardach  wrote:
> >
> > The lock-free stack implementation (RTE_STACK_F_LF) is supported only on a
> > subset of platforms, namely x86_64 and arm64. Platforms supporting 128b 
> > atomics
> > have to opt-in to a generic or C11 implementations. All other platforms use 
> > a
> > stubbed implementation for push/pop operations which are basically NOPs.
> > However rte_stack_create() will not fail and application can proceed 
> > assuming
> > it has a working lock-free stack.
> >
> > This means that among other things the stack_lf fast and perf tests will 
> > fail
> > as if implementation is wrong (which one can argue is). Therefore this 
> > patchset
> > tries to give user a way to check whether a lock_free is supported or not 
> > both
> > at compile time (build flag) and at runtime (ENOTSUP errno in 
> > rte_stack_create).
> >
> > I have added cc to sta...@dpdk.org because check-git-log.sh suggested it. 
> > I'm
> > not sure if adding a binary compatible change to API is worth 
> > sta...@dpdk.org.
> >
> > Cc: sta...@dpdk.org
> 
> The issue was hit while porting to a new architecture.
> The feature is broken in existing stable releases and it won't get
> fixed by this change.
> 
> I'd rather not backport it.
> 
> Opinions?

Agreed.


Re: [dpdk-dev] [PATCH v5 2/2] mempool: distinguish debug counters from cache and pool

2021-05-03 Thread Olivier Matz
On Tue, Apr 27, 2021 at 11:01:40AM -0500, Dharmik Thakkar wrote:
> From: Joyce Kong 
> 
> If cache is enabled, objects will be retrieved/put from/to cache,
> subsequently from/to the common pool. Now the debug stats calculate
> the objects retrieved/put from/to cache and pool together, it is
> better to distinguish them.
> 
> Signed-off-by: Joyce Kong 
> Signed-off-by: Dharmik Thakkar 
> Reviewed-by: Ruifeng Wang 
> Reviewed-by: Honnappa Nagarahalli 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH] tests/cmdline: fix memory leaks

2021-06-23 Thread Olivier Matz
Hi Owen,

Thanks for fixing this test.
Some comments below.

On Wed, Jun 16, 2021 at 02:07:24PM -0400, ohily...@iol.unh.edu wrote:
> From: Owen Hilyard 
> 
> Fixes for a few memory leaks in the cmdline_autotest unit test.
> 
> All of the leaks were related to not freeing the commandline struct
> after testing had completed.
> 
> Fixes: dbb860e03e ("cmdline: tests")
> 
> Signed-off-by: Owen Hilyard 
> Reviewed-by: David Marchand 

Please use "-v $version" when sending a new version of
the patch. You can find examples in doc/guides/contributing/patches.rst

> ---
>  app/test/test_cmdline_lib.c | 30 ++
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test/test_cmdline_lib.c b/app/test/test_cmdline_lib.c
> index bd72df0da..19228c9a5 100644
> --- a/app/test/test_cmdline_lib.c
> +++ b/app/test/test_cmdline_lib.c
> @@ -71,10 +71,12 @@ test_cmdline_parse_fns(void)
>   if (cmdline_complete(cl, "buffer", &i, NULL, sizeof(dst)) >= 0)
>   goto error;
>  
> + cmdline_free(cl);
>   return 0;
>  
>  error:
>   printf("Error: function accepted null parameter!\n");
> + cmdline_free(cl);
>   return -1;
>  }
>  
> @@ -140,32 +142,43 @@ static int
>  test_cmdline_socket_fns(void)
>  {
>   cmdline_parse_ctx_t ctx;
> + struct cmdline *cl;
>  
> - if (cmdline_stdin_new(NULL, "prompt") != NULL)
> + cl = cmdline_stdin_new(NULL, "prompt");
> + if (cl != NULL)
>   goto error;
> - if (cmdline_stdin_new(&ctx, NULL) != NULL)
> + cl = cmdline_stdin_new(&ctx, NULL);
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(NULL, "prompt", "/dev/null") != NULL)
> + cl = cmdline_file_new(NULL, "prompt", "/dev/null");
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(&ctx, NULL, "/dev/null") != NULL)
> + cl = cmdline_file_new(&ctx, NULL, "/dev/null");
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(&ctx, "prompt", NULL) != NULL)
> + cl = cmdline_file_new(&ctx, "prompt", NULL);
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(&ctx, "prompt", "-/invalid/~/path") != NULL) {
> + cl = cmdline_file_new(&ctx, "prompt", "-/invalid/~/path");
> + if (cl != NULL) {
>   printf("Error: succeeded in opening invalid file for reading!");
> + cmdline_free(cl);
>   return -1;
>   }
> - if (cmdline_file_new(&ctx, "prompt", "/dev/null") == NULL) {
> + cl = cmdline_file_new(&ctx, "prompt", "/dev/null");
> + if (cl == NULL) {
>   printf("Error: failed to open /dev/null for reading!");
> + cmdline_free(cl);
>   return -1;
>   }

This last cmdline_free() is not needed, because the test is (cl == NULL)

>  
>   /* void functions */
>   cmdline_stdin_exit(NULL);
> -
> + cmdline_free(cl);

I would keep an empty line here, to highlight that the comment
only refers to cmdline_stdin_exit().


>   return 0;
>  error:
>   printf("Error: function accepted null parameter!\n");
> + cmdline_free(cl);
>   return -1;
>  }
>  
> @@ -198,6 +211,7 @@ test_cmdline_fns(void)
>   cmdline_interact(NULL);
>   cmdline_quit(NULL);
>  

In this function test_cmdline_fns(), there is the same scheme than in
test_cmdline_socket_fns(), so I think you should apply the same kind
of modifications (even if a non-NULL return value should not happen):


   -if (cmdline_new(NULL, prompt, 0, 0) != NULL)
   +cl = cmdline_new(NULL, prompt, 0, 0);
   +if (cl != NULL)

Maybe the 2 tested error cases that are expected to return NULL
should go before the correct one, so that it's not needed to have
another cl pointer variable.

> + cmdline_free(cl);
>   return 0;
>  
>  error:
> -- 
> 2.30.2
> 


Re: [dpdk-dev] [PATCH v3] tests/cmdline: fix memory leaks

2021-06-24 Thread Olivier Matz
Hi Owen,

One small issue remain, please see below.

On Wed, Jun 23, 2021 at 02:06:45PM -0400, ohily...@iol.unh.edu wrote:
> From: Owen Hilyard 
> 
> Fixes for a few memory leaks in the cmdline_autotest unit test.
> 
> All of the leaks were related to not freeing the commandline struct
> after testing had completed.
> 
> Fixes: dbb860e03e ("cmdline: tests")
> 
> Signed-off-by: Owen Hilyard 
> Reviewed-by: David Marchand 
> ---
>  app/test/test_cmdline_lib.c | 40 ++---
>  1 file changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/app/test/test_cmdline_lib.c b/app/test/test_cmdline_lib.c
> index bd72df0da..b476b2594 100644
> --- a/app/test/test_cmdline_lib.c
> +++ b/app/test/test_cmdline_lib.c
> @@ -71,10 +71,12 @@ test_cmdline_parse_fns(void)
>   if (cmdline_complete(cl, "buffer", &i, NULL, sizeof(dst)) >= 0)
>   goto error;
>  
> + cmdline_free(cl);
>   return 0;
>  
>  error:
>   printf("Error: function accepted null parameter!\n");
> + cmdline_free(cl);
>   return -1;
>  }
>  
> @@ -140,32 +142,44 @@ static int
>  test_cmdline_socket_fns(void)
>  {
>   cmdline_parse_ctx_t ctx;
> + struct cmdline *cl;
>  
> - if (cmdline_stdin_new(NULL, "prompt") != NULL)
> + cl = cmdline_stdin_new(NULL, "prompt");
> + if (cl != NULL)
>   goto error;
> - if (cmdline_stdin_new(&ctx, NULL) != NULL)
> + cl = cmdline_stdin_new(&ctx, NULL);
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(NULL, "prompt", "/dev/null") != NULL)
> + cl = cmdline_file_new(NULL, "prompt", "/dev/null");
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(&ctx, NULL, "/dev/null") != NULL)
> + cl = cmdline_file_new(&ctx, NULL, "/dev/null");
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(&ctx, "prompt", NULL) != NULL)
> + cl = cmdline_file_new(&ctx, "prompt", NULL);
> + if (cl != NULL)
>   goto error;
> - if (cmdline_file_new(&ctx, "prompt", "-/invalid/~/path") != NULL) {
> + cl = cmdline_file_new(&ctx, "prompt", "-/invalid/~/path");
> + if (cl != NULL) {
>   printf("Error: succeeded in opening invalid file for reading!");
> + cmdline_free(cl);
>   return -1;
>   }
> - if (cmdline_file_new(&ctx, "prompt", "/dev/null") == NULL) {
> + cl = cmdline_file_new(&ctx, "prompt", "/dev/null");
> + if (cl == NULL) {
>   printf("Error: failed to open /dev/null for reading!");
> + cmdline_free(cl);
>   return -1;
>   }

The cmdline_free(cl) after a if (cl == NULL) is not needed.

After that change, you can add my ack in v4.

Thanks,
Olivier

>  
>   /* void functions */
>   cmdline_stdin_exit(NULL);
>  
> + cmdline_free(cl);
>   return 0;
>  error:
>   printf("Error: function accepted null parameter!\n");
> + cmdline_free(cl);
>   return -1;
>  }
>  
> @@ -176,13 +190,14 @@ test_cmdline_fns(void)
>   struct cmdline *cl;
>  
>   memset(&ctx, 0, sizeof(ctx));
> - cl = cmdline_new(&ctx, "test", -1, -1);
> - if (cl == NULL)
> + cl = cmdline_new(NULL, "prompt", 0, 0);
> + if (cl != NULL)
>   goto error;
> -
> - if (cmdline_new(NULL, "prompt", 0, 0) != NULL)
> + cl = cmdline_new(&ctx, NULL, 0, 0);
> + if (cl != NULL)
>   goto error;
> - if (cmdline_new(&ctx, NULL, 0, 0) != NULL)
> + cl = cmdline_new(&ctx, "test", -1, -1);
> + if (cl == NULL)
>   goto error;
>   if (cmdline_in(NULL, "buffer", CMDLINE_TEST_BUFSIZE) >= 0)
>   goto error;
> @@ -198,6 +213,7 @@ test_cmdline_fns(void)
>   cmdline_interact(NULL);
>   cmdline_quit(NULL);
>  
> + cmdline_free(cl);
>   return 0;
>  
>  error:
> -- 
> 2.30.2
> 


[dpdk-dev] [PATCH v2 0/4] net/tap: fix Rx cksum

2021-06-30 Thread Olivier Matz
This patchset fixes the Rx checksum flags in net/tap
driver. The first two patches are the effective fixes.

The last 2 patches introduce a new checksum API to
verify a L4 checksum and its unt test, in order to
simplify the net/tap code, or any other code that has
the same needs.

v2:

* clarify why RTE_PTYPE_L3_IPV4_EXT_UNKNOWN cannot happen in
  tap_verify_csum() (patch 1)
* align style of rte_ipv6_udptcp_cksum_verify() to
  rte_ipv4_udptcp_cksum_verify() (patch 3)
* clarify comment above rte_ipv4_udptcp_cksum_verify() and
  rte_ipv6_udptcp_cksum_verify() (patch 3)


Olivier Matz (4):
  net/tap: fix Rx cksum flags on IP options packets
  net/tap: fix Rx cksum flags on TCP packets
  net: introduce functions to verify L4 checksums
  test/cksum: new test for L3/L4 checksum API

 MAINTAINERS   |   1 +
 app/test/autotest_data.py |   6 +
 app/test/meson.build  |   2 +
 app/test/test_cksum.c | 271 ++
 drivers/net/tap/rte_eth_tap.c |  23 ++-
 lib/net/rte_ip.h  | 127 +---
 6 files changed, 398 insertions(+), 32 deletions(-)
 create mode 100644 app/test/test_cksum.c

-- 
2.29.2



[dpdk-dev] [PATCH v2 1/4] net/tap: fix Rx cksum flags on IP options packets

2021-06-30 Thread Olivier Matz
When packet type is IPV4_EXT, the checksum is always marked as good in
the mbuf offload flags.

Since we know the header lengths, we can easily call
rte_ipv4_udptcp_cksum() in this case too.

Fixes: 8ae3023387e9 ("net/tap: add Rx/Tx checksum offload support")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
---
 drivers/net/tap/rte_eth_tap.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5735988e7c..5513cfd2d7 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -342,7 +342,11 @@ tap_verify_csum(struct rte_mbuf *mbuf)
rte_pktmbuf_data_len(mbuf))
return;
} else {
-   /* IPv6 extensions are not supported */
+   /* - RTE_PTYPE_L3_IPV4_EXT_UNKNOWN cannot happen because
+*   mbuf->packet_type is filled by rte_net_get_ptype() which
+*   never returns this value.
+* - IPv6 extensions are not supported.
+*/
return;
}
if (l4 == RTE_PTYPE_L4_UDP || l4 == RTE_PTYPE_L4_TCP) {
@@ -350,7 +354,7 @@ tap_verify_csum(struct rte_mbuf *mbuf)
/* Don't verify checksum for multi-segment packets. */
if (mbuf->nb_segs > 1)
return;
-   if (l3 == RTE_PTYPE_L3_IPV4) {
+   if (l3 == RTE_PTYPE_L3_IPV4 || l3 == RTE_PTYPE_L3_IPV4_EXT) {
if (l4 == RTE_PTYPE_L4_UDP) {
udp_hdr = (struct rte_udp_hdr *)l4_hdr;
if (udp_hdr->dgram_cksum == 0) {
@@ -364,7 +368,7 @@ tap_verify_csum(struct rte_mbuf *mbuf)
}
}
cksum = ~rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
-   } else if (l3 == RTE_PTYPE_L3_IPV6) {
+   } else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
cksum = ~rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
}
mbuf->ol_flags |= cksum ?
-- 
2.29.2



[dpdk-dev] [PATCH v2 2/4] net/tap: fix Rx cksum flags on TCP packets

2021-06-30 Thread Olivier Matz
Since commit d5df2ae0428a ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() or
rte_ipv6_udptcp_cksum() can return either 0x or 0x when used to
verify a packet containing a valid checksum.

This new behavior broke the checksum verification in tap driver for TCP
packets: these packets are marked with PKT_RX_L4_CKSUM_BAD.

Fix this by checking the 2 possible values. A next commit will introduce
a checksum verification helper to simplify this a bit.

Fixes: d5df2ae0428a ("net: fix unneeded replacement of TCP checksum 0")
Cc: sta...@dpdk.org

Signed-off-by: Olivier Matz 
Acked-by: Andrew Rybchenko 
---
 drivers/net/tap/rte_eth_tap.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5513cfd2d7..5429f611c1 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -350,6 +350,8 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
if (l4 == RTE_PTYPE_L4_UDP || l4 == RTE_PTYPE_L4_TCP) {
+   int cksum_ok;
+
l4_hdr = rte_pktmbuf_mtod_offset(mbuf, void *, l2_len + l3_len);
/* Don't verify checksum for multi-segment packets. */
if (mbuf->nb_segs > 1)
@@ -367,13 +369,13 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
}
-   cksum = ~rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum = rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
} else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
-   cksum = ~rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum = rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
}
-   mbuf->ol_flags |= cksum ?
-   PKT_RX_L4_CKSUM_BAD :
-   PKT_RX_L4_CKSUM_GOOD;
+   cksum_ok = (cksum == 0) || (cksum == 0x);
+   mbuf->ol_flags |= cksum_ok ?
+   PKT_RX_L4_CKSUM_GOOD : PKT_RX_L4_CKSUM_BAD;
}
 }
 
-- 
2.29.2



[dpdk-dev] [PATCH v2 3/4] net: introduce functions to verify L4 checksums

2021-06-30 Thread Olivier Matz
Since commit d5df2ae0428a ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() and
rte_ipv6_udptcp_cksum() can return either 0x or 0x when used to
verify a packet containing a valid checksum.

Since these functions should be used to calculate the checksum to set in
a packet, introduce 2 new helpers for checksum verification. They return
0 if the checksum is valid in the packet.

Use this new helper in net/tap driver.

Signed-off-by: Olivier Matz 
Acked-by: Morten Brørup 
---
 drivers/net/tap/rte_eth_tap.c |   7 +-
 lib/net/rte_ip.h  | 127 +++---
 2 files changed, 107 insertions(+), 27 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5429f611c1..2229eef059 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -369,11 +369,12 @@ tap_verify_csum(struct rte_mbuf *mbuf)
return;
}
}
-   cksum = rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum_ok = !rte_ipv4_udptcp_cksum_verify(l3_hdr,
+l4_hdr);
} else { /* l3 == RTE_PTYPE_L3_IPV6, checked above */
-   cksum = rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
+   cksum_ok = !rte_ipv6_udptcp_cksum_verify(l3_hdr,
+l4_hdr);
}
-   cksum_ok = (cksum == 0) || (cksum == 0x);
mbuf->ol_flags |= cksum_ok ?
PKT_RX_L4_CKSUM_GOOD : PKT_RX_L4_CKSUM_BAD;
}
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 4b728969c1..05948b69b7 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -344,20 +344,10 @@ rte_ipv4_phdr_cksum(const struct rte_ipv4_hdr *ipv4_hdr, 
uint64_t ol_flags)
 }
 
 /**
- * Process the IPv4 UDP or TCP checksum.
- *
- * The IP and layer 4 checksum must be set to 0 in the packet by
- * the caller.
- *
- * @param ipv4_hdr
- *   The pointer to the contiguous IPv4 header.
- * @param l4_hdr
- *   The pointer to the beginning of the L4 header.
- * @return
- *   The complemented checksum to set in the IP packet.
+ * @internal Calculate the non-complemented IPv4 L4 checksum
  */
 static inline uint16_t
-rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+__rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void 
*l4_hdr)
 {
uint32_t cksum;
uint32_t l3_len, l4_len;
@@ -374,16 +364,65 @@ rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr 
*ipv4_hdr, const void *l4_hdr)
cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
 
cksum = ((cksum & 0x) >> 16) + (cksum & 0x);
-   cksum = (~cksum) & 0x;
+
+   return (uint16_t)cksum;
+}
+
+/**
+ * Process the IPv4 UDP or TCP checksum.
+ *
+ * The IP and layer 4 checksum must be set to 0 in the packet by
+ * the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_udptcp_cksum(const struct rte_ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+{
+   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
+
+   cksum = ~cksum;
+
/*
-* Per RFC 768:If the computed checksum is zero for UDP,
+* Per RFC 768: If the computed checksum is zero for UDP,
 * it is transmitted as all ones
 * (the equivalent in one's complement arithmetic).
 */
if (cksum == 0 && ipv4_hdr->next_proto_id == IPPROTO_UDP)
cksum = 0x;
 
-   return (uint16_t)cksum;
+   return cksum;
+}
+
+/**
+ * Validate the IPv4 UDP or TCP checksum.
+ *
+ * In case of UDP, the caller must first check if udp_hdr->dgram_cksum is 0
+ * (i.e. no checksum).
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   Return 0 if the checksum is correct, else -1.
+ */
+__rte_experimental
+static inline int
+rte_ipv4_udptcp_cksum_verify(const struct rte_ipv4_hdr *ipv4_hdr,
+const void *l4_hdr)
+{
+   uint16_t cksum = __rte_ipv4_udptcp_cksum(ipv4_hdr, l4_hdr);
+
+   if (cksum != 0x)
+   return -1;
+
+   return 0;
 }
 
 /**
@@ -448,6 +487,25 @@ rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, 
uint64_t ol_flags)
return __rte_raw_cksum_reduce(sum);
 }
 
+/**
+ * @internal Calculate the non-complemented IPv4 L4 checksum
+ */
+static inline uint16_t
+__rte_ipv6_udptcp_cksum(const struct rte_ipv6_hdr *ipv6_hdr, const void 
*l4_hdr)
+{
+   uint32_

[dpdk-dev] [PATCH v2 4/4] test/cksum: new test for L3/L4 checksum API

2021-06-30 Thread Olivier Matz
Add a simple unit test for checksum API.

Signed-off-by: Olivier Matz 
---
 MAINTAINERS   |   1 +
 app/test/autotest_data.py |   6 +
 app/test/meson.build  |   2 +
 app/test/test_cksum.c | 271 ++
 4 files changed, 280 insertions(+)
 create mode 100644 app/test/test_cksum.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..4347555ebc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1314,6 +1314,7 @@ Packet processing
 Network headers
 M: Olivier Matz 
 F: lib/net/
+F: app/test/test_cksum.c
 
 Packet CRC
 M: Jasvinder Singh 
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 11f9c8640c..302d6374c1 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -567,6 +567,12 @@
 "Func":default_autotest,
 "Report":  None,
 },
+{
+"Name":"Checksum autotest",
+"Command": "cksum_autotest",
+"Func":default_autotest,
+"Report":  None,
+},
 #
 #Please always keep all dump tests at the end and together!
 #
diff --git a/app/test/meson.build b/app/test/meson.build
index 0a5f425578..ef90b16f16 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -17,6 +17,7 @@ test_sources = files(
 'test_bitmap.c',
 'test_bpf.c',
 'test_byteorder.c',
+'test_cksum.c',
 'test_cmdline.c',
 'test_cmdline_cirbuf.c',
 'test_cmdline_etheraddr.c',
@@ -188,6 +189,7 @@ fast_tests = [
 ['atomic_autotest', false],
 ['bitops_autotest', true],
 ['byteorder_autotest', true],
+['cksum_autotest', true],
 ['cmdline_autotest', true],
 ['common_autotest', true],
 ['cpuflags_autotest', true],
diff --git a/app/test/test_cksum.c b/app/test/test_cksum.c
new file mode 100644
index 00..cd983d7c01
--- /dev/null
+++ b/app/test/test_cksum.c
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 6WIND S.A.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MEMPOOL_CACHE_SIZE  0
+#define MBUF_DATA_SIZE  256
+#define NB_MBUF 128
+
+/*
+ * Test L3/L4 checksum API.
+ */
+
+#define GOTO_FAIL(str, ...) do {   \
+   printf("cksum test FAILED (l.%d): <" str ">\n", \
+  __LINE__,  ##__VA_ARGS__);   \
+   goto fail;  \
+   } while (0)
+
+/* generated in scapy with Ether()/IP()/TCP())) */
+static const char test_cksum_ipv4_tcp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x45, 0x00,
+   0x00, 0x28, 0x00, 0x01, 0x00, 0x00, 0x40, 0x06,
+   0x7c, 0xcd, 0x7f, 0x00, 0x00, 0x01, 0x7f, 0x00,
+   0x00, 0x01, 0x00, 0x14, 0x00, 0x50, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x50, 0x02,
+   0x20, 0x00, 0x91, 0x7c, 0x00, 0x00,
+
+};
+
+/* generated in scapy with Ether()/IPv6()/TCP()) */
+static const char test_cksum_ipv6_tcp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x60, 0x00,
+   0x00, 0x00, 0x00, 0x14, 0x06, 0x40, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x14,
+   0x00, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x50, 0x02, 0x20, 0x00, 0x8f, 0x7d,
+   0x00, 0x00,
+};
+
+/* generated in scapy with Ether()/IP()/UDP()/Raw('x')) */
+static const char test_cksum_ipv4_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x45, 0x00,
+   0x00, 0x1d, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+   0x7c, 0xcd, 0x7f, 0x00, 0x00, 0x01, 0x7f, 0x00,
+   0x00, 0x01, 0x00, 0x35, 0x00, 0x35, 0x00, 0x09,
+   0x89, 0x6f, 0x78,
+};
+
+/* generated in scapy with Ether()/IPv6()/UDP()/Raw('x')) */
+static const char test_cksum_ipv6_udp[] = {
+   0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x86, 0xdd, 0x60, 0x00,
+   0x00, 0x00, 0x00, 0x09, 0x11, 0x40, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x35,
+   0x00, 0x35, 0x00, 0x09, 0x87, 0x70, 0x78,
+};
+
+/* generated in scapy with Ether()/IP(options

Re: [dpdk-dev] [PATCH v2] net: prepare the outer ipv4 hdr for checksum

2021-06-30 Thread Olivier Matz
Hi Mohsin,

Hope you are fine!
Please see my comments below.

On Wed, Jun 30, 2021 at 01:04:04PM +0200, Mohsin Kazmi wrote:
> Re: [PATCH v2] net: prepare the outer ipv4 hdr for checksum

I suggest to highlight that it this is the Intel-specific tx-prepare
function in the commit title. What about:

  net: fix Intel-specific Tx preparation for outer checksums

> Preparation the headers for the hardware offload
> misses the outer ipv4 checksum offload.
> It results in bad checksum computed by hardware NIC.
> 
> This patch fixes the issue by setting the outer ipv4
> checksum field to 0.
> 
> Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Mohsin Kazmi 
> Acked-by: Qi Zhang 
> ---
> 
> v2:
> * Update the commit message with Fixes.
> ---
>  lib/net/rte_net.h | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
> index 434435ffa2..e47365099e 100644
> --- a/lib/net/rte_net.h
> +++ b/lib/net/rte_net.h
> @@ -128,8 +128,18 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, 
> uint64_t ol_flags)
>   if (!(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK | PKT_TX_TCP_SEG)))
>   return 0;

I think this test should be updated too with PKT_TX_OUTER_IP_CKSUM.

>  
> - if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6))
> + if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6)) {
>   inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> + /*
> +  * prepare outer ipv4 header checksum by setting it to 0,
> +  * in order to be computed by hardware NICs.
> +  */
> + if (ol_flags & PKT_TX_OUTER_IP_CKSUM) {
> + ipv4_hdr = rte_pktmbuf_mtod_offset(m,
> + struct rte_ipv4_hdr *, m->outer_l2_len);
> + ipv4_hdr->hdr_checksum = 0;
> + }
> + }

What about outer L4 checksum? Does it requires the same than inner?

>  
>   /*
>* Check if headers are fragmented.
> -- 
> 2.17.1
> 


[dpdk-dev] [PATCH] test/mbuf: fix virtual address conversion

2021-07-05 Thread Olivier Matz
Seen with address sanitizer.

rte_mempool_virt2iova() can only be used on mempool elements. In this case,
it is incorrect, and rte_mem_virt2iova() has to be used.

Bugzilla ID: 737
Fixes: 7b295dceea07 ("test/mbuf: add unit test cases")
Cc: sta...@dpdk.org

Reported-by: Zhihong Peng 
Signed-off-by: Olivier Matz 
---
 app/test/test_mbuf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 8e0561eabb..9a248dfaea 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -2363,7 +2363,7 @@ test_pktmbuf_ext_shinfo_init_helper(struct rte_mempool 
*pktmbuf_pool)
if (rte_mbuf_refcnt_read(m) != 1)
GOTO_FAIL("%s: Invalid refcnt in mbuf\n", __func__);
 
-   buf_iova = rte_mempool_virt2iova(ext_buf_addr);
+   buf_iova = rte_mem_virt2iova(ext_buf_addr);
rte_pktmbuf_attach_extbuf(m, ext_buf_addr, buf_iova, buf_len,
ret_shinfo);
if (m->ol_flags != EXT_ATTACHED_MBUF)
-- 
2.29.2



Re: [PATCH v2 1/2] cmdline: add function to verify valid commands

2022-06-07 Thread Olivier Matz
Hi Bruce,

Just few minor comments below.

On Fri, May 20, 2022 at 04:12:39PM +0100, Bruce Richardson wrote:
> The cmdline library cmdline_parse() function parses a command and
> executes the action automatically too. The cmdline_valid_buffer function
> also uses this function to validate commands, meaning that there is no
> function to validate a command as ok without executing it.
> 
> To fix this omission, we extract the body of cmdline_parse into a new
> static inline function with an extra parameter to indicate whether the
> action should be performed or not. Then we create two wrappers around
> that - a replacement for the existing cmdline_parse function where the
> extra parameter is "true" to execute the command, and a new function
> "cmdline_parse_check" which passes the parameter as "false" to perform
> cmdline validation only.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  lib/cmdline/cmdline_parse.c | 20 +---
>  lib/cmdline/cmdline_parse.h | 17 +++--
>  lib/cmdline/version.map |  3 +++
>  3 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/cmdline/cmdline_parse.c b/lib/cmdline/cmdline_parse.c
> index 349ec87bd7..b7fdc67ae5 100644
> --- a/lib/cmdline/cmdline_parse.c
> +++ b/lib/cmdline/cmdline_parse.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -182,8 +183,8 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
>  }
>  
>  
> -int
> -cmdline_parse(struct cmdline *cl, const char * buf)
> +static inline int
> +__cmdline_parse(struct cmdline *cl, const char *buf, bool call_fn)
>  {
>   unsigned int inst_num=0;
>   cmdline_parse_inst_t *inst;
> @@ -284,7 +285,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
>  
>   /* call func */
>   if (f) {
> - f(result.buf, cl, data);
> + if (call_fn)
> + f(result.buf, cl, data);

Maybe nicer to test in one line:

if (f && call_fn)


>   }
>  
>   /* no match */
> @@ -296,6 +298,18 @@ cmdline_parse(struct cmdline *cl, const char * buf)
>   return linelen;
>  }
>  
> +int
> +cmdline_parse(struct cmdline *cl, const char *buf)
> +{
> + return __cmdline_parse(cl, buf, true);
> +}
> +
> +int
> +cmdline_parse_check(struct cmdline *cl, const char *buf)
> +{
> + return __cmdline_parse(cl, buf, false);
> +}
> +
>  int
>  cmdline_complete(struct cmdline *cl, const char *buf, int *state,
>char *dst, unsigned int size)
> diff --git a/lib/cmdline/cmdline_parse.h b/lib/cmdline/cmdline_parse.h
> index e4d802fff7..6dd210d843 100644
> --- a/lib/cmdline/cmdline_parse.h
> +++ b/lib/cmdline/cmdline_parse.h
> @@ -7,6 +7,8 @@
>  #ifndef _CMDLINE_PARSE_H_
>  #define _CMDLINE_PARSE_H_
>  
> +#include 
> +
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
> @@ -149,11 +151,22 @@ typedef cmdline_parse_inst_t *cmdline_parse_ctx_t;
>   * argument buf must ends with "\n\0". The function returns
>   * CMDLINE_PARSE_AMBIGUOUS, CMDLINE_PARSE_NOMATCH or
>   * CMDLINE_PARSE_BAD_ARGS on error. Else it calls the associated
> - * function (defined in the context) and returns 0
> - * (CMDLINE_PARSE_SUCCESS).
> + * function (defined in the context) and returns the parsed line length (>= 
> 0)

Can we add a dot at the end?

>   */
>  int cmdline_parse(struct cmdline *cl, const char *buf);
>  
> +/**
> + * Try to parse a buffer according to the specified context, but do not
> + * perform any function calls if parse is successful.
> + *
> + * The argument buf must ends with "\n\0".
> + * The function returns CMDLINE_PARSE_AMBIGUOUS, CMDLINE_PARSE_NOMATCH or
> + * CMDLINE_PARSE_BAD_ARGS on error and returns the parsed line length (>=0).
> + * on successful parse

Same here.

> + */
> +__rte_experimental
> +int cmdline_parse_check(struct cmdline *cl, const char *buf);
> +
>  /**
>   * complete() must be called with *state==0 (try to complete) or
>   * with *state==-1 (just display choices), then called without
> diff --git a/lib/cmdline/version.map b/lib/cmdline/version.map
> index b9bbb87510..fc7fdd6ea4 100644
> --- a/lib/cmdline/version.map
> +++ b/lib/cmdline/version.map
> @@ -81,5 +81,8 @@ EXPERIMENTAL {
>   rdline_get_history_buffer_size;
>   rdline_get_opaque;
>  
> + # added in 22.07
> + cmdline_parse_check;
> +
>   local: *;
>  };
> -- 
> 2.34.1
> 

With these changes:
Acked-by: Olivier Matz 



Re: [PATCH v2 2/2] test: use cmdline library to validate args

2022-06-07 Thread Olivier Matz
On Fri, May 20, 2022 at 04:12:40PM +0100, Bruce Richardson wrote:
> When passing in test names to run via either the DPDK_TEST environment
> variable or via extra argv parameters, the checks run on those commands
> can miss valid commands that are registered with the cmdline library in
> the initial context used to set it up. This is seen in the fact that the
> "dump_*" set of commands are not callable via argv parameters, but can
> be called manually.
> 
> To fix this, just use the commandline library to validate each command
> before executing it, stopping execution when an error is encountered.
> This also has the benefit of not having the test binrary drop to
> interactive mode if all commandline parameters given are invalid.
> 
> Fixes: 9b848774a5dc ("test: use env variable to run tests")
> Fixes: ace2f054ed43 ("test: take test names from command line")
> Bugzilla ID: 1002
> 
> Signed-off-by: Bruce Richardson 

Acked-by: Olivier Matz 


Minutes of Technical Board Meeting, 2022-06-01

2022-06-07 Thread Olivier Matz
Members Attending
=

9/11
- Aaron
- Bruce
- Hemant
- Jerin
- Kevin
- Maxime
- Olivier (chair)
- Stephen
- Thomas

NOTE: The technical board meetings are on every second Wednesday at
https://meet.jit.si/DPDK at 3 pm UTC. Meetings are public, and DPDK
community members are welcome to attend.

NOTE: Next meeting will be on Wednesday 2021-June-15 @3pm UTC, and will
be chaired by Stephen.

Agenda items


1) Update on the tech writer hire
-

We are in the process of recruiting a tech writer to enhance DPDK
documentation.

The work group is composed of Nathan, Bruce, Stephen, Thomas.

- 5 reasonnable candidates among 17 applicants
- the list of tasks is defined, it has been estimated to ~250h
- the work should be spread over ~6 months to give enough time
  to the community for feedback
- after some time, if the community is satisfied, the writer can
  suggest new enhancements, reworks, or estimation updates
- in case the community is not satisfied, the contract could end
  before the end of the tasks

2) Discussions about alternatives to bug bounty to find bugs


These three ideas were mentionned:

- static analysis tools
- fuzz testing
- adding more tests to CI

Projects from Google Project Zero were also mentionned:
https://github.com/orgs/googleprojectzero/repositories

3) Reminder about API/ABI stability
---

Recently, the vector keyword was removed from rte_altivec.h:
http://git.dpdk.org/dpdk/commit/?id=64fcadeac0f

Since it is a minor (accepted) API breakage, it is the opportunity
to do a reminder about the ABI/API process:

- API breakages are announced and can happen in minor versions
- ABI breakages are announced and can only happen in LTS releases

4) Removal of KNI
-

There is no more maintainer for KNI.

A progressive removal proposal was made:
- add a message at runtime and/or compilation to announce deprecation
- remove KNI example after 22.11
- remove lib + kmod from main repo for 23.11

Bruce recently submitted a doc patchset to explain how to replace
it by virtio-user:
https://patchwork.dpdk.org/project/dpdk/list/?series=23218

The status of pending patches is not obvious. Until now, it was not
announced that new patches won't be integrated. Thomas will open the
discussion.


Re: [PATCH v2 1/2] cmdline: add function to verify valid commands

2022-06-10 Thread Olivier Matz
On Fri, Jun 10, 2022 at 03:08:49PM +0100, Bruce Richardson wrote:
> On Tue, Jun 07, 2022 at 10:08:30AM +0200, Olivier Matz wrote:
> > Hi Bruce,
> > 
> > Just few minor comments below.
> > 
> > On Fri, May 20, 2022 at 04:12:39PM +0100, Bruce Richardson wrote:
> > > The cmdline library cmdline_parse() function parses a command and
> > > executes the action automatically too. The cmdline_valid_buffer function
> > > also uses this function to validate commands, meaning that there is no
> > > function to validate a command as ok without executing it.
> > > 
> > > To fix this omission, we extract the body of cmdline_parse into a new
> > > static inline function with an extra parameter to indicate whether the
> > > action should be performed or not. Then we create two wrappers around
> > > that - a replacement for the existing cmdline_parse function where the
> > > extra parameter is "true" to execute the command, and a new function
> > > "cmdline_parse_check" which passes the parameter as "false" to perform
> > > cmdline validation only.
> > > 
> > > Signed-off-by: Bruce Richardson 
> > > ---
> > >  lib/cmdline/cmdline_parse.c | 20 +---
> > >  lib/cmdline/cmdline_parse.h | 17 +++--
> > >  lib/cmdline/version.map |  3 +++
> > >  3 files changed, 35 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/lib/cmdline/cmdline_parse.c b/lib/cmdline/cmdline_parse.c
> > > index 349ec87bd7..b7fdc67ae5 100644
> > > --- a/lib/cmdline/cmdline_parse.c
> > > +++ b/lib/cmdline/cmdline_parse.c
> > > @@ -7,6 +7,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #include 
> > >  
> > > @@ -182,8 +183,8 @@ match_inst(cmdline_parse_inst_t *inst, const char 
> > > *buf,
> > >  }
> > >  
> > >  
> > > -int
> > > -cmdline_parse(struct cmdline *cl, const char * buf)
> > > +static inline int
> > > +__cmdline_parse(struct cmdline *cl, const char *buf, bool call_fn)
> > >  {
> > >   unsigned int inst_num=0;
> > >   cmdline_parse_inst_t *inst;
> > > @@ -284,7 +285,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
> > >  
> > >   /* call func */
> > >   if (f) {
> > > - f(result.buf, cl, data);
> > > + if (call_fn)
> > > + f(result.buf, cl, data);
> > 
> > Maybe nicer to test in one line:
> > 
> > if (f && call_fn)
> > 
> 
> If we do so, then we need to also change the "else" leg to "else if

Oh yes I missed the else part!


> (!call_fn)" because we don't want to have the debug_printf being output in
> the case that we have call_fn == false. A better alternative is to slightly
> restructure the whole block, to have the error leg first, which removes the
> need for two condition checks before calling the function:
> 
> /* no match */
> if (f == NULL) {
> debug_printf("No match err=%d\n", err);
> return err;
> }
> 
> /* call func if requested*/
> if (call_fn)
> f(result.buf, cl, data);
> 
> return linelen;
> 
> I think this latter option is better, so will implement in v3.

Yes, it looks good to me, thanks!


> 
> > 
> > >   }
> > >  
> > >   /* no match */
> > > @@ -296,6 +298,18 @@ cmdline_parse(struct cmdline *cl, const char * buf)
> > >   return linelen;
> > >  }
> > >  
> > > +int
> > > +cmdline_parse(struct cmdline *cl, const char *buf)
> > > +{
> > > + return __cmdline_parse(cl, buf, true);
> > > +}
> > > +
> > > +int
> > > +cmdline_parse_check(struct cmdline *cl, const char *buf)
> > > +{
> > > + return __cmdline_parse(cl, buf, false);
> > > +}
> > > +
> > >  int
> > >  cmdline_complete(struct cmdline *cl, const char *buf, int *state,
> > >char *dst, unsigned int size)
> > > diff --git a/lib/cmdline/cmdline_parse.h b/lib/cmdline/cmdline_parse.h
> > > index e4d802fff7..6dd210d843 100644
> > > --- a/lib/cmdline/cmdline_parse.h
> > > +++ b/lib/cmdline/cmdline_parse.h
> > > @@ -7,6 +7,8 @@
> > >  #ifndef _CMDLINE_PARSE_H_
> > >  #define _CMDLINE_PARSE_H_
> > >  
> > > +#include 
> > > +
&

Re: [PATCH] mbuf: add mbuf physical address field to dynamic field

2022-07-01 Thread Olivier Matz
Hi,

On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
> On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
> > If all devices are configured to run in IOVA mode as VA, physical
> > address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
> > space is free to use as a dynamic field. So a new dynamic field member
> > (dynfield2) is added in mbuf structure to make use of that space.
> > 
> > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
> > mbuf that can use dynfield2.
> > 
> > Signed-off-by: Shijith Thotton 
> > ---
> I disagree with this patch. The mbuf should always record the iova of the
> buffer directly, rather than forcing the drivers to query the EAL mode.
> This will likely also break all vector drivers right now, as they are
> sensitive to the mbuf layout and the position of the IOVA address in the
> buffer.

I have the same opinion than Stephen and Bruce. This field is widely used
in DPDK, I don't think it is a good idea to disable it if some conditions
are met.


Re: [PATCH v2 1/2] app/test: add cksum performance test

2022-07-11 Thread Olivier Matz
Hi Mattias,

Please see few comments below.

On Fri, Jul 08, 2022 at 02:56:07PM +0200, Mattias Rönnblom wrote:
> Add performance test for the rte_raw_cksum() function, which delegates
> the actual work to __rte_raw_cksum(), which in turn is used by other
> functions in need of Internet checksum calculation.
> 
> Signed-off-by: Mattias Rönnblom 
> 
> ---
> 
> v2:
>   * Added __rte_unused to unused volatile variable, to keep the Intel
> compiler happy.
> ---
>  MAINTAINERS|   1 +
>  app/test/meson.build   |   1 +
>  app/test/test_cksum_perf.c | 118 +
>  3 files changed, 120 insertions(+)
>  create mode 100644 app/test/test_cksum_perf.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c923712946..2a4c99e05a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1414,6 +1414,7 @@ Network headers
>  M: Olivier Matz 
>  F: lib/net/
>  F: app/test/test_cksum.c
> +F: app/test/test_cksum_perf.c
>  
>  Packet CRC
>  M: Jasvinder Singh 
> diff --git a/app/test/meson.build b/app/test/meson.build
> index 431c5bd318..191db03d1d 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -18,6 +18,7 @@ test_sources = files(
>  'test_bpf.c',
>  'test_byteorder.c',
>  'test_cksum.c',
> +'test_cksum_perf.c',
>  'test_cmdline.c',
>  'test_cmdline_cirbuf.c',
>  'test_cmdline_etheraddr.c',
> diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c
> new file mode 100644
> index 00..bff73cb3bb
> --- /dev/null
> +++ b/app/test/test_cksum_perf.c
> @@ -0,0 +1,118 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test.h"
> +
> +#define NUM_BLOCKS (10)
> +#define ITERATIONS (100)

Parenthesis can be safely removed

> +
> +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 };
> +
> +static __rte_noinline uint16_t
> +do_rte_raw_cksum(const void *buf, size_t len)
> +{
> + return rte_raw_cksum(buf, len);
> +}

I don't understand the need to have this wrapper, especially marked
__rte_noinline. What is the objective?

Note that when I remove the __rte_noinline, the performance is better
for size 20 and 21.

> +
> +static void
> +init_block(void *buf, size_t len)

Can buf be a (char *) instead?
It would avoid a cast below.

> +{
> + size_t i;
> +
> + for (i = 0; i < len; i++)
> + ((char *)buf)[i] = (uint8_t)rte_rand();
> +}
> +
> +static int
> +test_cksum_perf_size_alignment(size_t block_size, bool aligned)
> +{
> + char *data[NUM_BLOCKS];
> + char *blocks[NUM_BLOCKS];
> + unsigned int i;
> + uint64_t start;
> + uint64_t end;
> + /* Floating point to handle low (pseudo-)TSC frequencies */
> + double block_latency;
> + double byte_latency;
> + volatile __rte_unused uint64_t sum = 0;
> +
> + for (i = 0; i < NUM_BLOCKS; i++) {
> + data[i] = rte_malloc(NULL, block_size + 1, 0);
> +
> + if (data[i] == NULL) {
> + printf("Failed to allocate memory for block\n");
> + return TEST_FAILED;
> + }
> +
> + init_block(data[i], block_size + 1);
> +
> + blocks[i] = aligned ? data[i] : data[i] + 1;
> + }
> +
> + start = rte_rdtsc();
> +
> + for (i = 0; i < ITERATIONS; i++) {
> + unsigned int j;
> + for (j = 0; j < NUM_BLOCKS; j++)
> + sum += do_rte_raw_cksum(blocks[j], block_size);
> + }
> +
> + end = rte_rdtsc();
> +
> + block_latency = (end - start) / (double)(ITERATIONS * NUM_BLOCKS);
> + byte_latency = block_latency / block_size;
> +
> + printf("%-9s %10zd %19.1f %16.2f\n", aligned ? "Aligned" : "Unaligned",
> +block_size, block_latency, byte_latency);

When I run the test on my dev machine, I get the following results,
which are quite reproductible:

Aligned   20   10.4  0.52 (range is 0.48 - 0.52)
Unaligned 207.9  0.39 (range is 0.39 - 0.40)
...

If I increase the number of iterations, the first results
change significantly:

Aligned   208.2  0.42 (range is 0.41 - 0.42)
Unaligned 208.0  0.40 (always this value)

To have more precise tests with small size, would it make sense to
target a test time instead of a

Re: [PATCH v2 2/2] net: have checksum routines accept unaligned data

2022-07-11 Thread Olivier Matz
Hi,

On Fri, Jul 08, 2022 at 02:56:08PM +0200, Mattias Rönnblom wrote:
> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> data through an uint16_t pointer, which allowed the compiler to assume
> the data was 16-bit aligned. This in turn would, with certain
> architectures and compiler flag combinations, result in code with SIMD
> load or store instructions with restrictions on data alignment.
> 
> This patch keeps the old algorithm, but data is read using memcpy()
> instead of direct pointer access, forcing the compiler to always
> generate code that handles unaligned input. The __may_alias__ GCC
> attribute is no longer needed.
> 
> The data on which the Internet checksum functions operates are almost
> always 16-bit aligned, but there are exceptions. In particular, the
> PDCP protocol header may (literally) have an odd size.
> 
> Performance impact seems to range from none to a very slight
> regression.
> 
> Bugzilla ID: 1035
> Cc: sta...@dpdk.org
> 
> ---

Using memcpy() looks to be a good solution fix the issue, while avoiding a
branch and the __may_alias__.

I just have one minor comment below.

> 
> v2:
>   * Simplified the odd-length conditional (Morten Brørup).
> 
> Reviewed-by: Morten Brørup 
> 
> Signed-off-by: Mattias Rönnblom 
> ---
>  lib/net/rte_ip.h | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index b502481670..a0334d931e 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -160,18 +160,21 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
>  static inline uint32_t
>  __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>  {
> - /* extend strict-aliasing rules */
> - typedef uint16_t __attribute__((__may_alias__)) u16_p;
> - const u16_p *u16_buf = (const u16_p *)buf;
> - const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> + const void *end;
>  
> - for (; u16_buf != end; ++u16_buf)
> - sum += *u16_buf;
> + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * sizeof(uint16_t));

What do you think about this form:

for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t)));

This also has the good property to solve the debate about the
spaces around the '/' :)


> +  buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
> + uint16_t v;
> +
> + memcpy(&v, buf, sizeof(uint16_t));
> + sum += v;
> + }
>  
>   /* if length is odd, keeping it byte order independent */
>   if (unlikely(len % 2)) {
>   uint16_t left = 0;
> - *(unsigned char *)&left = *(const unsigned char *)end;
> +
> + memcpy(&left, end, 1);
>   sum += left;
>   }
>  
> -- 
> 2.25.1
> 


Re: [PATCH v2 1/2] app/test: add cksum performance test

2022-07-11 Thread Olivier Matz
On Mon, Jul 11, 2022 at 10:42:37AM +, Mattias Rönnblom wrote:
> On 2022-07-11 11:47, Olivier Matz wrote:
> > Hi Mattias,
> > 
> > Please see few comments below.
> > 
> > On Fri, Jul 08, 2022 at 02:56:07PM +0200, Mattias Rönnblom wrote:
> >> Add performance test for the rte_raw_cksum() function, which delegates
> >> the actual work to __rte_raw_cksum(), which in turn is used by other
> >> functions in need of Internet checksum calculation.
> >>
> >> Signed-off-by: Mattias Rönnblom 
> >>
> >> ---
> >>
> >> v2:
> >>* Added __rte_unused to unused volatile variable, to keep the Intel
> >>  compiler happy.
> >> ---
> >>   MAINTAINERS|   1 +
> >>   app/test/meson.build   |   1 +
> >>   app/test/test_cksum_perf.c | 118 +
> >>   3 files changed, 120 insertions(+)
> >>   create mode 100644 app/test/test_cksum_perf.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index c923712946..2a4c99e05a 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -1414,6 +1414,7 @@ Network headers
> >>   M: Olivier Matz 
> >>   F: lib/net/
> >>   F: app/test/test_cksum.c
> >> +F: app/test/test_cksum_perf.c
> >>   
> >>   Packet CRC
> >>   M: Jasvinder Singh 
> >> diff --git a/app/test/meson.build b/app/test/meson.build
> >> index 431c5bd318..191db03d1d 100644
> >> --- a/app/test/meson.build
> >> +++ b/app/test/meson.build
> >> @@ -18,6 +18,7 @@ test_sources = files(
> >>   'test_bpf.c',
> >>   'test_byteorder.c',
> >>   'test_cksum.c',
> >> +'test_cksum_perf.c',
> >>   'test_cmdline.c',
> >>   'test_cmdline_cirbuf.c',
> >>   'test_cmdline_etheraddr.c',
> >> diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c
> >> new file mode 100644
> >> index 00..bff73cb3bb
> >> --- /dev/null
> >> +++ b/app/test/test_cksum_perf.c
> >> @@ -0,0 +1,118 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2022 Ericsson AB
> >> + */
> >> +
> >> +#include 
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "test.h"
> >> +
> >> +#define NUM_BLOCKS (10)
> >> +#define ITERATIONS (100)
> > 
> > Parenthesis can be safely removed
> > 
> >> +
> >> +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 };
> >> +
> >> +static __rte_noinline uint16_t
> >> +do_rte_raw_cksum(const void *buf, size_t len)
> >> +{
> >> +  return rte_raw_cksum(buf, len);
> >> +}
> > 
> > I don't understand the need to have this wrapper, especially marked
> > __rte_noinline. What is the objective?
> > 
> 
> The intention is to disallow the compiler to perform unrolling and 
> integrating/interleave one cksum operating with the next buffer's in a 
> way that wouldn't be feasable in a real application.
> 
> It will result in an overestimation of the cost for small cksums, so 
> it's still misleading, but in another direction. :)

OK, got it. I think it's fine like you did then.

> 
> > Note that when I remove the __rte_noinline, the performance is better
> > for size 20 and 21.
> > 
> >> +
> >> +static void
> >> +init_block(void *buf, size_t len)
> > 
> > Can buf be a (char *) instead?
> > It would avoid a cast below.
> > 
> 
> Yes.
> 
> >> +{
> >> +  size_t i;
> >> +
> >> +  for (i = 0; i < len; i++)
> >> +  ((char *)buf)[i] = (uint8_t)rte_rand();
> >> +}
> >> +
> >> +static int
> >> +test_cksum_perf_size_alignment(size_t block_size, bool aligned)
> >> +{
> >> +  char *data[NUM_BLOCKS];
> >> +  char *blocks[NUM_BLOCKS];
> >> +  unsigned int i;
> >> +  uint64_t start;
> >> +  uint64_t end;
> >> +  /* Floating point to handle low (pseudo-)TSC frequencies */
> >> +  double block_latency;
> >> +  double byte_latency;
> >> +  volatile __rte_unused uint64_t sum = 0;
> >> +
> >> +  for (i = 0; i < NUM_BLOCKS; i++) {
> 

Re: [PATCH v3 1/2] app/test: add cksum performance test

2022-07-11 Thread Olivier Matz
On Mon, Jul 11, 2022 at 02:11:31PM +0200, Mattias Rönnblom wrote:
> Add performance test for the rte_raw_cksum() function, which delegates
> the actual work to __rte_raw_cksum(), which in turn is used by other
> functions in need of Internet checksum calculation.
> 
> Signed-off-by: Mattias Rönnblom 

Acked-by: Olivier Matz 

Thank you!


Re: [PATCH v3 2/2] net: have checksum routines accept unaligned data

2022-07-11 Thread Olivier Matz
On Mon, Jul 11, 2022 at 02:11:32PM +0200, Mattias Rönnblom wrote:
> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> data through an uint16_t pointer, which allowed the compiler to assume
> the data was 16-bit aligned. This in turn would, with certain
> architectures and compiler flag combinations, result in code with SIMD
> load or store instructions with restrictions on data alignment.
> 
> This patch keeps the old algorithm, but data is read using memcpy()
> instead of direct pointer access, forcing the compiler to always
> generate code that handles unaligned input. The __may_alias__ GCC
> attribute is no longer needed.
> 
> The data on which the Internet checksum functions operates are almost
> always 16-bit aligned, but there are exceptions. In particular, the
> PDCP protocol header may (literally) have an odd size.
> 
> Performance impact seems to range from none to a very slight
> regression.
> 
> Bugzilla ID: 1035
> Cc: sta...@dpdk.org

Fixes: 6006818cfb26 ("net: new checksum functions")

> ---
> 
> v3:
>   * Use RTE_ALIGN_FLOOR() in the pointer arithmetic (Olivier Matz).
> v2:
>   * Simplified the odd-length conditional (Morten Brørup).
> 
> Reviewed-by: Morten Brørup 
> 
> Signed-off-by: Mattias Rönnblom 

Acked-by: Olivier Matz 

Thank you!


Re: [dpdk-dev] [PATCH v5] eal: fix race in ctrl thread creation

2021-04-07 Thread Olivier Matz
Hi Luc,

On Wed, Apr 07, 2021 at 08:53:23AM -0400, Luc Pelletier wrote:
> The creation of control threads uses a pthread barrier for
> synchronization. This patch fixes a race condition where the pthread
> barrier could get destroyed while one of the threads has not yet
> returned from the pthread_barrier_wait function, which could result in
> undefined behaviour.
> 
> Fixes: 3a0d465d4c53 ("eal: fix use-after-free on control thread creation")
> Cc: jianfeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Luc Pelletier 
> ---
> 
> Same as v4 except that I fixed 2 minor style issues flagged by patchwork.
> 
>  lib/librte_eal/common/eal_common_thread.c | 52 +++
>  1 file changed, 25 insertions(+), 27 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_thread.c 
> b/lib/librte_eal/common/eal_common_thread.c
> index 73a055902..c1044e795 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -170,11 +170,19 @@ struct rte_thread_ctrl_params {
>   void *(*start_routine)(void *);
>   void *arg;
>   pthread_barrier_t configured;
> + unsigned int refcnt;
>  };
>  
> +static void ctrl_params_free(struct rte_thread_ctrl_params *params)
> +{
> + if (__atomic_sub_fetch(¶ms->refcnt, 1, __ATOMIC_ACQ_REL) == 0) {
> + pthread_barrier_destroy(¶ms->configured);
> + free(params);
> + }
> +}
> +
>  static void *ctrl_thread_init(void *arg)
>  {
> - int ret;
>   struct internal_config *internal_conf =
>   eal_get_internal_configuration();
>   rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
> @@ -184,11 +192,8 @@ static void *ctrl_thread_init(void *arg)
>  
>   __rte_thread_init(rte_lcore_id(), cpuset);
>  
> - ret = pthread_barrier_wait(¶ms->configured);
> - if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> - pthread_barrier_destroy(¶ms->configured);
> - free(params);
> - }
> + pthread_barrier_wait(¶ms->configured);
> + ctrl_params_free(params);
>  
>   return start_routine(routine_arg);
>  }
> @@ -210,14 +215,15 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
> *name,
>  
>   params->start_routine = start_routine;
>   params->arg = arg;
> + params->refcnt = 2;
>  
> - pthread_barrier_init(¶ms->configured, NULL, 2);
> + ret = pthread_barrier_init(¶ms->configured, NULL, 2);
> + if (ret != 0)
> + goto fail_no_barrier;
>  
>   ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
> - if (ret != 0) {
> - free(params);
> - return -ret;
> - }
> + if (ret != 0)
> + goto fail_with_barrier;
>  
>   if (name != NULL) {
>   ret = rte_thread_setname(*thread, name);
> @@ -227,25 +233,17 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
> *name,
>   }
>  
>   ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
> - if (ret)
> - goto fail;
> + pthread_barrier_wait(¶ms->configured);
> + ctrl_params_free(params);
>  
> - ret = pthread_barrier_wait(¶ms->configured);
> - if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> - pthread_barrier_destroy(¶ms->configured);
> - free(params);
> - }
> + return -ret;

I think not killing the thread when pthread_setaffinity_np() returns an
error is not very understandable from the API user point of view.

What about doing this on top of your patch? The idea is to set
start_routine to NULL before the barrier if pthread_setaffinity_np()
failed. So there is no need to cancel the thread, it will exit by
itself.

  @@ -187,14 +187,18 @@ static void *ctrl_thread_init(void *arg)
  eal_get_internal_configuration();
  rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
  struct rte_thread_ctrl_params *params = arg;
  -   void *(*start_routine)(void *) = params->start_routine;
  +   void *(*start_routine)(void *);
  void *routine_arg = params->arg;
   
  __rte_thread_init(rte_lcore_id(), cpuset);
   
  pthread_barrier_wait(¶ms->configured);
  +   start_routine = params->start_routine;
  ctrl_params_free(params);
   
  +   if (start_routine == NULL)
  +   return NULL;
  +
  return start_routine(routine_arg);
   }
   
  @@ -233,10 +237,18 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
*name,
  }
   
  ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
  +   if (ret != 0)
  +   params->start_routine = NULL;
  +
  pthread_barrier_wait(¶ms->configured);
  ctrl_params_free(params);
   
  -   return -ret;
  +   if (ret != 0) {
  +   pthread_join(*thread, NULL);
  +   return -ret;
  +   }
  +
  +   return 0;
   
   fail_with_barrier:
  pthread_barrier_destroy(¶ms->configured);


Regards,
O

Re: [dpdk-dev] [PATCH v2] lib/mempool: distinguish debug counters from cache and pool

2021-04-07 Thread Olivier Matz
Hi Joyce,

On Thu, Mar 18, 2021 at 07:20:22PM +0800, Joyce Kong wrote:
> If cache is enabled, objects will be retrieved/put from/to cache,
> subsequently from/to the common pool. Now the debug stats calculate
> the objects retrieved/put from/to cache and pool together, it is
> better to distinguish the data number from local cache and common
> pool.

This is indeed a very useful information, thanks for proposing this.

Please see some comments below.

> Signed-off-by: Joyce Kong 
> ---
>  lib/librte_mempool/rte_mempool.c | 12 ++
>  lib/librte_mempool/rte_mempool.h | 64 ++--
>  2 files changed, 57 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c 
> b/lib/librte_mempool/rte_mempool.c
> index afb1239c8..9cb69367a 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -1244,8 +1244,14 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
>   sum.put_bulk += mp->stats[lcore_id].put_bulk;
>   sum.put_objs += mp->stats[lcore_id].put_objs;
> + sum.put_objs_cache += mp->stats[lcore_id].put_objs_cache;
> + sum.put_objs_pool += mp->stats[lcore_id].put_objs_pool;
> + sum.put_objs_flush += mp->stats[lcore_id].put_objs_flush;
>   sum.get_success_bulk += mp->stats[lcore_id].get_success_bulk;
>   sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
> + sum.get_success_objs_cache += 
> mp->stats[lcore_id].get_success_objs_cache;
> + sum.get_success_objs_pool += 
> mp->stats[lcore_id].get_success_objs_pool;
> + sum.get_success_objs_refill += 
> mp->stats[lcore_id].get_success_objs_refill;
>   sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
>   sum.get_fail_objs += mp->stats[lcore_id].get_fail_objs;
>   sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
> @@ -1254,8 +1260,14 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   fprintf(f, "  stats:\n");
>   fprintf(f, "put_bulk=%"PRIu64"\n", sum.put_bulk);
>   fprintf(f, "put_objs=%"PRIu64"\n", sum.put_objs);
> + fprintf(f, "put_objs_cache=%"PRIu64"\n", sum.put_objs_cache);
> + fprintf(f, "put_objs_pool=%"PRIu64"\n", sum.put_objs_pool);
> + fprintf(f, "put_objs_flush=%"PRIu64"\n", sum.put_objs_flush);
>   fprintf(f, "get_success_bulk=%"PRIu64"\n", sum.get_success_bulk);
>   fprintf(f, "get_success_objs=%"PRIu64"\n", sum.get_success_objs);
> + fprintf(f, "get_success_objs_cache=%"PRIu64"\n", 
> sum.get_success_objs_cache);
> + fprintf(f, "get_success_objs_pool=%"PRIu64"\n", 
> sum.get_success_objs_pool);
> + fprintf(f, "get_success_objs_refill=%"PRIu64"\n", 
> sum.get_success_objs_refill);
>   fprintf(f, "get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
>   fprintf(f, "get_fail_objs=%"PRIu64"\n", sum.get_fail_objs);
>   if (info.contig_block_size > 0) {
> diff --git a/lib/librte_mempool/rte_mempool.h 
> b/lib/librte_mempool/rte_mempool.h
> index c551cf733..29d80d97e 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -66,12 +66,18 @@ extern "C" {
>   * A structure that stores the mempool statistics (per-lcore).
>   */
>  struct rte_mempool_debug_stats {
> - uint64_t put_bulk; /**< Number of puts. */
> - uint64_t put_objs; /**< Number of objects successfully put. */
> - uint64_t get_success_bulk; /**< Successful allocation number. */
> - uint64_t get_success_objs; /**< Objects successfully allocated. */
> - uint64_t get_fail_bulk;/**< Failed allocation number. */
> - uint64_t get_fail_objs;/**< Objects that failed to be allocated. */
> + uint64_t put_bulk;/**< Number of puts. */
> + uint64_t put_objs;/**< Number of objects successfully 
> put. */
> + uint64_t put_objs_cache;  /**< Number of objects successfully 
> put to cache. */
> + uint64_t put_objs_pool;   /**< Number of objects successfully 
> put to pool. */
> + uint64_t put_objs_flush;  /**< Number of flushing objects from 
> cache to pool. */
> + uint64_t get_success_bulk;/**< Successful allocation number. */
> + uint64_t get_success_objs;/**< Objects successfully allocated. 
> */
> + uint64_t get_success_objs_cache;  /**< Objects successfully allocated 
> from cache. */
> + uint64_t get_success_objs_pool;   /**< Objects successfully allocated 
> from pool. */
> + uint64_t get_success_objs_refill; /**< Number of refilling objects from 
> pool to cache. */
> + uint64_t get_fail_bulk;   /**< Failed allocation number. */
> + uint64_t get_fail_objs;   /**< Objects that failed to be 
> allocated. */

What about having instead the following new stats:

Re: [dpdk-dev] [PATCH v6] eal: fix race in ctrl thread creation

2021-04-07 Thread Olivier Matz
Hi Luc,

On Wed, Apr 07, 2021 at 10:42:37AM -0400, Luc Pelletier wrote:
> The creation of control threads uses a pthread barrier for
> synchronization. This patch fixes a race condition where the pthread
> barrier could get destroyed while one of the threads has not yet
> returned from the pthread_barrier_wait function, which could result in
> undefined behaviour.
> 
> Fixes: 3a0d465d4c53 ("eal: fix use-after-free on control thread creation")
> Cc: jianfeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Luc Pelletier 
> ---
> 
> Hi Olivier,
> 
> I've made the changes as you requested. However, I'm using the atomic
> built-ins for reading and writing start_routine; I think they're
> required to prevent any re-reordering.
> 
> Please let me know what you think.

>From [1], it seems that pthread_barrier_wait() is a full memory barrier.
So while not wrong, I think using atomic built-ins it is not needed.

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11



> 
>  lib/librte_eal/common/eal_common_thread.c | 63 +--
>  1 file changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_thread.c 
> b/lib/librte_eal/common/eal_common_thread.c
> index 73a055902..fcb386f77 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -170,25 +170,34 @@ struct rte_thread_ctrl_params {
>   void *(*start_routine)(void *);
>   void *arg;
>   pthread_barrier_t configured;
> + unsigned int refcnt;
>  };
>  
> +static void ctrl_params_free(struct rte_thread_ctrl_params *params)
> +{
> + if (__atomic_sub_fetch(¶ms->refcnt, 1, __ATOMIC_ACQ_REL) == 0) {
> + pthread_barrier_destroy(¶ms->configured);
> + free(params);
> + }
> +}
> +
>  static void *ctrl_thread_init(void *arg)
>  {
> - int ret;
>   struct internal_config *internal_conf =
>   eal_get_internal_configuration();
>   rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
>   struct rte_thread_ctrl_params *params = arg;
> - void *(*start_routine)(void *) = params->start_routine;
> + void *(*start_routine)(void *);
>   void *routine_arg = params->arg;
>  
>   __rte_thread_init(rte_lcore_id(), cpuset);
>  
> - ret = pthread_barrier_wait(¶ms->configured);
> - if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> - pthread_barrier_destroy(¶ms->configured);
> - free(params);
> - }
> + pthread_barrier_wait(¶ms->configured);
> + start_routine = __atomic_load_n(¶ms->start_routine, 
> __ATOMIC_ACQUIRE);
> + ctrl_params_free(params);
> +
> + if (start_routine == NULL)
> + return NULL;
>  
>   return start_routine(routine_arg);
>  }
> @@ -210,14 +219,15 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
> *name,
>  
>   params->start_routine = start_routine;
>   params->arg = arg;
> + params->refcnt = 2;
>  
> - pthread_barrier_init(¶ms->configured, NULL, 2);
> + ret = pthread_barrier_init(¶ms->configured, NULL, 2);
> + if (ret != 0)
> + goto fail_no_barrier;
>  
>   ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
> - if (ret != 0) {
> - free(params);
> - return -ret;
> - }
> + if (ret != 0)
> + goto fail_with_barrier;
>  
>   if (name != NULL) {
>   ret = rte_thread_setname(*thread, name);
> @@ -227,25 +237,22 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
> *name,
>   }
>  
>   ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
> - if (ret)
> - goto fail;
> + if (ret != 0)
> + __atomic_store_n(¶ms->start_routine, NULL, 
> __ATOMIC_RELEASE);
> + pthread_barrier_wait(¶ms->configured);
> + ctrl_params_free(params);
>  
> - ret = pthread_barrier_wait(¶ms->configured);
> - if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> - pthread_barrier_destroy(¶ms->configured);
> - free(params);
> - }
> + if (ret != 0)
> + pthread_join(*thread, NULL);
>  
> - return 0;
> + return -ret;
> +
> +fail_with_barrier:
> + pthread_barrier_destroy(¶ms->configured);
> +
> +fail_no_barrier:
> + free(params);
>  
> -fail:
> - if (PTHREAD_BARRIER_SERIAL_THREAD ==
> - pthread_barrier_wait(¶ms->configured)) {
> - pthread_barrier_destroy(¶ms->configured);
> - free(params);
> - }
> - pthread_cancel(*thread);
> - pthread_join(*thread, NULL);
>   return -ret;
>  }
>  
> -- 
> 2.25.1
> 


Re: [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated

2021-04-08 Thread Olivier Matz
On Thu, Apr 01, 2021 at 11:52:39AM +0200, David Marchand wrote:
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated quite some time ago,

It's not that old, it was done by Lance in commit e8a419d6de4b ("mbuf:
rename outer IP checksum macro") 1 month ago.

> but there was no warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
> 
> Signed-off-by: David Marchand 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags

2021-04-08 Thread Olivier Matz
On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > Tx offload flags are of the application responsibility.
> > Leave the mbuf alone and check for TSO where needed.
> > 
> > Signed-off-by: David Marchand 
> > ---
> 
> The patch looks good, but maybe a better approach would be
> to change the documentation to require the TCP_CKSUM flag
> when TCP_SEG is used, otherwise this flag adjusting needs
> to be replicated every time TCP_SEG is used.
> 
> The above could break existing applications, so perhaps doing
> something like below would be better and backwards compatible?
> Then we can remove those places tweaking the flags completely.

As a first step, I suggest to document that:
- applications must set TCP_CKSUM when setting TCP_SEG
- pmds must suppose that TCP_CKSUM is set when TCP_SEG is set

This is clearer that what we have today, and I think it does not break
anything. This will guide apps in the correct direction, facilitating
an eventual future PMD change.

> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index c17dc95c5..6a0c2cdd9 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -298,7 +298,7 @@ extern "C" {
>   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
>   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
>   */
> -#define PKT_TX_TCP_SEG   (1ULL << 50)
> +#define PKT_TX_TCP_SEG   (1ULL << 50) | PKT_TX_TCP_CKSUM
>  
>  /** TX IEEE1588 packet to timestamp. */
>  #define PKT_TX_IEEE1588_TMST (1ULL << 51)

I'm afraid some applications or drivers use extended bit manipulations
to do the conversion from/to another domain (like hardware descriptors
or application-specific flags). They may expect this constant to be a
uniq flag.


Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags

2021-04-08 Thread Olivier Matz
On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and check for TSO where needed.
> 
> Signed-off-by: David Marchand 

Maybe the problem being solved should be better described in the commit
log. Is it a problem (other than cosmetic) to touch a mbuf in the Tx
function of a driver, where we could expect that the mbuf is owned by
the driver?

The only problem I can think about is in case we transmit a direct mbuf
whose refcnt is increased, but I wonder how much this is really
supported: for instance, several drivers add vlans using
rte_vlan_insert() in their Tx path.


> ---
>  drivers/net/tap/rte_eth_tap.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index c36d4bf76e..285fe395c5 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -562,6 +562,7 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned 
> int l2_len,
>   uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum)
>  {
>   void *l3_hdr = packet + l2_len;
> + uint64_t csum_l4;
>  
>   if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) {
>   struct rte_ipv4_hdr *iph = l3_hdr;
> @@ -571,13 +572,17 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, 
> unsigned int l2_len,
>   cksum = rte_raw_cksum(iph, l3_len);
>   iph->hdr_checksum = (cksum == 0x) ? cksum : ~cksum;
>   }
> - if (ol_flags & PKT_TX_L4_MASK) {
> +
> + csum_l4 = ol_flags & PKT_TX_L4_MASK;
> + if (ol_flags & PKT_TX_TCP_SEG)
> + csum_l4 |= PKT_TX_TCP_CKSUM;
> + if (csum_l4) {
>   void *l4_hdr;
>  
>   l4_hdr = packet + l2_len + l3_len;
> - if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM)
> + if (csum_l4 == PKT_TX_UDP_CKSUM)
>   *l4_cksum = &((struct rte_udp_hdr 
> *)l4_hdr)->dgram_cksum;
> - else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM)
> + else if (csum_l4 == PKT_TX_TCP_CKSUM)
>   *l4_cksum = &((struct rte_tcp_hdr *)l4_hdr)->cksum;
>   else
>   return;
> @@ -648,7 +653,8 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
>   if (txq->csum &&
>   ((mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) ||
>(mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM ||
> -  (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM))) {
> +  (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) ||
> +  (mbuf->ol_flags & PKT_TX_TCP_SEG))) {
>   is_cksum = 1;
>  
>   /* Support only packets with at least layer 4
> @@ -742,9 +748,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   if (tso) {
>   struct rte_gso_ctx *gso_ctx = &txq->gso_ctx;
>  
> - /* TCP segmentation implies TCP checksum offload */
> - mbuf_in->ol_flags |= PKT_TX_TCP_CKSUM;
> -
>   /* gso size is calculated without RTE_ETHER_CRC_LEN */
>   hdrs_len = mbuf_in->l2_len + mbuf_in->l3_len +
>   mbuf_in->l4_len;
> -- 
> 2.23.0
> 


Re: [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path

2021-04-08 Thread Olivier Matz
Hi David,

On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags.
> 
> The vhost example needs a reworking as it was built with the assumption
> that mbuf TSO configuration is set up by the vhost library.
> This is not done in this patch for now so TSO activation is forcibly
> refused.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Signed-off-by: David Marchand 
> ---

Reviewed-by: Olivier Matz 

LGTM, just one little comment below.

<...>

> + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> + m->packet_type = ptype;
> + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> + l4_supported = 1;
> +
> + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> + if (hdr->csum_start <= hdrlen && l4_supported) {
> + m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> + } else {
> + /* Unknown proto or tunnel, do sw cksum. We can assume
> +  * the cksum field is in the first segment since the
> +  * buffers we provided to the host are large enough.
> +  * In case of SCTP, this will be wrong since it's a CRC
> +  * but there's nothing we can do.
> +  */
> + uint16_t csum = 0, off;
> +
> + if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> + rte_pktmbuf_pkt_len(m) - hdr->csum_start,
> + &csum) < 0)
> + return -EINVAL;
> + if (likely(csum != 0x))
> + csum = ~csum;

I was trying to remember the reason for this last test (which is also
present in net/virtio).

If this is a UDP checksum (on top of an unrecognized tunnel), it's
indeed needed to do that, because we don't want to set the checksum to 0
in the packet (which means "no checksum" for UDPv4, or is fordidden for
UDPv6).

If this is something else than UDP, it shouldn't hurt to have a 0x in the
packet instead of 0.

Maybe it deserves a comment here, like:

  /* avoid 0 checksum for UDP, shouldn't hurt for other protocols */

What do you think?


Re: [dpdk-dev] [PATCH v3 2/4] mbuf: add packet type for UDP-ESP tunnel packets

2021-04-08 Thread Olivier Matz
On Thu, Apr 08, 2021 at 01:47:18PM +0530, Tejasree Kondoj wrote:
> Adding new mbuf packet type for UDP encapsulated
> ESP packets.
> 
> Signed-off-by: Tejasree Kondoj 
> ---
>  doc/guides/rel_notes/release_21_05.rst |  5 +
>  lib/librte_mbuf/rte_mbuf_ptype.h   | 21 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_21_05.rst 
> b/doc/guides/rel_notes/release_21_05.rst
> index 5565c7637c..c9e9e2ec22 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -55,6 +55,11 @@ New Features
>   Also, make sure to start the actual text at the margin.
>   ===
>  
> +* **Added new packet type for UDP-ESP packets in mbuf.**
> +
> +  Added new packet type ``RTE_PTYPE_TUNNEL_ESP_IN_UDP`` which can be
> +  used to identify UDP encapsulated ESP packets.
> +
>  * **Enhanced ethdev representor syntax.**
>  
>* Introduced representor type of VF, SF and PF.
> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h 
> b/lib/librte_mbuf/rte_mbuf_ptype.h
> index 17a2dd3576..bf92ce0c1a 100644
> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
> @@ -491,6 +491,27 @@ extern "C" {
>   * | 'destination port'=6635>
>   */
>  #define RTE_PTYPE_TUNNEL_MPLS_IN_UDP  0xd000
> +/**
> + * ESP-in-UDP tunneling packet type (RFC 3948).
> + *
> + * Packet format:
> + * <'ether type'=0x0800
> + * | 'version'=4, 'protocol'=17
> + * | 'destination port'=4500>
> + * or,
> + * <'ether type'=0x86DD
> + * | 'version'=6, 'next header'=17
> + * | 'destination port'=4500>
> + * or,
> + * <'ether type'=0x0800
> + * | 'version'=4, 'protocol'=17
> + * | 'source port'=4500>
> + * or,
> + * <'ether type'=0x86DD
> + * | 'version'=6, 'next header'=17
> + * | 'source port'=4500>
> + */
> +#define RTE_PTYPE_TUNNEL_ESP_IN_UDP   0xe000
>  /**
>   * Mask of tunneling packet types.
>   */

We arrive at the end of the values in packet type tunnel types,
and there is another pending patch that needs another tunnel type.

As there is already a RTE_PTYPE_TUNNEL_ESP, what would you think about
trying to reuse it, and differentiate IP/ESP from IP/UDP/ESP by using
the L4 layer type (unknown vs udp)? Or maybe add RTE_PTYPE_L4_NONE.

It is sensible, because it can be considered as an API change for
current users of RTE_PTYPE_TUNNEL_ESP. I don't really know how this
type is used by applications.

I think it is time to start thinking about how the packet_type
mbuf API can evolve to solve this issue.

By the way, the update of *rte_get_ptype_tunnel_name() is missing.


Re: [dpdk-dev] [PATCH v8 3/4] net: work around s_addr macro on Windows

2021-04-08 Thread Olivier Matz
On Thu, Apr 08, 2021 at 01:22:48AM +0300, Dmitry Kozlyuk wrote:
> Windows Sockets headers contain `#define s_addr S_un.S_addr`, which
> conflicts with definition of `s_addr` field of `struct rte_ether_hdr`.
> Prieviously `s_addr` was undefined in , which had been
> breaking access to `s_addr` field of `struct in_addr`, so some DPDK
> and Windows headers could not be included in one file.
> 
> Renaming of `struct rte_ether_hdr` is planned:
> https://mails.dpdk.org/archives/dev/2021-March/201444.html
> 
> Temporarily disable `s_addr` macro around `struct rte_ether_hdr`
> definition to avoid conflict. Place source MAC address in both `s_addr`
> and `S_un.S_addr` fields, so that access works either directly or
> through the macro as defined in Windows headers.
> 
> Signed-off-by: Dmitry Kozlyuk 
> Acked-by: Ranjit Menon 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v8 4/4] net: provide IP-related API on any OS

2021-04-08 Thread Olivier Matz
Hi Dmitry,

On Thu, Apr 08, 2021 at 01:22:49AM +0300, Dmitry Kozlyuk wrote:
> Users of  relied on it to provide IP-related defines,
> like IPPROTO_* constants, but still had to include POSIX headers
> for inet_pton() and other standard IP-related facilities.
> 
> Extend  so that it is a single header to gain access
> to IP-related facilities on any OS. Use it to replace POSIX includes
> in components enabled on Windows. Move missing constants from Windows
> networking shim to OS shim header and include it where needed.
> 
> Remove Windows networking shim that is no longer needed.
> 
> Signed-off-by: Dmitry Kozlyuk 
> ---
>  drivers/net/i40e/i40e_fdir.c |  1 +
>  drivers/net/mlx5/mlx5.h  |  1 -
>  drivers/net/mlx5/mlx5_flow.c |  4 +--
>  drivers/net/mlx5/mlx5_flow.h |  3 +-
>  drivers/net/mlx5/mlx5_mac.c  |  1 -
>  examples/cmdline/commands.c  |  5 ---
>  examples/cmdline/parse_obj_list.c|  2 --
>  lib/librte_cmdline/cmdline.c |  1 -
>  lib/librte_cmdline/cmdline_parse.c   |  2 --
>  lib/librte_cmdline/cmdline_parse_etheraddr.c |  6 
>  lib/librte_cmdline/cmdline_parse_ipaddr.c|  6 
>  lib/librte_cmdline/cmdline_parse_ipaddr.h|  2 +-
>  lib/librte_eal/windows/include/arpa/inet.h   | 30 
>  lib/librte_eal/windows/include/netinet/in.h  | 38 
>  lib/librte_eal/windows/include/netinet/ip.h  | 10 --
>  lib/librte_eal/windows/include/rte_os_shim.h |  8 +
>  lib/librte_eal/windows/include/sys/socket.h  | 24 -
>  lib/librte_ethdev/rte_ethdev.c   | 12 +++
>  lib/librte_ethdev/rte_ethdev_core.h  |  1 -
>  lib/librte_net/rte_ip.h  |  7 
>  lib/librte_net/rte_net.c |  1 +
>  21 files changed, 24 insertions(+), 141 deletions(-)
>  delete mode 100644 lib/librte_eal/windows/include/arpa/inet.h
>  delete mode 100644 lib/librte_eal/windows/include/netinet/in.h
>  delete mode 100644 lib/librte_eal/windows/include/netinet/ip.h
>  delete mode 100644 lib/librte_eal/windows/include/sys/socket.h

I see it has already been discussed for posix functions like close() or
strdup(), so I won't reopen the door too long ;)

Since DPDK is a network-oriented project, it provides network defines or
structure, prefixed with rte_. This API is on some aspects more complete
than the one provided in libc (for instance, more protocol headers are
available) . So, to me, it would make sense to define RTE_IPPROTO_* and
replace usages of IPPROTO_*, and avoid inclusions of network libc
headers in DPDK code.

This can be done later, if there is a consensus.

> diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
> index c572d003cb..e7361bf520 100644
> --- a/drivers/net/i40e/i40e_fdir.c
> +++ b/drivers/net/i40e/i40e_fdir.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  

If I understand the logic, rte_ip.h provides OS-specific IP-related
stuff (like IPPROTO_*), and rte_os_shim.h provides the POSIX definitions
that are missing after including rte_ip.h.

Would it make sense to include rte_os_shim.h from rte_ip.h, so that
including rte_ip.h is always sufficient? Or is it because we want to
avoid implicit inclusion of rte_os_shim.h?

Thanks,
Olivier


Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags

2021-04-08 Thread Olivier Matz
On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > Tx offload flags are of the application responsibility.
> > > > Leave the mbuf alone and check for TSO where needed.
> > > > 
> > > > Signed-off-by: David Marchand 
> > > > ---
> > > 
> > > The patch looks good, but maybe a better approach would be
> > > to change the documentation to require the TCP_CKSUM flag
> > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > to be replicated every time TCP_SEG is used.
> > > 
> > > The above could break existing applications, so perhaps doing
> > > something like below would be better and backwards compatible?
> > > Then we can remove those places tweaking the flags completely.
> > 
> > As a first step, I suggest to document that:
> > - applications must set TCP_CKSUM when setting TCP_SEG
> 
> That's what I suggest above.
> 
> > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> 
> But that keeps the problem of implying the TCP_CKSUM flag in
> various places.

Yes. What I propose is just a first step: better document what is the
current expected behavior, before doing something else.

> > This is clearer that what we have today, and I think it does not break
> > anything. This will guide apps in the correct direction, facilitating
> > an eventual future PMD change.
> > 
> > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h 
> > > b/lib/librte_mbuf/rte_mbuf_core.h
> > > index c17dc95c5..6a0c2cdd9 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -298,7 +298,7 @@ extern "C" {
> > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, 
> > > tso_segsz
> > >   */
> > > -#define PKT_TX_TCP_SEG   (1ULL << 50)
> > > +#define PKT_TX_TCP_SEG   (1ULL << 50) | PKT_TX_TCP_CKSUM
> > >  
> > >  /** TX IEEE1588 packet to timestamp. */
> > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > 
> > I'm afraid some applications or drivers use extended bit manipulations
> > to do the conversion from/to another domain (like hardware descriptors
> > or application-specific flags). They may expect this constant to be a
> > uniq flag.
> 
> Interesting, do you have an example? Because each flag still has an
> separate meaning.

Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.

I have in mind operations that are done with tables or vector
instructions inside the drivers, but this is mainly done for Rx, not Tx.
You can look at Tx functions like mlx5_set_cksum_table() or
nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
about.


Re: [dpdk-dev] [PATCH v4 1/2] ethdev: add new ext hdr for gtp psc

2021-04-08 Thread Olivier Matz
Hi Raslan,

On Sun, Apr 04, 2021 at 10:45:51AM +0300, Raslan Darawsheh wrote:
> Define new rte header for gtp PDU session container
> based on RFC 38415-g30

Do you have a link to this RFC?

> Signed-off-by: Raslan Darawsheh 
> ---
>  lib/librte_net/rte_gtp.h | 34 ++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/lib/librte_net/rte_gtp.h b/lib/librte_net/rte_gtp.h
> index 6a6f9b238d..088b0b5a53 100644
> --- a/lib/librte_net/rte_gtp.h
> +++ b/lib/librte_net/rte_gtp.h
> @@ -61,6 +61,40 @@ struct rte_gtp_hdr_ext_word {
>   uint8_t next_ext; /**< Next Extension Header Type. */
>  }  __rte_packed;
>  
> +/**
> + * Optional extension for GTP with next_ext set to 0x85
> + * defined based on RFC 38415-g30.
> + */
> +__extension__
> +struct rte_gtp_psc_hdr {
> + uint8_t ext_hdr_len; /**< PDU ext hdr len in multiples of 4 bytes */
> + uint8_t type:4; /**< PDU type */
> + uint8_t qmp:1; /**< Qos Monitoring Packet */
> + union {
> + struct {
> + uint8_t snp:1; /**< Sequence number presence */
> + uint8_t spare_dl1:2; /**< spare down link bits */
> + };
> + struct {
> + uint8_t dl_delay_ind:1; /**< dl delay result presence */
> + uint8_t ul_delay_ind:1; /**< ul delay result presence */
> + uint8_t snp_ul1:1; /**< Sequence number presence ul */
> + };
> + };
> + union {
> + struct {
> + uint8_t ppp:1; /**< Paging policy presence */
> + uint8_t rqi:1; /**< Reflective Qos Indicator */
> + };
> + struct {
> + uint8_t n_delay_ind:1; /**< N3/N9 delay result presence 
> */
> + uint8_t spare_ul2:1; /**< spare up link bits */
> + };
> + };
> + uint8_t qfi:6; /**< Qos Flow Identifier */
> + uint8_t data[0]; /**< data feilds */
> +} __rte_packed;

With this header, sizeof(rte_gtp_psc_hdr) = 5, is it really expected?

It would help to see the specification to have a better idea of how to
split, but a possible solution is to do something like this:

struct rte_gtp_psc_generic_hdr {
uint8_t ext_hdr_len;
uint8_t type:4
uint8_t qmp:1;
uint8_t pad:3;
};

struct rte_gtp_psc__hdr {
uint8_t ext_hdr_len;
uint8_t type:4
uint8_t qmp:1;
uint8_t uint8_t snp:1;
uint8_t spare_dl1:2;
...
};

...

struct rte_gtp_psc_hdr {
union {
struct rte_gtp_psc_generic_hdr generic;
struct rte_gtp_psc__hdr ;
struct rte_gtp_psc__hdr ;
};
};

Also, you need to take care about endianness.


Regards,
Olivier


Re: [dpdk-dev] [PATCH v4 1/2] ethdev: add new ext hdr for gtp psc

2021-04-08 Thread Olivier Matz
On Thu, Apr 08, 2021 at 12:37:27PM +, Raslan Darawsheh wrote:
> Hi Olivier,
> 
> > -Original Message-
> > From: Olivier Matz 
> > Sent: Thursday, April 8, 2021 3:30 PM
> > To: Raslan Darawsheh 
> > Cc: dev@dpdk.org; ferruh.yi...@intel.com; Ori Kam ;
> > andrew.rybche...@oktetlabs.ru; ivan.ma...@oktetlabs.ru;
> > ying.a.w...@intel.com; Slava Ovsiienko ; Shiri
> > Kuzin 
> > Subject: Re: [PATCH v4 1/2] ethdev: add new ext hdr for gtp psc
> > 
> > Hi Raslan,
> > 
> > On Sun, Apr 04, 2021 at 10:45:51AM +0300, Raslan Darawsheh wrote:
> > > Define new rte header for gtp PDU session container
> > > based on RFC 38415-g30
> > 
> > Do you have a link to this RFC?
> Yes sure,
> https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip
> 
> > 
> > > Signed-off-by: Raslan Darawsheh 
> > > ---
> > >  lib/librte_net/rte_gtp.h | 34 ++
> > >  1 file changed, 34 insertions(+)
> > >
> > > diff --git a/lib/librte_net/rte_gtp.h b/lib/librte_net/rte_gtp.h
> > > index 6a6f9b238d..088b0b5a53 100644
> > > --- a/lib/librte_net/rte_gtp.h
> > > +++ b/lib/librte_net/rte_gtp.h
> > > @@ -61,6 +61,40 @@ struct rte_gtp_hdr_ext_word {
> > >   uint8_t next_ext; /**< Next Extension Header Type. */
> > >  }  __rte_packed;
> > >
> > > +/**
> > > + * Optional extension for GTP with next_ext set to 0x85
> > > + * defined based on RFC 38415-g30.
> > > + */
> > > +__extension__
> > > +struct rte_gtp_psc_hdr {
> > > + uint8_t ext_hdr_len; /**< PDU ext hdr len in multiples of 4 bytes */
> > > + uint8_t type:4; /**< PDU type */
> > > + uint8_t qmp:1; /**< Qos Monitoring Packet */
> > > + union {
> > > + struct {
> > > + uint8_t snp:1; /**< Sequence number presence */
> > > + uint8_t spare_dl1:2; /**< spare down link bits */
> > > + };
> > > + struct {
> > > + uint8_t dl_delay_ind:1; /**< dl delay result presence
> > */
> > > + uint8_t ul_delay_ind:1; /**< ul delay result presence
> > */
> > > + uint8_t snp_ul1:1; /**< Sequence number presence
> > ul */
> > > + };
> > > + };
> > > + union {
> > > + struct {
> > > + uint8_t ppp:1; /**< Paging policy presence */
> > > + uint8_t rqi:1; /**< Reflective Qos Indicator */
> > > + };
> > > + struct {
> > > + uint8_t n_delay_ind:1; /**< N3/N9 delay result
> > presence */
> > > + uint8_t spare_ul2:1; /**< spare up link bits */
> > > + };
> > > + };
> > > + uint8_t qfi:6; /**< Qos Flow Identifier */
> > > + uint8_t data[0]; /**< data feilds */
> > > +} __rte_packed;
> > 
> > With this header, sizeof(rte_gtp_psc_hdr) = 5, is it really expected?
> The data[0] is variable length data, I guess I should send another version to 
> mention that in the comment maybe.
> The header size according to the spec should be 4 octets aligned in general.

What I wanted to highlight is that using union of structs containing
bitfields does not work as you expect: each union is at least 1 byte.
This results in a structure that does not match the expected header.

> > 
> > It would help to see the specification to have a better idea of how to
> Sure, I've just posted the link above, please let me know of any suggestion 
> that you have, and I'll be glad to do accordingly.
> 
> > split, but a possible solution is to do something like this:
> > 
> > struct rte_gtp_psc_generic_hdr {
> > uint8_t ext_hdr_len;
> > uint8_t type:4
> > uint8_t qmp:1;
> > uint8_t pad:3;
> > };
> > 
> > struct rte_gtp_psc__hdr {
> > uint8_t ext_hdr_len;
> > uint8_t type:4
> > uint8_t qmp:1;
> > uint8_t uint8_t snp:1;
> > uint8_t spare_dl1:2;
> > ...
> > };
> > 
> > ...
> > 
> > struct rte_gtp_psc_hdr {
> > union {
> > struct rte_gtp_psc_generic_hdr generic;
> > struct rte_gtp_psc__hdr ;
> > struct rte_gtp_psc__hdr ;
> > };
> > };

>From what I see in the documation, I think this approach should
work. From afar, I suggest:

struct rte_gtp_psc_generic_hdr {
#if big endian
uint8_t type:4
uint8_t qmp:1;
uint8_t pad:3;
#else
uint8_t pad:3;
uint8_t qmp:1;
uint8_t type:4
#endif
};

struct rte_gtp_psc_type0_hdr {
#if big endian
uint8_t type:4
uint8_t qmp:1;
uint8_t snp:1;
uint8_t spare:2;

uint8_t ppp:1;
...
#else
uint8_t pad:3;
uint8_t qmp:1;
uint8_t type:4
uint8_t spare:2;
uint8_t snp:1;

...
#endif
uint8_t data[0]; /* for variable fields */
};

struct rte_gtp_psc_type1_hdr {
... same for fixed fields of type1


uint8_t data[0]; /* for variable fields */
};

I don't see in the spec where is the reference to ext_hdr_len.

Regards,
Olivier


Re: [dpdk-dev] [PATCH 1/2] eal: fix race in ctrl thread creation

2021-04-08 Thread Olivier Matz
On Wed, Apr 07, 2021 at 04:16:04PM -0400, Luc Pelletier wrote:
> The creation of control threads uses a pthread barrier for
> synchronization. This patch fixes a race condition where the pthread
> barrier could get destroyed while one of the threads has not yet
> returned from the pthread_barrier_wait function, which could result in
> undefined behaviour.
> 
> Fixes: 3a0d465 ("eal: fix use-after-free on control thread creation")
> Cc: jianfeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Luc Pelletier 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH 2/2] eal: fix hang in ctrl thread creation error logic

2021-04-08 Thread Olivier Matz
Hi Luc,

On Wed, Apr 07, 2021 at 04:16:06PM -0400, Luc Pelletier wrote:
> The affinity of a control thread is set after it has been launched. If
> setting the affinity fails, pthread_cancel is called followed by a call
> to pthread_join, which can hang forever if the thread's start routine
> doesn't call a pthread cancellation point.
> 
> This patch modifies the logic so that the control thread exits
> gracefully if the affinity cannot be set successfully and removes the
> call to pthread_cancel.
> 
> Fixes: 6383d26 ("eal: set name when creating a control thread")
> Cc: olivier.m...@6wind.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Luc Pelletier 

Thank you for these 2 fixes. Note the the title of your patches do not
contain the version (should have been v8?). I don't know how critical
it is for commiters.

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags

2021-04-09 Thread Olivier Matz
On Thu, Apr 08, 2021 at 09:58:35AM -0300, Flavio Leitner wrote:
> On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> > On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > > Tx offload flags are of the application responsibility.
> > > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > > 
> > > > > > Signed-off-by: David Marchand 
> > > > > > ---
> > > > > 
> > > > > The patch looks good, but maybe a better approach would be
> > > > > to change the documentation to require the TCP_CKSUM flag
> > > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > > to be replicated every time TCP_SEG is used.
> > > > > 
> > > > > The above could break existing applications, so perhaps doing
> > > > > something like below would be better and backwards compatible?
> > > > > Then we can remove those places tweaking the flags completely.
> > > > 
> > > > As a first step, I suggest to document that:
> > > > - applications must set TCP_CKSUM when setting TCP_SEG
> > > 
> > > That's what I suggest above.
> > > 
> > > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > > 
> > > But that keeps the problem of implying the TCP_CKSUM flag in
> > > various places.
> > 
> > Yes. What I propose is just a first step: better document what is the
> > current expected behavior, before doing something else.
> > 
> > > > This is clearer that what we have today, and I think it does not break
> > > > anything. This will guide apps in the correct direction, facilitating
> > > > an eventual future PMD change.
> > > > 
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h 
> > > > > b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > index c17dc95c5..6a0c2cdd9 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > @@ -298,7 +298,7 @@ extern "C" {
> > > > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > > > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, 
> > > > > tso_segsz
> > > > >   */
> > > > > -#define PKT_TX_TCP_SEG   (1ULL << 50)
> > > > > +#define PKT_TX_TCP_SEG   (1ULL << 50) | PKT_TX_TCP_CKSUM
> > > > >  
> > > > >  /** TX IEEE1588 packet to timestamp. */
> > > > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > > > 
> > > > I'm afraid some applications or drivers use extended bit manipulations
> > > > to do the conversion from/to another domain (like hardware descriptors
> > > > or application-specific flags). They may expect this constant to be a
> > > > uniq flag.
> > > 
> > > Interesting, do you have an example? Because each flag still has an
> > > separate meaning.
> > 
> > Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.
> > 
> > I have in mind operations that are done with tables or vector
> > instructions inside the drivers, but this is mainly done for Rx, not Tx.
> > You can look at Tx functions like mlx5_set_cksum_table() or
> > nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
> > enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
> > about.
> 
> I see your point. Going back to improving the documentation as a
> first step, what would be the next steps? Are we going to wait few
> releases and then remove the flag tweaking code assuming that PMDs
> and apps are ok?

After this documentation step, in few releases, we could relax the
constraint on PMD: applications will be expected to set TCP_CKSUM when
TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
TCP_SEG is set. The documentation will be updated again.

This plan can be described in the deprecation notice, and later in the
release note.

How does it sound?


Re: [dpdk-dev] [PATCH v8 4/4] net: provide IP-related API on any OS

2021-04-09 Thread Olivier Matz
On Thu, Apr 08, 2021 at 10:51:34PM +0300, Dmitry Kozlyuk wrote:
> 2021-04-08 13:45 (UTC+0200), Olivier Matz:
> [...]
> > > diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
> > > index c572d003cb..e7361bf520 100644
> > > --- a/drivers/net/i40e/i40e_fdir.c
> > > +++ b/drivers/net/i40e/i40e_fdir.c
> > > @@ -22,6 +22,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > 
> > If I understand the logic, rte_ip.h provides OS-specific IP-related
> > stuff (like IPPROTO_*), and rte_os_shim.h provides the POSIX definitions
> > that are missing after including rte_ip.h.
> > 
> > Would it make sense to include rte_os_shim.h from rte_ip.h, so that
> > including rte_ip.h is always sufficient? Or is it because we want to
> > avoid implicit inclusion of rte_os_shim.h?
> 
> Yes, currently rte_os_shim.h is not exposed at all.
> It it ever will, this reason still applies.

Ok, thank you for the clarification.

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v4 3/5] kvargs: add get by key function

2021-04-11 Thread Olivier Matz
Hi Xueming,

On Sat, Apr 10, 2021 at 02:23:55PM +, Xueming Li wrote:
> Adds a new function to get value of a specific key from kvargs list.
> 
> Signed-off-by: Xueming Li 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/librte_kvargs/rte_kvargs.c | 20 
>  lib/librte_kvargs/rte_kvargs.h | 21 +
>  lib/librte_kvargs/version.map  |  3 +++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/lib/librte_kvargs/rte_kvargs.c b/lib/librte_kvargs/rte_kvargs.c
> index ffae8914cf..40e7670ab3 100644
> --- a/lib/librte_kvargs/rte_kvargs.c
> +++ b/lib/librte_kvargs/rte_kvargs.c
> @@ -203,6 +203,26 @@ rte_kvargs_free(struct rte_kvargs *kvlist)
>   free(kvlist);
>  }
>  
> +/* Lookup a value in an rte_kvargs list by its key. */
> +const char *
> +rte_kvargs_get(const struct rte_kvargs *kvlist, const char *key)
> +{
> + unsigned int i;
> +
> + if (!kvlist)
> + return NULL;
> + for (i = 0; i < kvlist->count; ++i) {
> + /* Allows key to be NULL. */
> + if (!key && !kvlist->pairs[i].key)
> + return kvlist->pairs[i].value;

Is it possible that kvlist->pairs[i].key == NULL? In which case?


Thanks,
Olivier


Re: [PATCH] mempool: test performance with constant n

2022-01-24 Thread Olivier Matz
Hi Morten,

Thank you for enhancing the mempool test. Please see some comments
below.

On Wed, Jan 19, 2022 at 12:37:32PM +0100, Morten Brørup wrote:
> "What gets measured gets done."
> 
> This patch adds mempool performance tests where the number of objects to
> put and get is constant at compile time, which may significantly improve
> the performance of these functions. [*]
> 
> Also, it is ensured that the array holding the object used for testing
> is cache line aligned, for maximum performance.
> 
> And finally, the following entries are added to the list of tests:
> - Number of kept objects: 512
> - Number of objects to get and to put: The number of pointers fitting
>   into a cache line, i.e. 8 or 16
> 
> [*] Some example performance test (with cache) results:
> 
> get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
> get_bulk=4 put_bulk=4 keep=128 constant_n=true  rate_persec=622159462
> 
> get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
> get_bulk=8 put_bulk=8 keep=128 constant_n=true  rate_persec=917582643
> 
> get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
> get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836
> 
> Signed-off-by: Morten Brørup 
> ---
>  app/test/test_mempool_perf.c | 120 +--
>  1 file changed, 74 insertions(+), 46 deletions(-)
> 
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index 87ad251367..ffefe934d5 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2022 SmartShare Systems
>   */
>  
>  #include 
> @@ -55,19 +56,24 @@
>   *
>   *  - Bulk get from 1 to 32
>   *  - Bulk put from 1 to 32
> + *  - Bulk get and put from 1 to 32, compile time constant
>   *
>   *- Number of kept objects (*n_keep*)
>   *
>   *  - 32
>   *  - 128
> + *  - 512
>   */
>  
>  #define N 65536
>  #define TIME_S 5
>  #define MEMPOOL_ELT_SIZE 2048
> -#define MAX_KEEP 128
> +#define MAX_KEEP 512
>  #define MEMPOOL_SIZE 
> ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
>  
> +/* Number of pointers fitting into one cache line. */
> +#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE/sizeof(uintptr_t))
> +

nit: I think it's better to follow the coding rules and add a space around the
'/', even if I can see the line right above does not follow this convention

>  #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__)
>  #define RET_ERR() do {   
> \
>   LOG_ERR();  \
> @@ -80,16 +86,16 @@
>   } while (0)
>  
>  static int use_external_cache;
> -static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
> +static unsigned int external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
>  
>  static uint32_t synchro;
>  
>  /* number of objects in one bulk operation (get or put) */
> -static unsigned n_get_bulk;
> -static unsigned n_put_bulk;
> +static int n_get_bulk;
> +static int n_put_bulk;
>  
>  /* number of objects retrieved from mempool before putting them back */
> -static unsigned n_keep;
> +static int n_keep;
>  
>  /* number of enqueues / dequeues */
>  struct mempool_test_stats {
> @@ -104,20 +110,43 @@ static struct mempool_test_stats stats[RTE_MAX_LCORE];
>   */
>  static void
>  my_obj_init(struct rte_mempool *mp, __rte_unused void *arg,
> - void *obj, unsigned i)
> + void *obj, unsigned int i)
>  {
>   uint32_t *objnum = obj;
>   memset(obj, 0, mp->elt_size);
>   *objnum = i;
>  }
>  
> +#define test_loop(x_keep, x_get_bulk, x_put_bulk)   \
> + for (i = 0; likely(i < (N/x_keep)); i++) {\
> + /* get x_keep objects by bulk of x_get_bulk */  \
> + for (idx = 0; idx < x_keep; idx += x_get_bulk) {\
> + ret = rte_mempool_generic_get(mp,   \
> + &obj_table[idx],\
> + x_get_bulk,  \
> + cache);  \
> + if (unlikely(ret < 0)) {\
> + rte_mempool_dump(stdout, mp);   \
> + GOTO_ERR(ret, out);  \
> + }  \
> + }  \
> + \
> + /* put the objects back by bulk of x_put_bulk */\
> + for (idx = 0; idx < x_keep; idx += x_put_bulk) {\
> + rte_mempool_generic_put(mp,  \
> + 

[PATCH] mempool: test performance with constant n

2022-01-24 Thread Olivier Matz
From: Morten Brørup 

"What gets measured gets done."

This patch adds mempool performance tests where the number of objects to
put and get is constant at compile time, which may significantly improve
the performance of these functions. [*]

Also, it is ensured that the array holding the object used for testing
is cache line aligned, for maximum performance.

And finally, the following entries are added to the list of tests:
- Number of kept objects: 512
- Number of objects to get and to put: The number of pointers fitting
  into a cache line, i.e. 8 or 16

[*] Some example performance test (with cache) results:

get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
get_bulk=4 put_bulk=4 keep=128 constant_n=true  rate_persec=622159462

get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
get_bulk=8 put_bulk=8 keep=128 constant_n=true  rate_persec=917582643

get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836

Signed-off-by: Morten Brørup 
Signed-off-by: Olivier Matz 
---

Hi Morten,

Here is the updated patch.

I launched the mempool_perf on my desktop machine, but I don't reproduce the 
numbers: constant or
non-constant give almost the same rate on my machine (it's even worst with 
constants). I tested with
your initial patch and with this one. Can you please try this patch, and/or 
give some details about
your test environment? Here is what I get:

with your patch:
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=false rate_persec=152620236
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=true rate_persec=144716595
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=false rate_persec=306996838
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=true rate_persec=287375359
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=false rate_persec=977626723
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=true rate_persec=963103944

with this patch:
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=156460646
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=142173798
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=312410111
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=281699942
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=983315247
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=950350638


v2:
- use a flag instead of a negative value to enable tests with
  compile-time constant
- use a static inline function instead of a macro
- remove some "noise" (do not change variable type when not required)


Thanks,
Olivier

 app/test/test_mempool_perf.c | 110 ---
 1 file changed, 77 insertions(+), 33 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 87ad251367..ce7c6241ab 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2022 SmartShare Systems
  */

 #include 
@@ -55,19 +56,24 @@
  *
  *  - Bulk get from 1 to 32
  *  - Bulk put from 1 to 32
+ *  - Bulk get and put from 1 to 32, compile time constant
  *
  *- Number of kept objects (*n_keep*)
  *
  *  - 32
  *  - 128
+ *  - 512
  */

 #define N 65536
 #define TIME_S 5
 #define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 128
+#define MAX_KEEP 512
 #define MEMPOOL_SIZE 
((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)

+/* Number of pointers fitting into one cache line. */
+#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
+
 #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__)
 #define RET_ERR() do { \
LOG_ERR();  \
@@ -91,6 +97,9 @@ static unsigned n_put_bulk;
 /* number of objects retrieved from mempool before putting them back */
 static unsigned n_keep;

+/* true if we want to test with constant n_get_bulk and n_put_bulk */
+static int use_constant_values;
+
 /* number of enqueues / dequeues */
 struct mempool_test_stats {
uint64_t enq_count;
@@ -111,11 +120,43 @@ my_obj_init(struct rte_mempool *mp, __rte_unused void 
*arg,
*objnum = i;
 }

+static __rte_always_inline int
+test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache,
+ unsigned int x_keep, unsigned int x_get_bulk, 

Re: [PATCH] mempool: test performance with constant n

2022-01-24 Thread Olivier Matz
On Mon, Jan 24, 2022 at 03:53:09PM +0100, Olivier Matz wrote:
> From: Morten Brørup 
> 
> "What gets measured gets done."
> 
> This patch adds mempool performance tests where the number of objects to
> put and get is constant at compile time, which may significantly improve
> the performance of these functions. [*]
> 
> Also, it is ensured that the array holding the object used for testing
> is cache line aligned, for maximum performance.
> 
> And finally, the following entries are added to the list of tests:
> - Number of kept objects: 512
> - Number of objects to get and to put: The number of pointers fitting
>   into a cache line, i.e. 8 or 16
> 
> [*] Some example performance test (with cache) results:
> 
> get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
> get_bulk=4 put_bulk=4 keep=128 constant_n=true  rate_persec=622159462
> 
> get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
> get_bulk=8 put_bulk=8 keep=128 constant_n=true  rate_persec=917582643
> 
> get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
> get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836
> 
> Signed-off-by: Morten Brørup 
> Signed-off-by: Olivier Matz 

Sorry, wrong title, "v2" is missing. Please ignore, I'm resending it.


[PATCH v2] mempool: test performance with constant n

2022-01-24 Thread Olivier Matz
From: Morten Brørup 

"What gets measured gets done."

This patch adds mempool performance tests where the number of objects to
put and get is constant at compile time, which may significantly improve
the performance of these functions. [*]

Also, it is ensured that the array holding the object used for testing
is cache line aligned, for maximum performance.

And finally, the following entries are added to the list of tests:
- Number of kept objects: 512
- Number of objects to get and to put: The number of pointers fitting
  into a cache line, i.e. 8 or 16

[*] Some example performance test (with cache) results:

get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
get_bulk=4 put_bulk=4 keep=128 constant_n=true  rate_persec=622159462

get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
get_bulk=8 put_bulk=8 keep=128 constant_n=true  rate_persec=917582643

get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836

Signed-off-by: Morten Brørup 
Signed-off-by: Olivier Matz 
---

Hi Morten,

Here is the updated patch.

I launched the mempool_perf on my desktop machine, but I don't reproduce the 
numbers: constant or
non-constant give almost the same rate on my machine (it's even worst with 
constants). I tested with
your initial patch and with this one. Can you please try this patch, and/or 
give some details about
your test environment? Here is what I get:

with your patch:
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=false rate_persec=152620236
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=true rate_persec=144716595
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=false rate_persec=306996838
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=true rate_persec=287375359
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=false rate_persec=977626723
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=true rate_persec=963103944

with this patch:
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=156460646
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=142173798
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=312410111
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=281699942
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=983315247
mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=950350638


v2:
- use a flag instead of a negative value to enable tests with
  compile-time constant
- use a static inline function instead of a macro
- remove some "noise" (do not change variable type when not required)


Thanks,
Olivier


 app/test/test_mempool_perf.c | 110 ---
 1 file changed, 77 insertions(+), 33 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 87ad251367..ce7c6241ab 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2022 SmartShare Systems
  */

 #include 
@@ -55,19 +56,24 @@
  *
  *  - Bulk get from 1 to 32
  *  - Bulk put from 1 to 32
+ *  - Bulk get and put from 1 to 32, compile time constant
  *
  *- Number of kept objects (*n_keep*)
  *
  *  - 32
  *  - 128
+ *  - 512
  */

 #define N 65536
 #define TIME_S 5
 #define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 128
+#define MAX_KEEP 512
 #define MEMPOOL_SIZE 
((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)

+/* Number of pointers fitting into one cache line. */
+#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
+
 #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__)
 #define RET_ERR() do { \
LOG_ERR();  \
@@ -91,6 +97,9 @@ static unsigned n_put_bulk;
 /* number of objects retrieved from mempool before putting them back */
 static unsigned n_keep;

+/* true if we want to test with constant n_get_bulk and n_put_bulk */
+static int use_constant_values;
+
 /* number of enqueues / dequeues */
 struct mempool_test_stats {
uint64_t enq_count;
@@ -111,11 +120,43 @@ my_obj_init(struct rte_mempool *mp, __rte_unused void 
*arg,
*objnum = i;
 }

+static __rte_always_inline int
+test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache,
+ unsigned int x_keep, unsigned int x_get_bulk, 

Re: [PATCH] mempool: fix get objects from mempool with cache

2022-01-24 Thread Olivier Matz
Hi Morten,

Few comments below.

On Fri, Jan 14, 2022 at 05:36:50PM +0100, Morten Brørup wrote:
> A flush threshold for the mempool cache was introduced in DPDK version
> 1.3, but rte_mempool_do_generic_get() was not completely updated back
> then, and some inefficiencies were introduced.
> 
> This patch fixes the following in rte_mempool_do_generic_get():
> 
> 1. The code that initially screens the cache request was not updated
> with the change in DPDK version 1.3.
> The initial screening compared the request length to the cache size,
> which was correct before, but became irrelevant with the introduction of
> the flush threshold. E.g. the cache can hold up to flushthresh objects,
> which is more than its size, so some requests were not served from the
> cache, even though they could be.
> The initial screening has now been corrected to match the initial
> screening in rte_mempool_do_generic_put(), which verifies that a cache
> is present, and that the length of the request does not overflow the
> memory allocated for the cache.
> 
> 2. The function is a helper for rte_mempool_generic_get(), so it must
> behave according to the description of that function.
> Specifically, objects must first be returned from the cache,
> subsequently from the ring.
> After the change in DPDK version 1.3, this was not the behavior when
> the request was partially satisfied from the cache; instead, the objects
> from the ring were returned ahead of the objects from the cache. This is
> bad for CPUs with a small L1 cache, which benefit from having the hot
> objects first in the returned array. (This is also the reason why
> the function returns the objects in reverse order.)
> Now, all code paths first return objects from the cache, subsequently
> from the ring.
> 
> 3. If the cache could not be backfilled, the function would attempt
> to get all the requested objects from the ring (instead of only the
> number of requested objects minus the objects available in the ring),
> and the function would fail if that failed.
> Now, the first part of the request is always satisfied from the cache,
> and if the subsequent backfilling of the cache from the ring fails, only
> the remaining requested objects are retrieved from the ring.

This is the only point I'd consider to be a fix. The problem, from the
user perspective, is that a get() can fail despite there are enough
objects in cache + common pool.

To be honnest, I feel a bit uncomfortable to have such a list of
problems solved in one commit, even if I understand that they are part
of the same code rework.

Ideally, this fix should be a separate commit. What do you think of
having this simple patch for this fix, and then do the
optimizations/rework in another commit?

  --- a/lib/mempool/rte_mempool.h
  +++ b/lib/mempool/rte_mempool.h
  @@ -1484,7 +1484,22 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, 
void **obj_table,
   * the ring directly. If that fails, we are truly out 
of
   * buffers.
   */
  -   goto ring_dequeue;
  +   req = n - cache->len;
  +   ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, 
req);
  +   if (ret < 0) {
  +   RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
  +   RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
  +   return ret;
  +   }
  +   obj_table += req;
  +   len = cache->len;
  +   while (len > 0)
  +   *obj_table++ = cache_objs[--len];
  +   cache->len = 0;
  +   RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
  +   RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
  +
  +   return 0;
  }
   
  cache->len += req;

The title of this commit could then be more precise to describe
the solved issue.

> 4. The code flow for satisfying the request from the cache was slightly
> inefficient:
> The likely code path where the objects are simply served from the cache
> was treated as unlikely. Now it is treated as likely.
> And in the code path where the cache was backfilled first, numbers were
> added and subtracted from the cache length; now this code path simply
> sets the cache length to its final value.
> 
> 5. Some comments were not correct anymore.
> The comments have been updated.
> Most importanly, the description of the succesful return value was
> inaccurate. Success only returns 0, not >= 0.
> 
> Signed-off-by: Morten Brørup 
> ---
>  lib/mempool/rte_mempool.h | 81 ---
>  1 file changed, 59 insertions(+), 22 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c1527..88f1b8b7ab 100644
> --- a/lib/mempool/rte_mempool.h

Re: [PATCH v3] mempool: fix put objects to mempool with cache

2022-01-24 Thread Olivier Matz
Hi Morten,

On Wed, Jan 19, 2022 at 04:03:01PM +0100, Morten Brørup wrote:
> mempool: fix put objects to mempool with cache
> 
> This patch optimizes the rte_mempool_do_generic_put() caching algorithm,
> and fixes a bug in it.

I think we should avoid grouping fixes and optimizations in one
patch. The main reason is that fixes aims to be backported, which
is not the case of optimizations.

> The existing algorithm was:
>  1. Add the objects to the cache
>  2. Anything greater than the cache size (if it crosses the cache flush
> threshold) is flushed to the ring.
> 
> Please note that the description in the source code said that it kept
> "cache min value" objects after flushing, but the function actually kept
> "size" objects, which is reflected in the above description.
> 
> Now, the algorithm is:
>  1. If the objects cannot be added to the cache without crossing the
> flush threshold, flush the cache to the ring.
>  2. Add the objects to the cache.
> 
> This patch changes these details:
> 
> 1. Bug: The cache was still full after flushing.
> In the opposite direction, i.e. when getting objects from the cache, the
> cache is refilled to full level when it crosses the low watermark (which
> happens to be zero).
> Similarly, the cache should be flushed to empty level when it crosses
> the high watermark (which happens to be 1.5 x the size of the cache).
> The existing flushing behaviour was suboptimal for real applications,
> because crossing the low or high watermark typically happens when the
> application is in a state where the number of put/get events are out of
> balance, e.g. when absorbing a burst of packets into a QoS queue
> (getting more mbufs from the mempool), or when a burst of packets is
> trickling out from the QoS queue (putting the mbufs back into the
> mempool).
> NB: When the application is in a state where put/get events are in
> balance, the cache should remain within its low and high watermarks, and
> the algorithms for refilling/flushing the cache should not come into
> play.
> Now, the mempool cache is completely flushed when crossing the flush
> threshold, so only the newly put (hot) objects remain in the mempool
> cache afterwards.

I'm not sure we should call this behavior a bug. What is the impact
on applications, from a user perspective? Can it break a use-case, or
have an important performance impact?


> 2. Minor bug: The flush threshold comparison has been corrected; it must
> be "len > flushthresh", not "len >= flushthresh".
> Reasoning: Consider a flush multiplier of 1 instead of 1.5; the cache
> would be flushed already when reaching size elements, not when exceeding
> size elements.
> Now, flushing is triggered when the flush threshold is exceeded, not
> when reached.

Same here, we should ask ourselves what is the impact before calling
it a bug.


> 3. Optimization: The most recent (hot) objects are flushed, leaving the
> oldest (cold) objects in the mempool cache.
> This is bad for CPUs with a small L1 cache, because when they get
> objects from the mempool after the mempool cache has been flushed, they
> get cold objects instead of hot objects.
> Now, the existing (cold) objects in the mempool cache are flushed before
> the new (hot) objects are added the to the mempool cache.
> 
> 4. Optimization: Using the x86 variant of rte_memcpy() is inefficient
> here, where n is relatively small and unknown at compile time.
> Now, it has been replaced by an alternative copying method, optimized
> for the fact that most Ethernet PMDs operate in bursts of 4 or 8 mbufs
> or multiples thereof.

For these optimizations, do you have an idea of what is the performance
gain? Ideally (I understand it is not always possible), each optimization
is done separately, and its impact is measured.


> v2 changes:
> 
> - Not adding the new objects to the mempool cache before flushing it
> also allows the memory allocated for the mempool cache to be reduced
> from 3 x to 2 x RTE_MEMPOOL_CACHE_MAX_SIZE.
> However, such this change would break the ABI, so it was removed in v2.
> 
> - The mempool cache should be cache line aligned for the benefit of the
> copying method, which on some CPU architectures performs worse on data
> crossing a cache boundary.
> However, such this change would break the ABI, so it was removed in v2;
> and yet another alternative copying method replaced the rte_memcpy().

OK, we may want to keep this in mind for the next abi breakage.


> 
> v3 changes:
> 
> - Actually remove my modifications of the rte_mempool_cache structure.
> 
> Signed-off-by: Morten Brørup 
> ---
>  lib/mempool/rte_mempool.h | 51 +--
>  1 file changed, 38 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c1527..7b364cfc74 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1334,6 +1334,7 @@ static __rte_always_inline void
>  rte_mempool_do_generic_put(struct rte_mem

Re: [RFC] mempool: modify flush threshold

2022-01-24 Thread Olivier Matz
On Mon, Jan 10, 2022 at 09:40:48AM +, Bruce Richardson wrote:
> On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> > > Sent: Friday, 7 January 2022 16.12
> > > 
> > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > > Hi mempool maintainers and DPDK team.
> > > >
> > > > Does anyone know the reason or history why
> > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> > > counterintuitive.
> > > >
> > > > The mempool cache flush threshold was introduced in DPDK version 1.3;
> > > it was not in DPDK version 1.2. The copyright notice for rte_mempool.c
> > > says year 2012.
> > > >
> > > >
> > > > Here is my analysis:
> > > >
> > > > With the multiplier of 1.5, a mempool cache is allowed to be filled
> > > up to 50 % above than its target size before its excess entries are
> > > flushed to the mempool (thereby reducing the cache length to the target
> > > size).
> > > >
> > > > In the opposite direction, a mempool cache is allowed to be drained
> > > completely, i.e. up to 100 % below its target size.
> > > >
> > > > My instinct tells me that it would be more natural to let a mempool
> > > cache go the same amount above and below its target size, i.e. using a
> > > flush multiplier of 2 instead of 1.5.
> > > >
> > > > Also, the cache should be allowed to fill up to and including the
> > > flush threshold, so it is flushed when the threshold is exceeded,
> > > instead of when it is reached.
> > > >
> > > > Here is a simplified example:
> > > >
> > > > Imagine a cache target size of 32, corresponding to a typical packet
> > > burst. With a flush threshold of 2 (and len > threshold instead of len
> > > >= threshold), the cache could hold 1 +/-1 packet bursts. With the
> > > current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> > > really providing a lot of elasticity.
> > > >
> > > Hi Morten,
> > > 
> > > Interesting to see this being looked at again. The original idea of
> > > adding
> > > in some extra room above the requested value was to avoid the worst-
> > > case
> > > scenario of a pool oscillating between full and empty repeatedly due to
> > > the
> > > addition/removal of perhaps a single packet. As for why 1.5 was chosen
> > > as
> > > the value, I don't recall any particular reason for it myself. The main
> > > objective was to have a separate flush and size values so that we could
> > > go
> > > a bit above full, and when flushing, not emptying the entire cache down
> > > to
> > > zero.
> > 
> > Thanks for providing the historical background for this feature, Bruce.
> > 
> > > 
> > > In terms of the behavioural points you make above, I wonder if symmetry
> > > is
> > > actually necessary or desirable in this case. After all, the ideal case
> > > is
> > > probably to keep the mempool neither full nor empty, so that both
> > > allocations or frees can be done without having to go to the underlying
> > > shared data structure. To accomodate this, the mempool will only flush
> > > when
> > > the number of elements is greater than size * 1.5, and then it only
> > > flushes
> > > elements down to size, ensuring that allocations can still take place.
> > > On allocation, new buffers are taken when we don't have enough in the
> > > cache
> > > to fullfil the request, and then the cache is filled up to size, not to
> > > the
> > > flush threshold.
> > 
> > I agree with the ideal case.
> > 
> > However, it looks like the addition of the flush threshold also changed the 
> > "size" parameter to effectively become "desired length" instead. This 
> > interpretation is also supported by the flush algorithm, which doesn't 
> > flush below the "size", but to the "size". So based on interpretation, I 
> > was wondering why it is not symmetrical around the "desired length", but 
> > allowed to go 100 % below and only 50 % above.
> > 
> > > 
> > > Now, for the scenario you describe - where the mempool cache size is
> > > set to
> > > be the same as the burst size, this scheme probably does break down, in
> > > that we don't really have any burst elasticity. However, I would query
> > > if
> > > that is a configuration that is used, since to the user it should
> > > appear
> > > correctly to provide no elasticity. Looking at testpmd, and our example
> > > apps, the standard there is a burst size of 32 and mempool cache of
> > > ~256.
> > > In OVS code, netdev-dpdk.c seems to initialize the mempool  with cache
> > > size
> > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In all
> > > these
> > > cases, I think the 1.5 threshold should work just fine for us.
> > 
> > My example was for demonstration only, and hopefully not being used by any 
> > applications.
> > 
> > The simplified example was intended to demonstrate the theoretical effect 
> > of the unbalance in using the 1.5 threshold. It will be similar with a 
> > cache size of 256 objects: You will be allowed to go 8 b

Re: [PATCH] mempool: fix get objects from mempool with cache

2022-01-24 Thread Olivier Matz
On Mon, Jan 24, 2022 at 04:38:58PM +0100, Olivier Matz wrote:
> On Fri, Jan 14, 2022 at 05:36:50PM +0100, Morten Brørup wrote:
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -1443,6 +1443,10 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
> >  
> >  /**
> >   * @internal Get several objects from the mempool; used internally.
> > + *
> > + * If cache is enabled, objects are returned from the cache in Last In 
> > First
> > + * Out (LIFO) order for the benefit of CPUs with small L1 cache.
> > + *
> >   * @param mp
> >   *   A pointer to the mempool structure.
> >   * @param obj_table
> > @@ -1452,7 +1456,7 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
> >   * @param cache
> >   *   A pointer to a mempool cache structure. May be NULL if not needed.
> >   * @return
> > - *   - >=0: Success; number of objects supplied.
> > + *   - 0: Success; got n objects.
> >   *   - <0: Error; code of ring dequeue function.
> >   */
> >  static __rte_always_inline int
> 
> I think that part should be in a separate commit too. This is a
> documentation fix, which is easily backportable (and should be
> backported) (Fixes: af75078fece3 ("first public release")).

I see that the same change is also part of this commit:
https://patches.dpdk.org/project/dpdk/patch/20211223100741.21292-1-chenzhiheng0...@gmail.com/

I think it is better to have a doc fix commit, and remove this chunk
from this patch.


Re: [PATCH v3] mempool: fix the description of some function return values

2022-01-24 Thread Olivier Matz
Hi Zhiheng,

Thank you for your patch proposal.

On Thu, Dec 23, 2021 at 10:07:41AM +, Zhiheng Chen wrote:
> In rte_mempool_ring.c, the committer uses the symbol ENOBUFS to
> describe the return value of function common_ring_sc_dequeue,
> but in rte_mempool.h, the symbol ENOENT is used to describe
> the return value of function rte_mempool_get. If the user of
> dpdk uses the symbol ENOENT as the judgment condition of
> the return value, it may cause some abnormal phenomena
> in their own programs, such as when the mempool space is exhausted.

The issue I see with this approach is that currently, there
is no standard error code in mempool drivers dequeue:

  bucket: -ENOBUFS
  cn10k: -ENOENT
  cn9k: -ENOENT
  dpaa: -1, -ENOBUFS
  dpaa2: -1, -ENOENT, -ENOBUFS
  octeontx: -ENOMEM
  ring: -ENOBUFS
  stack: -ENOBUFS

After your patch, the drivers do not match the documentation.

I agree it would be better to return the same code for the same error,
whatever the driver is used. But I think we should keep the possibility
for a driver to return another code. For instance, it could be an
hardware error in case of hardware mempool driver.

I see 2 possibilities:

1/ simplest one: relax documentation and do not talk about -ENOENT or
   -ENOBUFS, just say negative value is an error

2/ fix driver and doc

   Mempool drivers should be modified first, knowing that changing
   them is an ABI modification (which I think is acceptable, because the
   error code varies depending on driver). Then, this patch could be applied.

For reference, note that the documentation was probably right initially,
but the behavior changed in commit cfa7c9e6fc1f ("ring: make bulk and
burst return values consistent"), returning -ENOBUFS instead of -ENOENT
on dequeue error.


> v2:
> * Update the descriptions of underlying functions.
> 
> v3:
> * Correct the description that the return value cannot be greater than 0
> * Update the description of the dequeue function prototype
> 
> Signed-off-by: Zhiheng Chen 
> ---
>  lib/mempool/rte_mempool.h | 34 ++
>  1 file changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c1527..cae81d8a32 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -447,6 +447,16 @@ typedef int (*rte_mempool_enqueue_t)(struct rte_mempool 
> *mp,
>  
>  /**
>   * Dequeue an object from the external pool.
> + *
> + * @param mp
> + *   Pointer to the memory pool.
> + * @param obj_table
> + *   Pointer to a table of void * pointers (objects).
> + * @param n
> + *   Number of objects to get.
> + * @return
> + *   - 0: Success; got n objects.
> + *   - -ENOBUFS: Not enough entries in the mempool; no object is retrieved.

Also, we should have, in addition to -ENOBUFS:

 - <0: Another driver-specific error code (-errno)

This comment applies for the other functions below.

>   */
>  typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
>   void **obj_table, unsigned int n);
> @@ -738,7 +748,7 @@ rte_mempool_ops_alloc(struct rte_mempool *mp);
>   *   Number of objects to get.
>   * @return
>   *   - 0: Success; got n objects.
> - *   - <0: Error; code of dequeue function.
> + *   - -ENOBUFS: Not enough entries in the mempool; no object is retrieved.
>   */
>  static inline int
>  rte_mempool_ops_dequeue_bulk(struct rte_mempool *mp,
> @@ -1452,8 +1462,8 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
>   * @param cache
>   *   A pointer to a mempool cache structure. May be NULL if not needed.
>   * @return
> - *   - >=0: Success; number of objects supplied.
> - *   - <0: Error; code of ring dequeue function.
> + *   - 0: Success; got n objects.
> + *   - -ENOBUFS: Not enough entries in the mempool; no object is retrieved.
>   */
>  static __rte_always_inline int
>  rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
> @@ -1521,7 +1531,7 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void 
> **obj_table,
>   * Get several objects from the mempool.
>   *
>   * If cache is enabled, objects will be retrieved first from cache,
> - * subsequently from the common pool. Note that it can return -ENOENT when
> + * subsequently from the common pool. Note that it can return -ENOBUFS when
>   * the local cache and common pool are empty, even if cache from other
>   * lcores are full.
>   *
> @@ -1534,8 +1544,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void 
> **obj_table,
>   * @param cache
>   *   A pointer to a mempool cache structure. May be NULL if not needed.
>   * @return
> - *   - 0: Success; objects taken.
> - *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
> + *   - 0: Success; got n objects.
> + *   - -ENOBUFS: Not enough entries in the mempool; no object is retrieved.
>   */
>  static __rte_always_inline int
>  rte_mempool_generic_get(struct rte_mempool *mp, void **obj_table,
> @@ -1557,7 +1567,7 @@ rte_mempool_gener

Re: [PATCH v2] mempool: test performance with constant n

2022-01-25 Thread Olivier Matz
Hi Morten,

On Mon, Jan 24, 2022 at 06:20:49PM +0100, Morten Brørup wrote:
> > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > Sent: Monday, 24 January 2022 16.00
> > 
> > From: Morten Brørup 
> > 
> > "What gets measured gets done."
> > 
> > This patch adds mempool performance tests where the number of objects
> > to
> > put and get is constant at compile time, which may significantly
> > improve
> > the performance of these functions. [*]
> > 
> > Also, it is ensured that the array holding the object used for testing
> > is cache line aligned, for maximum performance.
> > 
> > And finally, the following entries are added to the list of tests:
> > - Number of kept objects: 512
> > - Number of objects to get and to put: The number of pointers fitting
> >   into a cache line, i.e. 8 or 16
> > 
> > [*] Some example performance test (with cache) results:
> > 
> > get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
> > get_bulk=4 put_bulk=4 keep=128 constant_n=true  rate_persec=622159462
> > 
> > get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
> > get_bulk=8 put_bulk=8 keep=128 constant_n=true  rate_persec=917582643
> > 
> > get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
> > get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836
> > 
> > Signed-off-by: Morten Brørup 
> > Signed-off-by: Olivier Matz 
> > ---
> > 
> > Hi Morten,
> > 
> > Here is the updated patch.
> > 
> > I launched the mempool_perf on my desktop machine, but I don't
> > reproduce the numbers: constant or
> > non-constant give almost the same rate on my machine (it's even worst
> > with constants). I tested with
> > your initial patch and with this one. Can you please try this patch,
> > and/or give some details about
> > your test environment?
> 
> Test environment:
> VMware virtual machine running Ubuntu 20.04.3 LTS.
> 4 CPUs and 8 GB RAM assigned.
> The physical CPU is a Xeon E5-2620 v4 with plenty of RAM.
> Although other VMs are running on the same server, it is not very 
> oversubscribed.
> 
> Hugepages established with:
> usertools/dpdk-hugepages.py -p 2M --setup 2G
> 
> Build steps:
> meson -Dplatform=generic work
> cd work
> ninja
> 
> > Here is what I get:
> > 
> > with your patch:
> > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128
> > constant_n=false rate_persec=152620236
> > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128
> > constant_n=true rate_persec=144716595
> > mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128
> > constant_n=false rate_persec=306996838
> > mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128
> > constant_n=true rate_persec=287375359
> > mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8
> > n_keep=128 constant_n=false rate_persec=977626723
> > mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8
> > n_keep=128 constant_n=true rate_persec=963103944
> 
> My test results were with an experimental, optimized version of the mempool 
> library, which showed a larger difference. (This was the reason for updating 
> the perf test - to measure the effects of optimizing the mempool library.)
> 
> However, testing the patch (version 1) with a brand new git checkout still 
> shows a huge difference, e.g.:
> 
> mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
> constant_n=false rate_persec=501009612
> mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
> constant_n=true rate_persec=799014912
> 
> You should also see a significant difference when testing.
> 
> My rate_persec without constant n is 3 x yours (501 M vs. 156 M ops/s), so 
> the baseline seems wrong! I don't think our server rig is so much faster than 
> your desktop machine. Perhaps mempool debug, telemetry or other background 
> noise is polluting your test.

Sorry, I just realized that I was indeed using a "debugoptimzed" build.
It's much better in release mode.

mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=1425473536
mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=2159660236
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=0 rate_persec=2796342476
mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 
constant_n=1 rate_persec=4351577292
mempool_autotest cac

Re: [PATCH] mbuf: delete dynamic fields copy in hdr copy

2022-01-26 Thread Olivier Matz
Hi,

On Tue, Jan 11, 2022 at 05:45:49PM +0100, Thomas Monjalon wrote:
> 14/12/2021 08:56, Gaoxiang Liu:
> > Because dynamic fields are registered by the DPDK application,
> > so it is up to the application to decide whether to copy the value of
> > dynamic fields.
> > So delete dynamic fields copy in __rte_pktmbuf_copy_hdr.
> > It's more flexible for the DPDK application,
> > and is useful for improving performance.
> 
> Yes, removing operations will improve the performance,
> but it looks wrong.
> This is copying all dynamic fields, not matter which one is registered.
> We cannot ask the application to manage dynamic fields copy,
> especially if the copy is done inside a library.

+1

Dynamic fields/flags can be registered by applications, libraries,
drivers, ...

There is no entity that is aware of which field/flag has to be copied,
so the only possibility is to copy all of them.


Re: [PATCH] mempool: fix rte primary program coredump

2022-01-27 Thread Olivier Matz
Hi Tianli,

On Wed, Nov 10, 2021 at 11:57:19PM +0800, Tianli Lai wrote:
> the primary program(such as ofp app) run first, then run the secondary
> program(such as dpdk-pdump), the primary program would receive signal
> SIGSEGV. the function stack as follow:
> 
> aived signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffee60e700 (LWP 112613)]
> 0x75f2cc0b in bucket_stack_pop (stack=0x0001) at
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95
> 95  if (stack->top == 0)
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-196.el7.x86_64 libatomic-4.8.5-16.el7.x86_64
> libconfig-1.4.9-5.el7.x86_64 libgcc-4.8.5-16.el7.x86_64
> libpcap-1.5.3-12.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64
> openssl-libs-1.0.2k-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
>  #0  0x75f2cc0b in bucket_stack_pop (stack=0x0001) at 
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95
>  #1  0x75f2e5dc in bucket_dequeue_orphans 
> (bd=0x2209e5fac0,obj_table=0x220b083710, n_orphans=251) at 
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:190
>  #2  0x75f30192 in bucket_dequeue 
> (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at 
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:288
>  #3  0x75f47e18 in rte_mempool_ops_dequeue_bulk 
> (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:739
>  #4  0x75f4819d in __mempool_generic_get (cache=0x220b083700, n=1, 
> obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1443
>  #5  rte_mempool_generic_get (cache=0x220b083700, n=1, 
> obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1506
>  #6  rte_mempool_get_bulk (n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1539
>  #7  rte_mempool_get (obj_p=0x7fffee5deb18, mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1565
>  #8  rte_mbuf_raw_alloc (mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:551
>  #9  0x75f483a4 in rte_pktmbuf_alloc (mp=0x220b07d5c0) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:804
>  #10 0x75f4c9d9 in pdump_pktmbuf_copy (m=0x220746ad80, 
> mp=0x220b07d5c0) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:99
>  #11 0x75f4e42e in pdump_copy (pkts=0x7fffee5dfdf0, nb_pkts=1, 
> user_params=0x776d7cc0 ) at 
> /ofp/dpdk/lib/librte_pdump/rte_pdump.c:151
>  #12 0x75f4eadd in pdump_rx (port=0, qidx=0, pkts=0x7fffee5dfdf0, 
> nb_pkts=1, max_pkts=16, user_params=0x776d7cc0 ) at 
> /ofp/dpdk/lib/librte_pdump/rte_pdump.c:172
>  #13 0x75d0e9e8 in rte_eth_rx_burst (port_id=0, queue_id=0, 
> rx_pkts=0x7fffee5dfdf0, nb_pkts=16) at 
> /ofp/dpdk/x86_64-native-linuxapp-gcc/usr/local/include/dpdk/rte_ethdev.h:4396
>  #14 0x75d114c3 in recv_pkt_dpdk (pktio_entry=0x22005436c0, index=0, 
> pkt_table=0x7fffee5dfdf0, num=16) at odp_packet_dpdk.c:1081
>  #15 0x75d2f931 in odp_pktin_recv (queue=...,packets=0x7fffee5dfdf0, 
> num=16) at ../linux-generic/odp_packet_io.c:1896
>  #16 0x0040a344 in rx_burst (pktin=...) at app_main.c:223
>  #17 0x0040aca4 in run_server_single (arg=0x7fffe2b0) at 
> app_main.c:417
>  #18 0x77bd6883 in run_thread (arg=0x7fffe3b8) at threads.c:67
>  #19 0x753c8e25 in start_thread () from /lib64/libpthread.so.0
>  #20 0x7433e34d in clone () from /lib64/libc.so.6.c:67
> 
> The program crash down reason is:
> 
> In primary program and secondary program , the global array 
> rte_mempool_ops.ops[]:
> primary namesecondary name
>  [0]:   "bucket""ring_mp_mc"
>  [1]:   "dpaa"  "ring_sp_sc"
>  [2]:   "dpaa2" "ring_mp_sc"
>  [3]:   "octeontx_fpavf""ring_sp_mc"
>  [4]:   "octeontx2_npa" "octeontx2_npa"
>  [5]:   "ring_mp_mc""bucket"
>  [6]:   "ring_sp_sc""stack"
>  [7]:   "ring_mp_sc""if_stack"
>  [8]:   "ring_sp_mc""dpaa"
>  [9]:   "stack" "dpaa2"
>  [10]:  "if_stack"  "octeontx_fpavf"
>  [11]:  NULLNULL
> 
>  this array in primary program is different with secondary program.
>  so when secondary program call rte_pktmbuf_pool_create_by_ops() with
>  mempool name “ring_mp_mc”, but the primary program use "bucket" type
>  to alloc rte_mbuf.
> 
>  so sort this array both primary program and secondary program when init
>  memzone.
> 
> Signed-off-by: Tianli Lai 

I think it is the same problem than the one described here:
http://inbox.dpdk.org/dev/1583114253-15345-1-git-send-email-xiangxia.m@gmail.com/#r

To summarize what is said in the thread, sorting ops look dangerous because it
changes the index during the lifetim

Re: [PATCH] test/mbuf: fix mbuf data content check

2022-02-03 Thread Olivier Matz
On Thu, Feb 03, 2022 at 10:39:12AM +0100, David Marchand wrote:
> When allocating a mbuf, its data content is most of the time zero'd but
> nothing ensures this. This is especially wrong when building with
> RTE_MALLOC_DEBUG, where data is poisoned to 0x6b on free.
> 
> This test reserves MBUF_TEST_DATA_LEN2 bytes in the mbuf data segment,
> and sets this data to 0xcc.
> Calling strlen(), the test may try to read more than MBUF_TEST_DATA_LEN2
> which has been noticed when memory had been poisoned.
> 
> The mbuf data content is checked right after, so we can simply remove
> strlen().
> 
> Fixes: 7b295dceea07 ("test/mbuf: add unit test cases")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH] net: add support for UDP segmentation case

2021-10-14 Thread Olivier Matz
Hi Radu,

On Fri, Sep 03, 2021 at 11:59:42AM +0100, Radu Nicolau wrote:
> [PATCH] net: add support for UDP segmentation case

What about this title instead?

net: exclude IP len from phdr cksum if offloading UDP frag

> Add support to the ipv4/ipv6 pseudo-header function when TSO is enabled
> in the UDP case, eg  PKT_TX_UDP_SEG is set in the mbuf ol_flags

I think it would be clearer to say "UDP fragmentation" instead of
"TSO is enabled in the UDP case".

> Signed-off-by: Declan Doherty 
> Signed-off-by: Radu Nicolau 
> ---
>  lib/net/rte_ip.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index 05948b69b7..c916ec1b09 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -333,7 +333,7 @@ rte_ipv4_phdr_cksum(const struct rte_ipv4_hdr *ipv4_hdr, 
> uint64_t ol_flags)
>   psd_hdr.dst_addr = ipv4_hdr->dst_addr;
>   psd_hdr.zero = 0;
>   psd_hdr.proto = ipv4_hdr->next_proto_id;
> - if (ol_flags & PKT_TX_TCP_SEG) {
> + if (ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)) {
>   psd_hdr.len = 0;
>   } else {
>   l3_len = rte_be_to_cpu_16(ipv4_hdr->total_length);

Can you also update the API comment?

> @@ -474,7 +474,7 @@ rte_ipv6_phdr_cksum(const struct rte_ipv6_hdr *ipv6_hdr, 
> uint64_t ol_flags)
>   } psd_hdr;
>  
>   psd_hdr.proto = (uint32_t)(ipv6_hdr->proto << 24);
> - if (ol_flags & PKT_TX_TCP_SEG) {
> + if (ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)) {
>   psd_hdr.len = 0;
>   } else {
>   psd_hdr.len = ipv6_hdr->payload_len;
> -- 
> 2.25.1
> 

No objection for this patch, but I think we should consider removing
this ol_flags parameter from the pseudo header checksum calculation
functions in the future, because it is a bit confusing.

Historically, this was done in commit 4199fdea60c3 ("mbuf: generic
support for TCP segmentation offload") because we were expecting that
this pseudo-header checksum (required by Intel hw when doing checksum or
TSO) will be done in the same way for many drivers (i.e. without the IP
length for TSO). I don't know if it is the case.

Or maybe a 'use_0_length' parameter would make more sense than
'ol_flags'.

Thanks,
Olivier


Re: [dpdk-dev] [PATCH v2] net/virtio: handle Tx checksums correctly for tunnel packets

2021-10-15 Thread Olivier Matz
On Thu, Oct 14, 2021 at 07:12:29AM +, Xia, Chenbo wrote:
> > -Original Message-
> > From: Ivan Malov 
> > Sent: Friday, September 17, 2021 2:50 AM
> > To: dev@dpdk.org
> > Cc: Maxime Coquelin ; sta...@dpdk.org; Andrew
> > Rybchenko ; Xia, Chenbo 
> > ;
> > Yuanhan Liu ; Olivier Matz
> > 
> > Subject: [PATCH v2] net/virtio: handle Tx checksums correctly for tunnel
> > packets
> > 
> > Tx prepare method calls rte_net_intel_cksum_prepare(), which
> > handles tunnel packets correctly, but Tx burst path does not
> > take tunnel presence into account when computing the offsets.
> > 
> > Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Ivan Malov 
> > Reviewed-by: Andrew Rybchenko 
> > ---
> >  drivers/net/virtio/virtqueue.h | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> > index 03957b2bd0..b83ff32efb 100644
> > --- a/drivers/net/virtio/virtqueue.h
> > +++ b/drivers/net/virtio/virtqueue.h
> > @@ -620,19 +620,21 @@ static inline void
> >  virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
> >  {
> > uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
> > +   uint16_t o_l23_len = (cookie->ol_flags & PKT_TX_TUNNEL_MASK) ?
> > +cookie->outer_l2_len + cookie->outer_l3_len : 0;
> > 
> > if (cookie->ol_flags & PKT_TX_TCP_SEG)
> > csum_l4 |= PKT_TX_TCP_CKSUM;
> > 
> > switch (csum_l4) {
> > case PKT_TX_UDP_CKSUM:
> > -   hdr->csum_start = cookie->l2_len + cookie->l3_len;
> > +   hdr->csum_start = o_l23_len + cookie->l2_len + cookie->l3_len;
> > hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
> > hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> > break;
> > 
> > case PKT_TX_TCP_CKSUM:
> > -   hdr->csum_start = cookie->l2_len + cookie->l3_len;
> > +   hdr->csum_start = o_l23_len + cookie->l2_len + cookie->l3_len;
> > hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
> > hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> > break;
> > @@ -650,7 +652,8 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr, 
> > struct
> > rte_mbuf *cookie)
> > VIRTIO_NET_HDR_GSO_TCPV6 :
> > VIRTIO_NET_HDR_GSO_TCPV4;
> > hdr->gso_size = cookie->tso_segsz;
> > -   hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
> > +   hdr->hdr_len = o_l23_len + cookie->l2_len + cookie->l3_len +
> > +  cookie->l4_len;
> > } else {
> > ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
> > ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> > --
> > 2.20.1
> 
> Reviewed-by: Chenbo Xia 
> 

I have one comment to mention that from application perspective, it has
to take care that the driver does not support outer tunnel offload (this
matches the advertised capabilities). For instance, in case of a vxlan
tunnel, if the outer checksum needs to be calculated, it has to be done
by the application. In short, the application can ask to offload the
inner part if no offload is required on the outer part.

Also, since grep "PKT_TX_TUNNEL" in driver/net/ixgbe gives nothing, it
seems the ixgbe driver does not support the same offload request than
described in this patch:
  (m->ol_flags & PKT_TX_TUNNEL_MASK) == PKT_TX_TUNNEL_X
  m->outer_l2_len = outer l2 length
  m->outer_l3_len = outer l3 length
  m->l2_len = outer l4 length + tunnel len + inner l2 len
  m->l3_len = inner l3 len
  m->l4_len = inner l4 len

An alternative for doing the same (that would work with ixgbe and
current virtio) is to give:
  (m->ol_flags & PKT_TX_TUNNEL_MASK) == 0
  m->l2_len = outer lengths + tunnel len + inner l2 len
  m->l3_len = inner l3 len
  m->l4_len = inner l4 len

I think a capability may be missing to differentiate which drivers
support which mode. Or, all drivers could be fixed to support both modes
(and this would make this patch valid).

Thanks,
Olivier


Re: [dpdk-dev] [PATCH v3 4/5] lib/kvargs: remove unneeded header includes

2021-10-15 Thread Olivier Matz
Hi Sean,

On Thu, Oct 07, 2021 at 10:25:56AM +, Sean Morrissey wrote:
> These header includes have been flagged by the iwyu_tool
> and removed.
> 
> Signed-off-by: Sean Morrissey 
> ---
>  lib/kvargs/rte_kvargs.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/lib/kvargs/rte_kvargs.c b/lib/kvargs/rte_kvargs.c
> index 38e9d5c1ca..4cce8e953b 100644
> --- a/lib/kvargs/rte_kvargs.c
> +++ b/lib/kvargs/rte_kvargs.c
> @@ -7,7 +7,6 @@
>  #include 
>  #include 
>  
> -#include 
>  #include 
>  
>  #include "rte_kvargs.h"
> -- 
> 2.25.1
> 

Did you check that it still compiles for the Windows platform
after this change?

+CC Dmitry


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Olivier Matz
On Fri, Oct 15, 2021 at 12:33:31PM +0300, Andrew Rybchenko wrote:
> On 10/15/21 12:18 PM, Dmitry Kozlyuk wrote:
> >> -Original Message-
> >> From: Andrew Rybchenko 
> >> [...]
> >>> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> >>> index 51c0ba2931..2204f140b3 100644
> >>> --- a/lib/mempool/rte_mempool.c
> >>> +++ b/lib/mempool/rte_mempool.c
> >>> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool *mp,
> >>> char *vaddr,
> >>>
> >>>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
> >>>   mp->nb_mem_chunks++;
> >>> + if (iova == RTE_BAD_IOVA)
> >>> + mp->flags |= MEMPOOL_F_NON_IO;
> >>
> >> As I understand rte_mempool_populate_iova() may be called few times for
> >> one mempool. The flag must be set if all invocations are done with
> >> RTE_BAD_IOVA. So, it should be set by default and just removed when iova
> >> != RTE_BAD_IOVA happens.
> > 
> > I don't agree at all. If any object of the pool is unsuitable for IO,
> > the pool cannot be considered suitable for IO. So if there's a single
> > invocation with RTE_BAD_IOVA, the flag must be set forever.
> 
> If so, some objects may be used for IO, some cannot be used.
> What should happen if an application allocates an object
> which is suitable for IO and try to use it this way?

If the application can predict if the allocated object is usable for IO
before allocating it, I would be surprised to have it used for IO. I agree
with Dmitry here.


Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Olivier Matz
On Fri, Oct 15, 2021 at 09:58:49AM +, Dmitry Kozlyuk wrote:
> 
> 
> > -Original Message-
> > From: Olivier Matz 
> > Sent: 15 октября 2021 г. 12:43
> > To: Andrew Rybchenko 
> > Cc: Dmitry Kozlyuk ; dev@dpdk.org; Matan Azrad
> > 
> > Subject: Re: [PATCH v4 2/4] mempool: add non-IO flag
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Fri, Oct 15, 2021 at 12:33:31PM +0300, Andrew Rybchenko wrote:
> > > On 10/15/21 12:18 PM, Dmitry Kozlyuk wrote:
> > > >> -Original Message-
> > > >> From: Andrew Rybchenko  [...]
> > > >>> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> > > >>> index 51c0ba2931..2204f140b3 100644
> > > >>> --- a/lib/mempool/rte_mempool.c
> > > >>> +++ b/lib/mempool/rte_mempool.c
> > > >>> @@ -371,6 +371,8 @@ rte_mempool_populate_iova(struct rte_mempool
> > > >>> *mp, char *vaddr,
> > > >>>
> > > >>>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
> > > >>>   mp->nb_mem_chunks++;
> > > >>> + if (iova == RTE_BAD_IOVA)
> > > >>> + mp->flags |= MEMPOOL_F_NON_IO;
> > > >>
> > > >> As I understand rte_mempool_populate_iova() may be called few times
> > > >> for one mempool. The flag must be set if all invocations are done
> > > >> with RTE_BAD_IOVA. So, it should be set by default and just removed
> > > >> when iova != RTE_BAD_IOVA happens.
> > > >
> > > > I don't agree at all. If any object of the pool is unsuitable for
> > > > IO, the pool cannot be considered suitable for IO. So if there's a
> > > > single invocation with RTE_BAD_IOVA, the flag must be set forever.
> > >
> > > If so, some objects may be used for IO, some cannot be used.
> > > What should happen if an application allocates an object which is
> > > suitable for IO and try to use it this way?
> > 
> > If the application can predict if the allocated object is usable for IO
> > before allocating it, I would be surprised to have it used for IO. I agree
> > with Dmitry here.
> 
> The flag hints to components, PMDs before all,
> that objects from this mempool will never be used for IO,
> so that the component can save some memory mapping or DMA configuration.
> If the flag is set when even a single object may be used for IO,
> the consumer of the flag will not be ready for that.
> Whatever a corner case it is, Andrew is correct.
> There is a subtle difference between "pool is not usable"
> (as described now) and "objects from this mempool will never be used"
> (as stated above), I'll highlight it in the flag description.

OK, agreed, thanks.


Re: [dpdk-dev] [PATCH v4 1/4] mempool: add event callbacks

2021-10-15 Thread Olivier Matz
Hi Dmitry,

On Wed, Oct 13, 2021 at 02:01:28PM +0300, Dmitry Kozlyuk wrote:
> Data path performance can benefit if the PMD knows which memory it will
> need to handle in advance, before the first mbuf is sent to the PMD.
> It is impractical, however, to consider all allocated memory for this
> purpose. Most often mbuf memory comes from mempools that can come and
> go. PMD can enumerate existing mempools on device start, but it also
> needs to track creation and destruction of mempools after the forwarding
> starts but before an mbuf from the new mempool is sent to the device.
> 
> Add an API to register callback for mempool life cycle events:
> * rte_mempool_event_callback_register()
> * rte_mempool_event_callback_unregister()
> Currently tracked events are:
> * RTE_MEMPOOL_EVENT_READY (after populating a mempool)
> * RTE_MEMPOOL_EVENT_DESTROY (before freeing a mempool)
> Provide a unit test for the new API.
> The new API is internal, because it is primarily demanded by PMDs that
> may need to deal with any mempools and do not control their creation,
> while an application, on the other hand, knows which mempools it creates
> and doesn't care about internal mempools PMDs might create.
> 
> Signed-off-by: Dmitry Kozlyuk 
> Acked-by: Matan Azrad 
> ---
>  app/test/test_mempool.c   | 209 ++
>  lib/mempool/rte_mempool.c | 137 +
>  lib/mempool/rte_mempool.h |  61 +++
>  lib/mempool/version.map   |   8 ++
>  4 files changed, 415 insertions(+)

(...)

> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -42,6 +42,18 @@ static struct rte_tailq_elem rte_mempool_tailq = {
>  };
>  EAL_REGISTER_TAILQ(rte_mempool_tailq)
>  
> +TAILQ_HEAD(mempool_callback_list, rte_tailq_entry);
> +
> +static struct rte_tailq_elem callback_tailq = {
> + .name = "RTE_MEMPOOL_CALLBACK",
> +};
> +EAL_REGISTER_TAILQ(callback_tailq)
> +
> +/* Invoke all registered mempool event callbacks. */
> +static void
> +mempool_event_callback_invoke(enum rte_mempool_event event,
> +   struct rte_mempool *mp);
> +
>  #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5
>  #define CALC_CACHE_FLUSHTHRESH(c)\
>   ((typeof(c))((c) * CACHE_FLUSHTHRESH_MULTIPLIER))
> @@ -360,6 +372,10 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char 
> *vaddr,
>   STAILQ_INSERT_TAIL(&mp->mem_list, memhdr, next);
>   mp->nb_mem_chunks++;
>  
> + /* Report the mempool as ready only when fully populated. */
> + if (mp->populated_size >= mp->size)
> + mempool_event_callback_invoke(RTE_MEMPOOL_EVENT_READY, mp);
> +

One small comment here. I think it does not happen today, but in the
future, something that could happen is:
  - create empty mempool
  - populate mempool
  - use mempool
  - populate mempool with more objects
  - use mempool

I've seen one usage there: https://www.youtube.com/watch?v=SzQFn9tm4Sw

In that case, it would require a POPULATE event instead of a
MEMPOOL_CREATE event.

Enhancing the documentation to better explain when the callback is
invoked looks enough to me for the moment.

>   rte_mempool_trace_populate_iova(mp, vaddr, iova, len, free_cb, opaque);
>   return i;
>  
> @@ -722,6 +738,7 @@ rte_mempool_free(struct rte_mempool *mp)
>   }
>   rte_mcfg_tailq_write_unlock();
>  
> + mempool_event_callback_invoke(RTE_MEMPOOL_EVENT_DESTROY, mp);
>   rte_mempool_trace_free(mp);
>   rte_mempool_free_memchunks(mp);
>   rte_mempool_ops_free(mp);
> @@ -1343,3 +1360,123 @@ void rte_mempool_walk(void (*func)(struct rte_mempool 
> *, void *),
>  
>   rte_mcfg_mempool_read_unlock();
>  }
> +
> +struct mempool_callback {
> + rte_mempool_event_callback *func;
> + void *user_data;
> +};
> +
> +static void
> +mempool_event_callback_invoke(enum rte_mempool_event event,
> +   struct rte_mempool *mp)
> +{
> + struct mempool_callback_list *list;
> + struct rte_tailq_entry *te;
> + void *tmp_te;
> +
> + rte_mcfg_tailq_read_lock();
> + list = RTE_TAILQ_CAST(callback_tailq.head, mempool_callback_list);
> + RTE_TAILQ_FOREACH_SAFE(te, list, next, tmp_te) {
> + struct mempool_callback *cb = te->data;
> + rte_mcfg_tailq_read_unlock();
> + cb->func(event, mp, cb->user_data);
> + rte_mcfg_tailq_read_lock();

I think it is dangerous to unlock the list before invoking the callback.
During that time, another thread can remove the next mempool callback, and
the next iteration will access to a freed element, causing an undefined
behavior.

Is it a problem to keep the lock held during the callback invocation?

I see that you have a test for this, and that you wrote a comment in the
documentation:

 * rte_mempool_event_callback_register() may be called from within the callback,
 * but the callbacks registered this way will not be invoked for the same event.
 * rte_mempool_event_callback_unregister() may only be

Re: [dpdk-dev] [PATCH v4 2/4] mempool: add non-IO flag

2021-10-15 Thread Olivier Matz
On Fri, Oct 15, 2021 at 01:41:40PM +0200, David Marchand wrote:
> On Fri, Oct 15, 2021 at 12:42 PM Dmitry Kozlyuk  wrote:
> > > a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c index
> > > 8d5f99f7e7..27d197fe86 100644
> > > --- a/lib/mempool/rte_mempool.c
> > > +++ b/lib/mempool/rte_mempool.c
> > > @@ -802,6 +802,7 @@ rte_mempool_cache_free(struct rte_mempool_cache
> > > *cache)
> > > | MEMPOOL_F_SC_GET \
> > > | MEMPOOL_F_POOL_CREATED \
> > > | MEMPOOL_F_NO_IOVA_CONTIG \
> > > +   | MEMPOOL_F_NON_IO \
> >
> > I wonder why CREATED and NON_IO should be listed here:
> > they are not supposed to be passed by the user,
> > which is what MEMPOOL_KNOWN_FLAGS is used for.
> > The same question stands for the test code.
> > Could you confirm your suggestion?
> 
> There was no distinction in the API for valid flags so far, and indeed
> I did not pay attention to MEMPOOL_F_POOL_CREATED and its internal
> aspect.
> (That's the problem when mixing stuff together)
> 
> We could separate internal and exposed flags in different fields, but
> it seems overkill.
> It would be seen as an API change too, if application were checking
> for this flag.
> So let's keep this as is.
> 
> As you suggest, we should exclude those internal flags from
> KNOWN_FLAGS (probably rename it too), and we will have to export this

I suggest RTE_MEMPOOL_VALID_USER_FLAGS for the name

> define for the unit test since the check had been written with
> contiguous valid flags in mind.
> If your new flag is internal only, I agree we must skip it.
> 
> I'll prepare a patch for mempool.
> 
> -- 
> David Marchand
> 


Re: [dpdk-dev] [PATCH v3 4/5] lib/kvargs: remove unneeded header includes

2021-10-15 Thread Olivier Matz
On Fri, Oct 15, 2021 at 10:20:06AM +0100, Morrissey, Sean wrote:
> 
> On 15/10/2021 10:00, Olivier Matz wrote:
> > Hi Sean,
> > 
> > On Thu, Oct 07, 2021 at 10:25:56AM +, Sean Morrissey wrote:
> > > These header includes have been flagged by the iwyu_tool
> > > and removed.
> > > 
> > > Signed-off-by: Sean Morrissey 
> > > ---
> > >   lib/kvargs/rte_kvargs.c | 1 -
> > >   1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/lib/kvargs/rte_kvargs.c b/lib/kvargs/rte_kvargs.c
> > > index 38e9d5c1ca..4cce8e953b 100644
> > > --- a/lib/kvargs/rte_kvargs.c
> > > +++ b/lib/kvargs/rte_kvargs.c
> > > @@ -7,7 +7,6 @@
> > >   #include 
> > >   #include 
> > > -#include 
> > >   #include 
> > >   #include "rte_kvargs.h"
> > > -- 
> > > 2.25.1
> > > 
> > Did you check that it still compiles for the Windows platform
> > after this change?
> > 
> > +CC Dmitry
> 
> Hi Olivier,
> 
> I cross-compiled with MinGW-64 after this change and it still compiled.

Thanks.

However I see that strdup() is used in rte_kvargs.c, and it is defined in
lib/eal/windows/include/rte_os_shim.h. So at first glance, it seems a better
option to keep the include as it is.

I don't know if strdup() is defined somewhere else on windows, or if
rte_os_shim.h is included by another header. Better have an opinion from a
windows maintainer if we want to remove this include.


Re: [dpdk-dev] [PATCH] doc: fix default mempool option

2021-10-15 Thread Olivier Matz
On Fri, Oct 15, 2021 at 10:39:41AM +0200, David Marchand wrote:
> This option should be prefixed with -- for consistency with others.
> 
> Fixes: a103a97e7191 ("eal: allow user to override default mempool driver")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 

Reviewed-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v3 09/18] net: fix spelling error in gtp comment

2021-10-15 Thread Olivier Matz
On Thu, Oct 14, 2021 at 02:56:22PM -0700, Stephen Hemminger wrote:
> More codespell finds.
> 
> Signed-off-by: Stephen Hemminger 

Acked-by: Olivier Matz 

Thanks!


  1   2   3   4   5   6   7   8   9   10   >