On 3/11/26 6:32 PM, David Marchand wrote:
> By default, DPDK based dp-packets points to data buffers that can't be
> expanded dynamically.
> Their layout is as follows:
> - a minimum 128 bytes headroom chosen at DPDK build time
>   (RTE_PKTMBUF_HEADROOM),
> - a maximum size chosen at mempool creation,
> 
> In some usecases though (like encapsulating with multiple tunnels),
> a 128 bytes headroom is too short.
> 
> Keep on using mono segment packets but dynamically allocate buffers
> in DPDK memory and make use of DPDK external buffers API
> (previously used for userspace TSO).
> 
> Signed-off-by: David Marchand <[email protected]>
> ---
> Changes since v4:
> - fixed tailroom,
> - added a check on configured DPDK headroom,
> - added more description and renamed ifaces in the unit test,
> 
> Changes since v3:
> - split buffer length calculation in a helper,
> - handled running test without qdisc (net/tap does not require
>   those qdiscs, but spews ERR level logs if absent),
> - added check on firewall,
> 
> Changes since v2:
> - moved check on uint16_t overflow in netdev_dpdk_extbuf_allocate(),
> 
> Changes since v1:
> - fixed new segment length (reset by extbuf attach helper),
> - added a system-dpdk unit test,
> 
> ---
>  acinclude.m4         |   7 +++
>  lib/dp-packet.c      |  21 ++++++++-
>  lib/netdev-dpdk.c    |  47 +++++++++++++++++---
>  lib/netdev-dpdk.h    |   4 ++
>  tests/atlocal.in     |   1 +
>  tests/system-dpdk.at | 100 +++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 174 insertions(+), 6 deletions(-)
> 
> diff --git a/acinclude.m4 b/acinclude.m4
> index e4e48cb531..060c416f8a 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -431,6 +431,13 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>        AC_MSG_ERROR([unable to find rte_config.h in $with_dpdk])
>      ], [AC_INCLUDES_DEFAULT])
>  
> +    AC_COMPUTE_INT([dpdk_mbuf_headroom], [RTE_PKTMBUF_HEADROOM],
> +                   [AC_INCLUDES_DEFAULT],
> +                   [AC_MSG_ERROR([unable to determine 
> RTE_PKTMBUF_HEADROOM])])
> +    AC_DEFINE_UNQUOTED([DPDK_MBUF_HEADROOM], [$dpdk_mbuf_headroom],
> +                       [Value of RTE_PKTMBUF_HEADROOM from DPDK])
> +    AC_SUBST([DPDK_MBUF_HEADROOM], [$dpdk_mbuf_headroom])
> +
>      AC_CHECK_DECLS([RTE_LIBRTE_VHOST_NUMA, RTE_EAL_NUMA_AWARE_HUGEPAGES], [
>        OVS_FIND_DEPENDENCY([get_mempolicy], [numa], [libnuma])
>      ], [], [[#include <rte_config.h>]])
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index c04d608be6..30fd013c29 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -255,8 +255,27 @@ dp_packet_resize(struct dp_packet *b, size_t 
> new_headroom, size_t new_tailroom)
>      new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>  
>      switch (b->source) {
> -    case DPBUF_DPDK:
> +    case DPBUF_DPDK: {
> +#ifdef DPDK_NETDEV
> +        uint32_t extbuf_len;
> +
> +        extbuf_len = netdev_dpdk_extbuf_size(new_allocated);
> +        ovs_assert(extbuf_len <= UINT16_MAX);
> +        new_base = netdev_dpdk_extbuf_allocate(extbuf_len);
> +        if (!new_base) {
> +            out_of_memory();
> +        }
> +        dp_packet_copy__(b, new_base, new_headroom, new_tailroom);
> +        netdev_dpdk_extbuf_replace(b, new_base, extbuf_len);
> +        /* Because of alignment, we may have gained a bit more tailroom than
> +         * expected. Rely on this mbuf buf_len which got adjusted by
> +         * rte_pktmbuf_attach_extbuf(). */
> +        new_allocated = b->mbuf.buf_len;

nit: We're not accessing mbuf directly anywhere in this file, so I wonder
if we should use the access function here instead, e.g.:

        /* Because of alignment, we may have gained a bit more tailroom than
         * expected.  Update from the currently allocated length which got
         * adjusted by rte_pktmbuf_attach_extbuf(). */
        new_allocated = dp_packet_get_allocated(b);

WDYT?

Also, double spaces between sentences.

All these could probbaly be adjusted on commit, the rest of the code and
the updated test look good to me:

Acked-by: Ilya Maximets <[email protected]>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to