On Thu Apr 2, 2026 at 12:41 PM CEST, Kevin Traynor via dev wrote:
> On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote:
> > 
> > 
> > On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:
> > 
> >> This patch adds support for specific PMD thread initialization,
> >> deinitialization, and a callback execution to perform work as
> >> part of the PMD thread loop. This allows hardware offload
> >> providers to handle any specific asynchronous or batching work.
> >>
> >> This patch also adds cycle statistics for the provider-specific
> >> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.
> > 
> > Bringing back the discussion on the earlier patch between Ilya and Gaetan 
> > to this revision :)
> > 
> > Ilya:
> >   Hi, Eelco.  As we talked before, this infrastructure resembles the async
> >   work infra that was proposed in the past for the use case of async vhost
> >   processing.  And I don't see any real use case proposed for it here nor
> >   in the RFC, where the question was asked, but not replied.
> > 
> > Gaetan:
> > 
>
> Hi Gaetan,
>
> A few questions below. I'm not so clear on the DOCA threading
> requirements, so questions may be broad.
>
> >   Hi Ilya, Eelco,
> > 
> >   Thanks for the patch and for the review.
> > 
> >   The use-case on our side is distributed data-structures in DOCA that
> >   requires each participating threads to do maintenance work periodically.
> > 
> >   Specifically, offload threads will insert offload objects.
> >   Those will reserve entries in a map that can be resized. The DOCA
> >   implementation requires any thread that owns an entry to perform the
> >   work of moving it to the new bucket / space after resize is initiated.
> > 
> >   This is a pervasive design choice in DOCA, they write most of their APIs
> >   assuming participating threads are periodically calling into these
> >   maintenance functions.
> > 
>
> What is a "particpating thread" ? IIUC, the pmd thread passes down the
> flow pattern/action and the offload thread inserts the offload into the NIC.
>
> In that case, is it the offload thread that owns the entry ?
>

Participating threads are any threads that registered to DOCA-flow as
offloading threads. In our case, it means:

  * The main thread
      --> When probing a port, starting it requires installing
          DOCA offloads to execute RSS in particular, and a few other
          'admin' offloads (optional rate-limiting on VF to avoid
          noisy-neighbors, etc).

  * The offload thread(s) (in the OVS sense)
      A thread in OVS managing dp-flow offloads asynchronously.

  * The polling thread(s)
      CT-offload is much simpler and faster than dp-flow offload.
      Executing offload insertion synchronously from the fastpath
      is beneficial.

In our case, 'participating threads' are any thread owning an offload
queue in DOCA-flow.

We have a few exceptions for the main thread, mainly that we force all
offload operations to be fully synchronous there: we do not want to
publish a new netdev if its 'admin' offloads have not yet been received
and successfully acknowledged by the hardware, so we force waiting
operations for it: it does not need to do regular upkeep etc.

> >   Some of such work is also time-sensitive, for example the current
> >   implementation requires a CT offload thread to receive completions after
> >   some hardware initialization. Until this completion is done, the CT
> >   offload entry is not fully usable (cannot be queried for activity /
> >   counters). We cannot leave batches of CT offload entry waiting for
> >   completion, assuming that at some later point, we will eventually
> >   re-execute something in our offload provider: it leaves a few stranded
> >   connection objects incomplete.
> > 
> >   This has the result of having hardware execution of a flow with CT
> >   actions, but no activity counters: the software datapath then deletes
> >   the connection and/or flow due to inactivity.
> > 
>
> Can this periodic work be done by the offload thread ? If it is fast
> enough for inserting the offload, then maybe it is fast enough for this.
>

The PMD thread owns the offload queue. If another thread has to execute
its upkeep work, it means sharing the queue between threads.

> Some DPDK PMDs use alarms for periodic maintenance work, could they be
> used inside DOCA for this?
>

Those upkeep functions are exposed by DOCA and part of the DOCA-flow
API. DOCA does not expose an event framework to schedule this kind of
work, it requires DOCA applications to explicitly call those functions.

> If it needs to be on the PMD thread, is the work significant (i.e. more
> than a few % cpu) and how variable is it ? Could it be added inside the
> call to rte_eth_rx_burst polling ?
>

It can be significant.
The work is anything requiring the use of the offload queue owned by
this thread. The principle is that the owning thread must execute it.

Currently, with CT offloads we have:

  * offload queue polling for HW completion (requests have been
    executed: add / mod / del were executed)

  * CT-del: A conn was offloaded by PMD 1. The connection either expired
    or another PMD 2 closed it: ct-clean or PMD-2 send a CT-del
    request to PMD-1: PMD-1 must poll for CT-del requests and
    execute them locally.

  * Offload flush: when a port is deleted, all owning threads must
    process a blocking flush request from the main thread. The main
    thread only proceeds once all participating threads have completed
    their flush.

Completion is a very lightweight work, but we must execute it.
Generally we do only completion polling as needed: we only clear enough
room in the offload queue for the current batch of requests we want to
enqueue, but we have an issue on idle: some stray completion can
be left in the queue and won't be processed if we rely only on activity.
Currently DOCA-flow does not support leaving the completions until the
port is deleted: they need to be processed.

CT-del can be significant in some cases. We have a 'rolling-window' case
of constant open + close of short connections, and in this worst case,
CT-del takes ~30% (both local and distant). Some portion of it comes from
CT-del messages, in particular in case of multiple PMDs.

Offload flush is generally quick, but we must answer the flush message
quickly to block the main thread as little as possible.

Some of the messages must be handled even if there is no RX-burst: a PMD
that is waiting for reload will need to execute a flush message that it
has received.

I think completions and flushes would be the main issues with the
rx-burst approach.

Thanks,
Gaetan

> thanks,
> Kevin.
>
> > Thanks,
> > 
> > Eelco
> > 
> >> Signed-off-by: Eelco Chaudron <[email protected]>
> >> ---
> >>  lib/dpif-netdev-perf.c        |  19 ++++-
> >>  lib/dpif-netdev-perf.h        |   3 +-
> >>  lib/dpif-netdev.c             |  42 ++++++++++-
> >>  lib/dpif-offload-dummy.c      |  38 ++++++++++
> >>  lib/dpif-offload-provider.h   |  26 +++++++
> >>  lib/dpif-offload.c            | 133 ++++++++++++++++++++++++++++++++++
> >>  lib/dpif-offload.h            |  11 +++
> >>  tests/pmd.at                  |  32 ++++++++
> >>  utilities/checkpatch_dict.txt |   2 +
> >>  9 files changed, 298 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
> >> index 1cd4ee0842..39465ba819 100644
> >> --- a/lib/dpif-netdev-perf.c
> >> +++ b/lib/dpif-netdev-perf.c
> >> @@ -232,6 +232,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
> >> pmd_perf_stats *s,
> >>      uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
> >>      uint64_t sleep_iter = stats[PMD_SLEEP_ITER];
> >>      uint64_t tot_sleep_cycles = stats[PMD_CYCLES_SLEEP];
> >> +    uint64_t offload_cycles = stats[PMD_CYCLES_OFFLOAD];
> >>
> >>      ds_put_format(str,
> >>              "  Iterations:         %12"PRIu64"  (%.2f us/it)\n"
> >> @@ -242,7 +243,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
> >> pmd_perf_stats *s,
> >>              "  Sleep time (us):    %12.0f  (%3.0f us/iteration avg.)\n",
> >>              tot_iter,
> >>              tot_iter
> >> -                ? (tot_cycles + tot_sleep_cycles) * us_per_cycle / 
> >> tot_iter
> >> +                ? (tot_cycles + tot_sleep_cycles + offload_cycles)
> >> +                  * us_per_cycle / tot_iter
> >>                  : 0,
> >>              tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
> >>              idle_iter,
> >> @@ -252,6 +254,13 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
> >> pmd_perf_stats *s,
> >>              sleep_iter, tot_iter ? 100.0 * sleep_iter / tot_iter : 0,
> >>              tot_sleep_cycles * us_per_cycle,
> >>              sleep_iter ? (tot_sleep_cycles * us_per_cycle) / sleep_iter : 
> >> 0);
> >> +    if (offload_cycles > 0) {
> >> +        ds_put_format(str,
> >> +            "  Offload cycles:     %12" PRIu64 "  (%5.1f %% of used 
> >> cycles)\n",
> >> +            offload_cycles,
> >> +            100.0 * offload_cycles / (tot_cycles + tot_sleep_cycles
> >> +                                      + offload_cycles));
> >> +    }
> >>      if (rx_packets > 0) {
> >>          ds_put_format(str,
> >>              "  Rx packets:         %12"PRIu64"  (%.0f Kpps, %.0f 
> >> cycles/pkt)\n"
> >> @@ -532,14 +541,14 @@ OVS_REQUIRES(s->stats_mutex)
> >>  void
> >>  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> >>                         int tx_packets, uint64_t sleep_cycles,
> >> -                       bool full_metrics)
> >> +                       uint64_t offload_cycles, bool full_metrics)
> >>  {
> >>      uint64_t now_tsc = cycles_counter_update(s);
> >>      struct iter_stats *cum_ms;
> >>      uint64_t cycles, cycles_per_pkt = 0;
> >>      char *reason = NULL;
> >>
> >> -    cycles = now_tsc - s->start_tsc - sleep_cycles;
> >> +    cycles = now_tsc - s->start_tsc - sleep_cycles - offload_cycles;
> >>      s->current.timestamp = s->iteration_cnt;
> >>      s->current.cycles = cycles;
> >>      s->current.pkts = rx_packets;
> >> @@ -558,6 +567,10 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int 
> >> rx_packets,
> >>          pmd_perf_update_counter(s, PMD_CYCLES_SLEEP, sleep_cycles);
> >>      }
> >>
> >> +    if (offload_cycles) {
> >> +        pmd_perf_update_counter(s, PMD_CYCLES_OFFLOAD, offload_cycles);
> >> +    }
> >> +
> >>      if (!full_metrics) {
> >>          return;
> >>      }
> >> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> >> index 84beced151..2a055dacdd 100644
> >> --- a/lib/dpif-netdev-perf.h
> >> +++ b/lib/dpif-netdev-perf.h
> >> @@ -82,6 +82,7 @@ enum pmd_stat_type {
> >>      PMD_CYCLES_UPCALL,      /* Cycles spent processing upcalls. */
> >>      PMD_SLEEP_ITER,         /* Iterations where a sleep has taken place. 
> >> */
> >>      PMD_CYCLES_SLEEP,       /* Total cycles slept to save power. */
> >> +    PMD_CYCLES_OFFLOAD,     /* Total cycles spend handling offload. */
> >>      PMD_N_STATS
> >>  };
> >>
> >> @@ -411,7 +412,7 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s);
> >>  void
> >>  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> >>                         int tx_packets, uint64_t sleep_cycles,
> >> -                       bool full_metrics);
> >> +                       uint64_t offload_cycles, bool full_metrics);
> >>
> >>  /* Formatting the output of commands. */
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 9df05c4c28..64531a02c0 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -329,6 +329,9 @@ struct dp_netdev {
> >>      uint64_t last_reconfigure_seq;
> >>      struct ovsthread_once once_set_config;
> >>
> >> +    /* When a reconfigure is requested, forcefully reload all PMDs. */
> >> +    bool force_pmd_reload;
> >> +
> >>      /* Cpu mask for pin of pmd threads. */
> >>      char *pmd_cmask;
> >>
> >> @@ -339,6 +342,7 @@ struct dp_netdev {
> >>
> >>      struct conntrack *conntrack;
> >>      struct pmd_auto_lb pmd_alb;
> >> +    bool offload_enabled;
> >>
> >>      /* Bonds. */
> >>      struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */
> >> @@ -4556,6 +4560,14 @@ dpif_netdev_set_config(struct dpif *dpif, const 
> >> struct smap *other_config)
> >>          log_all_pmd_sleeps(dp);
> >>      }
> >>
> >> +    if (!dp->offload_enabled) {
> >> +        dp->offload_enabled = dpif_offload_enabled();
> >> +        if (dp->offload_enabled) {
> >> +            dp->force_pmd_reload = true;
> >> +            dp_netdev_request_reconfigure(dp);
> >> +        }
> >> +    }
> >> +
> >>      return 0;
> >>  }
> >>
> >> @@ -6216,6 +6228,14 @@ reconfigure_datapath(struct dp_netdev *dp)
> >>          ovs_mutex_unlock(&pmd->port_mutex);
> >>      }
> >>
> >> +    /* Do we need to forcefully reload all threads? */
> >> +    if (dp->force_pmd_reload) {
> >> +        CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> >> +            pmd->need_reload = true;
> >> +        }
> >> +        dp->force_pmd_reload = false;
> >> +    }
> >> +
> >>      /* Reload affected pmd threads. */
> >>      reload_affected_pmds(dp);
> >>
> >> @@ -6516,6 +6536,7 @@ pmd_thread_main(void *f_)
> >>  {
> >>      struct dp_netdev_pmd_thread *pmd = f_;
> >>      struct pmd_perf_stats *s = &pmd->perf_stats;
> >> +    struct dpif_offload_pmd_ctx *offload_ctx = NULL;
> >>      unsigned int lc = 0;
> >>      struct polled_queue *poll_list;
> >>      bool wait_for_reload = false;
> >> @@ -6549,6 +6570,9 @@ reload:
> >>          dpdk_attached = dpdk_attach_thread(pmd->core_id);
> >>      }
> >>
> >> +    dpif_offload_pmd_thread_reload(pmd->dp->full_name, pmd->core_id,
> >> +                                   pmd->numa_id, &offload_ctx);
> >> +
> >>      /* List port/core affinity */
> >>      for (i = 0; i < poll_cnt; i++) {
> >>         VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
> >> @@ -6588,7 +6612,7 @@ reload:
> >>      ovs_mutex_lock(&pmd->perf_stats.stats_mutex);
> >>      for (;;) {
> >>          uint64_t rx_packets = 0, tx_packets = 0;
> >> -        uint64_t time_slept = 0;
> >> +        uint64_t time_slept = 0, offload_cycles = 0;
> >>          uint64_t max_sleep;
> >>
> >>          pmd_perf_start_iteration(s);
> >> @@ -6628,6 +6652,10 @@ reload:
> >>                                                     ? true : false);
> >>          }
> >>
> >> +        /* Do work required by any of the hardware offload providers. */
> >> +        offload_cycles = dpif_offload_pmd_thread_do_work(offload_ctx,
> >> +                                                         
> >> &pmd->perf_stats);
> >> +
> >>          if (max_sleep) {
> >>              /* Check if a sleep should happen on this iteration. */
> >>              if (sleep_time) {
> >> @@ -6687,7 +6715,7 @@ reload:
> >>          }
> >>
> >>          pmd_perf_end_iteration(s, rx_packets, tx_packets, time_slept,
> >> -                               pmd_perf_metrics_enabled(pmd));
> >> +                               offload_cycles, 
> >> pmd_perf_metrics_enabled(pmd));
> >>      }
> >>      ovs_mutex_unlock(&pmd->perf_stats.stats_mutex);
> >>
> >> @@ -6708,6 +6736,7 @@ reload:
> >>          goto reload;
> >>      }
> >>
> >> +    dpif_offload_pmd_thread_exit(offload_ctx);
> >>      pmd_free_static_tx_qid(pmd);
> >>      dfc_cache_uninit(&pmd->flow_cache);
> >>      free(poll_list);
> >> @@ -9623,7 +9652,7 @@ dp_netdev_pmd_try_optimize(struct 
> >> dp_netdev_pmd_thread *pmd,
> >>                             struct polled_queue *poll_list, int poll_cnt)
> >>  {
> >>      struct dpcls *cls;
> >> -    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0;
> >> +    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0, tot_offload = 0;
> >>      unsigned int pmd_load = 0;
> >>
> >>      if (pmd->ctx.now > pmd->next_cycle_store) {
> >> @@ -9642,11 +9671,14 @@ dp_netdev_pmd_try_optimize(struct 
> >> dp_netdev_pmd_thread *pmd,
> >>                         pmd->prev_stats[PMD_CYCLES_ITER_BUSY];
> >>              tot_sleep = pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP] -
> >>                          pmd->prev_stats[PMD_CYCLES_SLEEP];
> >> +            tot_offload = pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD] -
> >> +                          pmd->prev_stats[PMD_CYCLES_OFFLOAD];
> >>
> >>              if (pmd_alb->is_enabled && !pmd->isolated) {
> >>                  if (tot_proc) {
> >>                      pmd_load = ((tot_proc * 100) /
> >> -                                    (tot_idle + tot_proc + tot_sleep));
> >> +                                    (tot_idle + tot_proc + tot_sleep
> >> +                                     + tot_offload));
> >>                  }
> >>
> >>                  atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> >> @@ -9665,6 +9697,8 @@ dp_netdev_pmd_try_optimize(struct 
> >> dp_netdev_pmd_thread *pmd,
> >>                          pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY];
> >>          pmd->prev_stats[PMD_CYCLES_SLEEP] =
> >>                          pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP];
> >> +        pmd->prev_stats[PMD_CYCLES_OFFLOAD] =
> >> +                        pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD];
> >>
> >>          /* Get the cycles that were used to process each queue and store. 
> >> */
> >>          for (unsigned i = 0; i < poll_cnt; i++) {
> >> diff --git a/lib/dpif-offload-dummy.c b/lib/dpif-offload-dummy.c
> >> index 28f1f40013..80d6ce67c3 100644
> >> --- a/lib/dpif-offload-dummy.c
> >> +++ b/lib/dpif-offload-dummy.c
> >> @@ -17,6 +17,7 @@
> >>  #include <config.h>
> >>  #include <errno.h>
> >>
> >> +#include "coverage.h"
> >>  #include "dpif.h"
> >>  #include "dpif-offload.h"
> >>  #include "dpif-offload-provider.h"
> >> @@ -33,6 +34,8 @@
> >>
> >>  VLOG_DEFINE_THIS_MODULE(dpif_offload_dummy);
> >>
> >> +COVERAGE_DEFINE(dummy_offload_do_work);
> >> +
> >>  struct pmd_id_data {
> >>      struct hmap_node node;
> >>      void *flow_reference;
> >> @@ -1020,6 +1023,40 @@ dummy_netdev_hw_offload_run(struct netdev *netdev)
> >>      }
> >>  }
> >>
> >> +static void
> >> +dummy_pmd_thread_work_cb(unsigned core_id OVS_UNUSED, int numa_id 
> >> OVS_UNUSED,
> >> +                         void *ctx OVS_UNUSED)
> >> +{
> >> +    COVERAGE_INC(dummy_offload_do_work);
> >> +}
> >> +
> >> +static void
> >> +dummy_pmd_thread_lifecycle(const struct dpif_offload *dpif_offload,
> >> +                           bool exit, unsigned core_id, int numa_id,
> >> +                           dpif_offload_pmd_thread_work_cb **callback,
> >> +                           void **ctx)
> >> +{
> >> +    /* Only do this for the 'dummy' class, not for 'dummy_x'. */
> >> +    if (strcmp(dpif_offload_type(dpif_offload), "dummy")) {
> >> +        *callback = NULL;
> >> +        *ctx = NULL;
> >> +        return;
> >> +    }
> >> +
> >> +    VLOG_DBG(
> >> +        "pmd_thread_lifecycle; exit=%s, core=%u, numa=%d, cb=%p, ctx=%p",
> >> +        exit ? "true" : "false", core_id, numa_id, *callback, *ctx);
> >> +
> >> +    ovs_assert(!*callback || *callback == dummy_pmd_thread_work_cb);
> >> +
> >> +    if (exit) {
> >> +        free(*ctx);
> >> +    } else {
> >> +        *ctx = *ctx ? *ctx : xstrdup("DUMMY_OFFLOAD_WORK");
> >> +        *callback = dummy_pmd_thread_work_cb;
> >> +    }
> >> +}
> >> +
> >>  #define DEFINE_DPIF_DUMMY_CLASS(NAME, TYPE_STR)                           
> >>   \
> >>      struct dpif_offload_class NAME = {                                    
> >>   \
> >>          .type = TYPE_STR,                                                 
> >>   \
> >> @@ -1039,6 +1076,7 @@ dummy_netdev_hw_offload_run(struct netdev *netdev)
> >>          .netdev_flow_del = dummy_flow_del,                                
> >>   \
> >>          .netdev_flow_stats = dummy_flow_stats,                            
> >>   \
> >>          .register_flow_unreference_cb = 
> >> dummy_register_flow_unreference_cb, \
> >> +        .pmd_thread_lifecycle = dummy_pmd_thread_lifecycle                
> >>   \
> >>  }
> >>
> >>  DEFINE_DPIF_DUMMY_CLASS(dpif_offload_dummy_class, "dummy");
> >> diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
> >> index 02ef46cb08..259de2c299 100644
> >> --- a/lib/dpif-offload-provider.h
> >> +++ b/lib/dpif-offload-provider.h
> >> @@ -87,6 +87,10 @@ dpif_offload_flow_dump_thread_init(
> >>  }
> >>
> >>
> >> +/* Offload Provider specific PMD thread work callback definition. */
> >> +typedef void dpif_offload_pmd_thread_work_cb(unsigned core_id, int 
> >> numa_id,
> >> +                                             void *ctx);
> >> +
> >>  struct dpif_offload_class {
> >>      /* Type of DPIF offload provider in this class, e.g., "tc", "dpdk",
> >>       * "dummy", etc. */
> >> @@ -305,6 +309,28 @@ struct dpif_offload_class {
> >>       * to netdev_flow_put() is no longer held by the offload provider. */
> >>      void (*register_flow_unreference_cb)(const struct dpif_offload *,
> >>                                           dpif_offload_flow_unreference_cb 
> >> *);
> >> +
> >> +
> >> +    /* The API below is specific to PMD (userspace) thread lifecycle 
> >> handling.
> >> +     *
> >> +     * This API allows a provider to supply a callback function
> >> +     * (via `*callback`) and an optional context pointer (via `*ctx`) for 
> >> a
> >> +     * PMD thread.
> >> +     *
> >> +     * The lifecycle hook may be invoked multiple times for the same PMD
> >> +     * thread.  For example, when the thread is reinitialized, this 
> >> function
> >> +     * will be called again and the previous `callback` and `ctx` values 
> >> will
> >> +     * be passed back in.  It is the provider's responsibility to decide
> >> +     * whether those should be reused, replaced, or cleaned up before 
> >> storing
> >> +     * new values.
> >> +     *
> >> +     * When the PMD thread is terminating, this API is called with
> >> +     * `exit == true`.  At that point, the provider must release any 
> >> resources
> >> +     * associated with the previously returned `callback` and `ctx`. */
> >> +    void (*pmd_thread_lifecycle)(const struct dpif_offload *, bool exit,
> >> +                                 unsigned core_id, int numa_id,
> >> +                                 dpif_offload_pmd_thread_work_cb 
> >> **callback,
> >> +                                 void **ctx);
> >>  };
> >>
> >>  extern struct dpif_offload_class dpif_offload_dummy_class;
> >> diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c
> >> index bb2feced9e..cbf1f6c704 100644
> >> --- a/lib/dpif-offload.c
> >> +++ b/lib/dpif-offload.c
> >> @@ -17,6 +17,7 @@
> >>  #include <config.h>
> >>  #include <errno.h>
> >>
> >> +#include "dpif-netdev-perf.h"
> >>  #include "dpif-offload.h"
> >>  #include "dpif-offload-provider.h"
> >>  #include "dpif-provider.h"
> >> @@ -54,6 +55,7 @@ static const struct dpif_offload_class 
> >> *base_dpif_offload_classes[] = {
> >>      &dpif_offload_dummy_x_class,
> >>  };
> >>
> >> +#define TOTAL_PROVIDERS ARRAY_SIZE(base_dpif_offload_classes)
> >>  #define DEFAULT_PROVIDER_PRIORITY_LIST "tc,dpdk,dummy,dummy_x"
> >>
> >>  static char *dpif_offload_provider_priority_list = NULL;
> >> @@ -1665,3 +1667,134 @@ dpif_offload_port_mgr_port_count(const struct 
> >> dpif_offload *offload)
> >>
> >>      return cmap_count(&offload->ports->odp_port_to_port);
> >>  }
> >> +
> >> +struct dpif_offload_pmd_ctx_node {
> >> +    const struct dpif_offload *offload;
> >> +    dpif_offload_pmd_thread_work_cb *callback;
> >> +    void *provider_ctx;
> >> +};
> >> +
> >> +struct dpif_offload_pmd_ctx {
> >> +    unsigned core_id;
> >> +    int numa_id;
> >> +    size_t n_nodes;
> >> +    struct dpif_offload_pmd_ctx_node nodes[TOTAL_PROVIDERS];
> >> +};
> >> +
> >> +void
> >> +dpif_offload_pmd_thread_reload(const char *dpif_name, unsigned core_id,
> >> +                               int numa_id, struct dpif_offload_pmd_ctx 
> >> **ctx_)
> >> +{
> >> +    struct dpif_offload_pmd_ctx_node old_nodes[TOTAL_PROVIDERS];
> >> +    struct dpif_offload_provider_collection *collection;
> >> +    struct dpif_offload_pmd_ctx *ctx;
> >> +    struct dpif_offload *offload;
> >> +    size_t old_n_nodes = 0;
> >> +
> >> +    if (!dpif_offload_enabled()) {
> >> +        ovs_assert(!*ctx_);
> >> +        return;
> >> +    }
> >> +
> >> +    ovs_mutex_lock(&dpif_offload_mutex);
> >> +    collection = shash_find_data(&dpif_offload_providers, dpif_name);
> >> +    ovs_mutex_unlock(&dpif_offload_mutex);
> >> +
> >> +    if (OVS_UNLIKELY(!collection)) {
> >> +        ovs_assert(!*ctx_);
> >> +        return;
> >> +    }
> >> +
> >> +    if (!*ctx_) {
> >> +        /* Would be nice if we have a numa specific xzalloc(). */
> >> +        ctx = xzalloc(sizeof *ctx);
> >> +        ctx->core_id = core_id;
> >> +        ctx->numa_id = numa_id;
> >> +        *ctx_ = ctx;
> >> +    } else {
> >> +        ctx = *ctx_;
> >> +        old_n_nodes = ctx->n_nodes;
> >> +
> >> +        if (old_n_nodes) {
> >> +            memcpy(old_nodes, ctx->nodes, old_n_nodes * sizeof 
> >> old_nodes[0]);
> >> +        }
> >> +
> >> +        /* Reset active nodes array. */
> >> +        memset(ctx->nodes, 0, sizeof ctx->nodes);
> >> +        ctx->n_nodes = 0;
> >> +    }
> >> +
> >> +    LIST_FOR_EACH (offload, dpif_list_node, &collection->list) {
> >> +
> >> +        ovs_assert(ctx->n_nodes < TOTAL_PROVIDERS);
> >> +
> >> +        if (!offload->class->pmd_thread_lifecycle) {
> >> +            continue;
> >> +        }
> >> +
> >> +        if (old_n_nodes) {
> >> +            /* If this is a reload, try to find previous callback and 
> >> ctx. */
> >> +            for (size_t i = 0; i < old_n_nodes; i++) {
> >> +                struct dpif_offload_pmd_ctx_node *node = &old_nodes[i];
> >> +
> >> +                if (offload == node->offload) {
> >> +                    ctx->nodes[ctx->n_nodes].callback = node->callback;
> >> +                    ctx->nodes[ctx->n_nodes].provider_ctx = 
> >> node->provider_ctx;
> >> +                    break;
> >> +                }
> >> +            }
> >> +        }
> >> +
> >> +        offload->class->pmd_thread_lifecycle(
> >> +            offload, false, core_id, numa_id,
> >> +            &ctx->nodes[ctx->n_nodes].callback,
> >> +            &ctx->nodes[ctx->n_nodes].provider_ctx);
> >> +
> >> +        if (ctx->nodes[ctx->n_nodes].callback) {
> >> +            ctx->nodes[ctx->n_nodes].offload = offload;
> >> +            ctx->n_nodes++;
> >> +        } else {
> >> +            memset(&ctx->nodes[ctx->n_nodes], 0,
> >> +                   sizeof ctx->nodes[ctx->n_nodes]);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +uint64_t
> >> +dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *ctx,
> >> +                                struct pmd_perf_stats *stats)
> >> +{
> >> +    struct cycle_timer offload_work_timer;
> >> +
> >> +    if (!ctx || !ctx->n_nodes) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    cycle_timer_start(stats, &offload_work_timer);
> >> +
> >> +    for (size_t i = 0; i < ctx->n_nodes; i++) {
> >> +        ctx->nodes[i].callback(ctx->core_id, ctx->numa_id,
> >> +                               ctx->nodes[i].provider_ctx);
> >> +    }
> >> +
> >> +    return cycle_timer_stop(stats, &offload_work_timer);
> >> +}
> >> +
> >> +void
> >> +dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *ctx)
> >> +{
> >> +    if (!ctx) {
> >> +        return;
> >> +    }
> >> +
> >> +    for (size_t i = 0; i < ctx->n_nodes; i++) {
> >> +        struct dpif_offload_pmd_ctx_node *node = &ctx->nodes[i];
> >> +
> >> +        node->offload->class->pmd_thread_lifecycle(node->offload, true,
> >> +                                                   ctx->core_id, 
> >> ctx->numa_id,
> >> +                                                   &node->callback,
> >> +                                                   &node->provider_ctx);
> >> +    }
> >> +
> >> +    free(ctx);
> >> +}
> >> diff --git a/lib/dpif-offload.h b/lib/dpif-offload.h
> >> index 7fad3ebee3..0f66d8cd8e 100644
> >> --- a/lib/dpif-offload.h
> >> +++ b/lib/dpif-offload.h
> >> @@ -22,6 +22,7 @@
> >>  /* Forward declarations of private structures. */
> >>  struct dpif_offload_class;
> >>  struct dpif_offload;
> >> +struct pmd_perf_stats;
> >>
> >>  /* Definition of the DPIF offload implementation type.
> >>   *
> >> @@ -186,4 +187,14 @@ dpif_offload_datapath_flow_op_continue(struct 
> >> dpif_offload_flow_cb_data *cb,
> >>      }
> >>  }
> >>
> >> +/* PMD Thread helper functions. */
> >> +struct dpif_offload_pmd_ctx;
> >> +
> >> +void dpif_offload_pmd_thread_reload(const char *dpif_name,
> >> +                                    unsigned core_id, int numa_id,
> >> +                                    struct dpif_offload_pmd_ctx **);
> >> +uint64_t dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *,
> >> +                                         struct pmd_perf_stats *);
> >> +void dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *);
> >> +
> >>  #endif /* DPIF_OFFLOAD_H */
> >> diff --git a/tests/pmd.at b/tests/pmd.at
> >> index 8254ac3b0f..54184d8c92 100644
> >> --- a/tests/pmd.at
> >> +++ b/tests/pmd.at
> >> @@ -1689,3 +1689,35 @@ 
> >> recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.
> >>
> >>  OVS_VSWITCHD_STOP
> >>  AT_CLEANUP
> >> +
> >> +AT_SETUP([PMD - offload work])
> >> +OVS_VSWITCHD_START([], [], [], [DUMMY_NUMA],
> >> +                   [-- set Open_vSwitch . other_config:hw-offload=true])
> >> +
> >> +AT_CHECK([ovs-appctl vlog/set dpif_offload_dummy:dbg])
> >> +AT_CHECK([ovs-vsctl add-port br0 p0 -- set Interface p0 type=dummy-pmd])
> >> +
> >> +CHECK_CPU_DISCOVERED()
> >> +CHECK_PMD_THREADS_CREATED()
> >> +
> >> +OVS_WAIT_UNTIL(
> >> +  [test $(ovs-appctl coverage/read-counter dummy_offload_do_work) -gt 0])
> >> +
> >> +AT_CHECK([ovs-appctl dpif-netdev/pmd-perf-show \
> >> +  | grep -Eq 'Offload cycles: +[[0-9]]+  \( *[[0-9.]]+ % of used 
> >> cycles\)'])
> >> +
> >> +OVS_VSWITCHD_STOP
> >> +
> >> +LOG="$(sed -n 's/.*\(pmd_thread_lifecycle.*\)/\1/p' ovs-vswitchd.log)"
> >> +CB=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*cb=\([[^,]]*\).*/\1/p')
> >> +CTX=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*ctx=\(.*\)$/\1/p')
> >> +
> >> +AT_CHECK([echo "$LOG" | sed -n '1p' | sed 's/(nil)/0x0/g'], [0], [dnl
> >> +pmd_thread_lifecycle; exit=false, core=0, numa=0, cb=0x0, ctx=0x0
> >> +])
> >> +AT_CHECK([echo "$LOG" | sed -n '2p' \
> >> +          | grep -q "exit=false, core=0, numa=0, cb=$CB, ctx=$CTX"])
> >> +AT_CHECK([echo "$LOG" | sed -n '$p' \
> >> +          | grep -q "exit=true, core=0, numa=0, cb=$CB, ctx=$CTX"])
> >> +
> >> +AT_CLEANUP
> >> diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt
> >> index c1f43e5afa..c9b758d63c 100644
> >> --- a/utilities/checkpatch_dict.txt
> >> +++ b/utilities/checkpatch_dict.txt
> >> @@ -36,6 +36,7 @@ cpu
> >>  cpus
> >>  cstime
> >>  csum
> >> +ctx
> >>  cutime
> >>  cvlan
> >>  datapath
> >> @@ -46,6 +47,7 @@ decap
> >>  decapsulation
> >>  defrag
> >>  defragment
> >> +deinitialization
> >>  deref
> >>  dereference
> >>  dest
> >> -- 
> >> 2.52.0
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to