> Currently, there is a hard limitation on the PMD power management > support that only allows it to support a single queue per lcore. This is > not ideal as most DPDK use cases will poll multiple queues per core. > > The PMD power management mechanism relies on ethdev Rx callbacks, so it > is very difficult to implement such support because callbacks are > effectively stateless and have no visibility into what the other ethdev > devices are doing. This places limitations on what we can do within the > framework of Rx callbacks, but the basics of this implementation are as > follows: > > - Replace per-queue structures with per-lcore ones, so that any device > polled from the same lcore can share data > - Any queue that is going to be polled from a specific lcore has to be > added to the list of cores to poll, so that the callback is aware of > other queues being polled by the same lcore > - Both the empty poll counter and the actual power saving mechanism is > shared between all queues polled on a particular lcore, and is only > activated when a special designated "power saving" queue is polled. To > put it another way, we have no idea which queue the user will poll in > what order, so we rely on them telling us that queue X is the last one > in the polling loop, so any power management should happen there. > - A new API is added to mark a specific Rx queue as "power saving".
Honestly, I don't understand the logic behind that new function. I understand that depending on HW we ca monitor either one or multiple queues. That's ok, but why we now need to mark one queue as a 'very special' one? Why can't rte_power_ethdev_pmgmt_queue_enable() just: Check is number of monitored queues exceed HW/SW capabilities, and if so then just return a failure. Otherwise add queue to the list and treat them all equally, i.e: go to power save mode when number of sequential empty polls on all monitored queues will exceed EMPTYPOLL_MAX threshold? > Failing to call this API will result in no power management, however > when having only one queue per core it is obvious which queue is the > "power saving" one, so things will still work without this new API for > use cases that were previously working without it. > - The limitation on UMWAIT-based polling is not removed because UMWAIT > is incapable of monitoring more than one address. > > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > --- > lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++------- > lib/power/rte_power_pmd_mgmt.h | 34 ++++ > lib/power/version.map | 3 + > 3 files changed, 306 insertions(+), 66 deletions(-) > > diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c > index 0707c60a4f..60dd21a19c 100644 > --- a/lib/power/rte_power_pmd_mgmt.c > +++ b/lib/power/rte_power_pmd_mgmt.c > @@ -33,7 +33,19 @@ enum pmd_mgmt_state { > PMD_MGMT_ENABLED > }; > > -struct pmd_queue_cfg { > +struct queue { > + uint16_t portid; > + uint16_t qid; > +}; Just a thought: if that would help somehow, it can be changed to: union queue { uint32_t raw; struct { uint16_t portid, qid; }; }; That way in queue find/cmp functions below you can operate with single raw 32-bt values. Probably not that important, as all these functions are on slow path, but might look nicer. > +struct pmd_core_cfg { > + struct queue queues[RTE_MAX_ETHPORTS]; If we'll have ability to monitor multiple queues per lcore, would it be always enough? >From other side, it is updated on control path only. Wouldn't normal list with malloc(/rte_malloc) would be more suitable here? > + /**< Which port-queue pairs are associated with this lcore? */ > + struct queue power_save_queue; > + /**< When polling multiple queues, all but this one will be ignored */ > + bool power_save_queue_set; > + /**< When polling multiple queues, power save queue must be set */ > + size_t n_queues; > + /**< How many queues are in the list? */ > volatile enum pmd_mgmt_state pwr_mgmt_state; > /**< State of power management for this queue */ > enum rte_power_pmd_mgmt_type cb_mode; > @@ -43,8 +55,97 @@ struct pmd_queue_cfg { > uint64_t empty_poll_stats; > /**< Number of empty polls */ > } __rte_cache_aligned; > +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE]; > > -static struct pmd_queue_cfg > port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; > +static inline bool > +queue_equal(const struct queue *l, const struct queue *r) > +{ > + return l->portid == r->portid && l->qid == r->qid; > +} > + > +static inline void > +queue_copy(struct queue *dst, const struct queue *src) > +{ > + dst->portid = src->portid; > + dst->qid = src->qid; > +} > + > +static inline bool > +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) { Here and in other places - any reason why standard DPDK coding style is not used? > + const struct queue *pwrsave = &cfg->power_save_queue; > + > + /* if there's only single queue, no need to check anything */ > + if (cfg->n_queues == 1) > + return true; > + return cfg->power_save_queue_set && queue_equal(q, pwrsave); > +} > +