> Currently, there is a hard limitation on the PMD power management
> support that only allows it to support a single queue per lcore. This is
> not ideal as most DPDK use cases will poll multiple queues per core.
> 
> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> is very difficult to implement such support because callbacks are
> effectively stateless and have no visibility into what the other ethdev
> devices are doing.  This places limitations on what we can do within the
> framework of Rx callbacks, but the basics of this implementation are as
> follows:
> 
> - Replace per-queue structures with per-lcore ones, so that any device
>   polled from the same lcore can share data
> - Any queue that is going to be polled from a specific lcore has to be
>   added to the list of cores to poll, so that the callback is aware of
>   other queues being polled by the same lcore
> - Both the empty poll counter and the actual power saving mechanism is
>   shared between all queues polled on a particular lcore, and is only
>   activated when a special designated "power saving" queue is polled. To
>   put it another way, we have no idea which queue the user will poll in
>   what order, so we rely on them telling us that queue X is the last one
>   in the polling loop, so any power management should happen there.
> - A new API is added to mark a specific Rx queue as "power saving".

Honestly, I don't understand the logic behind that new function.
I understand that depending on HW we ca monitor either one or multiple queues.
That's ok, but why we now need to mark one queue as a 'very special' one?
Why can't rte_power_ethdev_pmgmt_queue_enable() just:
Check is number of monitored queues exceed HW/SW capabilities,
and if so then just return a failure.
Otherwise add queue to the list and treat them all equally, i.e:
go to power save mode when number of sequential empty polls on
all monitored queues will exceed EMPTYPOLL_MAX threshold?

>   Failing to call this API will result in no power management, however
>   when having only one queue per core it is obvious which queue is the
>   "power saving" one, so things will still work without this new API for
>   use cases that were previously working without it.
> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>   is incapable of monitoring more than one address.
> 
> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
> ---
>  lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
>  lib/power/rte_power_pmd_mgmt.h |  34 ++++
>  lib/power/version.map          |   3 +
>  3 files changed, 306 insertions(+), 66 deletions(-)
> 
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index 0707c60a4f..60dd21a19c 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
>       PMD_MGMT_ENABLED
>  };
> 
> -struct pmd_queue_cfg {
> +struct queue {
> +     uint16_t portid;
> +     uint16_t qid;
> +};

Just a thought: if that would help somehow, it can be changed to:
union queue {
        uint32_t raw;
        struct { uint16_t portid, qid;
        }; 
};

That way in queue find/cmp functions below you can operate with single raw 
32-bt values.
Probably not that important, as all these functions are on slow path, but might 
look nicer.

> +struct pmd_core_cfg {
> +     struct queue queues[RTE_MAX_ETHPORTS];

If we'll have ability to monitor multiple queues per lcore, would it be always 
enough?
>From other side, it is updated on control path only.
Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?  

> +     /**< Which port-queue pairs are associated with this lcore? */
> +     struct queue power_save_queue;
> +     /**< When polling multiple queues, all but this one will be ignored */
> +     bool power_save_queue_set;
> +     /**< When polling multiple queues, power save queue must be set */
> +     size_t n_queues;
> +     /**< How many queues are in the list? */
>       volatile enum pmd_mgmt_state pwr_mgmt_state;
>       /**< State of power management for this queue */
>       enum rte_power_pmd_mgmt_type cb_mode;
> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
>       uint64_t empty_poll_stats;
>       /**< Number of empty polls */
>  } __rte_cache_aligned;
> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
> 
> -static struct pmd_queue_cfg 
> port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> +static inline bool
> +queue_equal(const struct queue *l, const struct queue *r)
> +{
> +     return l->portid == r->portid && l->qid == r->qid;
> +}
> +
> +static inline void
> +queue_copy(struct queue *dst, const struct queue *src)
> +{
> +     dst->portid = src->portid;
> +     dst->qid = src->qid;
> +}
> +
> +static inline bool
> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {

Here and in other places - any reason why standard DPDK coding style is not 
used?

> +     const struct queue *pwrsave = &cfg->power_save_queue;
> +
> +     /* if there's only single queue, no need to check anything */
> +     if (cfg->n_queues == 1)
> +             return true;
> +     return cfg->power_save_queue_set && queue_equal(q, pwrsave);
> +}
> +

Reply via email to