On 14-Oct-20 7:41 PM, Ananyev, Konstantin wrote:
From: Liang Ma <liang.j...@intel.com>

Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.

This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).

This design is using PMD RX callbacks.

1. UMWAIT/UMONITOR:

    When a certain threshold of empty polls is reached, the core will go
    into a power optimized sleep while waiting on an address of next RX
    descriptor to be written to.

2. Pause instruction

    Instead of move the core into deeper C state, this method uses the
    pause instruction to avoid busy polling.

3. Frequency scaling
    Reuse existing DPDK power library to scale up/down core frequency
    depending on traffic volume.

Signed-off-by: Liang Ma <liang.j...@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
---

Notes:
     v6:
     - Added wakeup mechanism for UMWAIT
     - Removed memory allocation (everything is now allocated statically)
     - Fixed various typos and comments
     - Check for invalid queue ID
     - Moved release notes to this patch

     v5:
     - Make error checking more robust
       - Prevent initializing scaling if ACPI or PSTATE env wasn't set
       - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
     - Add some debug logging
     - Replace x86-specific code path to generic path using the intrinsic check



<snip>



I think you need to check state here, and _disable() have to set state with 
lock grabbed.
Otherwise this lock wouldn't protect you from race conditions.
As an example:

CP@T0:
rte_spinlock_lock(&queue_cfg->umwait_lock);
if (queue_cfg->wait_addr != NULL) //wait_addr == NULL, fallthrough
rte_spinlock_unlock(&queue_cfg->umwait_lock);

DP@T1:
rte_spinlock_lock(&queue_cfg->umwait_lock);
queue_cfg->wait_addr = target_addr;
monitor_sync(...);  // DP was put to sleep

CP@T2:
queue_cfg->cur_cb = NULL;
queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
ret = 0;

rte_power_pmd_mgmt_queue_disable() finished with success,
but DP core wasn't wokenup.

To be more specific:
clb_umwait(...) {
...
lock(&qcfg->lck);
if (qcfg->state == ENABLED)  {
qcfg->wake_addr = addr;
monitor_sync(addr, ...,&qcfg->lck);
}
unlock(&qcfg->lck);
...
}

_disable(...) {
...
lock(&qcfg->lck);
qcfg->state = DISABLED;
if (qcfg->wake_addr != NULL)
monitor_wakeup(qcfg->wake_addr);
unlock(&qcfg->lock);
...
}

True, didn't think of that. Will fix.

+
+if (!i.power_monitor) {
+RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
+ret = -ENOTSUP;
+goto end;
+}
+
+/* check if the device supports the necessary PMD API */
+if (rte_eth_get_wake_addr(port_id, queue_id,
+&dummy_addr, &dummy_expected,
+&dummy_mask, &dummy_sz) == -ENOTSUP) {
+RTE_LOG(DEBUG, POWER, "The device does not support 
rte_eth_rxq_ring_addr_get\n");
+ret = -ENOTSUP;
+goto end;
+}
+/* initialize UMWAIT spinlock */
+rte_spinlock_init(&queue_cfg->umwait_lock);

I think don't need to do that.
It supposed to be in valid state (otherwise you are probably in trouble anyway).

This is mostly for initialization, for when we first run the callback. I suppose we could do it in an RTE_INIT() function or just leave it be since the spinlocks are part of a statically allocated structure and will default to 0 anyway (although it wouldn't be proper usage of the API as that would be relying on implementation detail).


+
+/* initialize data before enabling the callback */
+queue_cfg->empty_poll_stats = 0;
+queue_cfg->cb_mode = mode;
+queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+
+queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+clb_umwait, NULL);

Would be a bit cleaner/nicer to move add_rx_callback out of switch() {}
As you have to do it always anyway.
Same thought for disable() and remove_rx_callback().

The functions are different for each, so we can't move them out of switch (unless you're suggesting to unify the callback to handle all three modes?).

--
Thanks,
Anatoly

Reply via email to