> On 14-Oct-20 7:41 PM, Ananyev, Konstantin wrote: > >> From: Liang Ma <liang.j...@intel.com> > >> > >> Add a simple on/off switch that will enable saving power when no > >> packets are arriving. It is based on counting the number of empty > >> polls and, when the number reaches a certain threshold, entering an > >> architecture-defined optimized power state that will either wait > >> until a TSC timestamp expires, or when packets arrive. > >> > >> This API mandates a core-to-single-queue mapping (that is, multiple > >> queued per device are supported, but they have to be polled on different > >> cores). > >> > >> This design is using PMD RX callbacks. > >> > >> 1. UMWAIT/UMONITOR: > >> > >> When a certain threshold of empty polls is reached, the core will go > >> into a power optimized sleep while waiting on an address of next RX > >> descriptor to be written to. > >> > >> 2. Pause instruction > >> > >> Instead of move the core into deeper C state, this method uses the > >> pause instruction to avoid busy polling. > >> > >> 3. Frequency scaling > >> Reuse existing DPDK power library to scale up/down core frequency > >> depending on traffic volume. > >> > >> Signed-off-by: Liang Ma <liang.j...@intel.com> > >> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > >> --- > >> > >> Notes: > >> v6: > >> - Added wakeup mechanism for UMWAIT > >> - Removed memory allocation (everything is now allocated statically) > >> - Fixed various typos and comments > >> - Check for invalid queue ID > >> - Moved release notes to this patch > >> > >> v5: > >> - Make error checking more robust > >> - Prevent initializing scaling if ACPI or PSTATE env wasn't set > >> - Prevent initializing UMWAIT path if PMD doesn't support > >> get_wake_addr > >> - Add some debug logging > >> - Replace x86-specific code path to generic path using the intrinsic > >> check > >> > > > <snip> > > > > > > > I think you need to check state here, and _disable() have to set state with > > lock grabbed. > > Otherwise this lock wouldn't protect you from race conditions. > > As an example: > > > > CP@T0: > > rte_spinlock_lock(&queue_cfg->umwait_lock); > > if (queue_cfg->wait_addr != NULL) //wait_addr == NULL, fallthrough > > rte_spinlock_unlock(&queue_cfg->umwait_lock); > > > > DP@T1: > > rte_spinlock_lock(&queue_cfg->umwait_lock); > > queue_cfg->wait_addr = target_addr; > > monitor_sync(...); // DP was put to sleep > > > > CP@T2: > > queue_cfg->cur_cb = NULL; > > queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; > > ret = 0; > > > > rte_power_pmd_mgmt_queue_disable() finished with success, > > but DP core wasn't wokenup. > > > > To be more specific: > > clb_umwait(...) { > > ... > > lock(&qcfg->lck); > > if (qcfg->state == ENABLED) { > > qcfg->wake_addr = addr; > > monitor_sync(addr, ...,&qcfg->lck); > > } > > unlock(&qcfg->lck); > > ... > > } > > > > _disable(...) { > > ... > > lock(&qcfg->lck); > > qcfg->state = DISABLED; > > if (qcfg->wake_addr != NULL) > > monitor_wakeup(qcfg->wake_addr); > > unlock(&qcfg->lock); > > ... > > } > > True, didn't think of that. Will fix. > > >> + > >> +if (!i.power_monitor) { > >> +RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); > >> +ret = -ENOTSUP; > >> +goto end; > >> +} > >> + > >> +/* check if the device supports the necessary PMD API */ > >> +if (rte_eth_get_wake_addr(port_id, queue_id, > >> +&dummy_addr, &dummy_expected, > >> +&dummy_mask, &dummy_sz) == -ENOTSUP) { > >> +RTE_LOG(DEBUG, POWER, "The device does not support > >> rte_eth_rxq_ring_addr_get\n"); > >> +ret = -ENOTSUP; > >> +goto end; > >> +} > >> +/* initialize UMWAIT spinlock */ > >> +rte_spinlock_init(&queue_cfg->umwait_lock); > > > > I think don't need to do that. > > It supposed to be in valid state (otherwise you are probably in trouble > > anyway). > > This is mostly for initialization, for when we first run the callback. I > suppose we could do it in an RTE_INIT() function or just leave it be > since the spinlocks are part of a statically allocated structure and > will default to 0 anyway
Yes static initialization should be sufficient here. (although it wouldn't be proper usage of the > API as that would be relying on implementation detail). What I am trying to say - at that moment spinlock value should be zero. If it is not, then your DP callback is still running and holding the lock. I understand that it should be *very* rare situation, but it seems strange to me to silently over-write the locked value from other thread. It shouldn't cause any major trouble, but still... > > > > >> + > >> +/* initialize data before enabling the callback */ > >> +queue_cfg->empty_poll_stats = 0; > >> +queue_cfg->cb_mode = mode; > >> +queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; > >> + > >> +queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, > >> +clb_umwait, NULL); > > > > Would be a bit cleaner/nicer to move add_rx_callback out of switch() {} > > As you have to do it always anyway. > > Same thought for disable() and remove_rx_callback(). > > The functions are different for each, so we can't move them out of > switch (unless you're suggesting to unify the callback to handle all > three modes?). > > -- > Thanks, > Anatoly