On Aug 29, 2016 8:07 AM, "J Freyensee" <james_p_freyen...@linux.intel.com> wrote: > > On Mon, 2016-08-29 at 02:25 -0700, Andy Lutomirski wrote: > > NVME devices can advertise multiple power states. These states can > > be either "operational" (the device is fully functional but possibly > > slow) or "non-operational" (the device is asleep until woken up). > > Some devices can automatically enter a non-operational state when > > idle for a specified amount of time and then automatically wake back > > up when needed. > > > > The hardware configuration is a table. For each state, an entry in > > the table indicates the next deeper non-operational state, if any, > > to autonomously transition to and the idle time required before > > transitioning. > > > > This patch teaches the driver to program APST so that each > > successive non-operational state will be entered after an idle time > > equal to 100% of the total latency (entry plus exit) associated with > > that state. A sysfs attribute 'apst_max_latency_ns' gives the > > maximum acceptable latency in ns; non-operational states with total > > latency greater than this value will not be used. As a special > > case, apst_max_latency_ns=0 will disable APST entirely. > > > > On hardware without APST support, apst_max_latency_ns will not be > > exposed in sysfs. > > > > In theory, the device can expose "default" APST table, but this > > doesn't seem to function correctly on my device (Samsung 950), nor > > does it seem particularly useful. There is also an optional > > mechanism by which a configuration can be "saved" so it will be > > automatically loaded on reset. This can be configured from > > userspace, but it doesn't seem useful to support in the driver. > > > > On my laptop, enabling APST seems to save nearly 1W. > > > > The hardware tables can be decoded in userspace with nvme-cli. > > 'nvme id-ctrl /dev/nvmeN' will show the power state table and > > 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST > > configuration. > > > > Signed-off-by: Andy Lutomirski <l...@kernel.org> > > --- > > drivers/nvme/host/core.c | 167 > > +++++++++++++++++++++++++++++++++++++++++++++++ > > drivers/nvme/host/nvme.h | 6 ++ > > include/linux/nvme.h | 6 ++ > > 3 files changed, 179 insertions(+) > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 3f7561ab54dc..042137ad2437 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -1223,6 +1223,98 @@ static void nvme_set_queue_limits(struct > > nvme_ctrl *ctrl, > > blk_queue_write_cache(q, vwc, vwc); > > } > > > > +static void nvme_configure_apst(struct nvme_ctrl *ctrl) > > +{ > > + /* > > + * APST (Autonomous Power State Transition) lets us program > > a > > + * table of power state transitions that the controller will > > + * perform automatically. We configure it with a simple > > + * heuristic: we are willing to spend at most 2% of the time > > + * transitioning between power states. Therefore, when > > running > > + * in any given state, we will enter the next lower-power > > + * non-operational state after waiting 100 * (enlat + exlat) > > + * microseconds, as long as that state's total latency is > > under > > + * the requested maximum latency. > > + * > > + * We will not autonomously enter any non-operational state > > for > > + * which the total latency exceeds > > apst_max_latency_ns. Users > > + * can set apst_max_latency_ns to zero to turn off APST. > > + */ > > + > > + unsigned apste; > > + struct nvme_feat_auto_pst *table; > > + int ret; > > + > > + if (!ctrl->apsta) > > + return; /* APST isn't supported. */ > > + > > + if (ctrl->npss > 31) { > > + dev_warn(ctrl->device, "NPSS is invalid; disabling > > APST\n"); > > Quick question. A little bit below in a later if() block, apste is set > to 0 to turn off APST, which is to be used later in a > nvme_set_features() call to actually turn it off. You wouldn't want to > also set apste to zero too and call a nvme_set_features() to "disable > APST"? > > I guess I'm a little confused on the error statement, "disabling APST", > when it doesn't seem like anything is being done to actually disable > APST, it's just more of an invalid state retrieved from the HW.
I guess that should be "not using APST" instead. > > > > + return; > > + } > > + > > + table = kzalloc(sizeof(*table), GFP_KERNEL); > > + if (!table) > > + return; > > + > > + if (ctrl->apst_max_latency_ns == 0) { > > + /* Turn off APST. */ > > + apste = 0; > > + } else { > > + __le64 target = cpu_to_le64(0); > > + int state; > > + > > + /* > > + * Walk through all states from lowest- to highest- > > power. > > + * According to the spec, lower-numbered states use > > more > > + * power. NPSS, despite the name, is the index of > > the > > + * lowest-power state, not the number of states. > > + */ > > + for (state = (int)ctrl->npss; state >= 0; state--) { > > + u64 total_latency_us, transition_ms; > > + > > + if (target) > > + table->entries[state] = target; > > + > > + /* > > + * Is this state a useful non-operational > > state for > > + * higher-power states to autonomously > > transition to? > > + */ > > + if (!(ctrl->psd[state].flags & 2)) > > + continue; /* It's an operational > > state. */ > > + > > + total_latency_us = > > + (u64)cpu_to_le32(ctrl- > > >psd[state].entry_lat) + > > + + cpu_to_le32(ctrl- > > >psd[state].exit_lat); > > + if (total_latency_us * 1000 > ctrl- > > >apst_max_latency_ns) > > + continue; > > + > > + /* > > + * This state is good. Use it as the APST > > idle > > + * target for higher power states. > > + */ > > + transition_ms = total_latency_us + 19; > > + do_div(transition_ms, 20); > > + if (transition_ms >= (1 << 24)) > > + transition_ms = (1 << 24); > > Is it possible to use a macro for this bit shift as its used more than > once? Sure, will do. > > > + > > + target = cpu_to_le64((state << 3) | > > + (transition_ms << 8)); > > + } > > > > snip... > . > . > . > > > + /* > > + * By default, allow up to 25ms of APST-induced > > latency. This will > > + * have no effect on non-APST supporting controllers (i.e. > > any > > + * controller with APSTA == 0). > > + */ > > + ctrl->apst_max_latency_ns = 25000000; > > Is it possible to make that a #define please? I'll make it a module parameter as Keith suggested. > > Nice stuff! > > > >