Re: [PATCH V1 3/3] x86, bm: Add documentation on Intel Branch Monitoring

2017-11-11 Thread Randy Dunlap
On 11/11/17 13:20, Megha Dey wrote:
> This patch adds the Documentation/x86/intel_bm.txt file with some
> information about Intel Branch monitoring.

> +4. Window count select: /sys/devices/intel-bm/window_cnt_sel
> +   Possible values are:
> +   ‘00 = instructions retired
> +   ‘01 = branches retired
> +   ‘10 = returned instructions retired
> +   ‘11 = indirect branch instructions retired
> +   By default, it has a value of 0.

Hi,

Is the 'xx binary notation?  If so, it would be nice to say so..
or whatever it is.

thanks,
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection

2017-11-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Add helper routines to find and return a suitable subsystem callback
during the "noirq" phases of system suspend/resume (or analogous)
transitions as well as during the "late" phase of system suspend and
the "early" phase of system resume (or analogous) transitions.

The helpers will be called from additional sites going forward.

Signed-off-by: Rafael J. Wysocki 
---

v2 -> v3: No changes.

---
 drivers/base/power/main.c |  196 +++---
 1 file changed, 136 insertions(+), 60 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -525,6 +525,14 @@ static void dpm_watchdog_clear(struct dp
 #define dpm_watchdog_clear(x)
 #endif
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+pm_message_t state,
+const char **info_p);
+
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+   pm_message_t state,
+   const char **info_p);
+
 /*- Resume routines -*/
 
 /**
@@ -539,6 +547,35 @@ bool dev_pm_may_skip_resume(struct devic
return !dev->power.must_resume && pm_transition.event != 
PM_EVENT_RESTORE;
 }
 
+static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
+   pm_message_t state,
+   const char **info_p)
+{
+   pm_callback_t callback;
+   const char *info;
+
+   if (dev->pm_domain) {
+   info = "noirq power domain ";
+   callback = pm_noirq_op(>pm_domain->ops, state);
+   } else if (dev->type && dev->type->pm) {
+   info = "noirq type ";
+   callback = pm_noirq_op(dev->type->pm, state);
+   } else if (dev->class && dev->class->pm) {
+   info = "noirq class ";
+   callback = pm_noirq_op(dev->class->pm, state);
+   } else if (dev->bus && dev->bus->pm) {
+   info = "noirq bus ";
+   callback = pm_noirq_op(dev->bus->pm, state);
+   } else {
+   return NULL;
+   }
+
+   if (info_p)
+   *info_p = info;
+
+   return callback;
+}
+
 /**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
@@ -550,8 +587,8 @@ bool dev_pm_may_skip_resume(struct devic
  */
 static int device_resume_noirq(struct device *dev, pm_message_t state, bool 
async)
 {
-   pm_callback_t callback = NULL;
-   const char *info = NULL;
+   pm_callback_t callback;
+   const char *info;
int error = 0;
 
TRACE_DEVICE(dev);
@@ -565,19 +602,7 @@ static int device_resume_noirq(struct de
 
dpm_wait_for_superior(dev, async);
 
-   if (dev->pm_domain) {
-   info = "noirq power domain ";
-   callback = pm_noirq_op(>pm_domain->ops, state);
-   } else if (dev->type && dev->type->pm) {
-   info = "noirq type ";
-   callback = pm_noirq_op(dev->type->pm, state);
-   } else if (dev->class && dev->class->pm) {
-   info = "noirq class ";
-   callback = pm_noirq_op(dev->class->pm, state);
-   } else if (dev->bus && dev->bus->pm) {
-   info = "noirq bus ";
-   callback = pm_noirq_op(dev->bus->pm, state);
-   }
+   callback = dpm_subsys_resume_noirq_cb(dev, state, );
 
if (!callback && dev->driver && dev->driver->pm) {
info = "noirq driver ";
@@ -686,6 +711,35 @@ void dpm_resume_noirq(pm_message_t state
dpm_noirq_end();
 }
 
+static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev,
+   pm_message_t state,
+   const char **info_p)
+{
+   pm_callback_t callback;
+   const char *info;
+
+   if (dev->pm_domain) {
+   info = "early power domain ";
+   callback = pm_late_early_op(>pm_domain->ops, state);
+   } else if (dev->type && dev->type->pm) {
+   info = "early type ";
+   callback = pm_late_early_op(dev->type->pm, state);
+   } else if (dev->class && dev->class->pm) {
+   info = "early class ";
+   callback = pm_late_early_op(dev->class->pm, state);
+   } else if (dev->bus && dev->bus->pm) {
+   info = "early bus ";
+   callback = pm_late_early_op(dev->bus->pm, state);
+   } else {
+   return NULL;
+   }
+
+   if (info_p)
+   *info_p = info;
+
+   return callback;
+}
+
 /**
  * 

[PATCH v3 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain

2017-11-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Add support for DPM_FLAG_LEAVE_SUSPENDED to the ACPI PM domain by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from acpi_subsys_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki 
Acked-by: Greg Kroah-Hartman 
---

v2 -> v3: No changes.

---
 drivers/acpi/device_pm.c |   27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -987,7 +987,7 @@ void acpi_subsys_complete(struct device
 * the sleep state it is going out of and it has never been resumed till
 * now, resume it in case the firmware powered it up.
 */
-   if (dev->power.direct_complete && pm_resume_via_firmware())
+   if (pm_runtime_suspended(dev) && pm_resume_via_firmware())
pm_request_resume(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_complete);
@@ -1036,10 +1036,28 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_la
  */
 int acpi_subsys_suspend_noirq(struct device *dev)
 {
-   if (dev_pm_smart_suspend_and_suspended(dev))
+   int ret;
+
+   if (dev_pm_smart_suspend_and_suspended(dev)) {
+   dev->power.may_skip_resume = true;
return 0;
+   }
 
-   return pm_generic_suspend_noirq(dev);
+   ret = pm_generic_suspend_noirq(dev);
+   if (ret)
+   return ret;
+
+   /*
+* If the target system sleep state is suspend-to-idle, it is sufficient
+* to check whether or not the device's wakeup settings are good for
+* runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+* acpi_subsys_complete() to take care of fixing up the device's state
+* anyway, if need be.
+*/
+   dev->power.may_skip_resume = device_may_wakeup(dev) ||
+   !device_can_wakeup(dev);
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq);
 
@@ -1049,6 +1067,9 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_no
  */
 int acpi_subsys_resume_noirq(struct device *dev)
 {
+   if (dev_pm_may_skip_resume(dev))
+   return 0;
+
/*
 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 * during system suspend, so update their runtime PM status to "active"


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization

2017-11-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Make the PM core avoid invoking the "late" and "noirq" system-wide
suspend (or analogous) callbacks for devices that are in runtime
suspend during the corresponding phases of system-wide suspend
(or analogous) transitions.

The underlying observation is that runtime PM is disabled for
devices during those system-wide suspend phases, so their runtime
PM status should not change going forward and if it has not changed
so far, their state should be compatible with the target system
sleep state.

This change really makes it possible for, say, platform device
drivers to re-use runtime PM suspend and resume callbacks by
pointing ->suspend_late and ->resume_early, respectively (and
possibly the analogous hibernation-related callback pointers too),
to them without adding any extra "is the device already suspended?"
type of checks to the callback routines, as long as they will be
invoked directly by the core.

Signed-off-by: Rafael J. Wysocki 
---

v2 -> v3: No changes.

---
 Documentation/driver-api/pm/devices.rst |   18 +
 drivers/base/power/main.c   |   62 
 2 files changed, 66 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -536,6 +536,24 @@ static pm_callback_t dpm_subsys_suspend_
 /*- Resume routines -*/
 
 /**
+ * suspend_event - Return a "suspend" message for given "resume" one.
+ * @resume_msg: PM message representing a system-wide resume transition.
+ */
+static pm_message_t suspend_event(pm_message_t resume_msg)
+{
+   switch (resume_msg.event) {
+   case PM_EVENT_RESUME:
+   return PMSG_SUSPEND;
+   case PM_EVENT_THAW:
+   case PM_EVENT_RESTORE:
+   return PMSG_FREEZE;
+   case PM_EVENT_RECOVER:
+   return PMSG_HIBERNATE;
+   }
+   return PMSG_ON;
+}
+
+/**
  * dev_pm_may_skip_resume - System-wide device resume optimization check.
  * @dev: Target device.
  *
@@ -609,6 +627,25 @@ static int device_resume_noirq(struct de
if (callback)
goto Run;
 
+   if (dev_pm_smart_suspend_and_suspended(dev)) {
+   pm_message_t suspend_msg = suspend_event(state);
+
+   /*
+* If "freeze" callbacks have been skipped during a transition
+* related to hibernation, the subsequent "thaw" callbacks must
+* be skipped too or bad things may happen.  Otherwise, if the
+* device is to be resumed, its runtime PM status must be
+* changed to reflect the new configuration.
+*/
+   if (!dpm_subsys_suspend_late_cb(dev, suspend_msg, NULL) &&
+   !dpm_subsys_suspend_noirq_cb(dev, suspend_msg, NULL)) {
+   if (state.event == PM_EVENT_THAW)
+   skip_resume = true;
+   else if (!skip_resume)
+   pm_runtime_set_active(dev);
+   }
+   }
+
if (skip_resume)
goto Skip;
 
@@ -1228,7 +1265,10 @@ static int __device_suspend_noirq(struct
if (callback)
goto Run;
 
-   direct_cb = true;
+   direct_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
+
+   if (dev_pm_smart_suspend_and_suspended(dev) && direct_cb)
+   goto Skip;
 
if (dev->driver && dev->driver->pm) {
info = "noirq driver ";
@@ -1242,6 +1282,7 @@ Run:
goto Complete;
}
 
+Skip:
dev->power.is_noirq_suspended = true;
 
if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
@@ -1249,7 +1290,6 @@ Run:
bool skip_resume;
 
if (direct_cb &&
-   !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
!dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
!dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
/*
@@ -1446,17 +1486,27 @@ static int __device_suspend_late(struct
goto Complete;
 
callback = dpm_subsys_suspend_late_cb(dev, state, );
+   if (callback)
+   goto Run;
 
-   if (!callback && dev->driver && dev->driver->pm) {
+   if (dev_pm_smart_suspend_and_suspended(dev) &&
+   !dpm_subsys_suspend_noirq_cb(dev, state, NULL))
+   goto Skip;
+
+   if (dev->driver && dev->driver->pm) {
info = "late driver ";
callback = pm_late_early_op(dev->driver->pm, state);
}
 
+Run:
error = dpm_run_callback(callback, dev, state, info);
-   if (!error)
-   dev->power.is_late_suspended = true;
-   else
+   if (error) {

[PATCH v3 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED

2017-11-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
devices whose "noirq", "late" and "early" driver callbacks are
invoked directly by it.

Namely, make it skip all of the system-wide resume callbacks for
such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
runtime suspend during the "noirq" phase of system-wide suspend
(or analogous) transitions or the system transition under way is
a proper suspend (rather than anything related to hibernation) and
the device's wakeup settings are compatible with runtime PM (that
is, the device cannot generate wakeup signals at all or it is
allowed to wake up the system from sleep).

Signed-off-by: Rafael J. Wysocki 
---

v2 -> v3: Rebase on the v3 of patch [1/6].

---
 Documentation/driver-api/pm/devices.rst |9 ++
 drivers/base/power/main.c   |   47 
 2 files changed, 51 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -589,6 +589,7 @@ static int device_resume_noirq(struct de
 {
pm_callback_t callback;
const char *info;
+   bool skip_resume;
int error = 0;
 
TRACE_DEVICE(dev);
@@ -602,23 +603,33 @@ static int device_resume_noirq(struct de
 
dpm_wait_for_superior(dev, async);
 
+   skip_resume = dev_pm_may_skip_resume(dev);
+
callback = dpm_subsys_resume_noirq_cb(dev, state, );
+   if (callback)
+   goto Run;
+
+   if (skip_resume)
+   goto Skip;
 
if (!callback && dev->driver && dev->driver->pm) {
info = "noirq driver ";
callback = pm_noirq_op(dev->driver->pm, state);
}
 
+Run:
error = dpm_run_callback(callback, dev, state, info);
+
+Skip:
dev->power.is_noirq_suspended = false;
 
-   if (dev_pm_may_skip_resume(dev)) {
+   if (skip_resume) {
pm_runtime_set_suspended(dev);
dev->power.is_late_suspended = false;
dev->power.is_suspended = false;
}
 
- Out:
+Out:
complete_all(>power.completion);
TRACE_RESUME(error);
return error;
@@ -1194,6 +1205,7 @@ static int __device_suspend_noirq(struct
 {
pm_callback_t callback;
const char *info;
+   bool direct_cb = false;
int error = 0;
 
TRACE_DEVICE(dev);
@@ -1213,12 +1225,17 @@ static int __device_suspend_noirq(struct
goto Complete;
 
callback = dpm_subsys_suspend_noirq_cb(dev, state, );
+   if (callback)
+   goto Run;
 
-   if (!callback && dev->driver && dev->driver->pm) {
+   direct_cb = true;
+
+   if (dev->driver && dev->driver->pm) {
info = "noirq driver ";
callback = pm_noirq_op(dev->driver->pm, state);
}
 
+Run:
error = dpm_run_callback(callback, dev, state, info);
if (error) {
async_error = error;
@@ -1228,13 +1245,33 @@ static int __device_suspend_noirq(struct
dev->power.is_noirq_suspended = true;
 
if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+   pm_message_t resume_msg = resume_event(state);
+   bool skip_resume;
+
+   if (direct_cb &&
+   !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
+   !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
+   !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
+   /*
+* If all of the device driver's "noirq", "late" and
+* "early" callbacks are invoked directly by the core,
+* the decision to allow the device to stay in suspend
+* can be based on its current runtime PM status and its
+* wakeup settings.
+*/
+   skip_resume = pm_runtime_status_suspended(dev) ||
+   (resume_msg.event == PM_EVENT_RESUME &&
+(!device_can_wakeup(dev) ||
+ device_may_wakeup(dev)));
+   } else {
+   skip_resume = dev->power.may_skip_resume;
+   }
/*
 * The only safe strategy here is to require that if the device
 * may not be left in suspend, resume callbacks must be invoked
 * for it.
 */
-   dev->power.must_resume = dev->power.must_resume ||
-   !dev->power.may_skip_resume ||
+   dev->power.must_resume = dev->power.must_resume || !skip_resume 
||
atomic_read(>power.usage_count);

[PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag

2017-11-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit.  That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki 
Acked-by: Greg Kroah-Hartman 
---

v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
  __device_suspend_noirq().

---
 Documentation/driver-api/pm/devices.rst |   24 ++-
 drivers/base/power/main.c   |   66 +---
 drivers/base/power/runtime.c|9 ++--
 include/linux/pm.h  |   14 +-
 include/linux/pm_runtime.h  |9 ++--
 5 files changed, 104 insertions(+), 18 deletions(-)

Index: linux-pm/include/linux/pm.h
===
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -559,6 +559,7 @@ struct pm_subsys_data {
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
+ * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 
from
@@ -572,10 +573,14 @@ struct pm_subsys_data {
  * necessary from the driver's perspective.  It also may cause them to skip
  * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
  * the driver if they decide to leave the device in runtime suspend.
+ *
+ * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
+ * driver prefers the device to be left in runtime suspend after system resume.
  */
-#define DPM_FLAG_NEVER_SKIPBIT(0)
-#define DPM_FLAG_SMART_PREPARE BIT(1)
-#define DPM_FLAG_SMART_SUSPEND BIT(2)
+#define DPM_FLAG_NEVER_SKIPBIT(0)
+#define DPM_FLAG_SMART_PREPARE BIT(1)
+#define DPM_FLAG_SMART_SUSPEND BIT(2)
+#define DPM_FLAG_LEAVE_SUSPENDED   BIT(3)
 
 struct dev_pm_info {
pm_message_tpower_state;
@@ -597,6 +602,8 @@ struct dev_pm_info {
boolwakeup_path:1;
boolsyscore:1;
boolno_pm_callbacks:1;  /* Owned by the PM core 
*/
+   unsigned intmust_resume:1;  /* Owned by the PM core */
+   unsigned intmay_skip_resume:1;  /* Set by subsystems */
 #else
unsigned intshould_wakeup:1;
 #endif
@@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
 extern int pm_generic_poweroff(struct device *dev);
 extern void pm_generic_complete(struct device *dev);
 
+extern bool dev_pm_may_skip_resume(struct device *dev);
 extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
 
 #else /* !CONFIG_PM_SLEEP */
Index: linux-pm/drivers/base/power/main.c
===
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
 /*- Resume routines -*/
 
 /**
+ * dev_pm_may_skip_resume - System-wide device resume optimization check.
+ * @dev: Target device.
+ *
+ * Checks whether or not the device may be left in suspend after a system-wide
+ * transition to the working state.
+ */
+bool dev_pm_may_skip_resume(struct device *dev)
+{
+   return !dev->power.must_resume && pm_transition.event != 
PM_EVENT_RESTORE;
+}
+
+/**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being 

[PATCH v3 2/6] PCI / PM: Support for LEAVE_SUSPENDED driver flag

2017-11-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from pci_pm_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki 
Acked-by: Greg Kroah-Hartman 
Acked-by: Bjorn Helgaas 
---

v2 -> v3: Add the Acked-by from Bjorn, no changes in the patch.

---
 Documentation/power/pci.txt |   11 +++
 drivers/pci/pci-driver.c|   19 +--
 2 files changed, 28 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -699,7 +699,7 @@ static void pci_pm_complete(struct devic
pm_generic_complete(dev);
 
/* Resume device if platform firmware has put it in reset-power-on */
-   if (dev->power.direct_complete && pm_resume_via_firmware()) {
+   if (pm_runtime_suspended(dev) && pm_resume_via_firmware()) {
pci_power_t pre_sleep_state = pci_dev->current_state;
 
pci_update_current_state(pci_dev, pci_dev->current_state);
@@ -783,8 +783,10 @@ static int pci_pm_suspend_noirq(struct d
struct pci_dev *pci_dev = to_pci_dev(dev);
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
-   if (dev_pm_smart_suspend_and_suspended(dev))
+   if (dev_pm_smart_suspend_and_suspended(dev)) {
+   dev->power.may_skip_resume = true;
return 0;
+   }
 
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
@@ -838,6 +840,16 @@ static int pci_pm_suspend_noirq(struct d
 Fixup:
pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 
+   /*
+* If the target system sleep state is suspend-to-idle, it is sufficient
+* to check whether or not the device's wakeup settings are good for
+* runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+* pci_pm_complete() to take care of fixing up the device's state
+* anyway, if need be.
+*/
+   dev->power.may_skip_resume = device_may_wakeup(dev) ||
+   !device_can_wakeup(dev);
+
return 0;
 }
 
@@ -847,6 +859,9 @@ static int pci_pm_resume_noirq(struct de
struct device_driver *drv = dev->driver;
int error = 0;
 
+   if (dev_pm_may_skip_resume(dev))
+   return 0;
+
/*
 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 * during system suspend, so update their runtime PM status to "active"
Index: linux-pm/Documentation/power/pci.txt
===
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -994,6 +994,17 @@ into D0 going forward), but if it is in
 the function will set the power.direct_complete flag for it (to make the PM 
core
 skip the subsequent "thaw" callbacks for it) and return.
 
+Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
+device to be left in suspend after system-wide transitions to the working 
state.
+This flag is checked by the PM core, but the PCI bus type informs the PM core
+which devices may be left in suspend from its perspective (that happens during
+the "noirq" phase of system-wide suspend and analogous transitions) and next it
+uses the dev_pm_may_skip_resume() helper to decide whether or not to return 
from
+pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
+callbacks for the device during the transition under way and will set its
+runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
+it.
+
 3.2. Device Runtime Power Management
 
 In addition to providing device power management callbacks PCI device drivers

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2)

2017-11-11 Thread Rafael J. Wysocki
Hi All,

The following still applies:

On Wednesday, November 8, 2017 1:41:35 AM CET Rafael J. Wysocki wrote:
>
> This is a follow-up for the first part of the PM driver flags series
> sent previously some time ago with an intro as follows:
> 
> On Saturday, October 28, 2017 12:11:55 AM CET Rafael J. Wysocki wrote:
> > The following part of the original cover letter still applies:
> > 
> > On Monday, October 16, 2017 3:12:35 AM CEST Rafael J. Wysocki wrote:
> > > 
> > > This work was triggered by attempts to fix and optimize PM in the
> > > i2c-designware-platdev driver that ended up with adding a couple of
> > > flags to the driver's internal data structures for the tracking of
> > > device state (https://marc.info/?l=linux-acpi=150629646805636=2).
> > > That approach is sort of suboptimal, though, because other drivers will
> > > probably want to do similar things and if all of them need to use internal
> > > flags for that, quite a bit of code duplication may ensue at least.
> > > 
> > > That can be avoided in a couple of ways and one of them is to provide a 
> > > means
> > > for drivers to tell the core what to do and to make the core take care of 
> > > it
> > > if told to do so.  Hence, the idea to use driver flags for system-wide PM
> > > that was briefly discussed during the LPC in LA last month.
> > 
> > [...]
> > 
> > > What can work (and this is the only strategy that can work AFAICS) is to
> > > point different callback pointers *in* *a* *driver* to the same routine
> > > if the driver wants to reuse that code.  That actually will work for PCI
> > > and USB drivers today, at least most of the time, but unfortunately there
> > > are problems with it for, say, platform devices.
> > > 
> > > The first problem is the requirement to track the status of the device
> > > (suspended vs not suspended) in the callbacks, because the system-wide PM
> > > code in the PM core doesn't do that.  The runtime PM framework does it, so
> > > this means adding some extra code which isn't necessary for runtime PM to
> > > the callback routines and that is not particularly nice.
> > > 
> > > The second problem is that, if the driver wants to do anything in its
> > > ->suspend callback, it generally has to prevent runtime suspend of the
> > > device from taking place in parallel with that, which is quite cumbersome.
> > > Usually, that is taken care of by resuming the device from runtime suspend
> > > upfront, but generally doing that is wasteful (there may be no real need 
> > > to
> > > resume the device except for the fact that the code is designed this way).
> > > 
> > > On top of the above, there are optimizations to be made, like leaving 
> > > certain
> > > devices in suspend after system resume to avoid wasting time on waiting 
> > > for
> > > them to resume before user space can run again and similar.
> > > 
> > > This patch series focuses on addressing those problems so as to make it
> > > easier to reuse callback routines by pointing different callback pointers
> > > to them in device drivers.  The flags introduced here are to instruct the
> > > PM core and middle layers (whatever they are) on how the driver wants the
> > > device to be handled and then the driver has to provide callbacks to match
> > > these instructions and the rest should be taken care of by the code above 
> > > it.
> > > 
> > > The flags are introduced one by one to avoid making too many changes in
> > > one go and to allow things to be explained better (hopefully).  They 
> > > mostly
> > > are mutually independent with some clearly documented exceptions.
> > 
> > but I had to rework the core patches to address the problem pointed with the
> > generic power domains (genpd) framework pointed out by Ulf.
> > 
> > Namely, genpd expects its "noirq" callbacks to be invoked for devices in
> > runtime suspend too and it has valid reasons for that, so its "noirq"
> > callbacks can never be skipped, even for devices with the SMART_SUSPEND
> > flag set.  For this reason, the logic related to DPM_FLAG_SMART_SUSPEND
> > had to be moved from the core to the PCI bus type and the ACPI PM domain
> > which are mostly affected by it anyway.  The code after the changes looks
> > more straightforward to me, but it generally is more code and some patterns
> > had to be repeated in a few places.
> 
> I promised to send the rest of the series then:
> 
> > I will send the core patches for the remaining two flags introduced by the
> > original series separately and the intel-lpss and i2c-designware ones will
> > be posted when the core patches have been reviewed and agreed on.
> 
> and here it goes.
> 
> It actually only adds support for one additional flag, namely for
> DPM_FLAG_LEAVE_SUSPENDED, to the PM core (basic bits), PCI bus type and the
> ACPI PM domain.
> 
> That part of the series (patches [1-3/6]) is rather straightforward and, as 
> PCI
> and the ACPI PM domain are concerned, it should be functionally equivalent to
> the previous version of the 

[PATCH V1 1/3] x86/cpu/intel: Add Cannonlake to Intel family

2017-11-11 Thread Megha Dey
Add CPUID of Cannonlake (CNL) processors to Intel family list.

Signed-off-by: Megha Dey 
---
 arch/x86/include/asm/intel-family.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/intel-family.h 
b/arch/x86/include/asm/intel-family.h
index 35a6bc4..056bd41 100644
--- a/arch/x86/include/asm/intel-family.h
+++ b/arch/x86/include/asm/intel-family.h
@@ -65,6 +65,8 @@
 #define INTEL_FAM6_ATOM_DENVERTON  0x5F /* Goldmont Microserver */
 #define INTEL_FAM6_ATOM_GEMINI_LAKE0x7A
 
+#define INTEL_FAM6_CANNONLAKE_MOBILE   0x66
+
 /* Xeon Phi */
 
 #define INTEL_FAM6_XEON_PHI_KNL0x57 /* Knights Landing */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-11 Thread Megha Dey
Currently, the cannonlake family of Intel processors support the
branch monitoring feature. Intel's Branch monitoring feature is trying
to utilize heuristics to detect the occurrence of an ROP (Return
Oriented Programming) attack.

A perf-based kernel driver has been used to monitor the occurrence of
one of the 6 branch monitoring events. There are 2 counters that each
can select between one of these events for evaluation over a specified
instruction window size (0 to 1023). For each counter, a threshold value
(0 to 127) can be configured to set a point at which ROP detection event
action is taken (determined by user-space). Each task can monitor
a maximum of 2 events at any given time.

Apart from window_size(global) and threshold(per-counter), various sysfs
entries are provided for the user to configure: guest_disable, lbr_freeze,
window_cnt_sel, cnt_and_mode (all global) and mispred_evt_cnt(per-counter).
For all events belonging to the same task, the global parameters are
shared.

Everytime a task is scheduled out, we save current window and count
associated with the event being monitored. When the task is scheduled
next, we start counting from previous count associated with this event.
Thus, a full context switch in this case is not necessary.

To monitor a user space application for ROP related events, perf command
line can be used as follows:

perf stat -e  

eg. For the following test program (test.c) and threshold = 100
(echo 100 > /sys/devices/intel_bm/threshold)

void func(void)
{
return;
}

void main(void)
{
int i;

for (i = 0; i < 128; i++) {
func();
}

return;
}

perf stat -e intel_bm/rets/ ./test

 Performance counter stats for './test':

 1  intel_bm/rets/

   0.104705937 seconds time elapsed

perf returns the number of branch monitoring interrupts occurred during
the execution of the user-space application.

Signed-off-by: Megha Dey 
Signed-off-by: Yu-Cheng Yu 
---
 arch/x86/events/Kconfig  |  10 +
 arch/x86/events/intel/Makefile   |   2 +
 arch/x86/events/intel/bm.c   | 618 +++
 arch/x86/include/asm/msr-index.h |   5 +
 arch/x86/include/asm/processor.h |   4 +
 include/linux/perf_event.h   |   9 +-
 kernel/events/core.c |  16 +
 7 files changed, 663 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/events/intel/bm.c

diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig
index 9a7a144..40903ca 100644
--- a/arch/x86/events/Kconfig
+++ b/arch/x86/events/Kconfig
@@ -9,6 +9,16 @@ config PERF_EVENTS_INTEL_UNCORE
Include support for Intel uncore performance events. These are
available on NehalemEX and more modern processors.
 
+config PERF_EVENTS_INTEL_BM
+   bool "Intel Branch Monitoring support"
+   depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
+   ---help---
+ Include support for Intel Branch monitoring. This feature utilizes
+ heuristics for detecting ROP(Return oriented programming) like
+ attacks. These heuristics are based off certain performance
+ monitoring statistics, measured dynamically over a short
+ configurable window period.
+
 config PERF_EVENTS_INTEL_RAPL
tristate "Intel rapl performance events"
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
index 3468b0c..14235ec 100644
--- a/arch/x86/events/intel/Makefile
+++ b/arch/x86/events/intel/Makefile
@@ -2,6 +2,8 @@
 obj-$(CONFIG_CPU_SUP_INTEL)+= core.o bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= ds.o knc.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= lbr.o p4.o p6.o pt.o
+obj-$(CONFIG_PERF_EVENTS_INTEL_BM) += intel-bm-perf.o
+intel-bm-perf-objs := bm.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)   += intel-rapl-perf.o
 intel-rapl-perf-objs   := rapl.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
diff --git a/arch/x86/events/intel/bm.c b/arch/x86/events/intel/bm.c
new file mode 100644
index 000..923c6e9
--- /dev/null
+++ b/arch/x86/events/intel/bm.c
@@ -0,0 +1,618 @@
+/*
+ * Support for Intel branch monitoring counters
+ *
+ * Intel branch monitoring MSRs are specified in the Intel® 64 and IA-32
+ * Software Developer’s Manual Volume 4 section 2.16.2 (October 2017)
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * Contact Information:
+ * Megha Dey 
+ * Yu-Cheng Yu 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A 

[PATCH V1 3/3] x86, bm: Add documentation on Intel Branch Monitoring

2017-11-11 Thread Megha Dey
This patch adds the Documentation/x86/intel_bm.txt file with some
information about Intel Branch monitoring.

Signed-off-by: Megha Dey 
---
 Documentation/x86/intel_bm.txt | 216 +
 1 file changed, 216 insertions(+)
 create mode 100644 Documentation/x86/intel_bm.txt

diff --git a/Documentation/x86/intel_bm.txt b/Documentation/x86/intel_bm.txt
new file mode 100644
index 000..25b7177
--- /dev/null
+++ b/Documentation/x86/intel_bm.txt
@@ -0,0 +1,216 @@
+Intel(R) Branch Monitoring
+
+Copyright (C) 2017 Intel Corporation
+
+Megha Dey 
+Yu-Cheng Yu 
+
+I. Overview
+===
+
+The Cannonlake family of Intel processors support the branch monitoring
+feature. This feature uses heuristics to detect the occurrence of an ROP
+(Return Oriented Programming) or ROP like(JOP:Jump oriented programming)
+attack. These heuristics are based off certain performance monitoring
+statistics, measured dynamically over a short configurable window period.
+ROP is a malware trend in which the attacker can compromise a return
+pointer held on the stack to redirect execution to a different desired
+instruction.
+
+Support for branch monitoring has been added via Linux kernel perf event
+infrastructure. This feature is enabled by CONFIG_PERF_EVENTS_INTEL_BM.
+
+Once the kernel is compiled with CONFIG_PERF_EVENTS_INTEL_BM=y on a
+Cannonlake system, the following perf events are added which can be viewed
+with perf list:
+  intel_bm/branch-misp/  [Kernel PMU event]
+  intel_bm/call-ret/ [Kernel PMU event]
+  intel_bm/far-branch/   [Kernel PMU event]
+  intel_bm/indirect-branch-misp/ [Kernel PMU event]
+  intel_bm/ret-misp/ [Kernel PMU event]
+  intel_bm/rets/ [Kernel PMU event]
+
+II. Hardware details
+
+
+The MSRs associated with branch monitoring are as follows:
+
+1. BR_DETECT_CTRL : Branch Monitoring Global control
+   Used for enabling and configuring global capability
+
+2. BR_DETECT_STATUS : Branch Monitoring Global Status
+   Used by SW handler for determining detect status
+
+3. BR_DETECT_COUNTER_CONFIG_i : Branch Monitoring Counter Configuration
+   Per-cpu branch monitoring counter Configuration
+
+There are 2 8-bit counters that each can select between one of the
+following 6 events:
+
+1. RET instructions: Counts the number of near return instructions retired
+
+2. CALL-RET instructions: Counts the difference between the number of near
+   return and call instructions retired
+
+3. RET mispredicts: Mispredicted return instructions retired
+
+4. Branch (all) mispredicts: Counts the number of mispredicted branches
+
+5. Indirect branch mispredicts: Counts the number of mispredicted indirect
+   near branch instructions. Includes indirect near jump/call instructions
+
+6. Far branch instructions: Counts the number of far branches retired
+
+Branch Monitoring hardware utilizes various existing performance related
+counter events. Of the 6 events above, only call-ret is newly implemented.
+
+The events are evaluated over a specified 10-bit instruction window size
+(0 to 1023). For each counter, a threshold value (0 to 127) can be
+configured to set a point at which an interrupt is generated and a
+detection event action is taken (determined by user-space). This can take
+the form of signaling an interrupt and/or freezing the state of the last
+branch record information.
+
+The event counters are reset after every 'window size' instructions by the
+hardware.
+
+The feature is for user mode (privilege level > 0) operation only, which is
+the known malware security threat target environment. While in supervisor
+mode, this heuristic detection counter activity is suspended. This behavior
+(user mode) is independent of root vs. non-root with respect to
+virtualization technology execution.
+
+III. Software Implementation
+
+
+A perf-based kernel driver has been used to monitor the occurrence of
+one of the 6 branch monitoring events.
+
+If an branch monitoring interrupt is generated, the interrupt bit is set
+which is cleared by interrupt handler and the event counters are reset.
+
+The entire system can monitor a maximum of 2 events at any given time.
+These events can belong to the same or different tasks.
+
+Everytime a task is scheduled out, we save current window and count
+associated with the event being monitored. When the task is scheduled next,
+we start counting from previous count associated with this event. Thus, a
+full context switch in this case is not necessary.
+
+The Branch Monitoring exception can be configured as a regular interrupt or
+an NMI. We chain an NMI handler after PMU, because
+1. It will not interfere with PMU events
+2. We only monitor for user-mode events, and this 

[PATCH V1 0/3] perf/x86/intel: Add Branch Monitoring support

2017-11-11 Thread Megha Dey
This patchset adds support for Intel's branch monitoring feature. This
feature uses heuristics to detect the occurrence of an ROP(Return Oriented
Programming) or ROP like(JOP: Jump oriented programming) attack. These
heuristics are based off certain performance monitoring statistics,
measured dynamically over a short configurable window period. ROP is a
malware trend in which the attacker can compromise a return pointer held
on the stack to redirect execution to a different desired instruction.

Currently, only the Cannonlake family of Intel processors support this
feature. This feature is enabled by CONFIG_PERF_EVENTS_INTEL_BM.

Once the kernel is compiled with CONFIG_PERF_EVENTS_INTEL_BM=y on a
Cannonlake system, the following perf events are added which can be viewed
with perf list:
  intel_bm/branch-misp/  [Kernel PMU event]
  intel_bm/call-ret/ [Kernel PMU event]
  intel_bm/far-branch/   [Kernel PMU event]
  intel_bm/indirect-branch-misp/ [Kernel PMU event]
  intel_bm/ret-misp/ [Kernel PMU event]
  intel_bm/rets/ [Kernel PMU event]

A perf-based kernel driver has been used to monitor the occurrence of
one of the 6 branch monitoring events. There are 2 counters that each
can select between one of these events for evaluation over a specified
instruction window size (0 to 1023). For each counter, a threshold value
(0 to 127) can be configured to set a point at which an interrupt is
generated. Each task can monitor a maximum of 2 events at any given time.

Apart from the kernel driver, this patchset adds CPUID of Cannonlake
processors to Intel family list and the Documentation/x86/intel_bm.txt
file with some information about Intel Branch monitoring.

Changes V0->V1:
1. Used the 'is_sampling_event' function 
2. Added support to monitor 2 events for every task
3. Corrected typos
4. Added a lock to prevent race condition in concurrent perf_event_open()s
5. Got rid of start()/stop() and added its functionality in add()/del()
6. Removed read() callback as it was not doing anything.
6. Removed code for sampling events as we do not support sampling.
7. Added 'id' member to hw_perf_event::intel_bm to track which counter the
event is using.
8. Moved MSR accesses to the add()/del() callbacks

Megha Dey (3):
  x86/cpu/intel: Add Cannonlake to Intel family
  perf/x86/intel/bm.c: Add Intel Branch Monitoring support
  x86, bm: Add documentation on Intel Branch Monitoring

 Documentation/x86/intel_bm.txt  | 216 +
 arch/x86/events/Kconfig |  10 +
 arch/x86/events/intel/Makefile  |   2 +
 arch/x86/events/intel/bm.c  | 618 
 arch/x86/include/asm/intel-family.h |   2 +
 arch/x86/include/asm/msr-index.h|   5 +
 arch/x86/include/asm/processor.h|   4 +
 include/linux/perf_event.h  |   9 +-
 kernel/events/core.c|  16 +
 9 files changed, 881 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/x86/intel_bm.txt
 create mode 100644 arch/x86/events/intel/bm.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/2] thunderbolt: Make pathname to force_power shorter

2017-11-11 Thread Mika Westerberg
On Fri, Nov 10, 2017 at 08:29:18PM +0200, Andy Shevchenko wrote:
> On Mon, 2017-10-23 at 13:30 +, mario.limoncie...@dell.com wrote:
> > Acked-by: Mario Limonciello 
> 
> Thanks.
> 
> Since Mika established a dedicated repository for Thunderbolt patches I
> assume he takes this.

Yes, I can pick it up after v4.15-rc1 is released.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html