Re: [PATCH] of: Document exactly what of_find_node_by_name() puts

2017-11-10 Thread Randy Dunlap
On 11/10/2017 05:45 PM, Stephen Boyd wrote:
> It isn't clear if this function of_node_put()s the 'from'
> argument, or the node it finds in the search. Clearly indicate
> which variable is touched.
> 
> Signed-off-by: Stephen Boyd 
> ---
>  drivers/of/base.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 63897531cd75..602c5d65734a 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -866,7 +866,7 @@ EXPORT_SYMBOL(of_find_node_opts_by_path);>   *@from:  
> The node to start searching from or NULL, the node

*   @from:  The node to start searching from or %NULL; the node

>   *   you pass will not be searched, only the next one
>   *   will; typically, you pass what the previous call

*   will. Typically, you pass what the previous call

> - *   returned. of_node_put() will be called on it
> + *   returned. of_node_put() will be called on @from.
>   *   @name:  The name string to match against
>   *
>   *   Returns a node pointer with refcount incremented, use
> 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] of: Document exactly what of_find_node_by_name() puts

2017-11-10 Thread Stephen Boyd
It isn't clear if this function of_node_put()s the 'from'
argument, or the node it finds in the search. Clearly indicate
which variable is touched.

Signed-off-by: Stephen Boyd 
---
 drivers/of/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 63897531cd75..602c5d65734a 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -866,7 +866,7 @@ EXPORT_SYMBOL(of_find_node_opts_by_path);
  * @from:  The node to start searching from or NULL, the node
  * you pass will not be searched, only the next one
  * will; typically, you pass what the previous call
- * returned. of_node_put() will be called on it
+ * returned. of_node_put() will be called on @from.
  * @name:  The name string to match against
  *
  * Returns a node pointer with refcount incremented, use
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag

2017-11-10 Thread Rafael J. Wysocki
On Sat, Nov 11, 2017 at 12:45 AM, Rafael J. Wysocki  wrote:
> On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson  wrote:
>> On 8 November 2017 at 14:25, Rafael J. Wysocki  wrote:

[cut]

>> Moreover, you should check the return value from
>> pm_runtime_set_suspended().
>
> This is in "noirq", so failures of that are meaningless here.

They *should* be meaningless, but __pm_runtime_set_status() is sort of
buggy and checks child_count regardless of whether or not runtime PM
is enabled for the children (but when changing the status to "active"
it actually checks if runtime PM is enabled for the parent before
returning -EBUSY, so it is not event consistent internally).  Oh well.

>> Then I wonder, what should you do when it fails here?
>>
>> Perhaps a better idea is to do this in the noirq suspend phase,
>> because it allows you to bail out in case pm_runtime_set_suspended()
>> fails.
>
> This doesn't make sense, sorry.

Not for the above reason, but that would allow the bug in
__pm_runtime_set_status() to be sort of worked around by setting the
status to "suspended" for children before doing that for their
parents.

Moreover, stuff with nonzero usage_counts cannot be left in suspend regardless.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-10 Thread Dey, Megha


>-Original Message-
>From: Jiri Olsa [mailto:jo...@redhat.com]
>Sent: Saturday, November 4, 2017 6:26 AM
>To: Megha Dey 
>Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
>d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com;
>h...@zytor.com; andriy.shevche...@linux.intel.com;
>kstew...@linuxfoundation.org; Yu, Yu-cheng ;
>Brown, Len ; gre...@linuxfoundation.org;
>pet...@infradead.org; a...@kernel.org;
>alexander.shish...@linux.intel.com; namhy...@kernel.org;
>vikas.shiva...@linux.intel.com; pombreda...@nexb.com;
>m...@kylehuey.com; b...@suse.de; Andrejczuk, Grzegorz
>; Luck, Tony ;
>cor...@lwn.net; Shankar, Ravi V ; Dey, Megha
>
>Subject: Re: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch
>Monitoring support
>
>On Fri, Nov 03, 2017 at 11:00:05AM -0700, Megha Dey wrote:
>
>SNIP
>
>> +
>> +static void intel_bm_event_update(struct perf_event *event) {
>> +union bm_detect_status cur_stat;
>> +
>> +rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw);
>> +local64_set(>hw.prev_count, (uint64_t)cur_stat.raw); }
>> +
>> +static void intel_bm_event_stop(struct perf_event *event, int mode) {
>> +wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->id,
>> +   (event->hw.bm_counter_conf & ~1));
>> +
>> +intel_bm_event_update(event);
>> +}
>> +
>> +static void intel_bm_event_del(struct perf_event *event, int flags) {
>> +intel_bm_event_stop(event, flags);
>> +}
>> +
>> +static void intel_bm_event_read(struct perf_event *event) { }
>
>should you call intel_bm_event_update in here? so the read syscall gets
>updated data in case the case the event is active

We do not update the event->count in the intel_bm_event_update function. We are 
basically saving the status register contents when a task is being scheduled 
out. So it has nothing to do with the read syscall. Having said that, I will 
probably put what stop() does in del() and get rid of the stop() function.
>
>jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-10 Thread Dey, Megha


>-Original Message-
>From: Jiri Olsa [mailto:jo...@redhat.com]
>Sent: Saturday, November 4, 2017 6:26 AM
>To: Megha Dey 
>Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
>d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com;
>h...@zytor.com; andriy.shevche...@linux.intel.com;
>kstew...@linuxfoundation.org; Yu, Yu-cheng ;
>Brown, Len ; gre...@linuxfoundation.org;
>pet...@infradead.org; a...@kernel.org;
>alexander.shish...@linux.intel.com; namhy...@kernel.org;
>vikas.shiva...@linux.intel.com; pombreda...@nexb.com;
>m...@kylehuey.com; b...@suse.de; Andrejczuk, Grzegorz
>; Luck, Tony ;
>cor...@lwn.net; Shankar, Ravi V ; Dey, Megha
>
>Subject: Re: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch
>Monitoring support
>
>On Fri, Nov 03, 2017 at 11:00:05AM -0700, Megha Dey wrote:
>
>SNIP
>
>> +static unsigned int bm_threshold = BM_MAX_THRESHOLD; static
>unsigned
>> +int bm_mispred_evt_cnt;
>> +
>> +/* Branch monitoring counter owners */ static struct perf_event
>> +*bm_counter_owner[2];
>
>SNIP
>
>> + * Find a hardware counter for the target task
>> + */
>> +for (i = 0; i < bm_num_counters; i++) {
>> +if ((bm_counter_owner[i] == NULL) ||
>> +(bm_counter_owner[i]->state ==
>PERF_EVENT_STATE_DEAD)) {
>> +counter_to_use = i;
>> +bm_counter_owner[i] = event;
>> +break;
>> +}
>> +}
>> +
>> +if (counter_to_use == -1)
>> +return -EBUSY;
>
>not sure I understand, your docs says: "There are 2 8-bit counters that each..
>"
>
>so there are 2 counters per CPU? if that's corrent, isn't this check too strict
>then? you could have more events configured running on other CPUs for
>another tasks
>
>given that we do task only events here, should bm_counter_owner be part
>of task, together with the limit..? I'm probably missing something..

Yes you are right. Initially, we had support for 2 events(from one or 2 tasks) 
to be monitored for the entire system. This indeed seems very limiting. In the 
next patchset, I will add support for 2 events per task (This is what the 
hardware can support).
>
>thanks,
>jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-10 Thread Dey, Megha


>-Original Message-
>From: Jiri Olsa [mailto:jo...@redhat.com]
>Sent: Saturday, November 4, 2017 6:25 AM
>To: Megha Dey 
>Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
>d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com;
>h...@zytor.com; andriy.shevche...@linux.intel.com;
>kstew...@linuxfoundation.org; Yu, Yu-cheng ;
>Brown, Len ; gre...@linuxfoundation.org;
>pet...@infradead.org; a...@kernel.org;
>alexander.shish...@linux.intel.com; namhy...@kernel.org;
>vikas.shiva...@linux.intel.com; pombreda...@nexb.com;
>m...@kylehuey.com; b...@suse.de; Andrejczuk, Grzegorz
>; Luck, Tony ;
>cor...@lwn.net; Shankar, Ravi V ; Dey, Megha
>
>Subject: Re: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch
>Monitoring support
>
>On Fri, Nov 03, 2017 at 11:00:05AM -0700, Megha Dey wrote:
>
>SNIP
>
>> +event->event_caps |= PERF_EV_CAP_BM;
>> +/*
>> + * cfg contains one of the 6 possible Branch Monitoring events
>> + */
>> +cfg = event->attr.config;
>> +if (cfg < 0 || cfg > (BM_MAX_EVENTS - 1))
>> +return -EINVAL;
>> +
>> +if (event->attr.sample_period) /* no sampling */
>> +return -EINVAL;
>
>you can use the 'is_sampling_event' function

Will make the change.
>
>jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-10 Thread Dey, Megha


>-Original Message-
>From: Jiri Olsa [mailto:jo...@redhat.com]
>Sent: Saturday, November 4, 2017 6:25 AM
>To: Megha Dey 
>Cc: x...@kernel.org; linux-ker...@vger.kernel.org; linux-
>d...@vger.kernel.org; t...@linutronix.de; mi...@redhat.com;
>h...@zytor.com; andriy.shevche...@linux.intel.com;
>kstew...@linuxfoundation.org; Yu, Yu-cheng ;
>Brown, Len ; gre...@linuxfoundation.org;
>pet...@infradead.org; a...@kernel.org;
>alexander.shish...@linux.intel.com; namhy...@kernel.org;
>vikas.shiva...@linux.intel.com; pombreda...@nexb.com;
>m...@kylehuey.com; b...@suse.de; Andrejczuk, Grzegorz
>; Luck, Tony ;
>cor...@lwn.net; Shankar, Ravi V ; Dey, Megha
>
>Subject: Re: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch
>Monitoring support
>
>On Fri, Nov 03, 2017 at 11:00:05AM -0700, Megha Dey wrote:
>
>SNIP
>
>> +
>> +static int intel_bm_event_nmi_handler(unsigned int cmd, struct
>> +pt_regs *regs) {
>> +struct perf_event *event;
>> +union bm_detect_status stat;
>> +struct perf_sample_data data;
>> +int i;
>> +unsigned long x;
>> +
>> +rdmsrl(BR_DETECT_STATUS_MSR, stat.raw);
>> +
>> +if (stat.event) {
>> +wrmsrl(BR_DETECT_STATUS_MSR, 0);
>> +apic_write(APIC_LVTPC, APIC_DM_NMI);
>> +/*
>> + * Issue wake-up to corrresponding polling event
>> + */
>> +x = stat.ctrl_hit;
>> +for_each_set_bit(i, , bm_num_counters) {
>> +event = bm_counter_owner[i];
>> +perf_sample_data_init(, 0, event-
>>hw.last_period);
>> +perf_event_overflow(event, , regs);
>
>hum, it's non sampling events only right? then you don't need any of the
>perf_sample_data stuff.. the perf_event_overflow call is basicaly nop

Yeah you are right. We were supporting sampling initially, forgot to get rid of 
this code.
Will change this in the next version.
>
>> +local64_inc(>count);
>> +atomic_set(>hw.bm_poll, POLLIN);
>> +event->pending_wakeup = 1;
>> +irq_work_queue(>pending);
>
>also this is for sampling events only
>
>seems like you only want to increment the event->count in here

We are currently working on a library similar to libperf which user space apps 
could make use of instead of perf command line monitoring. This code has been 
added so that the user can poll on when/if an interrupt is triggered and let 
the user know of its occurrence. If you think there is a better way of doing 
this, please let me know :)

>
>thanks,
>jirka
>
>> +}
>> +return NMI_HANDLED;
>> +}
>> +return NMI_DONE;
>> +}
>
>
>SNIP
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag

2017-11-10 Thread Rafael J. Wysocki
On Sat, Nov 11, 2017 at 12:45 AM, Rafael J. Wysocki  wrote:
> On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson  wrote:
>> On 8 November 2017 at 14:25, Rafael J. Wysocki  wrote:
>>> From: Rafael J. Wysocki 
>>>
>>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
>>> code that it is desirable to leave the device in runtime suspend
>>> after system-wide transitions to the working state (for example,
>>> the device may be slow to resume and it may be better to avoid
>>> resuming it right away).
>>>
>>> Generally, the middle-layer code involved in the handling of the
>>> device is expected to indicate to the PM core whether or not the
>>> device may be left in suspend with the help of the device's
>>> power.may_skip_resume status bit.  That has to happen in the "noirq"
>>> phase of the preceding system suspend (or analogous) transition.
>>> The middle layer is then responsible for handling the device as
>>> appropriate in its "noirq" resume callback which is executed
>>> regardless of whether or not the device may be left suspended, but
>>> the other resume callbacks (except for ->complete) will be skipped
>>> automatically by the core if the device really can be left in
>>> suspend.
>>
>> I don't understand the reason to why you need to skip invoking resume
>> callbacks to achieve this behavior, could you elaborate on that?
>
> The reason why it is done this way is because that takes less code and
> is easier (or at least less error-prone, because it avoids repeating
> patterns in middle layers).

Actually, it also is a matter of correctness, at least to some extent.

Namely, if the parent or any supplier of the device has
power.must_resume clear in dpm_noirq_resume_devices(), then the device
should not be touched during the whole system resume transition
(because the access may very well go through the suspended parent or
supplier) and the most straightforward way to make that happen is to
avoid running the code that may touch the device.  [Arguably, if
middle layers were made responsible for handling that, they would need
to do pretty much the same thing and so there is no reason for not
doing it in the core.]

Allowing the "noirq" callback from middle layers to run in that case
is a stretch already, but since genpd needs that, well, tough nuggets.

All of that said, if there is a middle layer wanting to set
power.skip_resume and needing to do something different for the resume
callbacks, then this piece can be moved from the core to the middle
layers at any time later.  So far there's none, though.  At least not
in this patch series.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag

2017-11-10 Thread Rafael J. Wysocki
On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson  wrote:
> On 8 November 2017 at 14:25, Rafael J. Wysocki  wrote:
>> From: Rafael J. Wysocki 
>>
>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
>> code that it is desirable to leave the device in runtime suspend
>> after system-wide transitions to the working state (for example,
>> the device may be slow to resume and it may be better to avoid
>> resuming it right away).
>>
>> Generally, the middle-layer code involved in the handling of the
>> device is expected to indicate to the PM core whether or not the
>> device may be left in suspend with the help of the device's
>> power.may_skip_resume status bit.  That has to happen in the "noirq"
>> phase of the preceding system suspend (or analogous) transition.
>> The middle layer is then responsible for handling the device as
>> appropriate in its "noirq" resume callback which is executed
>> regardless of whether or not the device may be left suspended, but
>> the other resume callbacks (except for ->complete) will be skipped
>> automatically by the core if the device really can be left in
>> suspend.
>
> I don't understand the reason to why you need to skip invoking resume
> callbacks to achieve this behavior, could you elaborate on that?

The reason why it is done this way is because that takes less code and
is easier (or at least less error-prone, because it avoids repeating
patterns in middle layers).

Note that the callbacks only may be skipped by the core if the middle
layer has set power.skip_resume for the device (or if the core is
handling it in patch [5/6], but that's one more step ahead still).

> Couldn't the PM domain or the middle-layer instead decide what to do?

They still can, the whole thing is a total opt-in.

But to be constructive, do you have any specific examples in mind?

> To me it sounds a bit prone to errors by skipping callbacks from the
> PM core, and I wonder if the general driver author will be able to
> understand how to use this flag properly.

This has nothing to do with general driver authors and I'm not sure
what you mean here and where you are going with this.

> That said, as the series don't include any changes for drivers making
> use of the flag, could please fold in such change as it would provide
> a more complete picture?

I've already done so, see https://patchwork.kernel.org/patch/10007349/

IMHO it's not really useful to drag this stuff (which doesn't change
BTW) along with every iteration of the core patches.

>>
>> The additional power.must_resume status bit introduced for the
>> implementation of this mechanisn is used internally by the PM core
>> to track the requirement to resume the device (which may depend on
>> its children etc).
>
> Yeah, clearly the PM core needs to be involved, because of the need of
> dealing with parent/child relations, however as kind of indicate
> above, couldn't the PM core just set some flag/status bits, which
> instructs the middle-layer and PM domain on what to do? That sounds
> like an easier approach.

No, it is not easier.  And it is backwards.

>>
>> Signed-off-by: Rafael J. Wysocki 
>> Acked-by: Greg Kroah-Hartman 
>> ---
>>  Documentation/driver-api/pm/devices.rst |   24 ++-
>>  drivers/base/power/main.c   |   65 
>> +---
>>  include/linux/pm.h  |   14 +-
>>  3 files changed, 93 insertions(+), 10 deletions(-)
>>
>> Index: linux-pm/include/linux/pm.h
>> ===
>> --- linux-pm.orig/include/linux/pm.h
>> +++ linux-pm/include/linux/pm.h
>> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
>> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if 
>> possible.
>>   *
>>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>>   * system suspend/resume callbacks to be skipped for the device to return 0 
>> from
>> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>>   * necessary from the driver's perspective.  It also may cause them to skip
>>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided 
>> by
>>   * the driver if they decide to leave the device in runtime suspend.
>> + *
>> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that 
>> the
>> + * driver prefers the device to be left in runtime suspend after system 
>> resume.
>>   */
>> -#define DPM_FLAG_NEVER_SKIPBIT(0)
>> -#define DPM_FLAG_SMART_PREPARE BIT(1)
>> -#define DPM_FLAG_SMART_SUSPEND BIT(2)
>> +#define 

Re: [PATCH V0 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support

2017-11-10 Thread Megha Dey
On Mon, 2017-11-06 at 13:49 +0200, Alexander Shishkin wrote:
> On Fri, Nov 03, 2017 at 11:00:05AM -0700, Megha Dey wrote:
> > +static int intel_bm_event_init(struct perf_event *event)
> > +{
> 
> ...
> 
> > +   /*
> > +* Find a hardware counter for the target task
> > +*/
> > +   for (i = 0; i < bm_num_counters; i++) {
> > +   if ((bm_counter_owner[i] == NULL) ||
> > +   (bm_counter_owner[i]->state == PERF_EVENT_STATE_DEAD)) {
> > +   counter_to_use = i;
> > +   bm_counter_owner[i] = event;
> > +   break;
> 
> How are two concurrent perf_event_open()s not going to race here?
> Also, I'm not sure what's the value of looking at the ->state here.
> Shouldn't the ->destroy() method clear the corresponding array slot?

Yes you are right. I will add a locking mechanism here to prevent racing
and remove the ->state in the next version.
> 
> > +   }
> > +   }
> 
> ...
> 
> > +   wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + counter_to_use,
> > +   event->hw.bm_counter_conf);
> > +   wrmsrl(BR_DETECT_STATUS_MSR, 0);
> 
> These wrmsrs will happen on whatever CPU perf_event_open() is called on,
> as opposed to the CPU where the event will be scheduled. You probably want
> to keep the MSR accesses in the start()/stop() callbacks.

Agreed, don't think we need this code here. We are writing to the MSRs
in start() anyways.
> 
> Regards,
> --
> Alex
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/2] thunderbolt: Make pathname to force_power shorter

2017-11-10 Thread Andy Shevchenko
On Mon, 2017-10-23 at 13:30 +, mario.limoncie...@dell.com wrote:
> Acked-by: Mario Limonciello 

Thanks.

Since Mika established a dedicated repository for Thunderbolt patches I
assume he takes this.

-- 
Andy Shevchenko 
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 00/51] powerpc, mm: Memory Protection Keys

2017-11-10 Thread Christophe LEROY

Hi

Le 06/11/2017 à 09:56, Ram Pai a écrit :

Memory protection keys enable applications to protect its
address space from inadvertent access from or corruption
by itself.

These patches along with the pte-bit freeing patch series
enables the protection key feature on powerpc; 4k and 64k
hashpage kernels. It also changes the generic and x86
code to expose memkey features through sysfs. Finally
testcases and Documentation is updated.

All patches can be found at --
https://github.com/rampai/memorykeys.git memkey.v9


As far as I can see you are focussing the implementation on 64 bits 
powerpc. This could also be implemented on 32 bits powerpc, for instance 
the 8xx has MMU Access Protection Registers which can be used to define 
16 domains and could I think be used for implementing protection keys.
Of course the challenge after that would be to find 4 spare PTE bits, 
I'm sure we can find them on the 8xx, at least when using 16k pages we 
have 2 bits already available, then by merging PAGE_SHARED and PAGE_USER 
and by reducing PAGE_RO to only one bit we can get the 4 spare bits.


Therefore I think it would be great if you could implement a framework 
common to both PPC32 and PPC64.


Christophe



The overall idea:
-
  A process allocates a key and associates it with
  an address range within its address space.
  The process then can dynamically set read/write
  permissions on the key without involving the
  kernel. Any code that violates the permissions
  of the address space; as defined by its associated
  key, will receive a segmentation fault.

This patch series enables the feature on PPC64 HPTE
platform.

ISA3.0 section 5.7.13 describes the detailed
specifications.


Highlevel view of the design:
---
When an application associates a key with a address
address range, program the key in the Linux PTE.
When the MMU detects a page fault, allocate a hash
page and program the key into HPTE. And finally
when the MMU detects a key violation; due to
invalid application access, invoke the registered
signal handler and provide the violated key number.


Testing:
---
This patch series has passed all the protection key
tests available in the selftest directory.The
tests are updated to work on both x86 and powerpc.
The selftests have passed on x86 and powerpc hardware.

History:
---
version v9:
(1) used jump-labels to optimize code
-- Balbir
(2) fixed a register initialization bug noted
by Balbir
(3) fixed inappropriate use of paca to pass
siginfo and keys to signal handler
(4) Cleanup of comment style not to be right
justified -- mpe
(5) restructured the patches to depend on the
availability of VM_PKEY_BIT4 in
include/linux/mm.h
(6) Incorporated comments from Dave Hansen
towards changes to selftest and got
them tested on x86.

version v8:
(1) Contents of the AMR register withdrawn from
the siginfo structure. Applications can always
read the AMR register.
(2) AMR/IAMR/UAMOR are now available through
ptrace system call. -- thanks to Thiago
(3) code changes to handle legacy power cpus
that do not support execute-disable.
(4) incorporates many code improvement
suggestions.

version v7:
(1) refers to device tree property to enable
protection keys.
(2) adds 4K PTE support.
(3) fixes a couple of bugs noticed by Thiago
(4) decouples this patch series from arch-
 independent code. This patch series can
 now stand by itself, with one kludge
patch(2).
version v7:
(1) refers to device tree property to enable
protection keys.
(2) adds 4K PTE support.
(3) fixes a couple of bugs noticed by Thiago
(4) decouples this patch series from arch-
 independent code. This patch series can
 now stand by itself, with one kludge
 patch(2).

version v6:
(1) selftest changes are broken down into 20
incremental patches.
(2) A separate key allocation mask that
includes PKEY_DISABLE_EXECUTE is
added for powerpc
(3) pkey feature is enabled for 64K HPT case
only. RPT and 4k HPT is disabled.
(4) Documentation is updated to better
capture the semantics.
(5) introduced arch_pkeys_enabled() to find
if an arch enables pkeys. Correspond-
ing change the logic that displays
key value in smaps.
(6) code rearranged in many places based on
comments from Dave Hansen, Balbir,
Anshuman.   
(7) fixed one bug where a bogus key could be
associated successfully in

[PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError

2017-11-10 Thread Dongjiu Geng
From: James Morse 

When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.

The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.

Signed-off-by: James Morse 
Signed-off-by: Dongjiu Geng 
---
 arch/arm64/kvm/hyp/switch.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..c6f17c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -226,13 +226,20 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, 
u64 *hpfar)
return true;
 }
 
+static void __hyp_text __populate_fault_info_esr(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+}
+
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 {
-   u64 esr = read_sysreg_el2(esr);
-   u8 ec = ESR_ELx_EC(esr);
+   u8 ec;
+   u64 esr;
u64 hpfar, far;
 
-   vcpu->arch.fault.esr_el2 = esr;
+   __populate_fault_info_esr(vcpu);
+   esr = vcpu->arch.fault.esr_el2;
+   ec = ESR_ELx_EC(esr);
 
if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
return true;
@@ -321,6 +328,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 */
if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
goto again;
+   else if (ARM_EXCEPTION_CODE(exit_code) == ARM_EXCEPTION_EL1_SERROR)
+   __populate_fault_info_esr(vcpu);
 
if (static_branch_unlikely(_v2_cpuif_trap) &&
exit_code == ARM_EXCEPTION_TRAP) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 0/7] Support RAS virtualization in KVM

2017-11-10 Thread Dongjiu Geng
This series patches mainly do below things:

1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
   KVM will will do a minimum simulation, there registers are simulated
   to RAZ/WI in KVM.
2. Route synchronous External Abort exceptions from Non-secure EL0
   and EL1 to EL2. When exception EL3 routing is enabled by firmware,
   system will trap to EL3 firmware instead of EL2 KVM, then firmware
   judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
   otherwise jump to EL1 host kernel.
3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
   in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
   driver to parse the CPER record for SError which happened in the guest
4. Although we can use APEI driver to handle the guest SError, but not all
   system support SEI notification, such as kernel-first. So here KVM will
   also classify the Error through Exception Syndrome Register and do different
   approaches according to Asynchronous Error Type
5. If the guest SError error is not propagated and not consumed, then KVM return
   recoverable error status to user-space, user-space will specify the guest ESR
   and inject a virtual SError. For other Asynchronous Error Type, KVM directly
   injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
   error is fatal. In the RAS extension, guest virtual ESR must be set, because
   all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
   this ESR to IMPLEMENTATION DEFINED by default if user space does not specify 
it.

Dongjiu Geng (5):
  acpi: apei: Add SEI notification type support for ARMv8
  KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
  arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
  arm64: kvm: Set Virtual SError Exception Syndrome for guest
  arm64: kvm: handle SError Interrupt by categorization

James Morse (1):
  KVM: arm64: Save ESR_EL2 on guest SError

Xie XiuQi (1):
  arm64: cpufeature: Detect CPU RAS Extentions

 Documentation/virtual/kvm/api.txt| 11 ++
 arch/arm/include/asm/kvm_host.h  |  1 +
 arch/arm/kvm/guest.c |  9 +
 arch/arm64/Kconfig   | 16 +
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 +-
 arch/arm64/include/asm/esr.h | 15 
 arch/arm64/include/asm/kvm_arm.h |  2 ++
 arch/arm64/include/asm/kvm_asm.h |  3 ++
 arch/arm64/include/asm/kvm_emulate.h | 17 +
 arch/arm64/include/asm/kvm_host.h|  2 ++
 arch/arm64/include/asm/sysreg.h  | 15 
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kernel/cpufeature.c   | 13 +++
 arch/arm64/kernel/process.c  |  3 ++
 arch/arm64/kvm/guest.c   | 14 
 arch/arm64/kvm/handle_exit.c | 67 +---
 arch/arm64/kvm/hyp/switch.c  | 31 +++--
 arch/arm64/kvm/inject_fault.c| 13 ++-
 arch/arm64/kvm/reset.c   |  3 ++
 arch/arm64/kvm/sys_regs.c| 10 ++
 arch/arm64/mm/fault.c| 16 +
 drivers/acpi/apei/Kconfig| 15 
 drivers/acpi/apei/ghes.c | 53 
 include/acpi/ghes.h  |  1 +
 include/uapi/linux/kvm.h |  3 ++
 virt/kvm/arm/arm.c   |  7 
 27 files changed, 336 insertions(+), 9 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA

2017-11-10 Thread Dongjiu Geng
ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external
aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps
all Non-secure EL1&0 error record accesses to EL2.

This patch enables the two bits for the guest OS, guaranteeing that
KVM takes external aborts and traps attempts to access the physical
error registers.

ERRIDR_EL1 advertises the number of error records, we return
zero meaning we can treat all the other registers as RAZ/WI too.

Signed-off-by: Dongjiu Geng 
[removed specific emulation, use trap_raz_wi() directly for everything,
 rephrased parts of the commit message]
Signed-off-by: James Morse 
---
 arch/arm64/include/asm/kvm_arm.h |  2 ++
 arch/arm64/include/asm/kvm_emulate.h |  7 +++
 arch/arm64/include/asm/sysreg.h  | 10 ++
 arch/arm64/kvm/sys_regs.c| 10 ++
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 61d694c..1188272 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,8 @@
 #include 
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_TEA(UL(1) << 37)
+#define HCR_TERR   (UL(1) << 36)
 #define HCR_E2H(UL(1) << 34)
 #define HCR_ID (UL(1) << 33)
 #define HCR_CD (UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index e5df3fc..555b28b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
if (is_kernel_in_hyp_mode())
vcpu->arch.hcr_el2 |= HCR_E2H;
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+   /* route synchronous external abort exceptions to EL2 */
+   vcpu->arch.hcr_el2 |= HCR_TEA;
+   /* trap error record accesses */
+   vcpu->arch.hcr_el2 |= HCR_TERR;
+   }
+
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 64e2a80..47b967d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -169,6 +169,16 @@
 #define SYS_AFSR0_EL1  sys_reg(3, 0, 5, 1, 0)
 #define SYS_AFSR1_EL1  sys_reg(3, 0, 5, 1, 1)
 #define SYS_ESR_EL1sys_reg(3, 0, 5, 2, 0)
+
+#define SYS_ERRIDR_EL1 sys_reg(3, 0, 5, 3, 0)
+#define SYS_ERRSELR_EL1sys_reg(3, 0, 5, 3, 1)
+#define SYS_ERXFR_EL1  sys_reg(3, 0, 5, 4, 0)
+#define SYS_ERXCTLR_EL1sys_reg(3, 0, 5, 4, 1)
+#define SYS_ERXSTATUS_EL1  sys_reg(3, 0, 5, 4, 2)
+#define SYS_ERXADDR_EL1sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXMISC0_EL1   sys_reg(3, 0, 5, 5, 0)
+#define SYS_ERXMISC1_EL1   sys_reg(3, 0, 5, 5, 1)
+
 #define SYS_FAR_EL1sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1sys_reg(3, 0, 7, 4, 0)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..2b1fafa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -953,6 +953,16 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
+
+   { SYS_DESC(SYS_ERRIDR_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERRSELR_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERXFR_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERXCTLR_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERXSTATUS_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERXADDR_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
+   { SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
+
{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8

2017-11-10 Thread Dongjiu Geng
ARMv8.2 requires implementation of the RAS extension, in
this extension it adds SEI(SError Interrupt) notification
type, this patch adds new GHES error source SEI handling
functions. This error source parsing and handling method
is similar with the SEA.

Expose API ghes_notify_sei() to external users. External
modules can call this exposed API to parse APEI table and
handle the SEI notification.

Signed-off-by: Dongjiu Geng 
---
 drivers/acpi/apei/Kconfig | 15 ++
 drivers/acpi/apei/ghes.c  | 53 +++
 include/acpi/ghes.h   |  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 52ae543..ff4afc3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,21 @@ config ACPI_APEI_SEA
  option allows the OS to look for such hardware error record, and
  take appropriate action.
 
+config ACPI_APEI_SEI
+   bool "APEI SError(System Error) Interrupt logging/recovering support"
+   depends on ARM64 && ACPI_APEI_GHES
+   default y
+   help
+ This option should be enabled if the system supports
+ firmware first handling of SEI (SError interrupt).
+
+ SEI happens with asynchronous external abort for errors on device
+ memory reads on ARMv8 systems. If a system supports firmware first
+ handling of SEI, the platform analyzes and handles hardware error
+ notifications from SEI, and it may then form a hardware error record 
for
+ the OS to parse and handle. This option allows the OS to look for
+ such hardware error record, and take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
bool "APEI memory error recovering support"
depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6a3f824..67cd3a7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -855,6 +855,46 @@ static inline void ghes_sea_add(struct ghes *ghes) { }
 static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+/*
+ * Return 0 only if one of the SEI error sources successfully reported an error
+ * record sent from the firmware.
+ */
+int ghes_notify_sei(void)
+{
+   struct ghes *ghes;
+   int ret = -ENOENT;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(ghes, _sei, list) {
+   if (!ghes_proc(ghes))
+   ret = 0;
+   }
+   rcu_read_unlock();
+   return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+   mutex_lock(_list_mutex);
+   list_add_rcu(>list, _sei);
+   mutex_unlock(_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+   mutex_lock(_list_mutex);
+   list_del_rcu(>list);
+   mutex_unlock(_list_mutex);
+   synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes) { }
+static inline void ghes_sei_remove(struct ghes *ghes) { }
+#endif /* CONFIG_ACPI_APEI_SEI */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1086,6 +1126,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
goto err;
}
break;
+   case ACPI_HEST_NOTIFY_SEI:
+   if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+   pr_warn(GHES_PFX "Generic hardware error source: %d 
notified via SEI is not supported!\n",
+   generic->header.source_id);
+   goto err;
+   }
+   break;
case ACPI_HEST_NOTIFY_NMI:
if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
pr_warn(GHES_PFX "Generic hardware error source: %d 
notified via NMI interrupt is not supported!\n",
@@ -1158,6 +1205,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_SEA:
ghes_sea_add(ghes);
break;
+   case ACPI_HEST_NOTIFY_SEI:
+   ghes_sei_add(ghes);
+   break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
break;
@@ -1211,6 +1261,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_SEA:
ghes_sea_remove(ghes);
break;
+   case ACPI_HEST_NOTIFY_SEI:
+   ghes_sei_remove(ghes);
+   break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_remove(ghes);
break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c8..9ba59e2 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct 
acpi_hest_generic_data *gdata)
 section = 

[PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions

2017-11-10 Thread Dongjiu Geng
From: Xie XiuQi 

ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending.

Add cpufeature detection and a barrier in the context-switch code.
There is no need to use alternatives for this as CPUs that don't
support this feature will treat the instruction as a nop.

Platform level RAS support may require additional firmware support.

Signed-off-by: Xie XiuQi 
[Rebased, added esb and config option, reworded commit message]
Signed-off-by: James Morse 
Signed-off-by: Dongjiu Geng 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/Kconfig   | 16 
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/kernel/cpufeature.c   | 13 +
 arch/arm64/kernel/process.c  |  3 +++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..cc00d10 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -973,6 +973,22 @@ config ARM64_PMEM
  operations if DC CVAP is not supported (following the behaviour of
  DC CVAP itself if the system does not define a point of persistence).
 
+config ARM64_RAS_EXTN
+   bool "Enable support for RAS CPU Extensions"
+   default y
+   help
+ CPUs that support the Reliability, Availability and Serviceability
+ (RAS) Extensions, part of ARMv8.2 are able to track faults and
+ errors, classify them and report them to software.
+
+ On CPUs with these extensions system software can use additional
+ barriers to determine if faults are pending and read the
+ classification from a new set of registers.
+
+ Selecting this feature will allow the kernel to use these barriers
+ and access the new registers if the system supports the extension.
+ Platform RAS features may additionally depend on firmware support.
+
 endmenu
 
 config ARM64_MODULE_CMODEL_LARGE
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 0fe7e43..8b0a0eb 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -30,6 +30,7 @@
 #define isb()  asm volatile("isb" : : : "memory")
 #define dmb(opt)   asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)   asm volatile("dsb " #opt : : : "memory")
+#define esb()  asm volatile("hint #16"  : : : "memory")
 
 #define mb()   dsb(sy)
 #define rmb()  dsb(ld)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..4820d44 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_85892119
 #define ARM64_WORKAROUND_CAVIUM_30115  20
 #define ARM64_HAS_DCPOP21
+#define ARM64_HAS_RAS_EXTN 22
 
-#define ARM64_NCAPS22
+#define ARM64_NCAPS23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f707fed..64e2a80 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -332,6 +332,7 @@
 #define ID_AA64ISAR1_DPB_SHIFT 0
 
 /* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT  28
 #define ID_AA64PFR0_GIC_SHIFT  24
 #define ID_AA64PFR0_ASIMD_SHIFT20
 #define ID_AA64PFR0_FP_SHIFT   16
@@ -340,6 +341,7 @@
 #define ID_AA64PFR0_EL1_SHIFT  4
 #define ID_AA64PFR0_EL0_SHIFT  0
 
+#define ID_AA64PFR0_RAS_V1 0x1
 #define ID_AA64PFR0_FP_NI  0xf
 #define ID_AA64PFR0_FP_SUPPORTED   0x0
 #define ID_AA64PFR0_ASIMD_NI   0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 21e2c95..4846974 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -125,6 +125,7 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+   ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, 
ID_AA64PFR0_RAS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, 
ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -900,6 +901,18 @@ static bool has_no_fpsimd(const struct 
arm64_cpu_capabilities *entry, int __unus
.min_field_value = 1,
},
 #endif
+#ifdef CONFIG_ARM64_RAS_EXTN
+   {
+   

[PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest

2017-11-10 Thread Dongjiu Geng
RAS Extension add a VSESR_EL2 register which can provides
the syndrome value reported to software on taking a virtual
SError interrupt exception. This patch supports to specify
this Syndrome.

In the RAS Extensions we can not set all-zero syndrome value
for SError, which means 'RAS error: Uncategorized' instead of
'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
by default.

We also need to support userspace to specify a valid syndrome
value, Because in some case, the recovery is driven by userspace.
This patch can support that userspace can specify it.

In the guest/host world switch, restore this value to VSESR_EL2
only when HCR_EL2.VSE is set. This value no need to be saved
because it is stale vale when guest exit.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 
[Set an impdef ESR for Virtual-SError]
Signed-off-by: James Morse 
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++
 arch/arm64/include/asm/kvm_host.h|  1 +
 arch/arm64/include/asm/sysreg.h  |  3 +++
 arch/arm64/kvm/guest.c   | 11 ++-
 arch/arm64/kvm/hyp/switch.c  | 16 
 arch/arm64/kvm/inject_fault.c| 13 -
 6 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 555b28b..73c84d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu 
*vcpu)
return vcpu->arch.fault.esr_el2;
 }
 
+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+   vcpu->arch.fault.vsesr_el2 = val;
+}
+
 static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 769cc58..53d1d81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
+   u32 vsesr_el2;  /* Virtual SError Exception Syndrome Register */
 };
 
 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 47b967d..3b035cc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,6 +86,9 @@
 #define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4)
 #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3)
 
+/* virtual SError exception syndrome register */
+#define REG_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
+
 #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM |
\
  (!!x)<<8 | 0x1f)
 #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM |
\
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 738ae90..ffad42b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
 {
-   return -EINVAL;
+   u64 reg = *syndrome;
+
+   /* inject virtual system Error or asynchronous abort */
+   kvm_inject_vabt(vcpu);
+
+   if (reg)
+   /* set vsesr_el2[24:0] with value that user space specified */
+   kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
+
+   return 0;
 }
 
 int __attribute_const__ kvm_target_cpu(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c6f17c7..06a71d2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,14 @@ static hyp_alternate_select(__activate_traps_arch,
__activate_traps_nvhe, __activate_traps_vhe,
ARM64_HAS_VIRT_HOST_EXTN);
 
+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu, u64 value)
+{
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+  (value & HCR_VSE))
+   write_sysreg_s(kvm_vcpu_get_vsesr(vcpu), REG_VSESR_EL2);
+}
+
+
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
u64 val;
@@ -86,6 +94,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
*vcpu)
isb();
}
write_sysreg(val, hcr_el2);
+
+   /*
+* If the virtual SError interrupt is taken to EL1 using AArch64,
+* then VSESR_EL2 provides the syndrome value reported in ISS field
+* of ESR_EL1.
+*/
+   __sysreg_set_vsesr(vcpu, val);
+

[PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2017-11-10 Thread Dongjiu Geng
The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
guest and user space needs a way to tell KVM this value. So we add a
new ioctl. Before user space specifies the Exception Syndrome Register
ESR(ESR), it firstly checks that whether KVM has the capability to
set the guest ESR, If has, will set it. Otherwise, nothing to do.

For this ESR specifying, Only support for AArch64, not support AArch32.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 

change the name to KVM_CAP_ARM_INJECT_SERROR_ESR instead of
X_ARM_RAS_EXTENSION, suggested here

https://patchwork.kernel.org/patch/9925203/
---
 Documentation/virtual/kvm/api.txt | 11 +++
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/guest.c  |  9 +
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/guest.c|  5 +
 arch/arm64/kvm/reset.c|  3 +++
 include/uapi/linux/kvm.h  |  3 +++
 virt/kvm/arm/arm.c|  7 +++
 8 files changed, 40 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index e63a35f..6dfb9fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4347,3 +4347,14 @@ This capability indicates that userspace can load 
HV_X64_MSR_VP_INDEX msr.  Its
 value is used to denote the target vcpu for a SynIC interrupt.  For
 compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
 capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify syndrome value reported to
+guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the 
ESR
+syndrome, can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be 
reported
+into ISS filed of ESR_EL1
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 4a879f6..6cf5c7b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -211,6 +211,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 1e0784e..1e15fa2 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+/*
+ * we only support guest SError syndrome specifying
+ * for ARM64, not support it for ARM32.
+ */
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+   return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e923b58..769cc58 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -317,6 +317,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..738ae90 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+   return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_ARM_PMU_V3:
r = kvm_arm_support_pmu_v3();
break;
+   case KVM_CAP_ARM_INJECT_SERROR_ESR:
+   r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+   break;
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7e9..0c861c4 100644
--- a/include/uapi/linux/kvm.h
+++ 

[PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2017-11-10 Thread Dongjiu Geng
If it is not RAS SError, directly inject virtual SError,
which will keep the old way. If it is RAS SError, firstly
let host ACPI module to handle it. For the ACPI handling,
if the error address is invalid, APEI driver will not
identify the address to hwpoison memory and can not notify
guest to do the recovery. In order to safe, KVM continues
categorizing errors and handle it separately.

If the RAS error is not propagated, let host user space to
handle it. The reason is that sometimes we can only kill the
guest effected application instead of panic whose guest OS.
Host user space specifies a valid ESR and inject virtual
SError, guest can just kill the current application if the
non-consumed error coming from guest application.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 
---
 arch/arm64/include/asm/esr.h | 15 
 arch/arm64/include/asm/kvm_asm.h |  3 ++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kvm/handle_exit.c | 67 +---
 arch/arm64/mm/fault.c| 16 +
 5 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 66ed8b6..aca7eee 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -102,6 +102,7 @@
 #define ESR_ELx_FSC_ACCESS (0x08)
 #define ESR_ELx_FSC_FAULT  (0x04)
 #define ESR_ELx_FSC_PERM   (0x0C)
+#define ESR_ELx_FSC_SERROR (0x11)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT  (24)
@@ -119,6 +120,20 @@
 #define ESR_ELx_CM_SHIFT   (8)
 #define ESR_ELx_CM (UL(1) << ESR_ELx_CM_SHIFT)
 
+/* ISS field definitions for SError interrupt */
+#define ESR_ELx_AET_SHIFT  (10)
+#define ESR_ELx_AET(UL(0x7) << ESR_ELx_AET_SHIFT)
+/* Uncontainable error */
+#define ESR_ELx_AET_UC (UL(0) << ESR_ELx_AET_SHIFT)
+/* Unrecoverable error */
+#define ESR_ELx_AET_UEU(UL(1) << ESR_ELx_AET_SHIFT)
+/* Restartable error */
+#define ESR_ELx_AET_UEO(UL(2) << ESR_ELx_AET_SHIFT)
+/* Recoverable error */
+#define ESR_ELx_AET_UER(UL(3) << ESR_ELx_AET_SHIFT)
+/* Corrected */
+#define ESR_ELx_AET_CE (UL(6) << ESR_ELx_AET_SHIFT)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..884f723 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,9 @@
 #define ARM_EXCEPTION_IRQ0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP   2
+/* Error code for SError Interrupt (SEI) exception */
+#define KVM_SEI_SEV_RECOVERABLE  1
+
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE   HVC_STUB_ERR
 
diff --git a/arch/arm64/include/asm/system_misc.h 
b/arch/arm64/include/asm/system_misc.h
index 07aa8e3..9ee13ad 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,6 +57,7 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, 
unsigned int,
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+int handle_guest_sei(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 7debb74..1afdc87 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu 
*vcpu)
return arm_exit_handlers[hsr_ec];
 }
 
+/**
+ * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
+ * @vcpu:  the VCPU pointer
+ *
+ * For RAS SError interrupt, firstly let host kernel handle it.
+ * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
+ */
+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+   bool impdef_syndrome =  esr & ESR_ELx_ISV;  /* aka IDS */
+   unsigned int aet = esr & ESR_ELx_AET;
+
+   /*
+* This is not RAS SError
+*/
+   if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+   kvm_inject_vabt(vcpu);
+   return 1;
+   }
+
+   /* The host kernel may handle this abort. */
+   handle_guest_sei();
+
+   /*
+* In below two conditions, it will directly inject the
+* virtual SError:
+* 1. The Syndrome is IMPLEMENTATION DEFINED
+* 2. It is Uncategorized SEI
+*/
+   if (impdef_syndrome ||
+   ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+   kvm_inject_vabt(vcpu);
+   return 1;
+   }

Re: [PATCH v9 44/51] selftest/vm: powerpc implementation for generic abstraction

2017-11-10 Thread Breno Leitao
Hi Ram,

On Thu, Nov 09, 2017 at 03:37:46PM -0800, Ram Pai wrote:
> On Thu, Nov 09, 2017 at 04:47:15PM -0200, Breno Leitao wrote:
> > On Mon, Nov 06, 2017 at 12:57:36AM -0800, Ram Pai wrote:
> > > @@ -206,12 +209,14 @@ void signal_handler(int signum, siginfo_t *si, void 
> > > *vucontext)
> > >  
> > >   trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
> > >   ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
> > > - fpregset = uctxt->uc_mcontext.fpregs;
> > > - fpregs = (void *)fpregset;
> > 
> > Since you removed all references for fpregset now, you probably want to
> > remove the declaration of the variable above.
> 
> fpregs is still needed.

Right, fpregs is still needed, but not fpregset. Every reference for this
variable was removed with your patch.

Grepping this variable identifier on a tree with your patches, I see:

 $ grep fpregset protection_keys.c 
 fpregset_t fpregset;
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag

2017-11-10 Thread Ulf Hansson
On 8 November 2017 at 14:25, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> instruct the PM core and middle-layer (bus type, PM domain, etc.)
> code that it is desirable to leave the device in runtime suspend
> after system-wide transitions to the working state (for example,
> the device may be slow to resume and it may be better to avoid
> resuming it right away).
>
> Generally, the middle-layer code involved in the handling of the
> device is expected to indicate to the PM core whether or not the
> device may be left in suspend with the help of the device's
> power.may_skip_resume status bit.  That has to happen in the "noirq"
> phase of the preceding system suspend (or analogous) transition.
> The middle layer is then responsible for handling the device as
> appropriate in its "noirq" resume callback which is executed
> regardless of whether or not the device may be left suspended, but
> the other resume callbacks (except for ->complete) will be skipped
> automatically by the core if the device really can be left in
> suspend.

I don't understand the reason to why you need to skip invoking resume
callbacks to achieve this behavior, could you elaborate on that?

Couldn't the PM domain or the middle-layer instead decide what to do?
To me it sounds a bit prone to errors by skipping callbacks from the
PM core, and I wonder if the general driver author will be able to
understand how to use this flag properly.

That said, as the series don't include any changes for drivers making
use of the flag, could please fold in such change as it would provide
a more complete picture?

>
> The additional power.must_resume status bit introduced for the
> implementation of this mechanisn is used internally by the PM core
> to track the requirement to resume the device (which may depend on
> its children etc).

Yeah, clearly the PM core needs to be involved, because of the need of
dealing with parent/child relations, however as kind of indicate
above, couldn't the PM core just set some flag/status bits, which
instructs the middle-layer and PM domain on what to do? That sounds
like an easier approach.

>
> Signed-off-by: Rafael J. Wysocki 
> Acked-by: Greg Kroah-Hartman 
> ---
>  Documentation/driver-api/pm/devices.rst |   24 ++-
>  drivers/base/power/main.c   |   65 
> +---
>  include/linux/pm.h  |   14 +-
>  3 files changed, 93 insertions(+), 10 deletions(-)
>
> Index: linux-pm/include/linux/pm.h
> ===
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if 
> possible.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 
> from
> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>   * necessary from the driver's perspective.  It also may cause them to skip
>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided 
> by
>   * the driver if they decide to leave the device in runtime suspend.
> + *
> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> + * driver prefers the device to be left in runtime suspend after system 
> resume.
>   */
> -#define DPM_FLAG_NEVER_SKIPBIT(0)
> -#define DPM_FLAG_SMART_PREPARE BIT(1)
> -#define DPM_FLAG_SMART_SUSPEND BIT(2)
> +#define DPM_FLAG_NEVER_SKIPBIT(0)
> +#define DPM_FLAG_SMART_PREPARE BIT(1)
> +#define DPM_FLAG_SMART_SUSPEND BIT(2)
> +#define DPM_FLAG_LEAVE_SUSPENDED   BIT(3)
>
>  struct dev_pm_info {
> pm_message_tpower_state;
> @@ -597,6 +602,8 @@ struct dev_pm_info {
> boolwakeup_path:1;
> boolsyscore:1;
> boolno_pm_callbacks:1;  /* Owned by the PM 
> core */
> +   unsigned intmust_resume:1;  /* Owned by the PM core */
> +   unsigned intmay_skip_resume:1;  /* Set by subsystems 
> */
>  #else
> unsigned intshould_wakeup:1;
>  #endif
> @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
>  extern int pm_generic_poweroff(struct device *dev);
>  extern void pm_generic_complete(struct device *dev);
>
> +extern bool dev_pm_may_skip_resume(struct device *dev);
>  extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
>
>  #else /* 

Re: [v4,3/6] pmbus: core: Add fan control support

2017-11-10 Thread Guenter Roeck

On 11/09/2017 07:03 PM, Andrew Jeffery wrote:

On Sun, 2017-11-05 at 06:39 -0800, Guenter Roeck wrote:

On Fri, Nov 03, 2017 at 03:53:03PM +1100, Andrew Jeffery wrote:

Expose fanX_target, pwmX and pwmX_enable hwmon sysfs attributes.

Fans in a PMBus device are driven by the configuration of two registers:
FAN_CONFIG_x_y and FAN_COMMAND_x: FAN_CONFIG_x_y dictates how the fan
and the tacho operate (if installed), while FAN_COMMAND_x sets the
desired fan rate. The unit of FAN_COMMAND_x is dependent on the
operational fan mode, RPM or PWM percent duty, as determined by the
corresponding FAN_CONFIG_x_y.

The mapping of fanX_target, pwmX and pwmX_enable onto FAN_CONFIG_x_y and
FAN_COMMAND_x is implemented with the addition of virtual registers and
generic implementations in the core:

1. PMBUS_VIRT_FAN_TARGET_x
2. PMBUS_VIRT_PWM_x
3. PMBUS_VIRT_PWM_ENABLE_x

The virtual registers facilitate the necessary side-effects of each
access. Examples of the read case, assuming m = 1, b = 0, R = 0:

  Read |  With  || Gives
  ---
Attribute  | FAN_CONFIG_x_y | FAN_COMMAND_y || Value
  --++---
   fanX_target | ~PB_FAN_z_RPM  | 0x0001|| 1
   pwm1| ~PB_FAN_z_RPM  | 0x0064|| 255
   pwmX_enable | ~PB_FAN_z_RPM  | 0x0001|| 1
   fanX_target |  PB_FAN_z_RPM  | 0x0001|| 1
   pwm1|  PB_FAN_z_RPM  | 0x0064|| 0
   pwmX_enable |  PB_FAN_z_RPM  | 0x0001|| 1

And the write case:

  Write| With  ||   Sets
  -+---+++---
Attribute  | Value || FAN_CONFIG_x_y | FAN_COMMAND_x
  -+---+++---
   fanX_target | 1 ||  PB_FAN_z_RPM  | 0x0001
   pwmX| 255   || ~PB_FAN_z_RPM  | 0x0064
   pwmX_enable | 1 || ~PB_FAN_z_RPM  | 0x0064

Also, the DIRECT mode scaling of some controllers is different between
RPM and PWM percent duty control modes, so PSC_PWM is introduced to
capture the necessary coefficients.

Signed-off-by: Andrew Jeffery 
---
  drivers/hwmon/pmbus/pmbus.h  |  29 +
  drivers/hwmon/pmbus/pmbus_core.c | 224 ---
  2 files changed, 238 insertions(+), 15 deletions(-)

diff --git a/drivers/hwmon/pmbus/pmbus.h b/drivers/hwmon/pmbus/pmbus.h
index 4efa2bd4f6d8..cdf3e288e626 100644
--- a/drivers/hwmon/pmbus/pmbus.h
+++ b/drivers/hwmon/pmbus/pmbus.h
@@ -190,6 +190,28 @@ enum pmbus_regs {
PMBUS_VIRT_VMON_UV_FAULT_LIMIT,
PMBUS_VIRT_VMON_OV_FAULT_LIMIT,
PMBUS_VIRT_STATUS_VMON,
+
+   /*
+* RPM and PWM Fan control
+*
+* Drivers wanting to expose PWM control must define the behaviour of
+* PMBUS_VIRT_PWM_ENABLE_[1-4] in the {read,write}_word_data callback.
+*
+* pmbus core provides default implementations for
+* PMBUS_VIRT_FAN_TARGET_[1-4] and PMBUS_VIRT_PWM_[1-4].
+*/
+   PMBUS_VIRT_FAN_TARGET_1,
+   PMBUS_VIRT_FAN_TARGET_2,
+   PMBUS_VIRT_FAN_TARGET_3,
+   PMBUS_VIRT_FAN_TARGET_4,
+   PMBUS_VIRT_PWM_1,
+   PMBUS_VIRT_PWM_2,
+   PMBUS_VIRT_PWM_3,
+   PMBUS_VIRT_PWM_4,
+   PMBUS_VIRT_PWM_ENABLE_1,
+   PMBUS_VIRT_PWM_ENABLE_2,
+   PMBUS_VIRT_PWM_ENABLE_3,
+   PMBUS_VIRT_PWM_ENABLE_4,
  };
  
  /*

@@ -223,6 +245,8 @@ enum pmbus_regs {
  #define PB_FAN_1_RPM  BIT(6)
  #define PB_FAN_1_INSTALLEDBIT(7)
  
+enum pmbus_fan_mode { percent = 0, rpm };

+
  /*
   * STATUS_BYTE, STATUS_WORD (lower)
   */
@@ -313,6 +337,7 @@ enum pmbus_sensor_classes {
PSC_POWER,
PSC_TEMPERATURE,
PSC_FAN,
+   PSC_PWM,
PSC_NUM_CLASSES /* Number of power sensor classes */
  };
  
@@ -339,6 +364,8 @@ enum pmbus_sensor_classes {

  #define PMBUS_HAVE_STATUS_FAN34   BIT(17)
  #define PMBUS_HAVE_VMON   BIT(18)
  #define PMBUS_HAVE_STATUS_VMONBIT(19)
+#define PMBUS_HAVE_PWM12   BIT(20)
+#define PMBUS_HAVE_PWM34   BIT(21)
  
  enum pmbus_data_format { linear = 0, direct, vid };

  enum vrm_version { vr11 = 0, vr12, vr13 };
@@ -413,6 +440,8 @@ int pmbus_write_byte_data(struct i2c_client *client, int 
page, u8 reg,
  u8 value);
  int pmbus_update_byte_data(struct i2c_client *client, int page, u8 reg,
   u8 mask, u8 value);
+int pmbus_update_fan(struct i2c_client *client, int page, int id,
+u8 config, u8 mask, u16 command);
  void pmbus_clear_faults(struct i2c_client *client);
  bool pmbus_check_byte_register(struct i2c_client *client, int page, int reg);
  bool pmbus_check_word_register(struct i2c_client *client, int page, int reg);
diff --git 

Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set

2017-11-10 Thread Paolo Bonzini
On 10/11/2017 07:07, Wanpeng Li wrote:
>>> You should also add a cpuid flag in kvm part.
>> It is better without that.  The flag has no dependency on KVM (kernel
>> hypervisor) code.
> Do you mean -cpu host, +,I think it will result in "warning: host
> doesn't support requested feature: CPUID.4001H:eax.XX"

There are some exceptions where QEMU overrides the values of
KVM_GET_SUPPORTED_CPUID.

I think it is better to add the flag to KVM *and* to QEMU's override in
kvm_arch_get_supported_cpuid (target/i386/kvm.c).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4,4/6] pmbus: max31785: Add fan control

2017-11-10 Thread Guenter Roeck

On 11/09/2017 07:10 PM, Andrew Jeffery wrote:

On Sun, 2017-11-05 at 07:04 -0800, Guenter Roeck wrote:

On Fri, Nov 03, 2017 at 03:53:04PM +1100, Andrew Jeffery wrote:

The implementation makes use of the new fan control virtual registers
exposed by the pmbus core. It mixes use of the default implementations
with some overrides via the read/write handlers to handle FAN_COMMAND_1
on the MAX31785, whose definition breaks the value range into various
control bands dependent on RPM or PWM mode.

Signed-off-by: Andrew Jeffery 
---
  Documentation/hwmon/max31785   |   4 ++
  drivers/hwmon/pmbus/max31785.c | 104 -
  2 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/Documentation/hwmon/max31785 b/Documentation/hwmon/max31785
index 45fb6093dec2..e9edbf11948f 100644
--- a/Documentation/hwmon/max31785
+++ b/Documentation/hwmon/max31785
@@ -32,6 +32,7 @@ Sysfs attributes
  fan[1-4]_alarmFan alarm.
  fan[1-4]_faultFan fault.
  fan[1-4]_inputFan RPM.
+fan[1-4]_targetFan input target
  
  in[1-6]_crit		Critical maximum output voltage

  in[1-6]_crit_alarmOutput voltage critical high alarm
@@ -44,6 +45,9 @@ in[1-6]_max_alarm Output voltage high alarm
  in[1-6]_min   Minimum output voltage
  in[1-6]_min_alarm Output voltage low alarm
  
+pwm[1-4]		Fan target duty cycle (0..255)

+pwm[1-4]_enable0: full-speed, 1: manual control, 2: automatic
+
  temp[1-11]_crit   Critical high temperature
  temp[1-11]_crit_alarm Chip temperature critical high alarm
  temp[1-11]_input  Measured temperature
diff --git a/drivers/hwmon/pmbus/max31785.c b/drivers/hwmon/pmbus/max31785.c
index 9313849d5160..0d97ddf67079 100644
--- a/drivers/hwmon/pmbus/max31785.c
+++ b/drivers/hwmon/pmbus/max31785.c
@@ -20,8 +20,102 @@ enum max31785_regs {
  
  #define MAX31785_NR_PAGES		23
  
+static int max31785_get_pwm(struct i2c_client *client, int page)

+{
+   int config;
+   int command;
+
+   config = pmbus_read_byte_data(client, page, PMBUS_FAN_CONFIG_12);
+   if (config < 0)
+   return config;
+
+   command = pmbus_read_word_data(client, page, PMBUS_FAN_COMMAND_1);
+   if (command < 0)
+   return command;
+
+   if (!(config & PB_FAN_1_RPM)) {
+   if (command >= 0x8000)
+   return 0;
+   else if (command >= 0x2711)
+   return 0x2710;
+
+   return command;
+   }
+
+   return 0;
+}
+
+static int max31785_get_pwm_mode(struct i2c_client *client, int page)
+{
+   int config;
+   int command;
+
+   config = pmbus_read_byte_data(client, page, PMBUS_FAN_CONFIG_12);
+   if (config < 0)
+   return config;
+
+   command = pmbus_read_word_data(client, page, PMBUS_FAN_COMMAND_1);
+   if (command < 0)
+   return command;
+
+   if (!(config & PB_FAN_1_RPM)) {
+   if (command >= 0x8000)
+   return 2;
+   else if (command >= 0x2711)
+   return 0;
+
+   return 1;
+   }
+
+   return (command >= 0x8000) ? 2 : 1;
+}
+
+static int max31785_read_word_data(struct i2c_client *client, int page,
+  int reg)
+{
+   int rv;
+
+   switch (reg) {
+   case PMBUS_VIRT_PWM_1:
+   rv = max31785_get_pwm(client, page);
+   if (rv < 0)
+   return rv;
+
+   rv *= 255;
+   rv /= 100;
+   break;
+   case PMBUS_VIRT_PWM_ENABLE_1:
+   rv = max31785_get_pwm_mode(client, page);
+   break;


I do wonder ... does it even make sense to specify generic code
for the new virtual attributes in the pmbus core code, or would
it be better to have it all in this driver, at least for now ?


I think I'll pull the generic implementations out in light of my
response on 3/6. At best the generic implementation for the PWM virtual
regs is a guess. We can always put it back if others come to need it.



SGTM.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set

2017-11-10 Thread Wanpeng Li
2017-11-10 15:59 GMT+08:00 Peter Zijlstra :
> On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote:
>
>> >> Also, you should not put cpumask_t on stack, that's 'broken'.
>>
>> Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
>>
>> /* These two declarations are only used in 
>> check_irq_vectors_for_cpu_disable()
>>  * below, which is protected by stop_machine().  Putting them on the stack
>>  * results in a stack frame overflow.  Dynamically allocating could result 
>> in a
>>  * failure so declare these two cpumasks as global.
>>  */
>> static struct cpumask affinity_new, online_new;
>
> That code no longer exists.. Also not entirely sure how it would be
> helpful.
>
> What you probably want to do is have a per-cpu cpumask, since
> flush_tlb_others() is called with preemption disabled. But you probably
> don't want an unconditionally allocated one, since most kernels will not
> in fact be PV.
>
> So you'll want something like:
>
> static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
>
> And then you need something like:
>
> for_each_possible_cpu(cpu) {
> zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu),
> GFP_KERNEL, cpu_to_node(cpu));
> }
>
> before you set the pv-op or so.

Thanks Peterz, :) I will have a try.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html