Re: [RFC PATCH] perf/core: don't sample kernel regs upon skid

2018-07-10 Thread Mark Rutland
On Mon, Jul 09, 2018 at 06:42:29PM -0400, Boris Ostrovsky wrote:
> On 07/02/2018 12:02 PM, Mark Rutland wrote:
> > On Mon, Jul 02, 2018 at 05:46:55PM +0200, Peter Zijlstra wrote:
> >> On Mon, Jul 02, 2018 at 04:12:50PM +0100, Mark Rutland wrote:
> >>> +static struct pt_regs *perf_get_sample_regs(struct perf_event *event,
> >>> + struct pt_regs *regs)
> >>> +{
> >>> + /*
> >>> +  * Due to interrupt latency (AKA "skid"), we may enter the kernel
> >>> +  * before taking an overflow, even if the PMU is only counting user
> >>> +  * events.
> >>> +  *
> >>> +  * If we're not counting kernel events, always use the user regs when
> >>> +  * sampling.
> >>> +  *
> >>> +  * TODO: what do we do about sampling a guest's registers? The IP is
> >>> +  * special-cased, but for the rest of the regs they'll get the
> >>> +  * user/kernel regs depending on whether exclude_kernel is set, which
> >>> +  * is nonsensical.
> >>> +  *
> >>> +  * We can't get at the full set of regs in all cases (e.g. Xen's PV PMU
> >>> +  * can't provide the GPRs), so should we just zero the GPRs when in a
> >>> +  * guest? Or skip outputting the regs in perf_output_sample?
> >> Seems daft Xen cannot provide registers; why is that? Boris?
> > The xen_pmu_regs structure simply doesn't have them, so I assume there's
> > no API to get them.
> >
> > Given we don't currently sample the guest regs, I'd be tempted to just
> > zero them for now, or skip the sample at output time (if that doesn't
> > break some other case).
> 
> (Was out on vacation, couldn't respond earlier)
> 
> Yes, PV guests only get a limited set of registers passed to the handler
> by the hypervisor. GPRs are not part of this set.

Is that also true for Dom0?

> I think we need do
> 
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index 7d00d4a..95997e6 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -478,7 +478,7 @@ static void xen_convert_regs(const struct
> xen_pmu_regs *xen_regs,
>  irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
>  {
>     int err, ret = IRQ_NONE;
> -   struct pt_regs regs;
> +   struct pt_regs regs = {0};
>     const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
>     uint8_t xenpmu_flags = get_xenpmu_flags();
> 
> 
> Do you want me to submit a separate patch or can you make this part of
> yours?

I think this is going to become a series rather than a single patch, but I can
have a go. I need to get my head around how the various cases interact with
each other.

Thanks,
Mark.


Re: [RFC PATCH] perf/core: don't sample kernel regs upon skid

2018-07-10 Thread Mark Rutland
On Mon, Jul 09, 2018 at 06:42:29PM -0400, Boris Ostrovsky wrote:
> On 07/02/2018 12:02 PM, Mark Rutland wrote:
> > On Mon, Jul 02, 2018 at 05:46:55PM +0200, Peter Zijlstra wrote:
> >> On Mon, Jul 02, 2018 at 04:12:50PM +0100, Mark Rutland wrote:
> >>> +static struct pt_regs *perf_get_sample_regs(struct perf_event *event,
> >>> + struct pt_regs *regs)
> >>> +{
> >>> + /*
> >>> +  * Due to interrupt latency (AKA "skid"), we may enter the kernel
> >>> +  * before taking an overflow, even if the PMU is only counting user
> >>> +  * events.
> >>> +  *
> >>> +  * If we're not counting kernel events, always use the user regs when
> >>> +  * sampling.
> >>> +  *
> >>> +  * TODO: what do we do about sampling a guest's registers? The IP is
> >>> +  * special-cased, but for the rest of the regs they'll get the
> >>> +  * user/kernel regs depending on whether exclude_kernel is set, which
> >>> +  * is nonsensical.
> >>> +  *
> >>> +  * We can't get at the full set of regs in all cases (e.g. Xen's PV PMU
> >>> +  * can't provide the GPRs), so should we just zero the GPRs when in a
> >>> +  * guest? Or skip outputting the regs in perf_output_sample?
> >> Seems daft Xen cannot provide registers; why is that? Boris?
> > The xen_pmu_regs structure simply doesn't have them, so I assume there's
> > no API to get them.
> >
> > Given we don't currently sample the guest regs, I'd be tempted to just
> > zero them for now, or skip the sample at output time (if that doesn't
> > break some other case).
> 
> (Was out on vacation, couldn't respond earlier)
> 
> Yes, PV guests only get a limited set of registers passed to the handler
> by the hypervisor. GPRs are not part of this set.

Is that also true for Dom0?

> I think we need do
> 
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index 7d00d4a..95997e6 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -478,7 +478,7 @@ static void xen_convert_regs(const struct
> xen_pmu_regs *xen_regs,
>  irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
>  {
>     int err, ret = IRQ_NONE;
> -   struct pt_regs regs;
> +   struct pt_regs regs = {0};
>     const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
>     uint8_t xenpmu_flags = get_xenpmu_flags();
> 
> 
> Do you want me to submit a separate patch or can you make this part of
> yours?

I think this is going to become a series rather than a single patch, but I can
have a go. I need to get my head around how the various cases interact with
each other.

Thanks,
Mark.


Re: [PATCH] tracing/irqtrace: only call trace_hardirqs_on/off when state changes

2018-07-10 Thread Joel Fernandes
On Tue, Jul 10, 2018 at 09:44:57PM -0400, Steven Rostedt wrote:
> On Wed, 2 May 2018 10:12:14 +1000
> Nicholas Piggin  wrote:
> 
> > > I have mixed feelings about this patch, I am Ok with this patch but I
> > > suggest its sent with the follow-up patch that shows its use of this.
> > > And also appreciate if such a follow-up patch is rebased onto the IRQ
> > > tracepoint work: https://patchwork.kernel.org/patch/10373129/
> > > 
> > > What do you think?  
> > 
> > I'll try to dig it up and resend. Thanks for the feedback on it.
> 
> Joel,
> 
> With your latest patches, is this obsolete?

Yes, that's right. This patch isn't needed for what I was doing (improving
the performance of the irq disable/enable path). I didn't see any performance
improvement with this patch. Instead, the patches in my series use SRCU to
improve the performance of the tracepoints.

thanks,

- Joel


[RFC PATCH] mm, page_alloc: double zone's batchsize

2018-07-10 Thread Aaron Lu
To improve page allocator's performance for order-0 pages, each CPU has
a Per-CPU-Pageset(PCP) per zone. Whenever an order-0 page is needed,
PCP will be checked first before asking pages from Buddy. When PCP is
used up, a batch of pages will be fetched from Buddy to improve
performance and the size of batch can affect performance.

zone's batch size gets doubled last time by commit ba56e91c9401("mm:
page_alloc: increase size of per-cpu-pages") over ten years ago. Since
then, CPU has envolved a lot and CPU's cache sizes also increased.

Dave Hansen is concerned the current batch size doesn't fit well with
modern hardware and suggested me to do two things: first, use a page
allocator intensive benchmark, e.g. will-it-scale/page_fault1 to find
out how performance changes with different batch sizes on various
machines and then choose a new default batch size; second, see how
this new batch size work with other workloads.

>From the first test, we saw performance gains on high-core-count systems
and little to no effect on older systems with more modest core counts.
>From this phase's test data, two candidates: 63 and 127 are chosen.

In the second step, ebizzy, oltp, kbuild, pigz, netperf, vm-scalability
and more will-it-scale sub-tests are tested to see how these two
candidates work with these workloads and decides a new default
according to their results.

Most test results are flat. will-it-scale/page_fault2 process mode has
10%-18% performance increase on 4-sockets Skylake and Broadwell.
vm-scalability/lru-file-mmap-read has 17%-47% performance increase for
4-sockets servers while for 2-sockets servers, it caused 3%-8%
performance drop. Further analysis showed that, with a larger pcp->batch
and thus larger pcp->high(the relationship of pcp->high=6 * pcp->batch
is maintained in this patch), zone lock contention shifted to LRU add
side lock contention and that caused performance drop. This performance
drop might be mitigated by others' work on optimizing LRU lock.

Another downside of increasing pcp->batch is, when PCP is used up and
need to fetch a batch of pages from Buddy, since batch is increased,
that time can be longer than before. My understanding is, this doesn't
affect slowpath where direct reclaim and compaction dominates. For
fastpath, throughput is a win(according to will-it-scale/page_fault1)
but worst latency can be larger now.

Overall, I think double the batch size from 31 to 63 is relatively
safe and provide good performance boost for high-core-count systems.

The two phase's test results are listed below(all tests are done with
THP disabled).

Phase one(will-it-scale/page_fault1) test results:

Skylake-EX: increased batch size has a good effect on zone->lock
contention, though LRU contention will rise at the same time and
limited the final performance increase.

batch   score change   zone_contention   lru_contention   total_contention
 31   15345900+0.00%   64% 8%   72%
 53   17903847   +16.67%   32%38%   70%
 63   17992886   +17.25%   24%45%   69%
 73   18022825   +17.44%   10%61%   71%
119   18023401   +17.45%4%66%   70%
127   18029012   +17.48%3%66%   69%
137   18036075   +17.53%4%66%   70%
165   18035964   +17.53%2%67%   69%
188   18101105   +17.95%2%67%   69%
223   18130951   +18.15%2%67%   69%
255   18118898   +18.07%2%67%   69%
267   18101559   +17.96%2%67%   69%
299   18160468   +18.34%2%68%   70%
320   18139845   +18.21%2%67%   69%
393   18160869   +18.34%2%68%   70%
424   18170999   +18.41%2%68%   70%
458   18144868   +18.24%2%68%   70%
467   18142366   +18.22%2%68%   70%
498   18154549   +18.30%1%68%   69%
511   18134525   +18.17%1%69%   70%

Broadwell-EX: similar pattern as Skylake-EX.

batch   score change   zone_contention   lru_contention   total_contention
 31   16703983+0.00%   67% 7%   74%
 53   18195393+8.93%   43%28%   71%
 63   1825+9.49%   38%33%   71%
 73   18344329+9.82%   35%37%   72%
119   18535529   +10.96%   24%46%   70%
127   18513596   +10.83%   23%48%   71%
137   18514327   +10.84%   23%48%   71%
165   18511840   +10.82%   22%49%   71%
188   18593478   +11.31%   17%53% 

Re: [PATCH] tracing/irqtrace: only call trace_hardirqs_on/off when state changes

2018-07-10 Thread Joel Fernandes
On Tue, Jul 10, 2018 at 09:44:57PM -0400, Steven Rostedt wrote:
> On Wed, 2 May 2018 10:12:14 +1000
> Nicholas Piggin  wrote:
> 
> > > I have mixed feelings about this patch, I am Ok with this patch but I
> > > suggest its sent with the follow-up patch that shows its use of this.
> > > And also appreciate if such a follow-up patch is rebased onto the IRQ
> > > tracepoint work: https://patchwork.kernel.org/patch/10373129/
> > > 
> > > What do you think?  
> > 
> > I'll try to dig it up and resend. Thanks for the feedback on it.
> 
> Joel,
> 
> With your latest patches, is this obsolete?

Yes, that's right. This patch isn't needed for what I was doing (improving
the performance of the irq disable/enable path). I didn't see any performance
improvement with this patch. Instead, the patches in my series use SRCU to
improve the performance of the tracepoints.

thanks,

- Joel


[RFC PATCH] mm, page_alloc: double zone's batchsize

2018-07-10 Thread Aaron Lu
To improve page allocator's performance for order-0 pages, each CPU has
a Per-CPU-Pageset(PCP) per zone. Whenever an order-0 page is needed,
PCP will be checked first before asking pages from Buddy. When PCP is
used up, a batch of pages will be fetched from Buddy to improve
performance and the size of batch can affect performance.

zone's batch size gets doubled last time by commit ba56e91c9401("mm:
page_alloc: increase size of per-cpu-pages") over ten years ago. Since
then, CPU has envolved a lot and CPU's cache sizes also increased.

Dave Hansen is concerned the current batch size doesn't fit well with
modern hardware and suggested me to do two things: first, use a page
allocator intensive benchmark, e.g. will-it-scale/page_fault1 to find
out how performance changes with different batch sizes on various
machines and then choose a new default batch size; second, see how
this new batch size work with other workloads.

>From the first test, we saw performance gains on high-core-count systems
and little to no effect on older systems with more modest core counts.
>From this phase's test data, two candidates: 63 and 127 are chosen.

In the second step, ebizzy, oltp, kbuild, pigz, netperf, vm-scalability
and more will-it-scale sub-tests are tested to see how these two
candidates work with these workloads and decides a new default
according to their results.

Most test results are flat. will-it-scale/page_fault2 process mode has
10%-18% performance increase on 4-sockets Skylake and Broadwell.
vm-scalability/lru-file-mmap-read has 17%-47% performance increase for
4-sockets servers while for 2-sockets servers, it caused 3%-8%
performance drop. Further analysis showed that, with a larger pcp->batch
and thus larger pcp->high(the relationship of pcp->high=6 * pcp->batch
is maintained in this patch), zone lock contention shifted to LRU add
side lock contention and that caused performance drop. This performance
drop might be mitigated by others' work on optimizing LRU lock.

Another downside of increasing pcp->batch is, when PCP is used up and
need to fetch a batch of pages from Buddy, since batch is increased,
that time can be longer than before. My understanding is, this doesn't
affect slowpath where direct reclaim and compaction dominates. For
fastpath, throughput is a win(according to will-it-scale/page_fault1)
but worst latency can be larger now.

Overall, I think double the batch size from 31 to 63 is relatively
safe and provide good performance boost for high-core-count systems.

The two phase's test results are listed below(all tests are done with
THP disabled).

Phase one(will-it-scale/page_fault1) test results:

Skylake-EX: increased batch size has a good effect on zone->lock
contention, though LRU contention will rise at the same time and
limited the final performance increase.

batch   score change   zone_contention   lru_contention   total_contention
 31   15345900+0.00%   64% 8%   72%
 53   17903847   +16.67%   32%38%   70%
 63   17992886   +17.25%   24%45%   69%
 73   18022825   +17.44%   10%61%   71%
119   18023401   +17.45%4%66%   70%
127   18029012   +17.48%3%66%   69%
137   18036075   +17.53%4%66%   70%
165   18035964   +17.53%2%67%   69%
188   18101105   +17.95%2%67%   69%
223   18130951   +18.15%2%67%   69%
255   18118898   +18.07%2%67%   69%
267   18101559   +17.96%2%67%   69%
299   18160468   +18.34%2%68%   70%
320   18139845   +18.21%2%67%   69%
393   18160869   +18.34%2%68%   70%
424   18170999   +18.41%2%68%   70%
458   18144868   +18.24%2%68%   70%
467   18142366   +18.22%2%68%   70%
498   18154549   +18.30%1%68%   69%
511   18134525   +18.17%1%69%   70%

Broadwell-EX: similar pattern as Skylake-EX.

batch   score change   zone_contention   lru_contention   total_contention
 31   16703983+0.00%   67% 7%   74%
 53   18195393+8.93%   43%28%   71%
 63   1825+9.49%   38%33%   71%
 73   18344329+9.82%   35%37%   72%
119   18535529   +10.96%   24%46%   70%
127   18513596   +10.83%   23%48%   71%
137   18514327   +10.84%   23%48%   71%
165   18511840   +10.82%   22%49%   71%
188   18593478   +11.31%   17%53% 

Re: [PATCH] orangefs: Adding new return type vm_fault_t

2018-07-10 Thread Souptick Joarder
On Wed, Jul 11, 2018 at 1:13 AM, Mike Marshall  wrote:
> Hi...
>
> I applied this patch to 4.18.0-rc4. It applied cleanly and there's no xfstests
> regressions. Sorry if I held you up any...
>
> You can add: Tested-By: Mike Marshall 
>

Thanks Mike. Can we get this patch in queue for 4.19 merge window ?

> -Mike
>
> On Fri, Jul 6, 2018 at 10:05 AM, Mike Marshall  wrote:
>> Souptick Joarder: Any comment for this patch?
>>
>> Thanks for sending it ...
>>
>> I have it in my stack, but I haven't studied it, or xfstested it yet, so
>> no useful comments yet...
>>
>> -Mike
>>
>>
>>
>> On Fri, Jul 6, 2018 at 2:44 AM, Souptick Joarder  
>> wrote:
>>> On Fri, Jun 29, 2018 at 12:12 AM, Souptick Joarder  
>>> wrote:
 Use new return type vm_fault_t for fault handler. For now,
 this is just documenting that the function returns a VM_FAULT
 value rather than an errno. Once all instances are converted,
 vm_fault_t will become a distinct type.

 See the following
 commit 1c8f422059ae ("mm: change return type to vm_fault_t")

 Fixed checkpatch.pl warning.

 Signed-off-by: Souptick Joarder 
 ---
  fs/orangefs/file.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

 diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
 index db0b521..a5a2fe7 100644
 --- a/fs/orangefs/file.c
 +++ b/fs/orangefs/file.c
 @@ -528,18 +528,19 @@ static long orangefs_ioctl(struct file *file, 
 unsigned int cmd, unsigned long ar
 return ret;
  }

 -static int orangefs_fault(struct vm_fault *vmf)
 +static vm_fault_t orangefs_fault(struct vm_fault *vmf)
  {
 struct file *file = vmf->vma->vm_file;
 -   int rc;
 -   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
 +   int ret;
 +
 +   ret = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
 STATX_SIZE);
 -   if (rc == -ESTALE)
 -   rc = -EIO;
 -   if (rc) {
 -   gossip_err("%s: orangefs_inode_getattr failed, "
 -   "rc:%d:.\n", __func__, rc);
 -   return rc;
 +   if (ret == -ESTALE)
 +   ret = -EIO;
 +   if (ret) {
 +   gossip_err("%s: orangefs_inode_getattr failed, ret:%d:.\n",
 +   __func__, ret);
 +   return VM_FAULT_SIGBUS;
 }
 return filemap_fault(vmf);
  }
 --
 1.9.1

>>>
>>> Any comment for this patch ?


Re: [PATCH] orangefs: Adding new return type vm_fault_t

2018-07-10 Thread Souptick Joarder
On Wed, Jul 11, 2018 at 1:13 AM, Mike Marshall  wrote:
> Hi...
>
> I applied this patch to 4.18.0-rc4. It applied cleanly and there's no xfstests
> regressions. Sorry if I held you up any...
>
> You can add: Tested-By: Mike Marshall 
>

Thanks Mike. Can we get this patch in queue for 4.19 merge window ?

> -Mike
>
> On Fri, Jul 6, 2018 at 10:05 AM, Mike Marshall  wrote:
>> Souptick Joarder: Any comment for this patch?
>>
>> Thanks for sending it ...
>>
>> I have it in my stack, but I haven't studied it, or xfstested it yet, so
>> no useful comments yet...
>>
>> -Mike
>>
>>
>>
>> On Fri, Jul 6, 2018 at 2:44 AM, Souptick Joarder  
>> wrote:
>>> On Fri, Jun 29, 2018 at 12:12 AM, Souptick Joarder  
>>> wrote:
 Use new return type vm_fault_t for fault handler. For now,
 this is just documenting that the function returns a VM_FAULT
 value rather than an errno. Once all instances are converted,
 vm_fault_t will become a distinct type.

 See the following
 commit 1c8f422059ae ("mm: change return type to vm_fault_t")

 Fixed checkpatch.pl warning.

 Signed-off-by: Souptick Joarder 
 ---
  fs/orangefs/file.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

 diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
 index db0b521..a5a2fe7 100644
 --- a/fs/orangefs/file.c
 +++ b/fs/orangefs/file.c
 @@ -528,18 +528,19 @@ static long orangefs_ioctl(struct file *file, 
 unsigned int cmd, unsigned long ar
 return ret;
  }

 -static int orangefs_fault(struct vm_fault *vmf)
 +static vm_fault_t orangefs_fault(struct vm_fault *vmf)
  {
 struct file *file = vmf->vma->vm_file;
 -   int rc;
 -   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
 +   int ret;
 +
 +   ret = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
 STATX_SIZE);
 -   if (rc == -ESTALE)
 -   rc = -EIO;
 -   if (rc) {
 -   gossip_err("%s: orangefs_inode_getattr failed, "
 -   "rc:%d:.\n", __func__, rc);
 -   return rc;
 +   if (ret == -ESTALE)
 +   ret = -EIO;
 +   if (ret) {
 +   gossip_err("%s: orangefs_inode_getattr failed, ret:%d:.\n",
 +   __func__, ret);
 +   return VM_FAULT_SIGBUS;
 }
 return filemap_fault(vmf);
  }
 --
 1.9.1

>>>
>>> Any comment for this patch ?


Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-10 Thread George Cherian

Hi Prakash,


On 07/10/2018 09:19 PM, Prakash, Prashanth wrote:


On 7/9/2018 11:42 PM, George Cherian wrote:

Hi Prakash,


On 07/09/2018 10:12 PM, Prakash, Prashanth wrote:


Hi George,


On 7/9/2018 4:10 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
   drivers/cpufreq/cppc_cpufreq.c | 44 
++
   1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
   }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+
+ delta_reference = (u32)fb_ctrs_t1.reference -
+ (u32)fb_ctrs_t0.reference;
+ delta_delivered = (u32)fb_ctrs_t1.delivered -
+ (u32)fb_ctrs_t0.delivered;

Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs
have 64b fields for reference and delivered counters.

Moreover, the integer math is incorrect. You can run into a scenario where
t1.ref/del < t0.ref/del,  thus setting a negative number to u64! The likelihood
of this is very high especially when you throw away the higher 32bits.


Because of binary representation, unsigned subtraction will work even if
t1.ref/del < t0.ref/del. So essentially, the code should look like
this,

static inline u64 get_delta(u64 t1, u64 t0)
{
 if (t1 > t0 || t0 > ~(u32)0)
 return t1 - t0;

 return (u32)t1 - (u32)t0;
}

As a further optimization, I used (u32) since that also works,
as long as the momentary delta at any point is not greater than 2 ^ 32.
I don't foresee any reason for any platform to increment the counters at
an interval greater than 2 ^ 32.


We are NOT running within any critical section to make sure that there will be
no context switch between feedback counter reads. Thus the assumptions that
the delta always represent a very short momentary window of time and that
it is always less than 2^32 is not accurate.

The single overflow assumption about when the above interger math will
work is also not acceptable - especially when we throw away the higher order 
bits.
There are hardware out there that uses 64b counters and can overflow lower 32b
in quite short order of time. Since the spec (and some hardware) provides 
64bits,
we should use it make our implementation more robust instead of throwing away
  the higher order bits.

I think it's ok to use the above integer math, but please add a comment about
single overflow assumption and don't throw away the higher 32bits.


Okay,
I will spin a v4 with the get_delta changes.
Also note that the get_delta function doesn't throw away the higher 32
bits.



To keep things simple, do something like below:

if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) {
   /* Atleast one of them should have overflowed */
   return desired_perf;
}
else {
   compute the delivered perf using the counters.
}


No need to do like this as this is tested and found working across counter 
overruns in our platform.



+
+ /* Check to avoid divide-by zero */
+ if (delta_reference || delta_delivered)
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = cpu->perf_ctrls.desired_perf;
+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)
+ return ret;
+
+

Re: [PATCH v3] cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC

2018-07-10 Thread George Cherian

Hi Prakash,


On 07/10/2018 09:19 PM, Prakash, Prashanth wrote:


On 7/9/2018 11:42 PM, George Cherian wrote:

Hi Prakash,


On 07/09/2018 10:12 PM, Prakash, Prashanth wrote:


Hi George,


On 7/9/2018 4:10 AM, George Cherian wrote:

Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.

OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:

delivered_perf = reference_perf X (delta of delivered_perf counter / delta of 
reference_perf counter).

Implement the above and hook this to the cpufreq->get method.

Signed-off-by: George Cherian 
Acked-by: Viresh Kumar 
---
   drivers/cpufreq/cppc_cpufreq.c | 44 
++
   1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index a9d3eec..61132e8 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -296,10 +296,54 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
return ret;
   }

+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
+  struct cppc_perf_fb_ctrs fb_ctrs_t0,
+  struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+ u64 delta_reference, delta_delivered;
+ u64 reference_perf, delivered_perf;
+
+ reference_perf = fb_ctrs_t0.reference_perf;
+
+ delta_reference = (u32)fb_ctrs_t1.reference -
+ (u32)fb_ctrs_t0.reference;
+ delta_delivered = (u32)fb_ctrs_t1.delivered -
+ (u32)fb_ctrs_t0.delivered;

Why (u32)? These registers can be 64bits and that's why cppc_perf_fb_ctrs
have 64b fields for reference and delivered counters.

Moreover, the integer math is incorrect. You can run into a scenario where
t1.ref/del < t0.ref/del,  thus setting a negative number to u64! The likelihood
of this is very high especially when you throw away the higher 32bits.


Because of binary representation, unsigned subtraction will work even if
t1.ref/del < t0.ref/del. So essentially, the code should look like
this,

static inline u64 get_delta(u64 t1, u64 t0)
{
 if (t1 > t0 || t0 > ~(u32)0)
 return t1 - t0;

 return (u32)t1 - (u32)t0;
}

As a further optimization, I used (u32) since that also works,
as long as the momentary delta at any point is not greater than 2 ^ 32.
I don't foresee any reason for any platform to increment the counters at
an interval greater than 2 ^ 32.


We are NOT running within any critical section to make sure that there will be
no context switch between feedback counter reads. Thus the assumptions that
the delta always represent a very short momentary window of time and that
it is always less than 2^32 is not accurate.

The single overflow assumption about when the above interger math will
work is also not acceptable - especially when we throw away the higher order 
bits.
There are hardware out there that uses 64b counters and can overflow lower 32b
in quite short order of time. Since the spec (and some hardware) provides 
64bits,
we should use it make our implementation more robust instead of throwing away
  the higher order bits.

I think it's ok to use the above integer math, but please add a comment about
single overflow assumption and don't throw away the higher 32bits.


Okay,
I will spin a v4 with the get_delta changes.
Also note that the get_delta function doesn't throw away the higher 32
bits.



To keep things simple, do something like below:

if (t1.reference <= t0.reference || t1.delivered <= t0.delivered) {
   /* Atleast one of them should have overflowed */
   return desired_perf;
}
else {
   compute the delivered perf using the counters.
}


No need to do like this as this is tested and found working across counter 
overruns in our platform.



+
+ /* Check to avoid divide-by zero */
+ if (delta_reference || delta_delivered)
+ delivered_perf = (reference_perf * delta_delivered) /
+ delta_reference;
+ else
+ delivered_perf = cpu->perf_ctrls.desired_perf;
+
+ return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
+}
+
+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ struct cppc_cpudata *cpu = all_cpu_data[cpunum];
+ int ret;
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t0);
+ if (ret)
+ return ret;
+
+ udelay(2); /* 2usec delay between sampling */
+
+ ret = cppc_get_perf_ctrs(cpunum, _ctrs_t1);
+ if (ret)
+ return ret;
+
+

Re: [PATCH] clk: aspeed: Support HPLL strapping on ast2400

2018-07-10 Thread Joel Stanley
Hi Stephen,

On 7 July 2018 at 03:55, Stephen Boyd  wrote:
> Quoting Joel Stanley (2018-06-28 16:15:40)
>> The HPLL can be configured through a register (SCU24), however some
>> platforms chose to configure it through the strapping settings and do
>> not use the register. This was not noticed as the logic for bit 18 in
>> SCU24 was confused: set means programmed, but the driver read it as set
>> means strapped.
>>
>> This gives us the correct HPLL value on Palmetto systems, from which
>> most of the peripheral clocks are generated.
>>
>> Fixes: 5eda5d79e4be ("clk: Add clock driver for ASPEED BMC SoCs")
>> Cc: sta...@vger.kernel.org # v4.15
>> Reviewed-by: Cédric Le Goater 
>> Signed-off-by: Joel Stanley 
>> ---
>
> Do you want this merged for -rc5? It sounds like on some systems this is
> a problem, but I don't know if these systems are supposed to work yet or
> not, so priority of this fix is not easy for me to understand.
>

Sure, some more background:

We did not notice this until we attempted to use the clock for the mtd
driver. However, this clock is used for the kernel clocksource, so eg.
sleep 1 takes two seconds to complete. This affects all of the systems
I have access to.

I suggest we merge for4.18, and keep the cc: stable so it can be
backported to the stable trees.

Cheers,

Joel


Re: [PATCH] clk: aspeed: Support HPLL strapping on ast2400

2018-07-10 Thread Joel Stanley
Hi Stephen,

On 7 July 2018 at 03:55, Stephen Boyd  wrote:
> Quoting Joel Stanley (2018-06-28 16:15:40)
>> The HPLL can be configured through a register (SCU24), however some
>> platforms chose to configure it through the strapping settings and do
>> not use the register. This was not noticed as the logic for bit 18 in
>> SCU24 was confused: set means programmed, but the driver read it as set
>> means strapped.
>>
>> This gives us the correct HPLL value on Palmetto systems, from which
>> most of the peripheral clocks are generated.
>>
>> Fixes: 5eda5d79e4be ("clk: Add clock driver for ASPEED BMC SoCs")
>> Cc: sta...@vger.kernel.org # v4.15
>> Reviewed-by: Cédric Le Goater 
>> Signed-off-by: Joel Stanley 
>> ---
>
> Do you want this merged for -rc5? It sounds like on some systems this is
> a problem, but I don't know if these systems are supposed to work yet or
> not, so priority of this fix is not easy for me to understand.
>

Sure, some more background:

We did not notice this until we attempted to use the clock for the mtd
driver. However, this clock is used for the kernel clocksource, so eg.
sleep 1 takes two seconds to complete. This affects all of the systems
I have access to.

I suggest we merge for4.18, and keep the cc: stable so it can be
backported to the stable trees.

Cheers,

Joel


Re: [PATCH] perf/core: fix possible spectre-v1 write

2018-07-10 Thread Mark Rutland
On Tue, Jul 10, 2018 at 07:06:07PM +0100, Mark Rutland wrote:
> It's possible for userspace to control event_id. Sanitize event_id when
> using it as an array index, to inhibit the potential spectre-v1 write
> gadget.
> 
> This class of issue is also known as CVE-2018-3693, or "bounds check bypass
> store".
> 
> Found by smatch.
> 
> Signed-off-by: Mark Rutland 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> ---
>  kernel/events/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> For Arm CPUs, more details can be found in the Arm Cache Speculation
> Side-channels whitepaper, available from the Arm security updates site [1].
> 
> Mark.
> 
> [1] 
> https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8f0434a9951a..eece719bd18e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8155,6 +8155,7 @@ struct static_key 
> perf_swevent_enabled[PERF_COUNT_SW_MAX];
>  static void sw_perf_event_destroy(struct perf_event *event)
>  {
>   u64 event_id = event->attr.config;
> + event_id = array_index_nospec(event_id, PERF_COUNT_SW_MAX);

As the kbuild test robot has pointed out, I've failed to include
 for this to compile.

I'll spin a v2 with that added, and the result tested.

Thanks,
Mark.


Re: [PATCH] perf/core: fix possible spectre-v1 write

2018-07-10 Thread Mark Rutland
On Tue, Jul 10, 2018 at 07:06:07PM +0100, Mark Rutland wrote:
> It's possible for userspace to control event_id. Sanitize event_id when
> using it as an array index, to inhibit the potential spectre-v1 write
> gadget.
> 
> This class of issue is also known as CVE-2018-3693, or "bounds check bypass
> store".
> 
> Found by smatch.
> 
> Signed-off-by: Mark Rutland 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> ---
>  kernel/events/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> For Arm CPUs, more details can be found in the Arm Cache Speculation
> Side-channels whitepaper, available from the Arm security updates site [1].
> 
> Mark.
> 
> [1] 
> https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8f0434a9951a..eece719bd18e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8155,6 +8155,7 @@ struct static_key 
> perf_swevent_enabled[PERF_COUNT_SW_MAX];
>  static void sw_perf_event_destroy(struct perf_event *event)
>  {
>   u64 event_id = event->attr.config;
> + event_id = array_index_nospec(event_id, PERF_COUNT_SW_MAX);

As the kbuild test robot has pointed out, I've failed to include
 for this to compile.

I'll spin a v2 with that added, and the result tested.

Thanks,
Mark.


Re: [PATCH] refcount: always allow checked forms

2018-07-10 Thread Mark Rutland
On Wed, Jul 04, 2018 at 10:46:41AM +0200, David Sterba wrote:
> On Tue, Jul 03, 2018 at 11:01:02AM +0100, Mark Rutland wrote:
> > In many cases, it would be useful to be able to use the full
> > sanity-checked refcount helpers regardless of CONFIG_REFCOUNT_FULL, as
> > this would help to avoid duplicate warnings where callers try to
> > sanity-check refcount manipulation.
> > 
> > This patch refactors things such that the full refcount helpers were
> > always built, as refcount_${op}_checked(), such that they can be used
> > regardless of CONFIG_REFCOUNT_FULL. This will allow code which *always*
> > wants a checked refcount to opt-in, avoiding the need to duplicate the
> > logic for warnings.
> > 
> > There should be no functional change as a result of this patch.
> > 
> > Signed-off-by: Mark Rutland 
> > Cc: Boqun Feng 
> > Cc: David Sterba 
> > Cc: Ingo Molnar 
> > Cc: Kees Cook 
> > Cc: Peter Zijlstra 
> > Cc: Peter Zijlstra 
> > Cc: Will Deacon 
> 
> I dare to give it my
> 
> Reviewed-by: David Sterba 

Cheers!

> as my POC implementations were crap and Mark's version is much better.

Please don't think that your implementations were bad; I just already had an
idea as to what this could look like.

> > ---
> >  include/linux/refcount.h | 27 +---
> >  lib/refcount.c   | 53 
> > +++-
> >  2 files changed, 45 insertions(+), 35 deletions(-)
> > 
> > Dave pointed out that it would be useful to be able to opt-in to full checks
> > regardless of CONFIG_REFCOUNT_FULL, so that we can simplify callsites where 
> > we
> > always want checks. I've spotted a few of these in code which is still 
> > awaiting
> > conversion.
> 
> The motivation was code like
> 
>   WARN_ON(refcount_read());
>   if (refcount_dec_and_test()) { ... }
> 
> so the warning is redundant for REFCOUNT_FULL, but I'm going to use the
> _checked versions everywhere the performance of refcounts is not
> critical.

If you will have conversion patches, do you want to pick this up as the start
of a series?

Thanks,
Mark.


Re: [PATCH] refcount: always allow checked forms

2018-07-10 Thread Mark Rutland
On Wed, Jul 04, 2018 at 10:46:41AM +0200, David Sterba wrote:
> On Tue, Jul 03, 2018 at 11:01:02AM +0100, Mark Rutland wrote:
> > In many cases, it would be useful to be able to use the full
> > sanity-checked refcount helpers regardless of CONFIG_REFCOUNT_FULL, as
> > this would help to avoid duplicate warnings where callers try to
> > sanity-check refcount manipulation.
> > 
> > This patch refactors things such that the full refcount helpers were
> > always built, as refcount_${op}_checked(), such that they can be used
> > regardless of CONFIG_REFCOUNT_FULL. This will allow code which *always*
> > wants a checked refcount to opt-in, avoiding the need to duplicate the
> > logic for warnings.
> > 
> > There should be no functional change as a result of this patch.
> > 
> > Signed-off-by: Mark Rutland 
> > Cc: Boqun Feng 
> > Cc: David Sterba 
> > Cc: Ingo Molnar 
> > Cc: Kees Cook 
> > Cc: Peter Zijlstra 
> > Cc: Peter Zijlstra 
> > Cc: Will Deacon 
> 
> I dare to give it my
> 
> Reviewed-by: David Sterba 

Cheers!

> as my POC implementations were crap and Mark's version is much better.

Please don't think that your implementations were bad; I just already had an
idea as to what this could look like.

> > ---
> >  include/linux/refcount.h | 27 +---
> >  lib/refcount.c   | 53 
> > +++-
> >  2 files changed, 45 insertions(+), 35 deletions(-)
> > 
> > Dave pointed out that it would be useful to be able to opt-in to full checks
> > regardless of CONFIG_REFCOUNT_FULL, so that we can simplify callsites where 
> > we
> > always want checks. I've spotted a few of these in code which is still 
> > awaiting
> > conversion.
> 
> The motivation was code like
> 
>   WARN_ON(refcount_read());
>   if (refcount_dec_and_test()) { ... }
> 
> so the warning is redundant for REFCOUNT_FULL, but I'm going to use the
> _checked versions everywhere the performance of refcounts is not
> critical.

If you will have conversion patches, do you want to pick this up as the start
of a series?

Thanks,
Mark.


Re: [V9fs-developer] [PATCH] net/9p/client.c: put refcount of trans_mod in error case in parse_opts()

2018-07-10 Thread Dominique Martinet
Andrew,

there seem to be some renew of interest in 9P lately, so if you'd like I
can take care of rounding these up and prepare a pull request for 4.19
(as we're already well into 4.18 release cycle, I believe most of the
patches can wait)

This patch however I consider important enough to take for 4.18 so could
you please grab it for now?

I've gathered the Review tags and added my own, feel free to change my
Reviewed-and-tested-by tag to Signed-off-by if it seems more appropriate
as I'm actively pushing for this patch.

piaojun wrote on Fri, Jul 06, 2018:
> >From my test, the second mount will fail after umounting successfully.
> The reason is that we put refcount of trans_mod in the correct case rather
> than the error case in parse_opts() at last. That will cause the refcount
> decrease to -1, and when we try to get trans_mod again in
> try_module_get(), we could only increase refcount to 0 which will cause
> failure as follows:
> parse_opts
>   v9fs_get_trans_by_name
> try_module_get : return NULL to caller which cause error
> 
> So we should put refcount of trans_mod in error case.
> 
> Fixes: 9421c3e64137ec ("net/9p/client.c: fix potential refcnt problem of 
> trans module")
> 
> Signed-off-by: Jun Piao 
Reviewed-by: Yiwen Jiang 
Reviewed-by: Greg Kurz 
Reviewed-and-tested-by: Dominique Martinet 

> ---
>  net/9p/client.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 18c5271..5c13431 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -225,7 +225,8 @@ static int parse_opts(char *opts, struct p9_client *clnt)
>   }
> 
>  free_and_return:
> - v9fs_put_trans(clnt->trans_mod);
> + if (ret)
> + v9fs_put_trans(clnt->trans_mod);
>   kfree(tmp_options);
>   return ret;
>  }

Thanks,
-- 
Dominique Martinet


Re: [V9fs-developer] [PATCH] net/9p/client.c: put refcount of trans_mod in error case in parse_opts()

2018-07-10 Thread Dominique Martinet
Andrew,

there seem to be some renew of interest in 9P lately, so if you'd like I
can take care of rounding these up and prepare a pull request for 4.19
(as we're already well into 4.18 release cycle, I believe most of the
patches can wait)

This patch however I consider important enough to take for 4.18 so could
you please grab it for now?

I've gathered the Review tags and added my own, feel free to change my
Reviewed-and-tested-by tag to Signed-off-by if it seems more appropriate
as I'm actively pushing for this patch.

piaojun wrote on Fri, Jul 06, 2018:
> >From my test, the second mount will fail after umounting successfully.
> The reason is that we put refcount of trans_mod in the correct case rather
> than the error case in parse_opts() at last. That will cause the refcount
> decrease to -1, and when we try to get trans_mod again in
> try_module_get(), we could only increase refcount to 0 which will cause
> failure as follows:
> parse_opts
>   v9fs_get_trans_by_name
> try_module_get : return NULL to caller which cause error
> 
> So we should put refcount of trans_mod in error case.
> 
> Fixes: 9421c3e64137ec ("net/9p/client.c: fix potential refcnt problem of 
> trans module")
> 
> Signed-off-by: Jun Piao 
Reviewed-by: Yiwen Jiang 
Reviewed-by: Greg Kurz 
Reviewed-and-tested-by: Dominique Martinet 

> ---
>  net/9p/client.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 18c5271..5c13431 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -225,7 +225,8 @@ static int parse_opts(char *opts, struct p9_client *clnt)
>   }
> 
>  free_and_return:
> - v9fs_put_trans(clnt->trans_mod);
> + if (ret)
> + v9fs_put_trans(clnt->trans_mod);
>   kfree(tmp_options);
>   return ret;
>  }

Thanks,
-- 
Dominique Martinet


Re: [PATCH] i2c: aspeed: Fix initial values of master and slave state

2018-07-10 Thread Brendan Higgins
On Mon, Jul 2, 2018 at 2:20 PM Jae Hyun Yoo
 wrote:
>
> This patch changes the order of enum aspeed_i2c_master_state and
> enum aspeed_i2c_slave_state defines to make their initial value to
> ASPEED_I2C_MASTER_INACTIVE and ASPEED_I2C_SLAVE_STOP respectively.
> In case of multi-master use, if a slave data comes ahead of the
> first master xfer, master_state starts from an invalid state so
> this change fixes the issue.
>
> Signed-off-by: Jae Hyun Yoo 
> ---
>  drivers/i2c/busses/i2c-aspeed.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
> index 60e4d0e939a3..2714c7fbe7c9 100644
> --- a/drivers/i2c/busses/i2c-aspeed.c
> +++ b/drivers/i2c/busses/i2c-aspeed.c
> @@ -111,22 +111,22 @@
>  #define ASPEED_I2CD_DEV_ADDR_MASK  GENMASK(6, 0)
>
>  enum aspeed_i2c_master_state {
> +   ASPEED_I2C_MASTER_INACTIVE,
> ASPEED_I2C_MASTER_START,
> ASPEED_I2C_MASTER_TX_FIRST,
> ASPEED_I2C_MASTER_TX,
> ASPEED_I2C_MASTER_RX_FIRST,
> ASPEED_I2C_MASTER_RX,
> ASPEED_I2C_MASTER_STOP,
> -   ASPEED_I2C_MASTER_INACTIVE,
>  };
>
>  enum aspeed_i2c_slave_state {
> +   ASPEED_I2C_SLAVE_STOP,
> ASPEED_I2C_SLAVE_START,
> ASPEED_I2C_SLAVE_READ_REQUESTED,
> ASPEED_I2C_SLAVE_READ_PROCESSED,
> ASPEED_I2C_SLAVE_WRITE_REQUESTED,
> ASPEED_I2C_SLAVE_WRITE_RECEIVED,
> -   ASPEED_I2C_SLAVE_STOP,
>  };
>
>  struct aspeed_i2c_bus {
> --
> 2.17.1
>

Reviewed-by: Brendan Higgins 

Thanks!

BTW, sorry for the delay, just got back from vacation. I will review
the rest tomorrow.


Re: [PATCH] i2c: aspeed: Fix initial values of master and slave state

2018-07-10 Thread Brendan Higgins
On Mon, Jul 2, 2018 at 2:20 PM Jae Hyun Yoo
 wrote:
>
> This patch changes the order of enum aspeed_i2c_master_state and
> enum aspeed_i2c_slave_state defines to make their initial value to
> ASPEED_I2C_MASTER_INACTIVE and ASPEED_I2C_SLAVE_STOP respectively.
> In case of multi-master use, if a slave data comes ahead of the
> first master xfer, master_state starts from an invalid state so
> this change fixes the issue.
>
> Signed-off-by: Jae Hyun Yoo 
> ---
>  drivers/i2c/busses/i2c-aspeed.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
> index 60e4d0e939a3..2714c7fbe7c9 100644
> --- a/drivers/i2c/busses/i2c-aspeed.c
> +++ b/drivers/i2c/busses/i2c-aspeed.c
> @@ -111,22 +111,22 @@
>  #define ASPEED_I2CD_DEV_ADDR_MASK  GENMASK(6, 0)
>
>  enum aspeed_i2c_master_state {
> +   ASPEED_I2C_MASTER_INACTIVE,
> ASPEED_I2C_MASTER_START,
> ASPEED_I2C_MASTER_TX_FIRST,
> ASPEED_I2C_MASTER_TX,
> ASPEED_I2C_MASTER_RX_FIRST,
> ASPEED_I2C_MASTER_RX,
> ASPEED_I2C_MASTER_STOP,
> -   ASPEED_I2C_MASTER_INACTIVE,
>  };
>
>  enum aspeed_i2c_slave_state {
> +   ASPEED_I2C_SLAVE_STOP,
> ASPEED_I2C_SLAVE_START,
> ASPEED_I2C_SLAVE_READ_REQUESTED,
> ASPEED_I2C_SLAVE_READ_PROCESSED,
> ASPEED_I2C_SLAVE_WRITE_REQUESTED,
> ASPEED_I2C_SLAVE_WRITE_RECEIVED,
> -   ASPEED_I2C_SLAVE_STOP,
>  };
>
>  struct aspeed_i2c_bus {
> --
> 2.17.1
>

Reviewed-by: Brendan Higgins 

Thanks!

BTW, sorry for the delay, just got back from vacation. I will review
the rest tomorrow.


Re: [PATCH] refcount: always allow checked forms

2018-07-10 Thread Mark Rutland
On Tue, Jul 03, 2018 at 11:30:38AM -0700, Kees Cook wrote:
> On Tue, Jul 3, 2018 at 3:01 AM, Mark Rutland  wrote:
> > In many cases, it would be useful to be able to use the full
> > sanity-checked refcount helpers regardless of CONFIG_REFCOUNT_FULL, as
> > this would help to avoid duplicate warnings where callers try to
> > sanity-check refcount manipulation.
> >
> > This patch refactors things such that the full refcount helpers were
> > always built, as refcount_${op}_checked(), such that they can be used
> > regardless of CONFIG_REFCOUNT_FULL. This will allow code which *always*
> > wants a checked refcount to opt-in, avoiding the need to duplicate the
> > logic for warnings.
> >
> > There should be no functional change as a result of this patch.
> >
> > Signed-off-by: Mark Rutland 
> > Cc: Boqun Feng 
> > Cc: David Sterba 
> > Cc: Ingo Molnar 
> > Cc: Kees Cook 
> > Cc: Peter Zijlstra 
> > Cc: Peter Zijlstra 
> > Cc: Will Deacon 
> 
> Looks good to me! Thanks for doing this. :)

Thank David; I rather stole his thunder here.

> Acked-by: Kees Cook 
> 
> > ---
> >  include/linux/refcount.h | 27 +---
> >  lib/refcount.c   | 53 
> > +++-
> >  2 files changed, 45 insertions(+), 35 deletions(-)
> >
> > Dave pointed out that it would be useful to be able to opt-in to full checks
> > regardless of CONFIG_REFCOUNT_FULL, so that we can simplify callsites where 
> > we
> > always want checks. I've spotted a few of these in code which is still 
> > awaiting
> > conversion.
> 
> Yeah, I need to go through the cocci output -- Elena had several
> outstanding patches that never got picked up.
> 
> > I'm assuming that the atomics group is intended to own the refcount code, 
> > even
> > though this isn't currently the case in MAINTAINERS.
> 
> That's how it has landed in the past, yes, but if there is a
> dependency on these for code that will use it, maybe it should go that
> way?

That sounds reasonable to me. I was just wanted to be clear as to why I'd Cc'd
the atomics maintainers. :)

I'll spin a v2 with the fixup Andrea noted.

Thanks,
Mark.


Re: [PATCH] refcount: always allow checked forms

2018-07-10 Thread Mark Rutland
On Tue, Jul 03, 2018 at 11:30:38AM -0700, Kees Cook wrote:
> On Tue, Jul 3, 2018 at 3:01 AM, Mark Rutland  wrote:
> > In many cases, it would be useful to be able to use the full
> > sanity-checked refcount helpers regardless of CONFIG_REFCOUNT_FULL, as
> > this would help to avoid duplicate warnings where callers try to
> > sanity-check refcount manipulation.
> >
> > This patch refactors things such that the full refcount helpers were
> > always built, as refcount_${op}_checked(), such that they can be used
> > regardless of CONFIG_REFCOUNT_FULL. This will allow code which *always*
> > wants a checked refcount to opt-in, avoiding the need to duplicate the
> > logic for warnings.
> >
> > There should be no functional change as a result of this patch.
> >
> > Signed-off-by: Mark Rutland 
> > Cc: Boqun Feng 
> > Cc: David Sterba 
> > Cc: Ingo Molnar 
> > Cc: Kees Cook 
> > Cc: Peter Zijlstra 
> > Cc: Peter Zijlstra 
> > Cc: Will Deacon 
> 
> Looks good to me! Thanks for doing this. :)

Thank David; I rather stole his thunder here.

> Acked-by: Kees Cook 
> 
> > ---
> >  include/linux/refcount.h | 27 +---
> >  lib/refcount.c   | 53 
> > +++-
> >  2 files changed, 45 insertions(+), 35 deletions(-)
> >
> > Dave pointed out that it would be useful to be able to opt-in to full checks
> > regardless of CONFIG_REFCOUNT_FULL, so that we can simplify callsites where 
> > we
> > always want checks. I've spotted a few of these in code which is still 
> > awaiting
> > conversion.
> 
> Yeah, I need to go through the cocci output -- Elena had several
> outstanding patches that never got picked up.
> 
> > I'm assuming that the atomics group is intended to own the refcount code, 
> > even
> > though this isn't currently the case in MAINTAINERS.
> 
> That's how it has landed in the past, yes, but if there is a
> dependency on these for code that will use it, maybe it should go that
> way?

That sounds reasonable to me. I was just wanted to be clear as to why I'd Cc'd
the atomics maintainers. :)

I'll spin a v2 with the fixup Andrea noted.

Thanks,
Mark.


Re: [RFC][PATCH 15/42] lift fput() on late failures into path_openat()

2018-07-10 Thread Amir Goldstein
On Wed, Jul 11, 2018 at 5:21 AM, Al Viro  wrote:
[...]
> @@ -3407,8 +3407,6 @@ static int do_last(struct nameidata *nd,
> if (!error && will_truncate)
> error = handle_truncate(file);
>  out:
> -   if (unlikely(error) && (*opened & FILE_OPENED))
> -   fput(file);
> if (unlikely(error > 0)) {
> WARN_ON(1);

Another opportunity to squash WARN_ON(1) with condition.

> error = -EINVAL;
> @@ -3484,8 +3482,6 @@ static int do_tmpfile(struct nameidata *nd, unsigned 
> flags,
> if (error)
> goto out2;
> error = open_check_o_direct(file);
> -   if (error)
> -   fput(file);
>  out2:
> mnt_drop_write(path.mnt);
>  out:
> @@ -3547,20 +3543,20 @@ static struct file *path_openat(struct nameidata *nd,
> }
> terminate_walk(nd);
>  out2:
> -   if (!(opened & FILE_OPENED)) {
> -   BUG_ON(!error);
> -   fput(file);
> +   if (likely(!error)) {
> +   if (likely(opened & FILE_OPENED))
> +   return file;
> +   WARN_ON(1);

This style may be open for debate but:

   if (!WARN_ON(!(opened & FILE_OPENED)))
   return file;

Maybe we need WARN_UNLESS() or WARN_ASSERT()

Thanks,
Amir.


Re: [RFC][PATCH 15/42] lift fput() on late failures into path_openat()

2018-07-10 Thread Amir Goldstein
On Wed, Jul 11, 2018 at 5:21 AM, Al Viro  wrote:
[...]
> @@ -3407,8 +3407,6 @@ static int do_last(struct nameidata *nd,
> if (!error && will_truncate)
> error = handle_truncate(file);
>  out:
> -   if (unlikely(error) && (*opened & FILE_OPENED))
> -   fput(file);
> if (unlikely(error > 0)) {
> WARN_ON(1);

Another opportunity to squash WARN_ON(1) with condition.

> error = -EINVAL;
> @@ -3484,8 +3482,6 @@ static int do_tmpfile(struct nameidata *nd, unsigned 
> flags,
> if (error)
> goto out2;
> error = open_check_o_direct(file);
> -   if (error)
> -   fput(file);
>  out2:
> mnt_drop_write(path.mnt);
>  out:
> @@ -3547,20 +3543,20 @@ static struct file *path_openat(struct nameidata *nd,
> }
> terminate_walk(nd);
>  out2:
> -   if (!(opened & FILE_OPENED)) {
> -   BUG_ON(!error);
> -   fput(file);
> +   if (likely(!error)) {
> +   if (likely(opened & FILE_OPENED))
> +   return file;
> +   WARN_ON(1);

This style may be open for debate but:

   if (!WARN_ON(!(opened & FILE_OPENED)))
   return file;

Maybe we need WARN_UNLESS() or WARN_ASSERT()

Thanks,
Amir.


Re: [PATCH] i2c: aspeed: Add newline characters into message printings.

2018-07-10 Thread Brendan Higgins
On Mon, Jul 2, 2018 at 2:14 PM Jae Hyun Yoo
 wrote:
>
> There are some log printing without a newline character. This
> patch adds the missing newline characters.
>
> Signed-off-by: Jae Hyun Yoo 
> ---
>  drivers/i2c/busses/i2c-aspeed.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
> index 60e4d0e939a3..e3007c1c4ac5 100644
> --- a/drivers/i2c/busses/i2c-aspeed.c
> +++ b/drivers/i2c/busses/i2c-aspeed.c
> @@ -407,7 +407,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
>  */
> ret = aspeed_i2c_is_irq_error(irq_status);
> if (ret < 0) {
> -   dev_dbg(bus->dev, "received error interrupt: 0x%08x",
> +   dev_dbg(bus->dev, "received error interrupt: 0x%08x\n",
> irq_status);
> bus->cmd_err = ret;
> bus->master_state = ASPEED_I2C_MASTER_INACTIVE;
> @@ -416,7 +416,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
>
> /* We are in an invalid state; reset bus to a known state. */
> if (!bus->msgs) {
> -   dev_err(bus->dev, "bus in unknown state");
> +   dev_err(bus->dev, "bus in unknown state\n");
> bus->cmd_err = -EIO;
> if (bus->master_state != ASPEED_I2C_MASTER_STOP)
> aspeed_i2c_do_stop(bus);
> @@ -431,7 +431,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
>  */
> if (bus->master_state == ASPEED_I2C_MASTER_START) {
> if (unlikely(!(irq_status & ASPEED_I2CD_INTR_TX_ACK))) {
> -   pr_devel("no slave present at %02x", msg->addr);
> +   pr_devel("no slave present at %02x\n", msg->addr);

Unless something changed in the last couple versions of the kernel, this is the
only line that actually changes anything. dev_* inserts a newline for every
call.

Admittedly, the rest of the file is pretty inconsistent, so if you really want
to make all these changes, I don't feel super strongly about it.

> status_ack |= ASPEED_I2CD_INTR_TX_NAK;
> bus->cmd_err = -ENXIO;
> aspeed_i2c_do_stop(bus);
> @@ -451,11 +451,11 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> switch (bus->master_state) {
> case ASPEED_I2C_MASTER_TX:
> if (unlikely(irq_status & ASPEED_I2CD_INTR_TX_NAK)) {
> -   dev_dbg(bus->dev, "slave NACKed TX");
> +   dev_dbg(bus->dev, "slave NACKed TX\n");
> status_ack |= ASPEED_I2CD_INTR_TX_NAK;
> goto error_and_stop;
> } else if (unlikely(!(irq_status & ASPEED_I2CD_INTR_TX_ACK))) 
> {
> -   dev_err(bus->dev, "slave failed to ACK TX");
> +   dev_err(bus->dev, "slave failed to ACK TX\n");
> goto error_and_stop;
> }
> status_ack |= ASPEED_I2CD_INTR_TX_ACK;
> @@ -478,7 +478,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> /* fallthrough intended */
> case ASPEED_I2C_MASTER_RX:
> if (unlikely(!(irq_status & ASPEED_I2CD_INTR_RX_DONE))) {
> -   dev_err(bus->dev, "master failed to RX");
> +   dev_err(bus->dev, "master failed to RX\n");
> goto error_and_stop;
> }
> status_ack |= ASPEED_I2CD_INTR_RX_DONE;
> @@ -509,7 +509,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> goto out_no_complete;
> case ASPEED_I2C_MASTER_STOP:
> if (unlikely(!(irq_status & ASPEED_I2CD_INTR_NORMAL_STOP))) {
> -   dev_err(bus->dev, "master failed to STOP");
> +   dev_err(bus->dev, "master failed to STOP\n");
> bus->cmd_err = -EIO;
> /* Do not STOP as we have already tried. */
> } else {
> @@ -520,7 +520,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> goto out_complete;
> case ASPEED_I2C_MASTER_INACTIVE:
> dev_err(bus->dev,
> -   "master received interrupt 0x%08x, but is inactive",
> +   "master received interrupt 0x%08x, but is inactive\n",
> irq_status);
> bus->cmd_err = -EIO;
> /* Do not STOP as we should be inactive. */
> @@ -851,7 +851,7 @@ static int aspeed_i2c_probe_bus(struct platform_device 
> *pdev)
> bus->rst = devm_reset_control_get_shared(>dev, NULL);
> if (IS_ERR(bus->rst)) {
> dev_err(>dev,
> -   "missing or invalid reset controller device tree 
> entry");

Re: [PATCH] i2c: aspeed: Add newline characters into message printings.

2018-07-10 Thread Brendan Higgins
On Mon, Jul 2, 2018 at 2:14 PM Jae Hyun Yoo
 wrote:
>
> There are some log printing without a newline character. This
> patch adds the missing newline characters.
>
> Signed-off-by: Jae Hyun Yoo 
> ---
>  drivers/i2c/busses/i2c-aspeed.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
> index 60e4d0e939a3..e3007c1c4ac5 100644
> --- a/drivers/i2c/busses/i2c-aspeed.c
> +++ b/drivers/i2c/busses/i2c-aspeed.c
> @@ -407,7 +407,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
>  */
> ret = aspeed_i2c_is_irq_error(irq_status);
> if (ret < 0) {
> -   dev_dbg(bus->dev, "received error interrupt: 0x%08x",
> +   dev_dbg(bus->dev, "received error interrupt: 0x%08x\n",
> irq_status);
> bus->cmd_err = ret;
> bus->master_state = ASPEED_I2C_MASTER_INACTIVE;
> @@ -416,7 +416,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
>
> /* We are in an invalid state; reset bus to a known state. */
> if (!bus->msgs) {
> -   dev_err(bus->dev, "bus in unknown state");
> +   dev_err(bus->dev, "bus in unknown state\n");
> bus->cmd_err = -EIO;
> if (bus->master_state != ASPEED_I2C_MASTER_STOP)
> aspeed_i2c_do_stop(bus);
> @@ -431,7 +431,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
>  */
> if (bus->master_state == ASPEED_I2C_MASTER_START) {
> if (unlikely(!(irq_status & ASPEED_I2CD_INTR_TX_ACK))) {
> -   pr_devel("no slave present at %02x", msg->addr);
> +   pr_devel("no slave present at %02x\n", msg->addr);

Unless something changed in the last couple versions of the kernel, this is the
only line that actually changes anything. dev_* inserts a newline for every
call.

Admittedly, the rest of the file is pretty inconsistent, so if you really want
to make all these changes, I don't feel super strongly about it.

> status_ack |= ASPEED_I2CD_INTR_TX_NAK;
> bus->cmd_err = -ENXIO;
> aspeed_i2c_do_stop(bus);
> @@ -451,11 +451,11 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> switch (bus->master_state) {
> case ASPEED_I2C_MASTER_TX:
> if (unlikely(irq_status & ASPEED_I2CD_INTR_TX_NAK)) {
> -   dev_dbg(bus->dev, "slave NACKed TX");
> +   dev_dbg(bus->dev, "slave NACKed TX\n");
> status_ack |= ASPEED_I2CD_INTR_TX_NAK;
> goto error_and_stop;
> } else if (unlikely(!(irq_status & ASPEED_I2CD_INTR_TX_ACK))) 
> {
> -   dev_err(bus->dev, "slave failed to ACK TX");
> +   dev_err(bus->dev, "slave failed to ACK TX\n");
> goto error_and_stop;
> }
> status_ack |= ASPEED_I2CD_INTR_TX_ACK;
> @@ -478,7 +478,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> /* fallthrough intended */
> case ASPEED_I2C_MASTER_RX:
> if (unlikely(!(irq_status & ASPEED_I2CD_INTR_RX_DONE))) {
> -   dev_err(bus->dev, "master failed to RX");
> +   dev_err(bus->dev, "master failed to RX\n");
> goto error_and_stop;
> }
> status_ack |= ASPEED_I2CD_INTR_RX_DONE;
> @@ -509,7 +509,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> goto out_no_complete;
> case ASPEED_I2C_MASTER_STOP:
> if (unlikely(!(irq_status & ASPEED_I2CD_INTR_NORMAL_STOP))) {
> -   dev_err(bus->dev, "master failed to STOP");
> +   dev_err(bus->dev, "master failed to STOP\n");
> bus->cmd_err = -EIO;
> /* Do not STOP as we have already tried. */
> } else {
> @@ -520,7 +520,7 @@ static bool aspeed_i2c_master_irq(struct aspeed_i2c_bus 
> *bus)
> goto out_complete;
> case ASPEED_I2C_MASTER_INACTIVE:
> dev_err(bus->dev,
> -   "master received interrupt 0x%08x, but is inactive",
> +   "master received interrupt 0x%08x, but is inactive\n",
> irq_status);
> bus->cmd_err = -EIO;
> /* Do not STOP as we should be inactive. */
> @@ -851,7 +851,7 @@ static int aspeed_i2c_probe_bus(struct platform_device 
> *pdev)
> bus->rst = devm_reset_control_get_shared(>dev, NULL);
> if (IS_ERR(bus->rst)) {
> dev_err(>dev,
> -   "missing or invalid reset controller device tree 
> entry");

RE: [PATCH v1 2/4] dmaengine: imx-sdma: add check_bd_buswidth() to kill the dulicated code

2018-07-10 Thread Robin Gong
> -Original Message-
> From: Vinod [mailto:vk...@kernel.org]
> Sent: 2018年7月10日 23:31
> To: Robin Gong 
> Cc: dan.j.willi...@intel.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; Fabio Estevam ;
> li...@armlinux.org.uk; linux-arm-ker...@lists.infradead.org;
> ker...@pengutronix.de; dmaeng...@vger.kernel.org;
> linux-kernel@vger.kernel.org; dl-linux-imx 
> Subject: Re: [PATCH v1 2/4] dmaengine: imx-sdma: add check_bd_buswidth() to
> kill the dulicated code
> 
> On 11-07-18, 00:23, Robin Gong wrote:
> > Add check_bd_buswidth() to minimize the code size.
> 
> this looks mostly fine and I think this should be first patch..
Since no need to check bus width in memcpy case, I'll remove this patch too.
> 
> >
> > Signed-off-by: Robin Gong 
> > ---
> >  drivers/dma/imx-sdma.c | 64
> > +++---
> >  1 file changed, 29 insertions(+), 35 deletions(-)
> >
> > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index
> > 27ccabf..ed2267d 100644
> > --- a/drivers/dma/imx-sdma.c
> > +++ b/drivers/dma/imx-sdma.c
> > @@ -1326,6 +1326,33 @@ static struct sdma_desc
> *sdma_transfer_init(struct sdma_channel *sdmac,
> > return NULL;
> >  }
> >
> > +static int check_bd_buswidth(struct sdma_buffer_descriptor *bd,
> > +struct sdma_channel *sdmac, int count,
> > +dma_addr_t dma_dst, dma_addr_t dma_src) {
> > +   int ret = 0;
> > +
> > +   switch (sdmac->word_size) {
> > +   case DMA_SLAVE_BUSWIDTH_4_BYTES:
> > +   bd->mode.command = 0;
> > +   if ((count | dma_dst | dma_src) & 3)
> > +   ret = -EINVAL;
> > +   break;
> 
> empty line after each break please
> 
> > +   case DMA_SLAVE_BUSWIDTH_2_BYTES:
> > +   bd->mode.command = 2;
> > +   if ((count | dma_dst | dma_src) & 1)
> > +   ret = -EINVAL;
> > +   break;
> > +   case DMA_SLAVE_BUSWIDTH_1_BYTE:
> > +bd->mode.command = 1;
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> --
> ~Vinod


RE: [PATCH v1 2/4] dmaengine: imx-sdma: add check_bd_buswidth() to kill the dulicated code

2018-07-10 Thread Robin Gong
> -Original Message-
> From: Vinod [mailto:vk...@kernel.org]
> Sent: 2018年7月10日 23:31
> To: Robin Gong 
> Cc: dan.j.willi...@intel.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; Fabio Estevam ;
> li...@armlinux.org.uk; linux-arm-ker...@lists.infradead.org;
> ker...@pengutronix.de; dmaeng...@vger.kernel.org;
> linux-kernel@vger.kernel.org; dl-linux-imx 
> Subject: Re: [PATCH v1 2/4] dmaengine: imx-sdma: add check_bd_buswidth() to
> kill the dulicated code
> 
> On 11-07-18, 00:23, Robin Gong wrote:
> > Add check_bd_buswidth() to minimize the code size.
> 
> this looks mostly fine and I think this should be first patch..
Since no need to check bus width in memcpy case, I'll remove this patch too.
> 
> >
> > Signed-off-by: Robin Gong 
> > ---
> >  drivers/dma/imx-sdma.c | 64
> > +++---
> >  1 file changed, 29 insertions(+), 35 deletions(-)
> >
> > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index
> > 27ccabf..ed2267d 100644
> > --- a/drivers/dma/imx-sdma.c
> > +++ b/drivers/dma/imx-sdma.c
> > @@ -1326,6 +1326,33 @@ static struct sdma_desc
> *sdma_transfer_init(struct sdma_channel *sdmac,
> > return NULL;
> >  }
> >
> > +static int check_bd_buswidth(struct sdma_buffer_descriptor *bd,
> > +struct sdma_channel *sdmac, int count,
> > +dma_addr_t dma_dst, dma_addr_t dma_src) {
> > +   int ret = 0;
> > +
> > +   switch (sdmac->word_size) {
> > +   case DMA_SLAVE_BUSWIDTH_4_BYTES:
> > +   bd->mode.command = 0;
> > +   if ((count | dma_dst | dma_src) & 3)
> > +   ret = -EINVAL;
> > +   break;
> 
> empty line after each break please
> 
> > +   case DMA_SLAVE_BUSWIDTH_2_BYTES:
> > +   bd->mode.command = 2;
> > +   if ((count | dma_dst | dma_src) & 1)
> > +   ret = -EINVAL;
> > +   break;
> > +   case DMA_SLAVE_BUSWIDTH_1_BYTE:
> > +bd->mode.command = 1;
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> --
> ~Vinod


RE: [PATCH v1 1/4] dmaengine: imx-sdma: add memcpy interface

2018-07-10 Thread Robin Gong
> Hi Robin,
> 
> On 11-07-18, 00:23, Robin Gong wrote:
> > Add MEMCPY support, meanwhile, add SDMA_BD_MAX_CNT instead of
> > '0x'.
> 
> latter part should be its own patch. Never mix things
Okay, I will split it even for this minor change.
> 
> > +static struct dma_async_tx_descriptor *sdma_prep_memcpy(
> > +   struct dma_chan *chan, dma_addr_t dma_dst,
> > +   dma_addr_t dma_src, size_t len, unsigned long flags) {
> > +   struct sdma_channel *sdmac = to_sdma_chan(chan);
> > +   struct sdma_engine *sdma = sdmac->sdma;
> > +   int channel = sdmac->channel;
> > +   size_t count;
> > +   int i = 0, param;
> > +   struct sdma_buffer_descriptor *bd;
> > +   struct sdma_desc *desc;
> > +
> > +   if (!chan || !len)
> > +   return NULL;
> > +
> > +   dev_dbg(sdma->dev, "memcpy: %pad->%pad, len=%zu, channel=%d.\n",
> > +   _src, _dst, len, channel);
> > +
> > +   desc = sdma_transfer_init(sdmac, DMA_MEM_TO_MEM, len /
> SDMA_BD_MAX_CNT
> > +   + 1);
> 
> this looks quite odd to read consider:
> 
> esc = sdma_transfer_init(sdmac, DMA_MEM_TO_MEM,
>  len / SDMA_BD_MAX_CNT + 1);
> 
Okay, will fix on v2.
> > +   if (!desc)
> > +   goto err_out;
> > +
> > +   do {
> > +   count = min_t(size_t, len, SDMA_BD_MAX_CNT);
> > +   bd = >bd[i];
> > +   bd->buffer_addr = dma_src;
> > +   bd->ext_buffer_addr = dma_dst;
> > +   bd->mode.count = count;
> > +   desc->chn_count += count;
> > +
> > +   switch (sdmac->word_size) {
> > +   case DMA_SLAVE_BUSWIDTH_4_BYTES:
> 
> This looks wrong, we are in memcpy, there is no SLAVE so no SLAVE widths..
> 
Okay, will remove check bus width.
> >  static struct dma_async_tx_descriptor *sdma_prep_slave_sg(
> > struct dma_chan *chan, struct scatterlist *sgl,
> > unsigned int sg_len, enum dma_transfer_direction direction, @@
> > -1344,9 +1431,9 @@ static struct dma_async_tx_descriptor
> > *sdma_prep_slave_sg(
> >
> > count = sg_dma_len(sg);
> >
> > -   if (count > 0x) {
> > +   if (count > SDMA_BD_MAX_CNT) {
> > dev_err(sdma->dev, "SDMA channel %d: maximum bytes for 
> > sg
> entry exceeded: %d > %d\n",
> > -   channel, count, 0x);
> > +   channel, count, SDMA_BD_MAX_CNT);
> 
> these changes dont belong to this patch
Will split in v2.
> 
> > @@ -1486,6 +1573,8 @@ static int sdma_config(struct dma_chan *chan,
> > sdmac->watermark_level |= (dmaengine_cfg->dst_maxburst << 16)
> &
> > SDMA_WATERMARK_LEVEL_HWML;
> > sdmac->word_size = dmaengine_cfg->dst_addr_width;
> > +   } else if (dmaengine_cfg->direction == DMA_MEM_TO_MEM) {
> > +   sdmac->word_size = dmaengine_cfg->dst_addr_width;
> 
> same here too, we are in .device_config which deals with slave. Not memcpy!
Will remove it.
> 
> > } else {
> > sdmac->per_address = dmaengine_cfg->dst_addr;
> > sdmac->watermark_level = dmaengine_cfg->dst_maxburst * @@
> -1902,6
> > +1991,7 @@ static int sdma_probe(struct platform_device *pdev)
> >
> > dma_cap_set(DMA_SLAVE, sdma->dma_device.cap_mask);
> > dma_cap_set(DMA_CYCLIC, sdma->dma_device.cap_mask);
> > +   dma_cap_set(DMA_MEMCPY, sdma->dma_device.cap_mask);
> >
> > INIT_LIST_HEAD(>dma_device.channels);
> > /* Initialize channel parameters */
> > @@ -1968,9 +2058,11 @@ static int sdma_probe(struct platform_device
> *pdev)
> > sdma->dma_device.dst_addr_widths = SDMA_DMA_BUSWIDTHS;
> > sdma->dma_device.directions = SDMA_DMA_DIRECTIONS;
> > sdma->dma_device.residue_granularity =
> > DMA_RESIDUE_GRANULARITY_SEGMENT;
> > +   sdma->dma_device.device_prep_dma_memcpy = sdma_prep_memcpy;
> > sdma->dma_device.device_issue_pending = sdma_issue_pending;
> > sdma->dma_device.dev->dma_parms = >dma_parms;
> > -   dma_set_max_seg_size(sdma->dma_device.dev, 65535);
> > +   sdma->dma_device.copy_align = DMAENGINE_ALIGN_4_BYTES;
> > +   dma_set_max_seg_size(sdma->dma_device.dev, SDMA_BD_MAX_CNT);
> 
> this line should not be part of this patch
> 
> --
> ~Vinod


RE: [PATCH v1 1/4] dmaengine: imx-sdma: add memcpy interface

2018-07-10 Thread Robin Gong
> Hi Robin,
> 
> On 11-07-18, 00:23, Robin Gong wrote:
> > Add MEMCPY support, meanwhile, add SDMA_BD_MAX_CNT instead of
> > '0x'.
> 
> latter part should be its own patch. Never mix things
Okay, I will split it even for this minor change.
> 
> > +static struct dma_async_tx_descriptor *sdma_prep_memcpy(
> > +   struct dma_chan *chan, dma_addr_t dma_dst,
> > +   dma_addr_t dma_src, size_t len, unsigned long flags) {
> > +   struct sdma_channel *sdmac = to_sdma_chan(chan);
> > +   struct sdma_engine *sdma = sdmac->sdma;
> > +   int channel = sdmac->channel;
> > +   size_t count;
> > +   int i = 0, param;
> > +   struct sdma_buffer_descriptor *bd;
> > +   struct sdma_desc *desc;
> > +
> > +   if (!chan || !len)
> > +   return NULL;
> > +
> > +   dev_dbg(sdma->dev, "memcpy: %pad->%pad, len=%zu, channel=%d.\n",
> > +   _src, _dst, len, channel);
> > +
> > +   desc = sdma_transfer_init(sdmac, DMA_MEM_TO_MEM, len /
> SDMA_BD_MAX_CNT
> > +   + 1);
> 
> this looks quite odd to read consider:
> 
> esc = sdma_transfer_init(sdmac, DMA_MEM_TO_MEM,
>  len / SDMA_BD_MAX_CNT + 1);
> 
Okay, will fix on v2.
> > +   if (!desc)
> > +   goto err_out;
> > +
> > +   do {
> > +   count = min_t(size_t, len, SDMA_BD_MAX_CNT);
> > +   bd = >bd[i];
> > +   bd->buffer_addr = dma_src;
> > +   bd->ext_buffer_addr = dma_dst;
> > +   bd->mode.count = count;
> > +   desc->chn_count += count;
> > +
> > +   switch (sdmac->word_size) {
> > +   case DMA_SLAVE_BUSWIDTH_4_BYTES:
> 
> This looks wrong, we are in memcpy, there is no SLAVE so no SLAVE widths..
> 
Okay, will remove check bus width.
> >  static struct dma_async_tx_descriptor *sdma_prep_slave_sg(
> > struct dma_chan *chan, struct scatterlist *sgl,
> > unsigned int sg_len, enum dma_transfer_direction direction, @@
> > -1344,9 +1431,9 @@ static struct dma_async_tx_descriptor
> > *sdma_prep_slave_sg(
> >
> > count = sg_dma_len(sg);
> >
> > -   if (count > 0x) {
> > +   if (count > SDMA_BD_MAX_CNT) {
> > dev_err(sdma->dev, "SDMA channel %d: maximum bytes for 
> > sg
> entry exceeded: %d > %d\n",
> > -   channel, count, 0x);
> > +   channel, count, SDMA_BD_MAX_CNT);
> 
> these changes dont belong to this patch
Will split in v2.
> 
> > @@ -1486,6 +1573,8 @@ static int sdma_config(struct dma_chan *chan,
> > sdmac->watermark_level |= (dmaengine_cfg->dst_maxburst << 16)
> &
> > SDMA_WATERMARK_LEVEL_HWML;
> > sdmac->word_size = dmaengine_cfg->dst_addr_width;
> > +   } else if (dmaengine_cfg->direction == DMA_MEM_TO_MEM) {
> > +   sdmac->word_size = dmaengine_cfg->dst_addr_width;
> 
> same here too, we are in .device_config which deals with slave. Not memcpy!
Will remove it.
> 
> > } else {
> > sdmac->per_address = dmaengine_cfg->dst_addr;
> > sdmac->watermark_level = dmaengine_cfg->dst_maxburst * @@
> -1902,6
> > +1991,7 @@ static int sdma_probe(struct platform_device *pdev)
> >
> > dma_cap_set(DMA_SLAVE, sdma->dma_device.cap_mask);
> > dma_cap_set(DMA_CYCLIC, sdma->dma_device.cap_mask);
> > +   dma_cap_set(DMA_MEMCPY, sdma->dma_device.cap_mask);
> >
> > INIT_LIST_HEAD(>dma_device.channels);
> > /* Initialize channel parameters */
> > @@ -1968,9 +2058,11 @@ static int sdma_probe(struct platform_device
> *pdev)
> > sdma->dma_device.dst_addr_widths = SDMA_DMA_BUSWIDTHS;
> > sdma->dma_device.directions = SDMA_DMA_DIRECTIONS;
> > sdma->dma_device.residue_granularity =
> > DMA_RESIDUE_GRANULARITY_SEGMENT;
> > +   sdma->dma_device.device_prep_dma_memcpy = sdma_prep_memcpy;
> > sdma->dma_device.device_issue_pending = sdma_issue_pending;
> > sdma->dma_device.dev->dma_parms = >dma_parms;
> > -   dma_set_max_seg_size(sdma->dma_device.dev, 65535);
> > +   sdma->dma_device.copy_align = DMAENGINE_ALIGN_4_BYTES;
> > +   dma_set_max_seg_size(sdma->dma_device.dev, SDMA_BD_MAX_CNT);
> 
> this line should not be part of this patch
> 
> --
> ~Vinod


[RFC PATCH v2 3/4] misc: Add bmc-misc-ctrl

2018-07-10 Thread Andrew Jeffery
The bmc-misc-ctrl platform driver stitches together the associated
devicetree bindings and the sysfs-devices-platform-field ABI to expose
fields described in the devicetree to userspace via sysfs.

While the userspace interface does not provide an abstraction over the
hardware, it does provide some improvements over devmem:

1. Removal of read-modify-write races, as register access is atomic
2. Reduced foot-gun, as only the defined field is accessible
3. Improved discoverability, as the fields are named

Userspace is expected to use its own means to discover fields of
interest in /sys/devices/platform, either via udev events or search.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Fix issues pointed out by Greg
* Drop the device class

 MAINTAINERS  |   1 +
 drivers/misc/Kconfig |  11 +
 drivers/misc/Makefile|   1 +
 drivers/misc/bmc-misc-ctrl.c | 446 +++
 4 files changed, 459 insertions(+)
 create mode 100644 drivers/misc/bmc-misc-ctrl.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d167f0340c11..c29136614cb8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2742,6 +2742,7 @@ L:open...@lists.ozlabs.org (moderated for 
non-subscribers)
 S: Supported
 F: Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
 F: Documentation/ABI/testing/sysfs-devices-platform-field
+F: drivers/misc/bmc-misc-ctrl.c
 
 BPF (Safe dynamic programs and tools)
 M: Alexei Starovoitov 
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 3726eacdf65d..914f8d37645d 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -513,6 +513,17 @@ config MISC_RTSX
tristate
default MISC_RTSX_PCI || MISC_RTSX_USB
 
+config BMC_MISC_CTRL
+   tristate "Miscellaneous BMC Control Interfaces"
+   depends on REGMAP && MFD_SYSCON
+   help
+ Say yes to expose scratch registers used to communicate between the
+ host and BMC along with other miscellaneous control interfaces
+ provided by BMC SoCs.
+
+ Attributes for controlling the fields are exposed in sysfs according
+ to the sysfs-devices-platform-field ABI.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index af22bbc3d00c..4fb2fac7a486 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,3 +58,4 @@ obj-$(CONFIG_ASPEED_LPC_SNOOP)+= aspeed-lpc-snoop.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)+= pci_endpoint_test.o
 obj-$(CONFIG_OCXL) += ocxl/
 obj-$(CONFIG_MISC_RTSX)+= cardreader/
+obj-$(CONFIG_BMC_MISC_CTRL) += bmc-misc-ctrl.o
diff --git a/drivers/misc/bmc-misc-ctrl.c b/drivers/misc/bmc-misc-ctrl.c
new file mode 100644
index ..93e1412f7087
--- /dev/null
+++ b/drivers/misc/bmc-misc-ctrl.c
@@ -0,0 +1,446 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2018 IBM Corp.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct bmc_misc_label {
+   const char *label;
+   struct device_attribute label_attr;
+};
+
+struct bmc_misc_field {
+   u32 shift;
+   u32 mask;
+   struct device_attribute mask_attr;
+};
+
+struct bmc_misc_type {
+   const char *type;
+   struct device_attribute type_attr;
+};
+
+struct bmc_misc_rw {
+   struct regmap *map;
+
+   struct bmc_misc_field field;
+   struct bmc_misc_label label;
+   struct bmc_misc_type type;
+
+   u32 value;
+   struct device_attribute value_attr;
+
+   struct attribute_group attr_grp;
+   struct attribute *attrs[5];
+};
+
+struct bmc_misc_sc {
+   struct regmap *map;
+
+   struct bmc_misc_field field;
+   struct bmc_misc_label label;
+   struct bmc_misc_type type;
+
+   u32 read;
+   u32 set;
+   u32 clear;
+
+   struct device_attribute read_attr;
+   struct device_attribute set_attr;
+   struct device_attribute clear_attr;
+
+   struct attribute_group attr_grp;
+   struct attribute *attrs[7];
+};
+
+static ssize_t bmc_misc_label_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct bmc_misc_label *priv;
+
+   priv = container_of(attr, struct bmc_misc_label, label_attr);
+
+   return sprintf(buf, "%s\n", priv->label);
+}
+
+static int bmc_misc_label_init(struct device_node *node,
+  struct bmc_misc_label *priv)
+{
+   int rc;
+
+   rc = of_property_read_string(node, "label", >label);
+   if (rc < 0)
+   return rc;
+
+   sysfs_attr_init(>label_attr.attr);
+   priv->label_attr.attr.name = "label";
+   priv->label_attr.attr.mode = 0440;
+   priv->label_attr.show = bmc_misc_label_show;
+
+   return 0;
+}
+
+static ssize_t bmc_misc_mask_show(struct device *dev,
+ struct 

[RFC PATCH v2 4/4] dts: aspeed-g5: Describe VGA, SIO scratch and DAC mux fields

2018-07-10 Thread Andrew Jeffery
The AST2500 has VGA scratch registers that are read-only, SuperIO
scratch registers that are a mix of read-only and read-write, and a
graphics DAC mux that must be read or configured in the process of
booting e.g. an OpenPOWER system.

These capabilities do not really have a place in other drivers, so
expose them as fields via bmc-misc-ctrl.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Rework labels to what is documented in the bindings
* Fix an incorrect offset property

 arch/arm/boot/dts/aspeed-g5.dtsi | 192 +++
 1 file changed, 192 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index 17f2714d18a7..c484ac637328 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -187,6 +187,77 @@
aspeed,external-nodes = < >;
 
};
+
+   field@2c.16 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x2c>;
+   mask = <0x0003>;
+   label = "dac-mux";
+   };
+
+   field@50.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x50>;
+   mask = <0x>;
+   label = "vga0";
+   read-only;
+   };
+
+   field@54.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x54>;
+   mask = <0x>;
+   label = "vga1";
+   read-only;
+   };
+
+   field@58.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x58>;
+   mask = <0x>;
+   label = "vga2";
+   read-only;
+   };
+
+   field@5c.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x5c>;
+   mask = <0x>;
+   label = "vga3";
+   read-only;
+   };
+
+   field@60.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x60>;
+   mask = <0x>;
+   label = "vga4";
+   read-only;
+   };
+
+   field@64.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x64>;
+   mask = <0x>;
+   label = "vga5";
+   read-only;
+   };
+
+   field@68.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x68>;
+   mask = <0x>;
+   label = "vga6";
+   read-only;
+   };
+
+   field@6c.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x6c>;
+   mask = <0x>;
+   label = "vga7";
+   read-only;
+   };
};
 
rng: hwrng@1e6e2078 {
@@ -343,6 +414,127 @@
#reset-cells = <1>;
};
 
+   field@f0.24 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0xf0>;
+   mask = <0xff00>;
+   label = "sio2b";
+   };
+
+   field@f0.16 {
+   compatible = "bmc-misc-ctrl";
+  

[RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields

2018-07-10 Thread Andrew Jeffery
Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
provide remote management of (primarily) server platforms. BMCs are
often tightly coupled to the platform in terms of behaviour and provide
many hardware features integral to booting and running the host system.

Some of these hardware features are simple, for example scratch
registers provided by the BMC that are exposed to both the host and the
BMC. In other cases there's a single bit switch to enable or disable
some of the provided functionality.

The documentation defines bindings for fields in registers that do not
integrate well into other driver models yet must be described to allow
the BMC kernel to assume control of these features.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Add a commit message
* Minor changes to documented labels

 .../bindings/misc/bmc-misc-ctrl.txt   | 252 ++
 MAINTAINERS   |   6 +
 2 files changed, 258 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt

diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt 
b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
new file mode 100644
index ..2c869fcc7ef2
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
@@ -0,0 +1,252 @@
+BMC Miscellaneous Control Interfaces
+
+
+Baseboard Management Controllers (BMCs) often have an array of hardware
+features that need to be described but are awkward to sensibly expose.
+
+This bindings document provides a generic mechanism for describing such
+features, covering read-only (RO), read-modify-write (RMW) and
+write-1-set/write-1-clear (W1SC) semantics.
+
+All uses of bmc-misc-ctrl must be documented under Valid Uses below.
+
+The bindings are similar in nature to register-bit-led.
+
+Required Properties
+---
+
+compatible:Must be "bmc-misc-ctrl"
+offset:A one or three cell property describing the registers
+   associated with the field.
+
+   If the optional property 'set-clear' is not present then the
+   node describes a register with read-modify-write semantics. The
+   offset property has one cell describing the register of
+   interest.
+
+   If the optional property 'set-clear' is present then the node
+   describes a register set that together implement read,
+   write-1-set and write-1-clear semantics. The offset property
+   must be three cells, the first is the address of the register
+   to read from, the second the write-1-set register and the third
+   write-1-clear.
+
+mask:  A mask whose set bits represent the bits of the field.
+label: The name of the field
+
+Optional Properties
+---
+
+read-only: Define a read-only field (RMW/W1SC irrelevant).
+set-clear: Define whether the field exists in a RMW or W1SC register set
+default-value: Single cell applicable to RMW. The field will be updated to the
+   cell's value.
+default-set:   For W1SC, set all bits in the field
+default-clear: For W1SC, clear all bits in the field
+
+Valid Uses
+--
+
+Description:   Control bit for switching the video display DAC mux between
+   host VGA and BMC CRT mode
+Machines:  aspeed,ast2500
+Parent:compatible = "aspeed,ast2500-scu", "syscon", 
"simple-mfd";
+Node:
+   field@2c.16 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x2c>;
+   mask = <0x0003>;
+   label = "dac-mux";
+   };
+
+Description:   Host VGA scratch registers
+Machines:  aspeed,ast2500
+Parent:compatible = "aspeed,ast2500-scu", "syscon", 
"simple-mfd";
+Node:
+   field@50.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x50>;
+   mask = <0x>;
+   label = "vga0";
+   read-only;
+   };
+
+   field@54.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x54>;
+   mask = <0x>;
+   label = "vga1";
+   read-only;
+   };
+
+   field@58.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x58>;
+   mask = <0x>;
+   label = "vga2";
+   read-only;
+   };
+
+   field@5c.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x5c>;
+   mask = <0x>;
+   label = "vga3";
+   read-only;
+

[RFC PATCH v2 3/4] misc: Add bmc-misc-ctrl

2018-07-10 Thread Andrew Jeffery
The bmc-misc-ctrl platform driver stitches together the associated
devicetree bindings and the sysfs-devices-platform-field ABI to expose
fields described in the devicetree to userspace via sysfs.

While the userspace interface does not provide an abstraction over the
hardware, it does provide some improvements over devmem:

1. Removal of read-modify-write races, as register access is atomic
2. Reduced foot-gun, as only the defined field is accessible
3. Improved discoverability, as the fields are named

Userspace is expected to use its own means to discover fields of
interest in /sys/devices/platform, either via udev events or search.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Fix issues pointed out by Greg
* Drop the device class

 MAINTAINERS  |   1 +
 drivers/misc/Kconfig |  11 +
 drivers/misc/Makefile|   1 +
 drivers/misc/bmc-misc-ctrl.c | 446 +++
 4 files changed, 459 insertions(+)
 create mode 100644 drivers/misc/bmc-misc-ctrl.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d167f0340c11..c29136614cb8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2742,6 +2742,7 @@ L:open...@lists.ozlabs.org (moderated for 
non-subscribers)
 S: Supported
 F: Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
 F: Documentation/ABI/testing/sysfs-devices-platform-field
+F: drivers/misc/bmc-misc-ctrl.c
 
 BPF (Safe dynamic programs and tools)
 M: Alexei Starovoitov 
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 3726eacdf65d..914f8d37645d 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -513,6 +513,17 @@ config MISC_RTSX
tristate
default MISC_RTSX_PCI || MISC_RTSX_USB
 
+config BMC_MISC_CTRL
+   tristate "Miscellaneous BMC Control Interfaces"
+   depends on REGMAP && MFD_SYSCON
+   help
+ Say yes to expose scratch registers used to communicate between the
+ host and BMC along with other miscellaneous control interfaces
+ provided by BMC SoCs.
+
+ Attributes for controlling the fields are exposed in sysfs according
+ to the sysfs-devices-platform-field ABI.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index af22bbc3d00c..4fb2fac7a486 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,3 +58,4 @@ obj-$(CONFIG_ASPEED_LPC_SNOOP)+= aspeed-lpc-snoop.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)+= pci_endpoint_test.o
 obj-$(CONFIG_OCXL) += ocxl/
 obj-$(CONFIG_MISC_RTSX)+= cardreader/
+obj-$(CONFIG_BMC_MISC_CTRL) += bmc-misc-ctrl.o
diff --git a/drivers/misc/bmc-misc-ctrl.c b/drivers/misc/bmc-misc-ctrl.c
new file mode 100644
index ..93e1412f7087
--- /dev/null
+++ b/drivers/misc/bmc-misc-ctrl.c
@@ -0,0 +1,446 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2018 IBM Corp.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct bmc_misc_label {
+   const char *label;
+   struct device_attribute label_attr;
+};
+
+struct bmc_misc_field {
+   u32 shift;
+   u32 mask;
+   struct device_attribute mask_attr;
+};
+
+struct bmc_misc_type {
+   const char *type;
+   struct device_attribute type_attr;
+};
+
+struct bmc_misc_rw {
+   struct regmap *map;
+
+   struct bmc_misc_field field;
+   struct bmc_misc_label label;
+   struct bmc_misc_type type;
+
+   u32 value;
+   struct device_attribute value_attr;
+
+   struct attribute_group attr_grp;
+   struct attribute *attrs[5];
+};
+
+struct bmc_misc_sc {
+   struct regmap *map;
+
+   struct bmc_misc_field field;
+   struct bmc_misc_label label;
+   struct bmc_misc_type type;
+
+   u32 read;
+   u32 set;
+   u32 clear;
+
+   struct device_attribute read_attr;
+   struct device_attribute set_attr;
+   struct device_attribute clear_attr;
+
+   struct attribute_group attr_grp;
+   struct attribute *attrs[7];
+};
+
+static ssize_t bmc_misc_label_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct bmc_misc_label *priv;
+
+   priv = container_of(attr, struct bmc_misc_label, label_attr);
+
+   return sprintf(buf, "%s\n", priv->label);
+}
+
+static int bmc_misc_label_init(struct device_node *node,
+  struct bmc_misc_label *priv)
+{
+   int rc;
+
+   rc = of_property_read_string(node, "label", >label);
+   if (rc < 0)
+   return rc;
+
+   sysfs_attr_init(>label_attr.attr);
+   priv->label_attr.attr.name = "label";
+   priv->label_attr.attr.mode = 0440;
+   priv->label_attr.show = bmc_misc_label_show;
+
+   return 0;
+}
+
+static ssize_t bmc_misc_mask_show(struct device *dev,
+ struct 

[RFC PATCH v2 4/4] dts: aspeed-g5: Describe VGA, SIO scratch and DAC mux fields

2018-07-10 Thread Andrew Jeffery
The AST2500 has VGA scratch registers that are read-only, SuperIO
scratch registers that are a mix of read-only and read-write, and a
graphics DAC mux that must be read or configured in the process of
booting e.g. an OpenPOWER system.

These capabilities do not really have a place in other drivers, so
expose them as fields via bmc-misc-ctrl.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Rework labels to what is documented in the bindings
* Fix an incorrect offset property

 arch/arm/boot/dts/aspeed-g5.dtsi | 192 +++
 1 file changed, 192 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index 17f2714d18a7..c484ac637328 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -187,6 +187,77 @@
aspeed,external-nodes = < >;
 
};
+
+   field@2c.16 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x2c>;
+   mask = <0x0003>;
+   label = "dac-mux";
+   };
+
+   field@50.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x50>;
+   mask = <0x>;
+   label = "vga0";
+   read-only;
+   };
+
+   field@54.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x54>;
+   mask = <0x>;
+   label = "vga1";
+   read-only;
+   };
+
+   field@58.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x58>;
+   mask = <0x>;
+   label = "vga2";
+   read-only;
+   };
+
+   field@5c.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x5c>;
+   mask = <0x>;
+   label = "vga3";
+   read-only;
+   };
+
+   field@60.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x60>;
+   mask = <0x>;
+   label = "vga4";
+   read-only;
+   };
+
+   field@64.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x64>;
+   mask = <0x>;
+   label = "vga5";
+   read-only;
+   };
+
+   field@68.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x68>;
+   mask = <0x>;
+   label = "vga6";
+   read-only;
+   };
+
+   field@6c.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x6c>;
+   mask = <0x>;
+   label = "vga7";
+   read-only;
+   };
};
 
rng: hwrng@1e6e2078 {
@@ -343,6 +414,127 @@
#reset-cells = <1>;
};
 
+   field@f0.24 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0xf0>;
+   mask = <0xff00>;
+   label = "sio2b";
+   };
+
+   field@f0.16 {
+   compatible = "bmc-misc-ctrl";
+  

[RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields

2018-07-10 Thread Andrew Jeffery
Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
provide remote management of (primarily) server platforms. BMCs are
often tightly coupled to the platform in terms of behaviour and provide
many hardware features integral to booting and running the host system.

Some of these hardware features are simple, for example scratch
registers provided by the BMC that are exposed to both the host and the
BMC. In other cases there's a single bit switch to enable or disable
some of the provided functionality.

The documentation defines bindings for fields in registers that do not
integrate well into other driver models yet must be described to allow
the BMC kernel to assume control of these features.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Add a commit message
* Minor changes to documented labels

 .../bindings/misc/bmc-misc-ctrl.txt   | 252 ++
 MAINTAINERS   |   6 +
 2 files changed, 258 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt

diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt 
b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
new file mode 100644
index ..2c869fcc7ef2
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
@@ -0,0 +1,252 @@
+BMC Miscellaneous Control Interfaces
+
+
+Baseboard Management Controllers (BMCs) often have an array of hardware
+features that need to be described but are awkward to sensibly expose.
+
+This bindings document provides a generic mechanism for describing such
+features, covering read-only (RO), read-modify-write (RMW) and
+write-1-set/write-1-clear (W1SC) semantics.
+
+All uses of bmc-misc-ctrl must be documented under Valid Uses below.
+
+The bindings are similar in nature to register-bit-led.
+
+Required Properties
+---
+
+compatible:Must be "bmc-misc-ctrl"
+offset:A one or three cell property describing the registers
+   associated with the field.
+
+   If the optional property 'set-clear' is not present then the
+   node describes a register with read-modify-write semantics. The
+   offset property has one cell describing the register of
+   interest.
+
+   If the optional property 'set-clear' is present then the node
+   describes a register set that together implement read,
+   write-1-set and write-1-clear semantics. The offset property
+   must be three cells, the first is the address of the register
+   to read from, the second the write-1-set register and the third
+   write-1-clear.
+
+mask:  A mask whose set bits represent the bits of the field.
+label: The name of the field
+
+Optional Properties
+---
+
+read-only: Define a read-only field (RMW/W1SC irrelevant).
+set-clear: Define whether the field exists in a RMW or W1SC register set
+default-value: Single cell applicable to RMW. The field will be updated to the
+   cell's value.
+default-set:   For W1SC, set all bits in the field
+default-clear: For W1SC, clear all bits in the field
+
+Valid Uses
+--
+
+Description:   Control bit for switching the video display DAC mux between
+   host VGA and BMC CRT mode
+Machines:  aspeed,ast2500
+Parent:compatible = "aspeed,ast2500-scu", "syscon", 
"simple-mfd";
+Node:
+   field@2c.16 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x2c>;
+   mask = <0x0003>;
+   label = "dac-mux";
+   };
+
+Description:   Host VGA scratch registers
+Machines:  aspeed,ast2500
+Parent:compatible = "aspeed,ast2500-scu", "syscon", 
"simple-mfd";
+Node:
+   field@50.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x50>;
+   mask = <0x>;
+   label = "vga0";
+   read-only;
+   };
+
+   field@54.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x54>;
+   mask = <0x>;
+   label = "vga1";
+   read-only;
+   };
+
+   field@58.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x58>;
+   mask = <0x>;
+   label = "vga2";
+   read-only;
+   };
+
+   field@5c.0 {
+   compatible = "bmc-misc-ctrl";
+   offset = <0x5c>;
+   mask = <0x>;
+   label = "vga3";
+   read-only;
+

[RFC PATCH v2 0/4] sysfs interface to miscellaneous BMC controls and fields

2018-07-10 Thread Andrew Jeffery
Hello,

This series is a second stab at exposing hardware controls on Baseboard
Management Controllers that are hard to fit into any a coherent abstraction.

The patches introduce new devicetree bindings and sysfs attributes, along with
a platform driver to expose devicetree nodes of the former as the latter.

Obviously not having an abstract interface to these knobs and switches is not
ideal, but the proposal does have some advantages over devmem:

1. Removal of read-modify-write races, as register update is atomic
2. Reduced foot-gun, as only the defined field is accessible
3. Improved discoverability as the fields are named

The intent is that the setup should be used as a second-last resort (over
devmem). I'm interested in feedback on:

a) Is this a acceptable improvement over devmem?
b) If a), is the devicetree the best way to describe the fields?
c) If b), is directly mapping them to a sysfs attr group managable longterm?

My concern with b) and c) is that there's not a clear restriction on what
fields can be exposed using the driver, so I've tried to compensate by
explicitly documenting the recognised fields in the bindings.

Looking for feedback on all fronts.

Cheers,

Andrew

Andrew Jeffery (4):
  dt-bindings: misc: Add bindings for misc. BMC control fields
  Documentation: ABI: Add sysfs-devices-platform-field to testing
  misc: Add bmc-misc-ctrl
  dts: aspeed-g5: Describe VGA, SIO scratch and DAC mux fields

 .../ABI/testing/sysfs-devices-platform-field  |  95 
 .../bindings/misc/bmc-misc-ctrl.txt   | 252 ++
 MAINTAINERS   |   8 +
 arch/arm/boot/dts/aspeed-g5.dtsi  | 192 
 drivers/misc/Kconfig  |  11 +
 drivers/misc/Makefile |   1 +
 drivers/misc/bmc-misc-ctrl.c  | 446 ++
 7 files changed, 1005 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-platform-field
 create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
 create mode 100644 drivers/misc/bmc-misc-ctrl.c

-- 
2.17.1



[RFC PATCH v2 2/4] Documentation: ABI: Add sysfs-devices-platform-field to testing

2018-07-10 Thread Andrew Jeffery
"Fields" expose control of hardware directly to userspace where
appropriate. Examples of expected use are single bit switches or other
small masks of registers where the range of values is entirely policy
driven and the field is not part of a larger, coherent design.

These fields can be from read-only, read-write or
write-1-set/write-1-clear register sets.

Using fields to control the behaviour of hardware local to the kernel
exposing them is likely incorrect. The use-case motivating the fields
feature is for Baseboard Management Controllers (BMCs) to expose policy
controls for booting and running their host systems.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Describe a 'type' attribute that determines the behaviour of the remaining
  attributes
* Rework paths to point through /sys/devices/platform
* Add a description to the commit message

 .../ABI/testing/sysfs-devices-platform-field  | 95 +++
 MAINTAINERS   |  1 +
 2 files changed, 96 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-platform-field

diff --git a/Documentation/ABI/testing/sysfs-devices-platform-field 
b/Documentation/ABI/testing/sysfs-devices-platform-field
new file mode 100644
index ..216481d8bc99
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-platform-field
@@ -0,0 +1,95 @@
+This document defines the sysfs attributes provided by the bmc-misc-ctrl
+driver. See Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt for
+exhaustive list of field definitions.
+
+What:  /sys/devices/platform/...///label
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   An RO attribute providing the name of the field of interest.
+   Corresponds to the value of  in the path
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///type
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   An RO attribute describing the type of the field. The type
+   takes one of three values:
+
+   'ro':   The field is read-only. The 'value' attribute will be
+   read-only and neither 'set' nor 'clear' attributes will
+   be present.
+   'rw':   The field is read-write. The 'value' attribute will be
+   both readable and writable and neither 'set' nor
+   'clear' attributes will be present. Values written to
+   the 'value' attribute will be atomically updated.
+   'w1sc': The field uses write-1-{set,clear} semantics. The
+   'value' attribute will be read-only, and both 'set' and
+   'clear' attributes will be present to manipulate
+   'value'. 'set' and 'clear' will both be write-only.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///mask
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   An RO attribute providing the mask applied to the value
+   read/written from the 'value' attribute.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///value
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   The value of the field of interest.
+
+   If the field is exposed from a read-modify-write register this
+   attribute will be RW, where writes will set the field to the
+   value written. Writing values that exceed the width of the
+   field will return an error.
+
+   If the field is exposed from a write-1-set/write-1-clear
+   register this attribute will be RO, and the attributes 'set'
+   and 'clear' will be present as write-only.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///set
+Users: open...@lists.ozlabs.org
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   A WO attribute that when written will set bits in the backing
+   register corresponding to set bits in the value written.
+   Register bits corresponding to cleared bits in the written
+   value will remain unchanged.
+
+   This attribute is exposed when the field is identified as being
+   composed of write-1-set and write-1-clear registers.
+
+   Writing values that exceed the width of the mask value will
+   return an error.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///clear
+Users: open...@lists.ozlabs.org
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   

[RFC PATCH v2 0/4] sysfs interface to miscellaneous BMC controls and fields

2018-07-10 Thread Andrew Jeffery
Hello,

This series is a second stab at exposing hardware controls on Baseboard
Management Controllers that are hard to fit into any a coherent abstraction.

The patches introduce new devicetree bindings and sysfs attributes, along with
a platform driver to expose devicetree nodes of the former as the latter.

Obviously not having an abstract interface to these knobs and switches is not
ideal, but the proposal does have some advantages over devmem:

1. Removal of read-modify-write races, as register update is atomic
2. Reduced foot-gun, as only the defined field is accessible
3. Improved discoverability as the fields are named

The intent is that the setup should be used as a second-last resort (over
devmem). I'm interested in feedback on:

a) Is this a acceptable improvement over devmem?
b) If a), is the devicetree the best way to describe the fields?
c) If b), is directly mapping them to a sysfs attr group managable longterm?

My concern with b) and c) is that there's not a clear restriction on what
fields can be exposed using the driver, so I've tried to compensate by
explicitly documenting the recognised fields in the bindings.

Looking for feedback on all fronts.

Cheers,

Andrew

Andrew Jeffery (4):
  dt-bindings: misc: Add bindings for misc. BMC control fields
  Documentation: ABI: Add sysfs-devices-platform-field to testing
  misc: Add bmc-misc-ctrl
  dts: aspeed-g5: Describe VGA, SIO scratch and DAC mux fields

 .../ABI/testing/sysfs-devices-platform-field  |  95 
 .../bindings/misc/bmc-misc-ctrl.txt   | 252 ++
 MAINTAINERS   |   8 +
 arch/arm/boot/dts/aspeed-g5.dtsi  | 192 
 drivers/misc/Kconfig  |  11 +
 drivers/misc/Makefile |   1 +
 drivers/misc/bmc-misc-ctrl.c  | 446 ++
 7 files changed, 1005 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-platform-field
 create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
 create mode 100644 drivers/misc/bmc-misc-ctrl.c

-- 
2.17.1



[RFC PATCH v2 2/4] Documentation: ABI: Add sysfs-devices-platform-field to testing

2018-07-10 Thread Andrew Jeffery
"Fields" expose control of hardware directly to userspace where
appropriate. Examples of expected use are single bit switches or other
small masks of registers where the range of values is entirely policy
driven and the field is not part of a larger, coherent design.

These fields can be from read-only, read-write or
write-1-set/write-1-clear register sets.

Using fields to control the behaviour of hardware local to the kernel
exposing them is likely incorrect. The use-case motivating the fields
feature is for Baseboard Management Controllers (BMCs) to expose policy
controls for booting and running their host systems.

Signed-off-by: Andrew Jeffery 
---

Since RFC v1:

* Describe a 'type' attribute that determines the behaviour of the remaining
  attributes
* Rework paths to point through /sys/devices/platform
* Add a description to the commit message

 .../ABI/testing/sysfs-devices-platform-field  | 95 +++
 MAINTAINERS   |  1 +
 2 files changed, 96 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-platform-field

diff --git a/Documentation/ABI/testing/sysfs-devices-platform-field 
b/Documentation/ABI/testing/sysfs-devices-platform-field
new file mode 100644
index ..216481d8bc99
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-platform-field
@@ -0,0 +1,95 @@
+This document defines the sysfs attributes provided by the bmc-misc-ctrl
+driver. See Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt for
+exhaustive list of field definitions.
+
+What:  /sys/devices/platform/...///label
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   An RO attribute providing the name of the field of interest.
+   Corresponds to the value of  in the path
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///type
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   An RO attribute describing the type of the field. The type
+   takes one of three values:
+
+   'ro':   The field is read-only. The 'value' attribute will be
+   read-only and neither 'set' nor 'clear' attributes will
+   be present.
+   'rw':   The field is read-write. The 'value' attribute will be
+   both readable and writable and neither 'set' nor
+   'clear' attributes will be present. Values written to
+   the 'value' attribute will be atomically updated.
+   'w1sc': The field uses write-1-{set,clear} semantics. The
+   'value' attribute will be read-only, and both 'set' and
+   'clear' attributes will be present to manipulate
+   'value'. 'set' and 'clear' will both be write-only.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///mask
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   An RO attribute providing the mask applied to the value
+   read/written from the 'value' attribute.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///value
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   The value of the field of interest.
+
+   If the field is exposed from a read-modify-write register this
+   attribute will be RW, where writes will set the field to the
+   value written. Writing values that exceed the width of the
+   field will return an error.
+
+   If the field is exposed from a write-1-set/write-1-clear
+   register this attribute will be RO, and the attributes 'set'
+   and 'clear' will be present as write-only.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///set
+Users: open...@lists.ozlabs.org
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   open...@lists.ozlabs.org
+Description:
+   A WO attribute that when written will set bits in the backing
+   register corresponding to set bits in the value written.
+   Register bits corresponding to cleared bits in the written
+   value will remain unchanged.
+
+   This attribute is exposed when the field is identified as being
+   composed of write-1-set and write-1-clear registers.
+
+   Writing values that exceed the width of the mask value will
+   return an error.
+Users: open...@lists.ozlabs.org
+
+What:  /sys/devices/platform/...///clear
+Users: open...@lists.ozlabs.org
+Date:  July, 2018
+KernelVersion: v4.19
+Contact:   

Re: [PATCH v9 7/7] kselftests: Add tests for the preemptoff and irqsoff tracers

2018-07-10 Thread Joel Fernandes
On Tue, Jul 10, 2018 at 08:49:58PM -0400, Steven Rostedt wrote:
> On Thu, 28 Jun 2018 11:21:49 -0700
> Joel Fernandes  wrote:
> 
> > From: "Joel Fernandes (Google)" 
> > 
> > Here we add unit tests for the preemptoff and irqsoff tracer by using a
> > kernel module introduced previously to trigger atomic sections in the
> > kernel.
> > 
> > Reviewed-by: Masami Hiramatsu 
> > Acked-by: Masami Hiramatsu 
> > Signed-off-by: Joel Fernandes (Google) 
> 
> This looks fine. The only patch that needs to be changed and resent is
> patch 6 and 7. Just send 6, and this one again because it depends on
> patch 6.
> 
> I'll go ahead and apply 1-5 and kick off my other tests.

Sounds good, I'll resend those shortly with the changes you suggested.

Thanks!

- Joel


Re: [PATCH v9 7/7] kselftests: Add tests for the preemptoff and irqsoff tracers

2018-07-10 Thread Joel Fernandes
On Tue, Jul 10, 2018 at 08:49:58PM -0400, Steven Rostedt wrote:
> On Thu, 28 Jun 2018 11:21:49 -0700
> Joel Fernandes  wrote:
> 
> > From: "Joel Fernandes (Google)" 
> > 
> > Here we add unit tests for the preemptoff and irqsoff tracer by using a
> > kernel module introduced previously to trigger atomic sections in the
> > kernel.
> > 
> > Reviewed-by: Masami Hiramatsu 
> > Acked-by: Masami Hiramatsu 
> > Signed-off-by: Joel Fernandes (Google) 
> 
> This looks fine. The only patch that needs to be changed and resent is
> patch 6 and 7. Just send 6, and this one again because it depends on
> patch 6.
> 
> I'll go ahead and apply 1-5 and kick off my other tests.

Sounds good, I'll resend those shortly with the changes you suggested.

Thanks!

- Joel


Re: [PATCH v9 6/7] lib: Add module to simulate atomic sections for testing preemptoff tracers

2018-07-10 Thread Joel Fernandes
On Tue, Jul 10, 2018 at 08:47:07PM -0400, Steven Rostedt wrote:
> On Thu, 28 Jun 2018 11:21:48 -0700
> Joel Fernandes  wrote:
> 
> > From: "Joel Fernandes (Google)" 
> > 
> > In this patch we introduce a test module for simulating a long atomic
> > section in the kernel which the preemptoff or irqsoff tracers can
> > detect. This module is to be used only for test purposes and is default
> > disabled.
> > 
> > Following is the expected output (only briefly shown) that can be parsed
> > to verify that the tracers are working correctly. We will use this from
> > the kselftests in future patches.
> > 
> > For the preemptoff tracer:
> > 
> > echo preemptoff > /d/tracing/current_tracer
> > sleep 1
> > insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=50
> > sleep 1
> > bash-4.3# cat /d/tracing/trace
> > preempt -10662...20us@: atomic_sect_run <-atomic_sect_run
> > preempt -10662...2 52us : atomic_sect_run <-atomic_sect_run
> > preempt -10662...2 54us : tracer_preempt_on <-atomic_sect_run
> > preempt -10662...2 500012us : 
> >  => kthread
> >  => ret_from_fork  
> > 
> > For the irqsoff tracer:
> > 
> > echo irqsoff > /d/tracing/current_tracer
> > sleep 1
> > insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=50
> > sleep 1
> > bash-4.3# cat /d/tracing/trace
> > irq dis -10691d..10us@: atomic_sect_run
> > irq dis -10691d..1 51us : atomic_sect_run
> > irq dis -10691d..1 52us : tracer_hardirqs_on <-atomic_sect_run
> > irq dis -10691d..1 55us : 
> >  => ret_from_fork  
> > 
> > Co-developed-by: Erick Reyes 
> > Cc: Andy Shevchenko 
> > Reviewed-by: Andy Shevchenko 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> >  lib/Kconfig.debug  |  8 
> >  lib/Makefile   |  1 +
> >  lib/test_atomic_sections.c | 77 ++
> 
> I think this code should reside in kernel/trace directory. I already
> have modules there. See the ring_buffer_benchmark code and the test
> module for mmio tracer.

Ok, I'll move it to there.

> >  3 files changed, 86 insertions(+)
> >  create mode 100644 lib/test_atomic_sections.c
> > 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 8838d1158d19..622c90e1e066 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1956,6 +1956,14 @@ config TEST_KMOD
> >  
> >   If unsure, say N.
> >  
> > +config TEST_ATOMIC_SECTIONS
> > +   tristate "Simulate atomic sections for tracers to detect"
> 
> Hmm, I don't like this title. It's not very obvious to what it is
> about. What about "Preempt / IRQ disable delay thread to test latency
> tracers" ? Or something along those lines.

Sure, I'll change it to that. I agree its better. I'll change the text to
that and call the config TEST_PREEMPT_IRQ_DISABLE_DELAY.

> > +   depends on m
> > +   help
> > + Select this option to build a test module that can help test atomic
> > + sections by simulating them with a duration supplied as a module
> > + parameter. Preempt disable and irq disable modes can be requested.
> 
> "If unsure say N"

Sure, sounds good.

> > +
> >  config TEST_DEBUG_VIRTUAL
> > tristate "Test CONFIG_DEBUG_VIRTUAL feature"
> > depends on DEBUG_VIRTUAL
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 90dc5520b784..7831e747bf72 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -44,6 +44,7 @@ obj-y += string_helpers.o
> >  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> >  obj-y += hexdump.o
> >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
> >  obj-y += kstrtox.o
> >  obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
> >  obj-$(CONFIG_TEST_BPF) += test_bpf.o
> > diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
> > new file mode 100644
> > index ..1eef518f0974
> > --- /dev/null
> > +++ b/lib/test_atomic_sections.c
> > @@ -0,0 +1,77 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Atomic section emulation test module
> > + *
> > + * Emulates atomic sections by disabling IRQs or preemption
> > + * and doing a busy wait for a specified amount of time.
> > + * This can be used for testing of different atomic section
> > + * tracers such as irqsoff tracers.
> > + *
> > + * (c) 2018. Google LLC
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +static ulong atomic_time = 100;
> > +static char atomic_mode[10] = "irq";
> > +
> > +module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
> > +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
> > +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
> 
> It's not a "Period", it's a delay. "Length of time in critical section"

Sure.

> > +MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq 
> > (default 

Re: [PATCH v9 6/7] lib: Add module to simulate atomic sections for testing preemptoff tracers

2018-07-10 Thread Joel Fernandes
On Tue, Jul 10, 2018 at 08:47:07PM -0400, Steven Rostedt wrote:
> On Thu, 28 Jun 2018 11:21:48 -0700
> Joel Fernandes  wrote:
> 
> > From: "Joel Fernandes (Google)" 
> > 
> > In this patch we introduce a test module for simulating a long atomic
> > section in the kernel which the preemptoff or irqsoff tracers can
> > detect. This module is to be used only for test purposes and is default
> > disabled.
> > 
> > Following is the expected output (only briefly shown) that can be parsed
> > to verify that the tracers are working correctly. We will use this from
> > the kselftests in future patches.
> > 
> > For the preemptoff tracer:
> > 
> > echo preemptoff > /d/tracing/current_tracer
> > sleep 1
> > insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=50
> > sleep 1
> > bash-4.3# cat /d/tracing/trace
> > preempt -10662...20us@: atomic_sect_run <-atomic_sect_run
> > preempt -10662...2 52us : atomic_sect_run <-atomic_sect_run
> > preempt -10662...2 54us : tracer_preempt_on <-atomic_sect_run
> > preempt -10662...2 500012us : 
> >  => kthread
> >  => ret_from_fork  
> > 
> > For the irqsoff tracer:
> > 
> > echo irqsoff > /d/tracing/current_tracer
> > sleep 1
> > insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=50
> > sleep 1
> > bash-4.3# cat /d/tracing/trace
> > irq dis -10691d..10us@: atomic_sect_run
> > irq dis -10691d..1 51us : atomic_sect_run
> > irq dis -10691d..1 52us : tracer_hardirqs_on <-atomic_sect_run
> > irq dis -10691d..1 55us : 
> >  => ret_from_fork  
> > 
> > Co-developed-by: Erick Reyes 
> > Cc: Andy Shevchenko 
> > Reviewed-by: Andy Shevchenko 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> >  lib/Kconfig.debug  |  8 
> >  lib/Makefile   |  1 +
> >  lib/test_atomic_sections.c | 77 ++
> 
> I think this code should reside in kernel/trace directory. I already
> have modules there. See the ring_buffer_benchmark code and the test
> module for mmio tracer.

Ok, I'll move it to there.

> >  3 files changed, 86 insertions(+)
> >  create mode 100644 lib/test_atomic_sections.c
> > 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 8838d1158d19..622c90e1e066 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1956,6 +1956,14 @@ config TEST_KMOD
> >  
> >   If unsure, say N.
> >  
> > +config TEST_ATOMIC_SECTIONS
> > +   tristate "Simulate atomic sections for tracers to detect"
> 
> Hmm, I don't like this title. It's not very obvious to what it is
> about. What about "Preempt / IRQ disable delay thread to test latency
> tracers" ? Or something along those lines.

Sure, I'll change it to that. I agree its better. I'll change the text to
that and call the config TEST_PREEMPT_IRQ_DISABLE_DELAY.

> > +   depends on m
> > +   help
> > + Select this option to build a test module that can help test atomic
> > + sections by simulating them with a duration supplied as a module
> > + parameter. Preempt disable and irq disable modes can be requested.
> 
> "If unsure say N"

Sure, sounds good.

> > +
> >  config TEST_DEBUG_VIRTUAL
> > tristate "Test CONFIG_DEBUG_VIRTUAL feature"
> > depends on DEBUG_VIRTUAL
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 90dc5520b784..7831e747bf72 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -44,6 +44,7 @@ obj-y += string_helpers.o
> >  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> >  obj-y += hexdump.o
> >  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
> > +obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
> >  obj-y += kstrtox.o
> >  obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
> >  obj-$(CONFIG_TEST_BPF) += test_bpf.o
> > diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
> > new file mode 100644
> > index ..1eef518f0974
> > --- /dev/null
> > +++ b/lib/test_atomic_sections.c
> > @@ -0,0 +1,77 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Atomic section emulation test module
> > + *
> > + * Emulates atomic sections by disabling IRQs or preemption
> > + * and doing a busy wait for a specified amount of time.
> > + * This can be used for testing of different atomic section
> > + * tracers such as irqsoff tracers.
> > + *
> > + * (c) 2018. Google LLC
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +static ulong atomic_time = 100;
> > +static char atomic_mode[10] = "irq";
> > +
> > +module_param_named(atomic_time, atomic_time, ulong, S_IRUGO);
> > +module_param_string(atomic_mode, atomic_mode, 10, S_IRUGO);
> > +MODULE_PARM_DESC(atomic_time, "Period in microseconds (100 uS default)");
> 
> It's not a "Period", it's a delay. "Length of time in critical section"

Sure.

> > +MODULE_PARM_DESC(atomic_mode, "Mode of the test such as preempt or irq 
> > (default 

[PATCH v4 8/8] mm: Fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL

2018-07-10 Thread Dan Williams
Now that all producers of dev_pagemap instances in the kernel are
properly converted to EXPORT_SYMBOL_GPL, fix up implicit consumers that
interact with dev_pagemap owners via put_page(). To reiterate,
dev_pagemap producers are EXPORT_SYMBOL_GPL because they adopt and
modify core memory management interfaces such that the dev_pagemap owner
can interact with all other kernel infrastructure and sub-systems
(drivers, filesystems, etc...) that consume page structures.

Fixes: e76384884344 ("mm: introduce MEMORY_DEVICE_FS_DAX and 
CONFIG_DEV_PAGEMAP_OPS")
Reported-by: Joe Gorse 
Reported-by: John Hubbard 
Tested-by: Joe Gorse 
Tested-by: John Hubbard 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 16141b608b63..ecee37b44aa1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
 DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
-EXPORT_SYMBOL_GPL(devmap_managed_key);
+EXPORT_SYMBOL(devmap_managed_key);
 static atomic_t devmap_enable;
 
 /*
@@ -371,5 +371,5 @@ void __put_devmap_managed_page(struct page *page)
} else if (!count)
__put_page(page);
 }
-EXPORT_SYMBOL_GPL(__put_devmap_managed_page);
+EXPORT_SYMBOL(__put_devmap_managed_page);
 #endif /* CONFIG_DEV_PAGEMAP_OPS */



[PATCH v4 8/8] mm: Fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL

2018-07-10 Thread Dan Williams
Now that all producers of dev_pagemap instances in the kernel are
properly converted to EXPORT_SYMBOL_GPL, fix up implicit consumers that
interact with dev_pagemap owners via put_page(). To reiterate,
dev_pagemap producers are EXPORT_SYMBOL_GPL because they adopt and
modify core memory management interfaces such that the dev_pagemap owner
can interact with all other kernel infrastructure and sub-systems
(drivers, filesystems, etc...) that consume page structures.

Fixes: e76384884344 ("mm: introduce MEMORY_DEVICE_FS_DAX and 
CONFIG_DEV_PAGEMAP_OPS")
Reported-by: Joe Gorse 
Reported-by: John Hubbard 
Tested-by: Joe Gorse 
Tested-by: John Hubbard 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 16141b608b63..ecee37b44aa1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
 DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
-EXPORT_SYMBOL_GPL(devmap_managed_key);
+EXPORT_SYMBOL(devmap_managed_key);
 static atomic_t devmap_enable;
 
 /*
@@ -371,5 +371,5 @@ void __put_devmap_managed_page(struct page *page)
} else if (!count)
__put_page(page);
 }
-EXPORT_SYMBOL_GPL(__put_devmap_managed_page);
+EXPORT_SYMBOL(__put_devmap_managed_page);
 #endif /* CONFIG_DEV_PAGEMAP_OPS */



[PATCH v4 1/8] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL

2018-07-10 Thread Dan Williams
The devm_memremap_pages() facility is tightly integrated with the
kernel's memory hotplug functionality. It injects an altmap argument
deep into the architecture specific vmemmap implementation to allow
allocating from specific reserved pages, and it has Linux specific
assumptions about page structure reference counting relative to
get_user_pages() and get_user_pages_fast(). It was an oversight that
this was not marked EXPORT_SYMBOL_GPL from the outset.

It exposes and relies upon core kernel internal assumptions and will
continue to evolve as memory hotplug and support for new memory types
and topologies is required. Only an in kernel GPL-only driver is
expected to keep up with this ongoing evolution.

Cc: Michal Hocko 
Cc: "Jérôme Glisse" 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 5857267a4af5..4478e4688bb7 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -257,7 +257,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
pgmap_radix_release(res, pgoff);
return ERR_PTR(error);
 }
-EXPORT_SYMBOL(devm_memremap_pages);
+EXPORT_SYMBOL_GPL(devm_memremap_pages);
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 {



[PATCH v4 5/8] mm, hmm: Use devm semantics for hmm_devmem_{add, remove}

2018-07-10 Thread Dan Williams
devm semantics arrange for resources to be torn down when
device-driver-probe fails or when device-driver-release completes.
Similar to devm_memremap_pages() there is no need to support an explicit
remove operation when the users properly adhere to devm semantics.

Note that devm_kzalloc() automatically handles allocating node-local
memory.

Reviewed-by: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 include/linux/hmm.h |4 --
 mm/hmm.c|  127 ++-
 2 files changed, 25 insertions(+), 106 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 4c92e3ba3e16..5ec8635f602c 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -499,8 +499,7 @@ struct hmm_devmem {
  * enough and allocate struct page for it.
  *
  * The device driver can wrap the hmm_devmem struct inside a private device
- * driver struct. The device driver must call hmm_devmem_remove() before the
- * device goes away and before freeing the hmm_devmem struct memory.
+ * driver struct.
  */
 struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
  struct device *device,
@@ -508,7 +507,6 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
 struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
   struct device *device,
   struct resource *res);
-void hmm_devmem_remove(struct hmm_devmem *devmem);
 
 /*
  * hmm_devmem_page_set_drvdata - set per-page driver data field
diff --git a/mm/hmm.c b/mm/hmm.c
index de7b6bf77201..d65a9419dbc2 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -934,7 +934,6 @@ static void hmm_devmem_ref_exit(void *data)
 
devmem = container_of(ref, struct hmm_devmem, ref);
percpu_ref_exit(ref);
-   devm_remove_action(devmem->device, _devmem_ref_exit, data);
 }
 
 static void hmm_devmem_ref_kill(void *data)
@@ -945,7 +944,6 @@ static void hmm_devmem_ref_kill(void *data)
devmem = container_of(ref, struct hmm_devmem, ref);
percpu_ref_kill(ref);
wait_for_completion(>completion);
-   devm_remove_action(devmem->device, _devmem_ref_kill, data);
 }
 
 static int hmm_devmem_fault(struct vm_area_struct *vma,
@@ -984,7 +982,7 @@ static void hmm_devmem_radix_release(struct resource 
*resource)
mutex_unlock(_devmem_lock);
 }
 
-static void hmm_devmem_release(struct device *dev, void *data)
+static void hmm_devmem_release(void *data)
 {
struct hmm_devmem *devmem = data;
struct resource *resource = devmem->resource;
@@ -992,11 +990,6 @@ static void hmm_devmem_release(struct device *dev, void 
*data)
struct zone *zone;
struct page *page;
 
-   if (percpu_ref_tryget_live(>ref)) {
-   dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
-   percpu_ref_put(>ref);
-   }
-
/* pages are dead and unused, undo the arch mapping */
start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT;
npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT;
@@ -1120,19 +1113,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem 
*devmem)
return ret;
 }
 
-static int hmm_devmem_match(struct device *dev, void *data, void *match_data)
-{
-   struct hmm_devmem *devmem = data;
-
-   return devmem->resource == match_data;
-}
-
-static void hmm_devmem_pages_remove(struct hmm_devmem *devmem)
-{
-   devres_release(devmem->device, _devmem_release,
-  _devmem_match, devmem->resource);
-}
-
 /*
  * hmm_devmem_add() - hotplug ZONE_DEVICE memory for device memory
  *
@@ -1160,8 +1140,7 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
 
dev_pagemap_get_ops();
 
-   devmem = devres_alloc_node(_devmem_release, sizeof(*devmem),
-  GFP_KERNEL, dev_to_node(device));
+   devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL);
if (!devmem)
return ERR_PTR(-ENOMEM);
 
@@ -1175,11 +1154,11 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
ret = percpu_ref_init(>ref, _devmem_ref_release,
  0, GFP_KERNEL);
if (ret)
-   goto error_percpu_ref;
+   return ERR_PTR(ret);
 
-   ret = devm_add_action(device, hmm_devmem_ref_exit, >ref);
+   ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit, 
>ref);
if (ret)
-   goto error_devm_add_action;
+   return ERR_PTR(ret);
 
size = ALIGN(size, PA_SECTION_SIZE);
addr = min((unsigned long)iomem_resource.end,
@@ -1199,16 +1178,12 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
 
devmem->resource = devm_request_mem_region(device, addr, size,
 

[PATCH v4 7/8] mm, hmm: Mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL

2018-07-10 Thread Dan Williams
The routines hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
devm_memremap_pages() and are now simple now wrappers around the core
facility to inject a dev_pagemap instance into the global pgmap_radix
and hook page-idle events. The devm_memremap_pages() interface is base
infrastructure for HMM. HMM has more and deeper ties into the kernel
memory management implementation than base ZONE_DEVICE which is itself a
EXPORT_SYMBOL_GPL facility.

Originally, the HMM page structure creation routines copied the
devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify
the implementations was discussed during the initial review:
http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html
Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA
facility enabled this cleanup to move forward.

In addition to the integration with devm_memremap_pages() HMM depends on
other GPL-only symbols:

mmu_notifier_unregister_no_release
percpu_ref
region_intersects
__class_create

It goes further to consume / indirectly expose functionality that is not
exported to any other driver:

alloc_pages_vma
walk_page_range

HMM depends upon and extends deep core-kernel fundamentals. Mark its
main entry points EXPORT_SYMBOL_GPL().

Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 mm/hmm.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index c5a6e61ee302..0af3220ffcba 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -1055,7 +1055,7 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
return result;
return devmem;
 }
-EXPORT_SYMBOL(hmm_devmem_add);
+EXPORT_SYMBOL_GPL(hmm_devmem_add);
 
 struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
   struct device *device,
@@ -1109,7 +1109,7 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct 
hmm_devmem_ops *ops,
return result;
return devmem;
 }
-EXPORT_SYMBOL(hmm_devmem_add_resource);
+EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
 
 /*
  * A device driver that wants to handle multiple devices memory through a



[PATCH v4 6/8] mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages()

2018-07-10 Thread Dan Williams
Commit e8d513483300 "memremap: change devm_memremap_pages interface to
use struct dev_pagemap" refactored devm_memremap_pages() to allow a
dev_pagemap instance to be supplied. Passing in a dev_pagemap interface
simplifies the design of pgmap type drivers in that they can rely on
container_of() to lookup any private data associated with the given
dev_pagemap instance.

In addition to the cleanups this also gives hmm users multi-order-radix
improvements that arrived with commit ab1b597ee0e4 "mm,
devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups"

As part of the conversion to the devm_memremap_pages() method of
handling the percpu_ref relative to when pages are put, the percpu_ref
completion needs to move to hmm_devmem_ref_exit(). See commit
71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page...")
for details.

Reviewed-by: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 mm/hmm.c |  197 --
 1 file changed, 26 insertions(+), 171 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index d65a9419dbc2..c5a6e61ee302 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -933,17 +933,16 @@ static void hmm_devmem_ref_exit(void *data)
struct hmm_devmem *devmem;
 
devmem = container_of(ref, struct hmm_devmem, ref);
+   wait_for_completion(>completion);
percpu_ref_exit(ref);
 }
 
-static void hmm_devmem_ref_kill(void *data)
+static void hmm_devmem_ref_kill(struct percpu_ref *ref)
 {
-   struct percpu_ref *ref = data;
struct hmm_devmem *devmem;
 
devmem = container_of(ref, struct hmm_devmem, ref);
percpu_ref_kill(ref);
-   wait_for_completion(>completion);
 }
 
 static int hmm_devmem_fault(struct vm_area_struct *vma,
@@ -964,155 +963,6 @@ static void hmm_devmem_free(struct page *page, void *data)
devmem->ops->free(devmem, page);
 }
 
-static DEFINE_MUTEX(hmm_devmem_lock);
-static RADIX_TREE(hmm_devmem_radix, GFP_KERNEL);
-
-static void hmm_devmem_radix_release(struct resource *resource)
-{
-   resource_size_t key, align_start, align_size;
-
-   align_start = resource->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(resource_size(resource), PA_SECTION_SIZE);
-
-   mutex_lock(_devmem_lock);
-   for (key = resource->start;
-key <= resource->end;
-key += PA_SECTION_SIZE)
-   radix_tree_delete(_devmem_radix, key >> PA_SECTION_SHIFT);
-   mutex_unlock(_devmem_lock);
-}
-
-static void hmm_devmem_release(void *data)
-{
-   struct hmm_devmem *devmem = data;
-   struct resource *resource = devmem->resource;
-   unsigned long start_pfn, npages;
-   struct zone *zone;
-   struct page *page;
-
-   /* pages are dead and unused, undo the arch mapping */
-   start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT;
-   npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT;
-
-   page = pfn_to_page(start_pfn);
-   zone = page_zone(page);
-
-   mem_hotplug_begin();
-   if (resource->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY)
-   __remove_pages(zone, start_pfn, npages, NULL);
-   else
-   arch_remove_memory(start_pfn << PAGE_SHIFT,
-  npages << PAGE_SHIFT, NULL);
-   mem_hotplug_done();
-
-   hmm_devmem_radix_release(resource);
-}
-
-static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
-{
-   resource_size_t key, align_start, align_size, align_end;
-   struct device *device = devmem->device;
-   int ret, nid, is_ram;
-   unsigned long pfn;
-
-   align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(devmem->resource->start +
-  resource_size(devmem->resource),
-  PA_SECTION_SIZE) - align_start;
-
-   is_ram = region_intersects(align_start, align_size,
-  IORESOURCE_SYSTEM_RAM,
-  IORES_DESC_NONE);
-   if (is_ram == REGION_MIXED) {
-   WARN_ONCE(1, "%s attempted on mixed region %pr\n",
-   __func__, devmem->resource);
-   return -ENXIO;
-   }
-   if (is_ram == REGION_INTERSECTS)
-   return -ENXIO;
-
-   if (devmem->resource->desc == IORES_DESC_DEVICE_PUBLIC_MEMORY)
-   devmem->pagemap.type = MEMORY_DEVICE_PUBLIC;
-   else
-   devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
-
-   devmem->pagemap.res = *devmem->resource;
-   devmem->pagemap.page_fault = hmm_devmem_fault;
-   devmem->pagemap.page_free = hmm_devmem_free;
-   devmem->pagemap.dev = devmem->device;
-   devmem->pagemap.ref = >ref;
-   devmem->pagemap.data = devmem;
-
-   mutex_lock(_devmem_lock);
-   align_end = align_start + align_size - 1;
-   for (key = align_start; 

[PATCH v4 1/8] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL

2018-07-10 Thread Dan Williams
The devm_memremap_pages() facility is tightly integrated with the
kernel's memory hotplug functionality. It injects an altmap argument
deep into the architecture specific vmemmap implementation to allow
allocating from specific reserved pages, and it has Linux specific
assumptions about page structure reference counting relative to
get_user_pages() and get_user_pages_fast(). It was an oversight that
this was not marked EXPORT_SYMBOL_GPL from the outset.

It exposes and relies upon core kernel internal assumptions and will
continue to evolve as memory hotplug and support for new memory types
and topologies is required. Only an in kernel GPL-only driver is
expected to keep up with this ongoing evolution.

Cc: Michal Hocko 
Cc: "Jérôme Glisse" 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 5857267a4af5..4478e4688bb7 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -257,7 +257,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
pgmap_radix_release(res, pgoff);
return ERR_PTR(error);
 }
-EXPORT_SYMBOL(devm_memremap_pages);
+EXPORT_SYMBOL_GPL(devm_memremap_pages);
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 {



[PATCH v4 5/8] mm, hmm: Use devm semantics for hmm_devmem_{add, remove}

2018-07-10 Thread Dan Williams
devm semantics arrange for resources to be torn down when
device-driver-probe fails or when device-driver-release completes.
Similar to devm_memremap_pages() there is no need to support an explicit
remove operation when the users properly adhere to devm semantics.

Note that devm_kzalloc() automatically handles allocating node-local
memory.

Reviewed-by: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 include/linux/hmm.h |4 --
 mm/hmm.c|  127 ++-
 2 files changed, 25 insertions(+), 106 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 4c92e3ba3e16..5ec8635f602c 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -499,8 +499,7 @@ struct hmm_devmem {
  * enough and allocate struct page for it.
  *
  * The device driver can wrap the hmm_devmem struct inside a private device
- * driver struct. The device driver must call hmm_devmem_remove() before the
- * device goes away and before freeing the hmm_devmem struct memory.
+ * driver struct.
  */
 struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
  struct device *device,
@@ -508,7 +507,6 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
 struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
   struct device *device,
   struct resource *res);
-void hmm_devmem_remove(struct hmm_devmem *devmem);
 
 /*
  * hmm_devmem_page_set_drvdata - set per-page driver data field
diff --git a/mm/hmm.c b/mm/hmm.c
index de7b6bf77201..d65a9419dbc2 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -934,7 +934,6 @@ static void hmm_devmem_ref_exit(void *data)
 
devmem = container_of(ref, struct hmm_devmem, ref);
percpu_ref_exit(ref);
-   devm_remove_action(devmem->device, _devmem_ref_exit, data);
 }
 
 static void hmm_devmem_ref_kill(void *data)
@@ -945,7 +944,6 @@ static void hmm_devmem_ref_kill(void *data)
devmem = container_of(ref, struct hmm_devmem, ref);
percpu_ref_kill(ref);
wait_for_completion(>completion);
-   devm_remove_action(devmem->device, _devmem_ref_kill, data);
 }
 
 static int hmm_devmem_fault(struct vm_area_struct *vma,
@@ -984,7 +982,7 @@ static void hmm_devmem_radix_release(struct resource 
*resource)
mutex_unlock(_devmem_lock);
 }
 
-static void hmm_devmem_release(struct device *dev, void *data)
+static void hmm_devmem_release(void *data)
 {
struct hmm_devmem *devmem = data;
struct resource *resource = devmem->resource;
@@ -992,11 +990,6 @@ static void hmm_devmem_release(struct device *dev, void 
*data)
struct zone *zone;
struct page *page;
 
-   if (percpu_ref_tryget_live(>ref)) {
-   dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
-   percpu_ref_put(>ref);
-   }
-
/* pages are dead and unused, undo the arch mapping */
start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT;
npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT;
@@ -1120,19 +1113,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem 
*devmem)
return ret;
 }
 
-static int hmm_devmem_match(struct device *dev, void *data, void *match_data)
-{
-   struct hmm_devmem *devmem = data;
-
-   return devmem->resource == match_data;
-}
-
-static void hmm_devmem_pages_remove(struct hmm_devmem *devmem)
-{
-   devres_release(devmem->device, _devmem_release,
-  _devmem_match, devmem->resource);
-}
-
 /*
  * hmm_devmem_add() - hotplug ZONE_DEVICE memory for device memory
  *
@@ -1160,8 +1140,7 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
 
dev_pagemap_get_ops();
 
-   devmem = devres_alloc_node(_devmem_release, sizeof(*devmem),
-  GFP_KERNEL, dev_to_node(device));
+   devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL);
if (!devmem)
return ERR_PTR(-ENOMEM);
 
@@ -1175,11 +1154,11 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
ret = percpu_ref_init(>ref, _devmem_ref_release,
  0, GFP_KERNEL);
if (ret)
-   goto error_percpu_ref;
+   return ERR_PTR(ret);
 
-   ret = devm_add_action(device, hmm_devmem_ref_exit, >ref);
+   ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit, 
>ref);
if (ret)
-   goto error_devm_add_action;
+   return ERR_PTR(ret);
 
size = ALIGN(size, PA_SECTION_SIZE);
addr = min((unsigned long)iomem_resource.end,
@@ -1199,16 +1178,12 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
 
devmem->resource = devm_request_mem_region(device, addr, size,
 

[PATCH v4 7/8] mm, hmm: Mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL

2018-07-10 Thread Dan Williams
The routines hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
devm_memremap_pages() and are now simple now wrappers around the core
facility to inject a dev_pagemap instance into the global pgmap_radix
and hook page-idle events. The devm_memremap_pages() interface is base
infrastructure for HMM. HMM has more and deeper ties into the kernel
memory management implementation than base ZONE_DEVICE which is itself a
EXPORT_SYMBOL_GPL facility.

Originally, the HMM page structure creation routines copied the
devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify
the implementations was discussed during the initial review:
http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html
Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA
facility enabled this cleanup to move forward.

In addition to the integration with devm_memremap_pages() HMM depends on
other GPL-only symbols:

mmu_notifier_unregister_no_release
percpu_ref
region_intersects
__class_create

It goes further to consume / indirectly expose functionality that is not
exported to any other driver:

alloc_pages_vma
walk_page_range

HMM depends upon and extends deep core-kernel fundamentals. Mark its
main entry points EXPORT_SYMBOL_GPL().

Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 mm/hmm.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index c5a6e61ee302..0af3220ffcba 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -1055,7 +1055,7 @@ struct hmm_devmem *hmm_devmem_add(const struct 
hmm_devmem_ops *ops,
return result;
return devmem;
 }
-EXPORT_SYMBOL(hmm_devmem_add);
+EXPORT_SYMBOL_GPL(hmm_devmem_add);
 
 struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
   struct device *device,
@@ -1109,7 +1109,7 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct 
hmm_devmem_ops *ops,
return result;
return devmem;
 }
-EXPORT_SYMBOL(hmm_devmem_add_resource);
+EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
 
 /*
  * A device driver that wants to handle multiple devices memory through a



[PATCH v4 6/8] mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages()

2018-07-10 Thread Dan Williams
Commit e8d513483300 "memremap: change devm_memremap_pages interface to
use struct dev_pagemap" refactored devm_memremap_pages() to allow a
dev_pagemap instance to be supplied. Passing in a dev_pagemap interface
simplifies the design of pgmap type drivers in that they can rely on
container_of() to lookup any private data associated with the given
dev_pagemap instance.

In addition to the cleanups this also gives hmm users multi-order-radix
improvements that arrived with commit ab1b597ee0e4 "mm,
devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups"

As part of the conversion to the devm_memremap_pages() method of
handling the percpu_ref relative to when pages are put, the percpu_ref
completion needs to move to hmm_devmem_ref_exit(). See commit
71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page...")
for details.

Reviewed-by: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 mm/hmm.c |  197 --
 1 file changed, 26 insertions(+), 171 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index d65a9419dbc2..c5a6e61ee302 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -933,17 +933,16 @@ static void hmm_devmem_ref_exit(void *data)
struct hmm_devmem *devmem;
 
devmem = container_of(ref, struct hmm_devmem, ref);
+   wait_for_completion(>completion);
percpu_ref_exit(ref);
 }
 
-static void hmm_devmem_ref_kill(void *data)
+static void hmm_devmem_ref_kill(struct percpu_ref *ref)
 {
-   struct percpu_ref *ref = data;
struct hmm_devmem *devmem;
 
devmem = container_of(ref, struct hmm_devmem, ref);
percpu_ref_kill(ref);
-   wait_for_completion(>completion);
 }
 
 static int hmm_devmem_fault(struct vm_area_struct *vma,
@@ -964,155 +963,6 @@ static void hmm_devmem_free(struct page *page, void *data)
devmem->ops->free(devmem, page);
 }
 
-static DEFINE_MUTEX(hmm_devmem_lock);
-static RADIX_TREE(hmm_devmem_radix, GFP_KERNEL);
-
-static void hmm_devmem_radix_release(struct resource *resource)
-{
-   resource_size_t key, align_start, align_size;
-
-   align_start = resource->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(resource_size(resource), PA_SECTION_SIZE);
-
-   mutex_lock(_devmem_lock);
-   for (key = resource->start;
-key <= resource->end;
-key += PA_SECTION_SIZE)
-   radix_tree_delete(_devmem_radix, key >> PA_SECTION_SHIFT);
-   mutex_unlock(_devmem_lock);
-}
-
-static void hmm_devmem_release(void *data)
-{
-   struct hmm_devmem *devmem = data;
-   struct resource *resource = devmem->resource;
-   unsigned long start_pfn, npages;
-   struct zone *zone;
-   struct page *page;
-
-   /* pages are dead and unused, undo the arch mapping */
-   start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT;
-   npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT;
-
-   page = pfn_to_page(start_pfn);
-   zone = page_zone(page);
-
-   mem_hotplug_begin();
-   if (resource->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY)
-   __remove_pages(zone, start_pfn, npages, NULL);
-   else
-   arch_remove_memory(start_pfn << PAGE_SHIFT,
-  npages << PAGE_SHIFT, NULL);
-   mem_hotplug_done();
-
-   hmm_devmem_radix_release(resource);
-}
-
-static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
-{
-   resource_size_t key, align_start, align_size, align_end;
-   struct device *device = devmem->device;
-   int ret, nid, is_ram;
-   unsigned long pfn;
-
-   align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(devmem->resource->start +
-  resource_size(devmem->resource),
-  PA_SECTION_SIZE) - align_start;
-
-   is_ram = region_intersects(align_start, align_size,
-  IORESOURCE_SYSTEM_RAM,
-  IORES_DESC_NONE);
-   if (is_ram == REGION_MIXED) {
-   WARN_ONCE(1, "%s attempted on mixed region %pr\n",
-   __func__, devmem->resource);
-   return -ENXIO;
-   }
-   if (is_ram == REGION_INTERSECTS)
-   return -ENXIO;
-
-   if (devmem->resource->desc == IORES_DESC_DEVICE_PUBLIC_MEMORY)
-   devmem->pagemap.type = MEMORY_DEVICE_PUBLIC;
-   else
-   devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
-
-   devmem->pagemap.res = *devmem->resource;
-   devmem->pagemap.page_fault = hmm_devmem_fault;
-   devmem->pagemap.page_free = hmm_devmem_free;
-   devmem->pagemap.dev = devmem->device;
-   devmem->pagemap.ref = >ref;
-   devmem->pagemap.data = devmem;
-
-   mutex_lock(_devmem_lock);
-   align_end = align_start + align_size - 1;
-   for (key = align_start; 

[PATCH v4 3/8] mm, devm_memremap_pages: Fix shutdown handling

2018-07-10 Thread Dan Williams
The last step before devm_memremap_pages() returns success is to
allocate a release action, devm_memremap_pages_release(), to tear the
entire setup down. However, the result from devm_add_action() is not
checked.

Checking the error from devm_add_action() is not enough. The api
currently relies on the fact that the percpu_ref it is using is killed
by the time the devm_memremap_pages_release() is run. Rather than
continue this awkward situation, offload the responsibility of killing
the percpu_ref to devm_memremap_pages_release() directly. This allows
devm_memremap_pages() to do the right thing  relative to init failures
and shutdown.

Without this change we could fail to register the teardown of
devm_memremap_pages(). The likelihood of hitting this failure is tiny as
small memory allocations almost always succeed. However, the impact of
the failure is large given any future reconfiguration, or
disable/enable, of an nvdimm namespace will fail forever as subsequent
calls to devm_memremap_pages() will fail to setup the pgmap_radix since
there will be stale entries for the physical address range.

An argument could be made to require that the ->kill() operation be set
in the @pgmap arg rather than passed in separately. However, it helps
code readability, tracking the lifetime of a given instance, to be able
to grep the kill routine directly at the devm_memremap_pages() call
site.

Cc: 
Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Cc: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Reported-by: Logan Gunthorpe 
Reviewed-by: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 drivers/dax/pmem.c|   10 ++
 drivers/nvdimm/pmem.c |   18 --
 include/linux/memremap.h  |7 +--
 kernel/memremap.c |   36 +++-
 tools/testing/nvdimm/test/iomap.c |   21 ++---
 5 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index fd49b24fd6af..54cba20c8ba6 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -48,9 +48,8 @@ static void dax_pmem_percpu_exit(void *data)
percpu_ref_exit(ref);
 }
 
-static void dax_pmem_percpu_kill(void *data)
+static void dax_pmem_percpu_kill(struct percpu_ref *ref)
 {
-   struct percpu_ref *ref = data;
struct dax_pmem *dax_pmem = to_dax_pmem(ref);
 
dev_dbg(dax_pmem->dev, "trace\n");
@@ -111,15 +110,10 @@ static int dax_pmem_probe(struct device *dev)
return rc;
 
dax_pmem->pgmap.ref = _pmem->ref;
-   addr = devm_memremap_pages(dev, _pmem->pgmap);
+   addr = devm_memremap_pages(dev, _pmem->pgmap, dax_pmem_percpu_kill);
if (IS_ERR(addr))
return PTR_ERR(addr);
 
-   rc = devm_add_action_or_reset(dev, dax_pmem_percpu_kill,
-   _pmem->ref);
-   if (rc)
-   return rc;
-
/* adjust the dax_region resource to the start of data */
memcpy(, _pmem->pgmap.res, sizeof(res));
res.start += le64_to_cpu(pfn_sb->dataoff);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 68940356cad3..e8ac6f244d2b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -281,8 +281,11 @@ static void pmem_release_queue(void *q)
blk_cleanup_queue(q);
 }
 
-static void pmem_freeze_queue(void *q)
+static void pmem_freeze_queue(struct percpu_ref *ref)
 {
+   struct request_queue *q;
+
+   q = container_of(ref, typeof(*q), q_usage_counter);
blk_freeze_queue_start(q);
 }
 
@@ -377,7 +380,8 @@ static int pmem_attach_disk(struct device *dev,
if (is_nd_pfn(dev)) {
if (setup_pagemap_fsdax(dev, >pgmap))
return -ENOMEM;
-   addr = devm_memremap_pages(dev, >pgmap);
+   addr = devm_memremap_pages(dev, >pgmap,
+   pmem_freeze_queue);
pfn_sb = nd_pfn->pfn_sb;
pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
pmem->pfn_pad = resource_size(res) -
@@ -390,20 +394,14 @@ static int pmem_attach_disk(struct device *dev,
pmem->pgmap.altmap_valid = false;
if (setup_pagemap_fsdax(dev, >pgmap))
return -ENOMEM;
-   addr = devm_memremap_pages(dev, >pgmap);
+   addr = devm_memremap_pages(dev, >pgmap,
+   pmem_freeze_queue);
pmem->pfn_flags |= PFN_MAP;
memcpy(_res, >pgmap.res, sizeof(bb_res));
} else
addr = devm_memremap(dev, pmem->phys_addr,
pmem->size, ARCH_MEMREMAP_PMEM);
 
-   /*
-* At release time the queue must be frozen before
-* devm_memremap_pages is unwound
-*/
-   if (devm_add_action_or_reset(dev, pmem_freeze_queue, q))
-   return -ENOMEM;
-

[PATCH v4 4/8] mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support

2018-07-10 Thread Dan Williams
In preparation for consolidating all ZONE_DEVICE enabling via
devm_memremap_pages(), teach it how to handle the constraints of
MEMORY_DEVICE_PRIVATE ranges.

Cc: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Reported-by: Logan Gunthorpe 
Reviewed-by: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |   38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 92b8d7057321..16141b608b63 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -131,8 +131,13 @@ static void devm_memremap_pages_release(void *data)
- align_start;
 
mem_hotplug_begin();
-   arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
-   >altmap : NULL);
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+   pfn = align_start >> PAGE_SHIFT;
+   __remove_pages(page_zone(pfn_to_page(pfn)), pfn,
+   align_size >> PAGE_SHIFT, NULL);
+   } else
+   arch_remove_memory(align_start, align_size,
+   pgmap->altmap_valid ? >altmap : NULL);
mem_hotplug_done();
 
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
@@ -216,11 +221,32 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap,
goto err_pfn_remap;
 
mem_hotplug_begin();
-   error = arch_add_memory(nid, align_start, align_size, altmap, false);
-   if (!error)
-   move_pfn_range_to_zone(_DATA(nid)->node_zones[ZONE_DEVICE],
-   align_start >> PAGE_SHIFT,
+
+   /*
+* For device private memory we call add_pages() as we only need to
+* allocate and initialize struct page for the device memory. More-
+* over the device memory is un-accessible thus we do not want to
+* create a linear mapping for the memory like arch_add_memory()
+* would do.
+*
+* For all other device memory types, which are accessible by
+* the CPU, we do want the linear mapping and thus use
+* arch_add_memory().
+*/
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+   error = add_pages(nid, align_start >> PAGE_SHIFT,
+   align_size >> PAGE_SHIFT, NULL, false);
+   } else {
+   struct zone *zone;
+
+   error = arch_add_memory(nid, align_start, align_size, altmap,
+   false);
+   zone = _DATA(nid)->node_zones[ZONE_DEVICE];
+   if (!error)
+   move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
align_size >> PAGE_SHIFT, altmap);
+   }
+
mem_hotplug_done();
if (error)
goto err_add_memory;



[PATCH v4 3/8] mm, devm_memremap_pages: Fix shutdown handling

2018-07-10 Thread Dan Williams
The last step before devm_memremap_pages() returns success is to
allocate a release action, devm_memremap_pages_release(), to tear the
entire setup down. However, the result from devm_add_action() is not
checked.

Checking the error from devm_add_action() is not enough. The api
currently relies on the fact that the percpu_ref it is using is killed
by the time the devm_memremap_pages_release() is run. Rather than
continue this awkward situation, offload the responsibility of killing
the percpu_ref to devm_memremap_pages_release() directly. This allows
devm_memremap_pages() to do the right thing  relative to init failures
and shutdown.

Without this change we could fail to register the teardown of
devm_memremap_pages(). The likelihood of hitting this failure is tiny as
small memory allocations almost always succeed. However, the impact of
the failure is large given any future reconfiguration, or
disable/enable, of an nvdimm namespace will fail forever as subsequent
calls to devm_memremap_pages() will fail to setup the pgmap_radix since
there will be stale entries for the physical address range.

An argument could be made to require that the ->kill() operation be set
in the @pgmap arg rather than passed in separately. However, it helps
code readability, tracking the lifetime of a given instance, to be able
to grep the kill routine directly at the devm_memremap_pages() call
site.

Cc: 
Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Cc: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Reported-by: Logan Gunthorpe 
Reviewed-by: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 drivers/dax/pmem.c|   10 ++
 drivers/nvdimm/pmem.c |   18 --
 include/linux/memremap.h  |7 +--
 kernel/memremap.c |   36 +++-
 tools/testing/nvdimm/test/iomap.c |   21 ++---
 5 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index fd49b24fd6af..54cba20c8ba6 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -48,9 +48,8 @@ static void dax_pmem_percpu_exit(void *data)
percpu_ref_exit(ref);
 }
 
-static void dax_pmem_percpu_kill(void *data)
+static void dax_pmem_percpu_kill(struct percpu_ref *ref)
 {
-   struct percpu_ref *ref = data;
struct dax_pmem *dax_pmem = to_dax_pmem(ref);
 
dev_dbg(dax_pmem->dev, "trace\n");
@@ -111,15 +110,10 @@ static int dax_pmem_probe(struct device *dev)
return rc;
 
dax_pmem->pgmap.ref = _pmem->ref;
-   addr = devm_memremap_pages(dev, _pmem->pgmap);
+   addr = devm_memremap_pages(dev, _pmem->pgmap, dax_pmem_percpu_kill);
if (IS_ERR(addr))
return PTR_ERR(addr);
 
-   rc = devm_add_action_or_reset(dev, dax_pmem_percpu_kill,
-   _pmem->ref);
-   if (rc)
-   return rc;
-
/* adjust the dax_region resource to the start of data */
memcpy(, _pmem->pgmap.res, sizeof(res));
res.start += le64_to_cpu(pfn_sb->dataoff);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 68940356cad3..e8ac6f244d2b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -281,8 +281,11 @@ static void pmem_release_queue(void *q)
blk_cleanup_queue(q);
 }
 
-static void pmem_freeze_queue(void *q)
+static void pmem_freeze_queue(struct percpu_ref *ref)
 {
+   struct request_queue *q;
+
+   q = container_of(ref, typeof(*q), q_usage_counter);
blk_freeze_queue_start(q);
 }
 
@@ -377,7 +380,8 @@ static int pmem_attach_disk(struct device *dev,
if (is_nd_pfn(dev)) {
if (setup_pagemap_fsdax(dev, >pgmap))
return -ENOMEM;
-   addr = devm_memremap_pages(dev, >pgmap);
+   addr = devm_memremap_pages(dev, >pgmap,
+   pmem_freeze_queue);
pfn_sb = nd_pfn->pfn_sb;
pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
pmem->pfn_pad = resource_size(res) -
@@ -390,20 +394,14 @@ static int pmem_attach_disk(struct device *dev,
pmem->pgmap.altmap_valid = false;
if (setup_pagemap_fsdax(dev, >pgmap))
return -ENOMEM;
-   addr = devm_memremap_pages(dev, >pgmap);
+   addr = devm_memremap_pages(dev, >pgmap,
+   pmem_freeze_queue);
pmem->pfn_flags |= PFN_MAP;
memcpy(_res, >pgmap.res, sizeof(bb_res));
} else
addr = devm_memremap(dev, pmem->phys_addr,
pmem->size, ARCH_MEMREMAP_PMEM);
 
-   /*
-* At release time the queue must be frozen before
-* devm_memremap_pages is unwound
-*/
-   if (devm_add_action_or_reset(dev, pmem_freeze_queue, q))
-   return -ENOMEM;
-

[PATCH v4 4/8] mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support

2018-07-10 Thread Dan Williams
In preparation for consolidating all ZONE_DEVICE enabling via
devm_memremap_pages(), teach it how to handle the constraints of
MEMORY_DEVICE_PRIVATE ranges.

Cc: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Reported-by: Logan Gunthorpe 
Reviewed-by: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |   38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 92b8d7057321..16141b608b63 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -131,8 +131,13 @@ static void devm_memremap_pages_release(void *data)
- align_start;
 
mem_hotplug_begin();
-   arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
-   >altmap : NULL);
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+   pfn = align_start >> PAGE_SHIFT;
+   __remove_pages(page_zone(pfn_to_page(pfn)), pfn,
+   align_size >> PAGE_SHIFT, NULL);
+   } else
+   arch_remove_memory(align_start, align_size,
+   pgmap->altmap_valid ? >altmap : NULL);
mem_hotplug_done();
 
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
@@ -216,11 +221,32 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap,
goto err_pfn_remap;
 
mem_hotplug_begin();
-   error = arch_add_memory(nid, align_start, align_size, altmap, false);
-   if (!error)
-   move_pfn_range_to_zone(_DATA(nid)->node_zones[ZONE_DEVICE],
-   align_start >> PAGE_SHIFT,
+
+   /*
+* For device private memory we call add_pages() as we only need to
+* allocate and initialize struct page for the device memory. More-
+* over the device memory is un-accessible thus we do not want to
+* create a linear mapping for the memory like arch_add_memory()
+* would do.
+*
+* For all other device memory types, which are accessible by
+* the CPU, we do want the linear mapping and thus use
+* arch_add_memory().
+*/
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+   error = add_pages(nid, align_start >> PAGE_SHIFT,
+   align_size >> PAGE_SHIFT, NULL, false);
+   } else {
+   struct zone *zone;
+
+   error = arch_add_memory(nid, align_start, align_size, altmap,
+   false);
+   zone = _DATA(nid)->node_zones[ZONE_DEVICE];
+   if (!error)
+   move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
align_size >> PAGE_SHIFT, altmap);
+   }
+
mem_hotplug_done();
if (error)
goto err_add_memory;



[PATCH v4 2/8] mm, devm_memremap_pages: Kill mapping "System RAM" support

2018-07-10 Thread Dan Williams
Given the fact that devm_memremap_pages() requires a percpu_ref that is
torn down by devm_memremap_pages_release() the current support for
mapping RAM is broken.

Support for remapping "System RAM" has been broken since the beginning
and there is no existing user of this this code path, so just kill the
support and make it an explicit error.

This cleanup also simplifies a follow-on patch to fix the error path
when setting a devm release action for devm_memremap_pages_release()
fails.

Cc: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4478e4688bb7..2d2c901cbe23 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -183,15 +183,12 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
is_ram = region_intersects(align_start, align_size,
IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
-   if (is_ram == REGION_MIXED) {
-   WARN_ONCE(1, "%s attempted on mixed region %pr\n",
-   __func__, res);
+   if (is_ram != REGION_DISJOINT) {
+   WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
+   is_ram == REGION_MIXED ? "mixed" : "ram", res);
return ERR_PTR(-ENXIO);
}
 
-   if (is_ram == REGION_INTERSECTS)
-   return __va(res->start);
-
if (!pgmap->ref)
return ERR_PTR(-EINVAL);
 



[PATCH v4 2/8] mm, devm_memremap_pages: Kill mapping "System RAM" support

2018-07-10 Thread Dan Williams
Given the fact that devm_memremap_pages() requires a percpu_ref that is
torn down by devm_memremap_pages_release() the current support for
mapping RAM is broken.

Support for remapping "System RAM" has been broken since the beginning
and there is no existing user of this this code path, so just kill the
support and make it an explicit error.

This cleanup also simplifies a follow-on patch to fix the error path
when setting a devm release action for devm_memremap_pages_release()
fails.

Cc: Christoph Hellwig 
Cc: "Jérôme Glisse" 
Cc: Logan Gunthorpe 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4478e4688bb7..2d2c901cbe23 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -183,15 +183,12 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
is_ram = region_intersects(align_start, align_size,
IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
-   if (is_ram == REGION_MIXED) {
-   WARN_ONCE(1, "%s attempted on mixed region %pr\n",
-   __func__, res);
+   if (is_ram != REGION_DISJOINT) {
+   WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
+   is_ram == REGION_MIXED ? "mixed" : "ram", res);
return ERR_PTR(-ENXIO);
}
 
-   if (is_ram == REGION_INTERSECTS)
-   return __va(res->start);
-
if (!pgmap->ref)
return ERR_PTR(-EINVAL);
 



[PATCH v4 0/8] mm: Rework hmm to use devm_memremap_pages and other fixes

2018-07-10 Thread Dan Williams
Changes since v3 [1]:
* Collect Logan's reviewed-by on patch 3
* Collect John's and Joe's tested-by on patch 8
* Update the changelog for patch 1 and 7 to better explain the
  EXPORT_SYMBOL_GPL rationale.
* Update the changelog for patch 2 to clarify that it is a cleanup to
  make the following patch-3 fix easier

[1]: https://lkml.org/lkml/2018/6/19/108

---

Hi Andrew,

As requested, here is a resend of the devm_memremap_pages() fixups.
Please consider for 4.18.

---

As ZONE_DEVICE continues to attract new users, it is imperative to keep
all users consolidated on devm_memremap_pages() as the interface for
create "device pages".

The devm_memremap_pages() implementation was recently reworked to make
it more generic for arbitrary users, like the proposed peer-to-peer
PCI-E enabling. HMM pre-dated this rework and opted to duplicate
devm_memremap_pages() as hmm_devmem_pages_create().

Rework hmm to be a consumer of devm_memremap_pages() directly and fix up
the licensing on the exports given the deep dependencies and exposure of
core mm internals.

With the exports of devm_memremap_pages() and hmm fixed up we can fix
the regression of inadvertently making put_page() have EXPORT_SYMBOL_GPL
dependencies, which breaks consumers like OpenAFS.

The series was tested against v4.18-rc2.

---

Dan Williams (8):
  mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
  mm, devm_memremap_pages: Kill mapping "System RAM" support
  mm, devm_memremap_pages: Fix shutdown handling
  mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support
  mm, hmm: Use devm semantics for hmm_devmem_{add,remove}
  mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages()
  mm, hmm: Mark hmm_devmem_{add,add_resource} EXPORT_SYMBOL_GPL
  mm: Fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL


 drivers/dax/pmem.c|   10 -
 drivers/nvdimm/pmem.c |   18 +-
 include/linux/hmm.h   |4 
 include/linux/memremap.h  |7 +
 kernel/memremap.c |   89 +++
 mm/hmm.c  |  306 +
 tools/testing/nvdimm/test/iomap.c |   21 ++-
 7 files changed, 132 insertions(+), 323 deletions(-)


[PATCH v4 0/8] mm: Rework hmm to use devm_memremap_pages and other fixes

2018-07-10 Thread Dan Williams
Changes since v3 [1]:
* Collect Logan's reviewed-by on patch 3
* Collect John's and Joe's tested-by on patch 8
* Update the changelog for patch 1 and 7 to better explain the
  EXPORT_SYMBOL_GPL rationale.
* Update the changelog for patch 2 to clarify that it is a cleanup to
  make the following patch-3 fix easier

[1]: https://lkml.org/lkml/2018/6/19/108

---

Hi Andrew,

As requested, here is a resend of the devm_memremap_pages() fixups.
Please consider for 4.18.

---

As ZONE_DEVICE continues to attract new users, it is imperative to keep
all users consolidated on devm_memremap_pages() as the interface for
create "device pages".

The devm_memremap_pages() implementation was recently reworked to make
it more generic for arbitrary users, like the proposed peer-to-peer
PCI-E enabling. HMM pre-dated this rework and opted to duplicate
devm_memremap_pages() as hmm_devmem_pages_create().

Rework hmm to be a consumer of devm_memremap_pages() directly and fix up
the licensing on the exports given the deep dependencies and exposure of
core mm internals.

With the exports of devm_memremap_pages() and hmm fixed up we can fix
the regression of inadvertently making put_page() have EXPORT_SYMBOL_GPL
dependencies, which breaks consumers like OpenAFS.

The series was tested against v4.18-rc2.

---

Dan Williams (8):
  mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
  mm, devm_memremap_pages: Kill mapping "System RAM" support
  mm, devm_memremap_pages: Fix shutdown handling
  mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support
  mm, hmm: Use devm semantics for hmm_devmem_{add,remove}
  mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages()
  mm, hmm: Mark hmm_devmem_{add,add_resource} EXPORT_SYMBOL_GPL
  mm: Fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL


 drivers/dax/pmem.c|   10 -
 drivers/nvdimm/pmem.c |   18 +-
 include/linux/hmm.h   |4 
 include/linux/memremap.h  |7 +
 kernel/memremap.c |   89 +++
 mm/hmm.c  |  306 +
 tools/testing/nvdimm/test/iomap.c |   21 ++-
 7 files changed, 132 insertions(+), 323 deletions(-)


[PATCH] ARM: dts: imx: Add ZII SCU3 ESB

2018-07-10 Thread Andrey Smirnov
Add support for the Zodiac Inflight Innovations i.MX51-base SCU3 Ethernet
Switch Board (ESB)

Cc: Fabio Estevam 
Cc: Nikita Yushchenko 
Cc: Lucas Stach 
Cc: cphe...@gmail.com
Cc: Shawn Guo 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: linux-arm-ker...@lists.infradead.org
Cc: devicet...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Fabio Estevam 
Signed-off-by: Andrey Gusakov 
Signed-off-by: Andrey Smirnov 
---

Shawn:

Similarly to [mezz] this is a spin-off of SCU3 (AFAIU, original
submission incorrectly called it SCU2 ESB) ESB board support
originally found in [v0].

Original submission was done by Andrey Gusakov, but he is too busy on
other projects, so the honors of submitting this were delegated to me.

NOTE: RAVE SP ("zii,rave-sp-esb") node is technically supported by
upstream, but it needs some fixes from [rave-sp-fixes] to work
correctly. If you want me to drop that node until [rave-sp-fixes] is
merged, let me know.

NOTE: This patch is currently generated to be on top of [mezz], but
that can be easily change should it be accepted first

Changes since [v0]:

 - Patch converted to be a standalone file not dependent on any
   ZII-specific .dtsi

 - Added RAVE SP node with all the children that are currently
   supported by upstream

 - Droppped ecspi2 node. That node didn't have any child devices in
   [v0] because none of the chips connected to that bus are supported
   upstream. This node can be added later once anything attached to it
   has upstream drivers.

 - Dropped i2c_gpio. That bus was originally added for RAVE SP related
   prototyping and is unused in actual product.

 - Various newline fixes pointed out in [v0]

 - Most of then nodes should be sorted alphabetically (I might have
   missed some)

 - Collected Reviewed-by from Fabio (Fabio, I assumed you won't mind,
   but let me know if you want me to drop it)

[mezz] lkml.kernel.org/r/20180707024902.439-1-andrew.smir...@gmail.com
[v0] 
lkml.kernel.org/r/1529603100-31958-3-git-send-email-andrey.gusa...@cogentembedded.com
[rave-sp-fixes] 
lkml.kernel.org/r/20180707024108.32373-1-andrew.smir...@gmail.com

 arch/arm/boot/dts/Makefile   |   3 +-
 arch/arm/boot/dts/imx51-zii-scu3-esb.dts | 459 +++
 2 files changed, 461 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/imx51-zii-scu3-esb.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 1d6acbab7062..bea41b129493 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -359,7 +359,8 @@ dtb-$(CONFIG_SOC_IMX51) += \
imx51-eukrea-mbimxsd51-baseboard.dtb \
imx51-ts4800.dtb \
imx51-zii-rdu1.dtb \
-   imx51-zii-scu2-mezz.dtb
+   imx51-zii-scu2-mezz.dtb \
+   imx51-zii-scu3-esb.dtb
 dtb-$(CONFIG_SOC_IMX53) += \
imx53-ard.dtb \
imx53-cx9020.dtb \
diff --git a/arch/arm/boot/dts/imx51-zii-scu3-esb.dts 
b/arch/arm/boot/dts/imx51-zii-scu3-esb.dts
new file mode 100644
index ..2941a92d40f1
--- /dev/null
+++ b/arch/arm/boot/dts/imx51-zii-scu3-esb.dts
@@ -0,0 +1,459 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+
+/*
+ * Copyright (C) 2018 Zodiac Inflight Innovations
+ */
+
+/dts-v1/;
+
+#include "imx51.dtsi"
+
+/ {
+   model = "ZII SCU3 ESB board";
+   compatible = "zii,imx51-scu3-esb", "fsl,imx51";
+
+   chosen {
+   stdout-path = 
+   };
+
+   /* Will be filled by the bootloader */
+   memory@9000 {
+   reg = <0x9000 0>;
+   };
+
+   usb_vbus: regulator-usb-vbus {
+   compatible = "regulator-fixed";
+   regulator-name = "usb_vbus";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_usb_mmc_reset>;
+   gpio = < 19 GPIO_ACTIVE_LOW>;
+   startup-delay-us = <15>;
+   };
+};
+
+ {
+   cpu-supply = <_reg>;
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_ecspi1>;
+   cs-gpios = < 24 GPIO_ACTIVE_HIGH>,
+  < 25 GPIO_ACTIVE_LOW>;
+   status = "okay";
+
+   pmic@0 {
+   compatible = "fsl,mc13892";
+   pinctrl-names = "default";
+   pinctrl-0 = <_pmic>;
+   spi-max-frequency = <600>;
+   spi-cs-high;
+   reg = <0>;
+   interrupt-parent = <>;
+   interrupts = <8 IRQ_TYPE_LEVEL_HIGH>;
+   fsl,mc13xxx-uses-adc;
+
+   regulators {
+   sw1_reg: sw1 {
+   regulator-min-microvolt = <60>;
+   regulator-max-microvolt = <1375000>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw2_reg: sw2 {
+   regulator-min-microvolt = 

[PATCH] ARM: dts: imx: Add ZII SCU3 ESB

2018-07-10 Thread Andrey Smirnov
Add support for the Zodiac Inflight Innovations i.MX51-base SCU3 Ethernet
Switch Board (ESB)

Cc: Fabio Estevam 
Cc: Nikita Yushchenko 
Cc: Lucas Stach 
Cc: cphe...@gmail.com
Cc: Shawn Guo 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: linux-arm-ker...@lists.infradead.org
Cc: devicet...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Fabio Estevam 
Signed-off-by: Andrey Gusakov 
Signed-off-by: Andrey Smirnov 
---

Shawn:

Similarly to [mezz] this is a spin-off of SCU3 (AFAIU, original
submission incorrectly called it SCU2 ESB) ESB board support
originally found in [v0].

Original submission was done by Andrey Gusakov, but he is too busy on
other projects, so the honors of submitting this were delegated to me.

NOTE: RAVE SP ("zii,rave-sp-esb") node is technically supported by
upstream, but it needs some fixes from [rave-sp-fixes] to work
correctly. If you want me to drop that node until [rave-sp-fixes] is
merged, let me know.

NOTE: This patch is currently generated to be on top of [mezz], but
that can be easily change should it be accepted first

Changes since [v0]:

 - Patch converted to be a standalone file not dependent on any
   ZII-specific .dtsi

 - Added RAVE SP node with all the children that are currently
   supported by upstream

 - Droppped ecspi2 node. That node didn't have any child devices in
   [v0] because none of the chips connected to that bus are supported
   upstream. This node can be added later once anything attached to it
   has upstream drivers.

 - Dropped i2c_gpio. That bus was originally added for RAVE SP related
   prototyping and is unused in actual product.

 - Various newline fixes pointed out in [v0]

 - Most of then nodes should be sorted alphabetically (I might have
   missed some)

 - Collected Reviewed-by from Fabio (Fabio, I assumed you won't mind,
   but let me know if you want me to drop it)

[mezz] lkml.kernel.org/r/20180707024902.439-1-andrew.smir...@gmail.com
[v0] 
lkml.kernel.org/r/1529603100-31958-3-git-send-email-andrey.gusa...@cogentembedded.com
[rave-sp-fixes] 
lkml.kernel.org/r/20180707024108.32373-1-andrew.smir...@gmail.com

 arch/arm/boot/dts/Makefile   |   3 +-
 arch/arm/boot/dts/imx51-zii-scu3-esb.dts | 459 +++
 2 files changed, 461 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/imx51-zii-scu3-esb.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 1d6acbab7062..bea41b129493 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -359,7 +359,8 @@ dtb-$(CONFIG_SOC_IMX51) += \
imx51-eukrea-mbimxsd51-baseboard.dtb \
imx51-ts4800.dtb \
imx51-zii-rdu1.dtb \
-   imx51-zii-scu2-mezz.dtb
+   imx51-zii-scu2-mezz.dtb \
+   imx51-zii-scu3-esb.dtb
 dtb-$(CONFIG_SOC_IMX53) += \
imx53-ard.dtb \
imx53-cx9020.dtb \
diff --git a/arch/arm/boot/dts/imx51-zii-scu3-esb.dts 
b/arch/arm/boot/dts/imx51-zii-scu3-esb.dts
new file mode 100644
index ..2941a92d40f1
--- /dev/null
+++ b/arch/arm/boot/dts/imx51-zii-scu3-esb.dts
@@ -0,0 +1,459 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+
+/*
+ * Copyright (C) 2018 Zodiac Inflight Innovations
+ */
+
+/dts-v1/;
+
+#include "imx51.dtsi"
+
+/ {
+   model = "ZII SCU3 ESB board";
+   compatible = "zii,imx51-scu3-esb", "fsl,imx51";
+
+   chosen {
+   stdout-path = 
+   };
+
+   /* Will be filled by the bootloader */
+   memory@9000 {
+   reg = <0x9000 0>;
+   };
+
+   usb_vbus: regulator-usb-vbus {
+   compatible = "regulator-fixed";
+   regulator-name = "usb_vbus";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_usb_mmc_reset>;
+   gpio = < 19 GPIO_ACTIVE_LOW>;
+   startup-delay-us = <15>;
+   };
+};
+
+ {
+   cpu-supply = <_reg>;
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_ecspi1>;
+   cs-gpios = < 24 GPIO_ACTIVE_HIGH>,
+  < 25 GPIO_ACTIVE_LOW>;
+   status = "okay";
+
+   pmic@0 {
+   compatible = "fsl,mc13892";
+   pinctrl-names = "default";
+   pinctrl-0 = <_pmic>;
+   spi-max-frequency = <600>;
+   spi-cs-high;
+   reg = <0>;
+   interrupt-parent = <>;
+   interrupts = <8 IRQ_TYPE_LEVEL_HIGH>;
+   fsl,mc13xxx-uses-adc;
+
+   regulators {
+   sw1_reg: sw1 {
+   regulator-min-microvolt = <60>;
+   regulator-max-microvolt = <1375000>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sw2_reg: sw2 {
+   regulator-min-microvolt = 

[PATCH v3 1/2] dt-bindings: regulator: add DT bindings for UniPhier regulator

2018-07-10 Thread Kunihiko Hayashi
Add DT bindings for regulators implemented in UniPhier SoCs.

Signed-off-by: Kunihiko Hayashi 
---
 .../bindings/regulator/uniphier-regulator.txt  | 57 ++
 1 file changed, 57 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/regulator/uniphier-regulator.txt

diff --git a/Documentation/devicetree/bindings/regulator/uniphier-regulator.txt 
b/Documentation/devicetree/bindings/regulator/uniphier-regulator.txt
new file mode 100644
index 000..c9919f4
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/uniphier-regulator.txt
@@ -0,0 +1,57 @@
+Socionext UniPhier Regulator Controller
+
+This describes the devicetree bindings for regulator controller implemented
+on Socionext UniPhier SoCs.
+
+USB3 Controller
+---
+
+This regulator controls VBUS and belongs to USB3 glue layer. Before using
+the regulator, it is necessary to control the clocks and resets to enable
+this layer. These clocks and resets should be described in each property.
+
+Required properties:
+- compatible: Should be
+"socionext,uniphier-pro4-usb3-regulator" - for Pro4 SoC
+"socionext,uniphier-pxs2-usb3-regulator" - for PXs2 SoC
+"socionext,uniphier-ld20-usb3-regulator" - for LD20 SoC
+"socionext,uniphier-pxs3-usb3-regulator" - for PXs3 SoC
+- reg: Specifies offset and length of the register set for the device.
+- clocks: A list of phandles to the clock gate for USB3 glue layer.
+   According to the clock-names, appropriate clocks are required.
+- clock-names: Should contain
+"gio", "link" - for Pro4 SoC
+"link"- for others
+- resets: A list of phandles to the reset control for USB3 glue layer.
+   According to the reset-names, appropriate resets are required.
+- reset-names: Should contain
+"gio", "link" - for Pro4 SoC
+"link"- for others
+
+See Documentation/devicetree/bindings/regulator/regulator.txt
+for more details about the regulator properties.
+
+Example:
+
+   usb-glue@65b0 {
+   compatible = "socionext,uniphier-ld20-dwc3-glue",
+"simple-mfd";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 0x65b0 0x400>;
+
+   usb_vbus0: regulators@100 {
+   compatible = "socionext,uniphier-ld20-usb3-regulator";
+   reg = <0x100 0x10>;
+   clock-names = "link";
+   clocks = <_clk 14>;
+   reset-names = "link";
+   resets = <_rst 14>;
+   };
+
+   phy {
+   ...
+   phy-supply = <_vbus0>;
+   };
+   ...
+   };
-- 
2.7.4



[PATCH v3 1/2] dt-bindings: regulator: add DT bindings for UniPhier regulator

2018-07-10 Thread Kunihiko Hayashi
Add DT bindings for regulators implemented in UniPhier SoCs.

Signed-off-by: Kunihiko Hayashi 
---
 .../bindings/regulator/uniphier-regulator.txt  | 57 ++
 1 file changed, 57 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/regulator/uniphier-regulator.txt

diff --git a/Documentation/devicetree/bindings/regulator/uniphier-regulator.txt 
b/Documentation/devicetree/bindings/regulator/uniphier-regulator.txt
new file mode 100644
index 000..c9919f4
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/uniphier-regulator.txt
@@ -0,0 +1,57 @@
+Socionext UniPhier Regulator Controller
+
+This describes the devicetree bindings for regulator controller implemented
+on Socionext UniPhier SoCs.
+
+USB3 Controller
+---
+
+This regulator controls VBUS and belongs to USB3 glue layer. Before using
+the regulator, it is necessary to control the clocks and resets to enable
+this layer. These clocks and resets should be described in each property.
+
+Required properties:
+- compatible: Should be
+"socionext,uniphier-pro4-usb3-regulator" - for Pro4 SoC
+"socionext,uniphier-pxs2-usb3-regulator" - for PXs2 SoC
+"socionext,uniphier-ld20-usb3-regulator" - for LD20 SoC
+"socionext,uniphier-pxs3-usb3-regulator" - for PXs3 SoC
+- reg: Specifies offset and length of the register set for the device.
+- clocks: A list of phandles to the clock gate for USB3 glue layer.
+   According to the clock-names, appropriate clocks are required.
+- clock-names: Should contain
+"gio", "link" - for Pro4 SoC
+"link"- for others
+- resets: A list of phandles to the reset control for USB3 glue layer.
+   According to the reset-names, appropriate resets are required.
+- reset-names: Should contain
+"gio", "link" - for Pro4 SoC
+"link"- for others
+
+See Documentation/devicetree/bindings/regulator/regulator.txt
+for more details about the regulator properties.
+
+Example:
+
+   usb-glue@65b0 {
+   compatible = "socionext,uniphier-ld20-dwc3-glue",
+"simple-mfd";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 0x65b0 0x400>;
+
+   usb_vbus0: regulators@100 {
+   compatible = "socionext,uniphier-ld20-usb3-regulator";
+   reg = <0x100 0x10>;
+   clock-names = "link";
+   clocks = <_clk 14>;
+   reset-names = "link";
+   resets = <_rst 14>;
+   };
+
+   phy {
+   ...
+   phy-supply = <_vbus0>;
+   };
+   ...
+   };
-- 
2.7.4



[PATCH v3 2/2] regulator: uniphier: add regulator driver for UniPhier SoC

2018-07-10 Thread Kunihiko Hayashi
Initial commit to add support for regulators implemented in UniPhier SoCs.
This supports USB VBUS only.

Signed-off-by: Kunihiko Hayashi 
---
 drivers/regulator/Kconfig  |   8 ++
 drivers/regulator/Makefile |   1 +
 drivers/regulator/uniphier-regulator.c | 213 +
 3 files changed, 222 insertions(+)
 create mode 100644 drivers/regulator/uniphier-regulator.c

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 097f617..7f7ad0d 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -932,6 +932,14 @@ config REGULATOR_TWL4030
  This driver supports the voltage regulators provided by
  this family of companion chips.
 
+config REGULATOR_UNIPHIER
+   tristate "UniPhier regulator driver"
+   depends on ARCH_UNIPHIER || COMPILE_TEST
+   depends on OF && MFD_SYSCON
+   default ARCH_UNIPHIER
+   help
+ Support for regulators implemented on Socionext UniPhier SoCs.
+
 config REGULATOR_VCTRL
tristate "Voltage controlled regulators"
depends on OF
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 590674f..c0dd281 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -116,6 +116,7 @@ obj-$(CONFIG_REGULATOR_TPS65912) += tps65912-regulator.o
 obj-$(CONFIG_REGULATOR_TPS80031) += tps80031-regulator.o
 obj-$(CONFIG_REGULATOR_TPS65132) += tps65132-regulator.o
 obj-$(CONFIG_REGULATOR_TWL4030) += twl-regulator.o twl6030-regulator.o
+obj-$(CONFIG_REGULATOR_UNIPHIER) += uniphier-regulator.o
 obj-$(CONFIG_REGULATOR_VCTRL) += vctrl-regulator.o
 obj-$(CONFIG_REGULATOR_VEXPRESS) += vexpress-regulator.o
 obj-$(CONFIG_REGULATOR_WM831X) += wm831x-dcdc.o
diff --git a/drivers/regulator/uniphier-regulator.c 
b/drivers/regulator/uniphier-regulator.c
new file mode 100644
index 000..abf22ac
--- /dev/null
+++ b/drivers/regulator/uniphier-regulator.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Regulator controller driver for UniPhier SoC
+// Copyright 2018 Socionext Inc.
+// Author: Kunihiko Hayashi 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MAX_CLKS   2
+#define MAX_RSTS   2
+
+struct uniphier_regulator_soc_data {
+   int nclks;
+   const char * const *clock_names;
+   int nrsts;
+   const char * const *reset_names;
+   const struct regulator_desc *desc;
+   const struct regmap_config *regconf;
+};
+
+struct uniphier_regulator_priv {
+   struct clk_bulk_data clk[MAX_CLKS];
+   struct reset_control *rst[MAX_RSTS];
+   const struct uniphier_regulator_soc_data *data;
+};
+
+static struct regulator_ops uniphier_regulator_ops = {
+   .enable = regulator_enable_regmap,
+   .disable= regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
+};
+
+static int uniphier_regulator_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   struct uniphier_regulator_priv *priv;
+   struct regulator_config config = { };
+   struct regulator_dev *rdev;
+   struct regmap *regmap;
+   struct resource *res;
+   void __iomem *base;
+   const char *name;
+   int i, ret, nr;
+
+   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   priv->data = of_device_get_match_data(dev);
+   if (WARN_ON(!priv->data))
+   return -EINVAL;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   base = devm_ioremap_resource(dev, res);
+   if (IS_ERR(base))
+   return PTR_ERR(base);
+
+   for (i = 0; i < priv->data->nclks; i++)
+   priv->clk[i].id = priv->data->clock_names[i];
+   ret = devm_clk_bulk_get(dev, priv->data->nclks, priv->clk);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < priv->data->nrsts; i++) {
+   name = priv->data->reset_names[i];
+   priv->rst[i] = devm_reset_control_get_shared(dev, name);
+   if (IS_ERR(priv->rst[i]))
+   return PTR_ERR(priv->rst[i]);
+   }
+
+   ret = clk_bulk_prepare_enable(priv->data->nclks, priv->clk);
+   if (ret)
+   return ret;
+
+   for (nr = 0; nr < priv->data->nrsts; nr++) {
+   ret = reset_control_deassert(priv->rst[nr]);
+   if (ret)
+   goto out_rst_assert;
+   }
+
+   regmap = devm_regmap_init_mmio(dev, base, priv->data->regconf);
+   if (IS_ERR(regmap))
+   return PTR_ERR(regmap);
+
+   config.dev = dev;
+   config.driver_data = priv;
+   config.of_node = dev->of_node;
+   config.regmap = regmap;
+   config.init_data = of_get_regulator_init_data(dev, dev->of_node,
+ priv->data->desc);
+   rdev = devm_regulator_register(dev, 

[PATCH v3 0/2] regulator: add new UniPhier regulator support

2018-07-10 Thread Kunihiko Hayashi
This series add new regulator controller support for UniPhier SoCs.
Currently this supports USB3 VBUS controller only. This USB3 VBUS belongs to
USB3 glue layer.

Changes since v2:
- replace functions in regulator_ops with helper ones

Changes since v1:
- dt-bindings: add description of glue layer
- replace read/write accesses with regmap_mmio
- rewrite a header with C++ comment style
- reuse soc_data for pxs2, ld20 and pxs3
- replace clk operations with clk_bulk
- move nclks and nrsts to soc_data

Kunihiko Hayashi (2):
  dt-bindings: regulator: add DT bindings for UniPhier regulator
  regulator: uniphier: add regulator driver for UniPhier SoC

 .../bindings/regulator/uniphier-regulator.txt  |  57 ++
 drivers/regulator/Kconfig  |   8 +
 drivers/regulator/Makefile |   1 +
 drivers/regulator/uniphier-regulator.c | 213 +
 4 files changed, 279 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/regulator/uniphier-regulator.txt
 create mode 100644 drivers/regulator/uniphier-regulator.c

-- 
2.7.4



[PATCH v3 2/2] regulator: uniphier: add regulator driver for UniPhier SoC

2018-07-10 Thread Kunihiko Hayashi
Initial commit to add support for regulators implemented in UniPhier SoCs.
This supports USB VBUS only.

Signed-off-by: Kunihiko Hayashi 
---
 drivers/regulator/Kconfig  |   8 ++
 drivers/regulator/Makefile |   1 +
 drivers/regulator/uniphier-regulator.c | 213 +
 3 files changed, 222 insertions(+)
 create mode 100644 drivers/regulator/uniphier-regulator.c

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 097f617..7f7ad0d 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -932,6 +932,14 @@ config REGULATOR_TWL4030
  This driver supports the voltage regulators provided by
  this family of companion chips.
 
+config REGULATOR_UNIPHIER
+   tristate "UniPhier regulator driver"
+   depends on ARCH_UNIPHIER || COMPILE_TEST
+   depends on OF && MFD_SYSCON
+   default ARCH_UNIPHIER
+   help
+ Support for regulators implemented on Socionext UniPhier SoCs.
+
 config REGULATOR_VCTRL
tristate "Voltage controlled regulators"
depends on OF
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 590674f..c0dd281 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -116,6 +116,7 @@ obj-$(CONFIG_REGULATOR_TPS65912) += tps65912-regulator.o
 obj-$(CONFIG_REGULATOR_TPS80031) += tps80031-regulator.o
 obj-$(CONFIG_REGULATOR_TPS65132) += tps65132-regulator.o
 obj-$(CONFIG_REGULATOR_TWL4030) += twl-regulator.o twl6030-regulator.o
+obj-$(CONFIG_REGULATOR_UNIPHIER) += uniphier-regulator.o
 obj-$(CONFIG_REGULATOR_VCTRL) += vctrl-regulator.o
 obj-$(CONFIG_REGULATOR_VEXPRESS) += vexpress-regulator.o
 obj-$(CONFIG_REGULATOR_WM831X) += wm831x-dcdc.o
diff --git a/drivers/regulator/uniphier-regulator.c 
b/drivers/regulator/uniphier-regulator.c
new file mode 100644
index 000..abf22ac
--- /dev/null
+++ b/drivers/regulator/uniphier-regulator.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Regulator controller driver for UniPhier SoC
+// Copyright 2018 Socionext Inc.
+// Author: Kunihiko Hayashi 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MAX_CLKS   2
+#define MAX_RSTS   2
+
+struct uniphier_regulator_soc_data {
+   int nclks;
+   const char * const *clock_names;
+   int nrsts;
+   const char * const *reset_names;
+   const struct regulator_desc *desc;
+   const struct regmap_config *regconf;
+};
+
+struct uniphier_regulator_priv {
+   struct clk_bulk_data clk[MAX_CLKS];
+   struct reset_control *rst[MAX_RSTS];
+   const struct uniphier_regulator_soc_data *data;
+};
+
+static struct regulator_ops uniphier_regulator_ops = {
+   .enable = regulator_enable_regmap,
+   .disable= regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
+};
+
+static int uniphier_regulator_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   struct uniphier_regulator_priv *priv;
+   struct regulator_config config = { };
+   struct regulator_dev *rdev;
+   struct regmap *regmap;
+   struct resource *res;
+   void __iomem *base;
+   const char *name;
+   int i, ret, nr;
+
+   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   priv->data = of_device_get_match_data(dev);
+   if (WARN_ON(!priv->data))
+   return -EINVAL;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   base = devm_ioremap_resource(dev, res);
+   if (IS_ERR(base))
+   return PTR_ERR(base);
+
+   for (i = 0; i < priv->data->nclks; i++)
+   priv->clk[i].id = priv->data->clock_names[i];
+   ret = devm_clk_bulk_get(dev, priv->data->nclks, priv->clk);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < priv->data->nrsts; i++) {
+   name = priv->data->reset_names[i];
+   priv->rst[i] = devm_reset_control_get_shared(dev, name);
+   if (IS_ERR(priv->rst[i]))
+   return PTR_ERR(priv->rst[i]);
+   }
+
+   ret = clk_bulk_prepare_enable(priv->data->nclks, priv->clk);
+   if (ret)
+   return ret;
+
+   for (nr = 0; nr < priv->data->nrsts; nr++) {
+   ret = reset_control_deassert(priv->rst[nr]);
+   if (ret)
+   goto out_rst_assert;
+   }
+
+   regmap = devm_regmap_init_mmio(dev, base, priv->data->regconf);
+   if (IS_ERR(regmap))
+   return PTR_ERR(regmap);
+
+   config.dev = dev;
+   config.driver_data = priv;
+   config.of_node = dev->of_node;
+   config.regmap = regmap;
+   config.init_data = of_get_regulator_init_data(dev, dev->of_node,
+ priv->data->desc);
+   rdev = devm_regulator_register(dev, 

[PATCH v3 0/2] regulator: add new UniPhier regulator support

2018-07-10 Thread Kunihiko Hayashi
This series add new regulator controller support for UniPhier SoCs.
Currently this supports USB3 VBUS controller only. This USB3 VBUS belongs to
USB3 glue layer.

Changes since v2:
- replace functions in regulator_ops with helper ones

Changes since v1:
- dt-bindings: add description of glue layer
- replace read/write accesses with regmap_mmio
- rewrite a header with C++ comment style
- reuse soc_data for pxs2, ld20 and pxs3
- replace clk operations with clk_bulk
- move nclks and nrsts to soc_data

Kunihiko Hayashi (2):
  dt-bindings: regulator: add DT bindings for UniPhier regulator
  regulator: uniphier: add regulator driver for UniPhier SoC

 .../bindings/regulator/uniphier-regulator.txt  |  57 ++
 drivers/regulator/Kconfig  |   8 +
 drivers/regulator/Makefile |   1 +
 drivers/regulator/uniphier-regulator.c | 213 +
 4 files changed, 279 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/regulator/uniphier-regulator.txt
 create mode 100644 drivers/regulator/uniphier-regulator.c

-- 
2.7.4



Re: [PATCH v11 5/8] i2c: fsi: Add transfer implementation

2018-07-10 Thread Joel Stanley
On 11 July 2018 at 06:59, Eddie James  wrote:
>
>
> On 07/10/2018 02:39 PM, Wolfram Sang wrote:
>>>
>>> Sorry, what do you mean "show up as"? Yes, we could first shift all our
>>> addresses in user-space before passing them to the driver, so that the
>>> msg->addr field is exactly what the hardware expects already... This
>>> would
>>> be non-trivial for our users considering all our documentation represents
>>> the addresses as the top 7 bits of a byte :(
>>
>> Ah, now I understand the whole situation! Good that I asked. But I have
>> bad news for you:
>>
>> msg->addr is 7 bit and LSB aligned. No way around that. This is how
>> Linux I2C worked since the beginning. You have to adapt to it.
>>
>> I know what you mean. Most doumentation I get has the addresses in 8
>> bit, i.e. 7 bit address shifted + RW bit. But sorry again, the Linux
>> representation is different and all drivers have to adhere to that.
>>
>> An EEPROM ist at 0x50 in Linux. There is no write addr 0xa0 and read
>> addr 0xa1.
>
>
> OK, I understand! Will test and resend with conforming addressing. Thanks
> for all the feedback!

Nice one Wolfram. I wondered why the standard tools didn't work, but
hadn't gotten around to working out what was going on.

Thanks for taking a close look.

Cheers,

Joel


Re: [PATCH v11 5/8] i2c: fsi: Add transfer implementation

2018-07-10 Thread Joel Stanley
On 11 July 2018 at 06:59, Eddie James  wrote:
>
>
> On 07/10/2018 02:39 PM, Wolfram Sang wrote:
>>>
>>> Sorry, what do you mean "show up as"? Yes, we could first shift all our
>>> addresses in user-space before passing them to the driver, so that the
>>> msg->addr field is exactly what the hardware expects already... This
>>> would
>>> be non-trivial for our users considering all our documentation represents
>>> the addresses as the top 7 bits of a byte :(
>>
>> Ah, now I understand the whole situation! Good that I asked. But I have
>> bad news for you:
>>
>> msg->addr is 7 bit and LSB aligned. No way around that. This is how
>> Linux I2C worked since the beginning. You have to adapt to it.
>>
>> I know what you mean. Most doumentation I get has the addresses in 8
>> bit, i.e. 7 bit address shifted + RW bit. But sorry again, the Linux
>> representation is different and all drivers have to adhere to that.
>>
>> An EEPROM ist at 0x50 in Linux. There is no write addr 0xa0 and read
>> addr 0xa1.
>
>
> OK, I understand! Will test and resend with conforming addressing. Thanks
> for all the feedback!

Nice one Wolfram. I wondered why the standard tools didn't work, but
hadn't gotten around to working out what was going on.

Thanks for taking a close look.

Cheers,

Joel


tools/include/asm-generic/bitsperlong.h:14:2: error: #error Inconsistent word size. Check asm/bitsperlong.h

2018-07-10 Thread kbuild test robot
Hi Alexei,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   1e09177acae32a61586af26d83ca5ef591cdcaf5
commit: 819dd92b9c0bc7bce9097d8c1f14240f471bb386 bpfilter: switch to CC from 
HOSTCC
date:   5 weeks ago
config: alpha-allyesconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 819dd92b9c0bc7bce9097d8c1f14240f471bb386
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=alpha 

All errors (new ones prefixed by >>):

   In file included from tools/include/uapi/asm/bitsperlong.h:17:0,
from /usr/alpha-linux-gnu/include/asm-generic/int-l64.h:11,
from /usr/alpha-linux-gnu/include/asm/types.h:12,
from tools/include/linux/types.h:10,
from ./include/uapi/linux/bpf.h:11,
from net//bpfilter/main.c:9:
>> tools/include/asm-generic/bitsperlong.h:14:2: error: #error Inconsistent 
>> word size. Check asm/bitsperlong.h
#error Inconsistent word size. Check asm/bitsperlong.h
 ^

vim +14 tools/include/asm-generic/bitsperlong.h

bb970707 Arnaldo Carvalho de Melo 2016-07-12  12  
2a00f026 Arnaldo Carvalho de Melo 2016-07-13  13  #if BITS_PER_LONG != 
__BITS_PER_LONG
bb970707 Arnaldo Carvalho de Melo 2016-07-12 @14  #error Inconsistent word 
size. Check asm/bitsperlong.h
bb970707 Arnaldo Carvalho de Melo 2016-07-12  15  #endif
bb970707 Arnaldo Carvalho de Melo 2016-07-12  16  

:: The code at line 14 was first introduced by commit
:: bb9707077b4ee5f77bc9939b057ff8a0d410296f tools: Copy the bitsperlong.h 
files from the kernel

:: TO: Arnaldo Carvalho de Melo 
:: CC: Arnaldo Carvalho de Melo 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


tools/include/asm-generic/bitsperlong.h:14:2: error: #error Inconsistent word size. Check asm/bitsperlong.h

2018-07-10 Thread kbuild test robot
Hi Alexei,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   1e09177acae32a61586af26d83ca5ef591cdcaf5
commit: 819dd92b9c0bc7bce9097d8c1f14240f471bb386 bpfilter: switch to CC from 
HOSTCC
date:   5 weeks ago
config: alpha-allyesconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 819dd92b9c0bc7bce9097d8c1f14240f471bb386
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=alpha 

All errors (new ones prefixed by >>):

   In file included from tools/include/uapi/asm/bitsperlong.h:17:0,
from /usr/alpha-linux-gnu/include/asm-generic/int-l64.h:11,
from /usr/alpha-linux-gnu/include/asm/types.h:12,
from tools/include/linux/types.h:10,
from ./include/uapi/linux/bpf.h:11,
from net//bpfilter/main.c:9:
>> tools/include/asm-generic/bitsperlong.h:14:2: error: #error Inconsistent 
>> word size. Check asm/bitsperlong.h
#error Inconsistent word size. Check asm/bitsperlong.h
 ^

vim +14 tools/include/asm-generic/bitsperlong.h

bb970707 Arnaldo Carvalho de Melo 2016-07-12  12  
2a00f026 Arnaldo Carvalho de Melo 2016-07-13  13  #if BITS_PER_LONG != 
__BITS_PER_LONG
bb970707 Arnaldo Carvalho de Melo 2016-07-12 @14  #error Inconsistent word 
size. Check asm/bitsperlong.h
bb970707 Arnaldo Carvalho de Melo 2016-07-12  15  #endif
bb970707 Arnaldo Carvalho de Melo 2016-07-12  16  

:: The code at line 14 was first introduced by commit
:: bb9707077b4ee5f77bc9939b057ff8a0d410296f tools: Copy the bitsperlong.h 
files from the kernel

:: TO: Arnaldo Carvalho de Melo 
:: CC: Arnaldo Carvalho de Melo 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


linux-next: build warning after merge of the scsi-mkp tree

2018-07-10 Thread Stephen Rothwell
Hi all,

After merging the scsi-mkp tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

In file included from include/linux/spinlock_types.h:18:0,
 from include/linux/spinlock.h:82,
 from drivers/scsi/libfc/fc_rport.c:61:
drivers/scsi/libfc/fc_rport.c: In function 'fc_rport_recv_req':
include/linux/lockdep.h:347:45: warning: 'rdata' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
 #define lockdep_is_held(lock)  lock_is_held(&(lock)->dep_map)
 ^
drivers/scsi/libfc/fc_rport.c:1832:24: note: 'rdata' was declared here
  struct fc_rport_priv *rdata;
^

Introduced by commit

  ee35624e1e4e ("scsi: libfc: Add lockdep annotations")

It is actually complaining about function fc_rport_recv_plogi_req()
(presumably it is being inlined) and this looks like an actual bug :-(



-- 
Cheers,
Stephen Rothwell


pgpWeGktHZXEd.pgp
Description: OpenPGP digital signature


linux-next: build warning after merge of the scsi-mkp tree

2018-07-10 Thread Stephen Rothwell
Hi all,

After merging the scsi-mkp tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

In file included from include/linux/spinlock_types.h:18:0,
 from include/linux/spinlock.h:82,
 from drivers/scsi/libfc/fc_rport.c:61:
drivers/scsi/libfc/fc_rport.c: In function 'fc_rport_recv_req':
include/linux/lockdep.h:347:45: warning: 'rdata' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
 #define lockdep_is_held(lock)  lock_is_held(&(lock)->dep_map)
 ^
drivers/scsi/libfc/fc_rport.c:1832:24: note: 'rdata' was declared here
  struct fc_rport_priv *rdata;
^

Introduced by commit

  ee35624e1e4e ("scsi: libfc: Add lockdep annotations")

It is actually complaining about function fc_rport_recv_plogi_req()
(presumably it is being inlined) and this looks like an actual bug :-(



-- 
Cheers,
Stephen Rothwell


pgpWeGktHZXEd.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the scsi-mkp tree with Linus' tree

2018-07-10 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the scsi-mkp tree got a conflict in:

  MAINTAINERS

between commit:

  54e45716a84a ("scsi: remove NCR_D700 driver")

from Linus' tree and commit:

  01a21986f8ed ("MAINTAINERS: Add Sam as the maintainer for NCSI")

from the scsi-mkp tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc MAINTAINERS
index e5f8823b5f02,f3de5d37179a..
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@@ -9810,17 -9750,6 +9810,11 @@@ F:drivers/scsi/mac_scsi.
  F:drivers/scsi/sun3_scsi.*
  F:drivers/scsi/sun3_scsi_vme.c
  
- NCR DUAL 700 SCSI DRIVER (MICROCHANNEL)
- M:"James E.J. Bottomley" 
- L:linux-s...@vger.kernel.org
- S:Maintained
- F:drivers/scsi/NCR_D700.*
- 
 +NCSI LIBRARY:
 +M:Samuel Mendoza-Jonas 
 +S:Maintained
 +F:net/ncsi/
 +
  NCT6775 HARDWARE MONITOR DRIVER
  M:Guenter Roeck 
  L:linux-hw...@vger.kernel.org


pgppzgu3KkM76.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the scsi-mkp tree with Linus' tree

2018-07-10 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the scsi-mkp tree got a conflict in:

  MAINTAINERS

between commit:

  54e45716a84a ("scsi: remove NCR_D700 driver")

from Linus' tree and commit:

  01a21986f8ed ("MAINTAINERS: Add Sam as the maintainer for NCSI")

from the scsi-mkp tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc MAINTAINERS
index e5f8823b5f02,f3de5d37179a..
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@@ -9810,17 -9750,6 +9810,11 @@@ F:drivers/scsi/mac_scsi.
  F:drivers/scsi/sun3_scsi.*
  F:drivers/scsi/sun3_scsi_vme.c
  
- NCR DUAL 700 SCSI DRIVER (MICROCHANNEL)
- M:"James E.J. Bottomley" 
- L:linux-s...@vger.kernel.org
- S:Maintained
- F:drivers/scsi/NCR_D700.*
- 
 +NCSI LIBRARY:
 +M:Samuel Mendoza-Jonas 
 +S:Maintained
 +F:net/ncsi/
 +
  NCT6775 HARDWARE MONITOR DRIVER
  M:Guenter Roeck 
  L:linux-hw...@vger.kernel.org


pgppzgu3KkM76.pgp
Description: OpenPGP digital signature


[PATCH RESEND] KVM: Add coalesced PIO support

2018-07-10 Thread Wanpeng Li
Windows I/O, such as the real-time clock. The address register (port
0x70 in the RTC case) can use coalesced I/O, cutting the number of
userspace exits by half when reading or writing the RTC.

Guest access rtc like this: write register index to 0x70, then write or 
read data from 0x71. writing 0x70 port is just as index and do nothing 
else. So we can use coalesced mmio to handle this scene to reduce VM-EXIT 
time.

In our environment, 12 windows guests running on a Skylake server:

Before patch:

IO Port Access  Samples  Samples%   Time%Avg time

0x70:POUT2067546.04%92.72%   67.15us ( +-   7.93% )

After patch:

IO Port Access  Samples  Samples%   Time%Avg time

0x70:POUT1750945.42%42.08%   6.37us ( +-  20.37% )

Thanks to Peng Hao's initial patch.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Eduardo Habkost 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/00-INDEX |  2 ++
 Documentation/virtual/kvm/api.txt  |  7 +++
 Documentation/virtual/kvm/coalesced-io.txt | 17 +
 include/uapi/linux/kvm.h   |  5 +++--
 virt/kvm/coalesced_mmio.c  | 16 +---
 virt/kvm/kvm_main.c|  2 ++
 6 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/virtual/kvm/coalesced-io.txt

diff --git a/Documentation/virtual/kvm/00-INDEX 
b/Documentation/virtual/kvm/00-INDEX
index 3492458..4160620 100644
--- a/Documentation/virtual/kvm/00-INDEX
+++ b/Documentation/virtual/kvm/00-INDEX
@@ -9,6 +9,8 @@ arm
- internal ABI between the kernel and HYP (for arm/arm64)
 cpuid.txt
- KVM-specific cpuid leaves (x86).
+coalesced-io.txt
+   - Coalesced MMIO and coalesced PIO.
 devices/
- KVM_CAP_DEVICE_CTRL userspace API.
 halt-polling.txt
diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d10944e..4190796 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4618,3 +4618,10 @@ This capability indicates that KVM supports 
paravirtualized Hyper-V TLB Flush
 hypercalls:
 HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
 HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
+
+8.19 KVM_CAP_COALESCED_PIO
+
+Architectures: x86, s390, ppc, arm64
+
+This Capability indicates that kvm supports writing to a coalesced-pio region
+is not reported to userspace until the next non-coalesced pio is issued.
diff --git a/Documentation/virtual/kvm/coalesced-io.txt 
b/Documentation/virtual/kvm/coalesced-io.txt
new file mode 100644
index 000..4a96eaf
--- /dev/null
+++ b/Documentation/virtual/kvm/coalesced-io.txt
@@ -0,0 +1,17 @@
+
+Coalesced MMIO and coalesced PIO can be used to optimize writes to
+simple device registers. Writes to a coalesced-I/O region are not
+reported to userspace until the next non-coalesced I/O is issued,
+in a similar fashion to write combining hardware.  In KVM, coalesced
+writes are handled in the kernel without exits to userspace, and
+are thus several times faster.
+
+Examples of devices that can benefit from coalesced I/O include:
+
+- devices whose memory is accessed with many consecutive writes, for
+  example the EGA/VGA video RAM.
+
+- windows I/O, such as the real-time clock. The address register (port
+  0x70 in the RTC case) can use coalesced I/O, cutting the number of
+  userspace exits by half when reading or writing the RTC.
+
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6270a3..9cc56d3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -420,13 +420,13 @@ struct kvm_run {
 struct kvm_coalesced_mmio_zone {
__u64 addr;
__u32 size;
-   __u32 pad;
+   __u32 pio;
 };
 
 struct kvm_coalesced_mmio {
__u64 phys_addr;
__u32 len;
-   __u32 pad;
+   __u32 pio;
__u8  data[8];
 };
 
@@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
 #define KVM_CAP_HYPERV_TLBFLUSH 155
+#define KVM_CAP_COALESCED_PIO 156
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 9e65feb..fc66a834 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -83,6 +83,7 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
ring->coalesced_mmio[ring->last].phys_addr = addr;
ring->coalesced_mmio[ring->last].len = len;
memcpy(ring->coalesced_mmio[ring->last].data, val, len);
+   ring->coalesced_mmio[ring->last].pio = dev->zone.pio;
smp_wmb();
ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
spin_unlock(>kvm->ring_lock);
@@ -149,8 +150,12 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
dev->zone = *zone;
 
mutex_lock(>slots_lock);
-   ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, zone->addr,
- zone->size, 

[PATCH RESEND] KVM: Add coalesced PIO support

2018-07-10 Thread Wanpeng Li
Windows I/O, such as the real-time clock. The address register (port
0x70 in the RTC case) can use coalesced I/O, cutting the number of
userspace exits by half when reading or writing the RTC.

Guest access rtc like this: write register index to 0x70, then write or 
read data from 0x71. writing 0x70 port is just as index and do nothing 
else. So we can use coalesced mmio to handle this scene to reduce VM-EXIT 
time.

In our environment, 12 windows guests running on a Skylake server:

Before patch:

IO Port Access  Samples  Samples%   Time%Avg time

0x70:POUT2067546.04%92.72%   67.15us ( +-   7.93% )

After patch:

IO Port Access  Samples  Samples%   Time%Avg time

0x70:POUT1750945.42%42.08%   6.37us ( +-  20.37% )

Thanks to Peng Hao's initial patch.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Eduardo Habkost 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/00-INDEX |  2 ++
 Documentation/virtual/kvm/api.txt  |  7 +++
 Documentation/virtual/kvm/coalesced-io.txt | 17 +
 include/uapi/linux/kvm.h   |  5 +++--
 virt/kvm/coalesced_mmio.c  | 16 +---
 virt/kvm/kvm_main.c|  2 ++
 6 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/virtual/kvm/coalesced-io.txt

diff --git a/Documentation/virtual/kvm/00-INDEX 
b/Documentation/virtual/kvm/00-INDEX
index 3492458..4160620 100644
--- a/Documentation/virtual/kvm/00-INDEX
+++ b/Documentation/virtual/kvm/00-INDEX
@@ -9,6 +9,8 @@ arm
- internal ABI between the kernel and HYP (for arm/arm64)
 cpuid.txt
- KVM-specific cpuid leaves (x86).
+coalesced-io.txt
+   - Coalesced MMIO and coalesced PIO.
 devices/
- KVM_CAP_DEVICE_CTRL userspace API.
 halt-polling.txt
diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d10944e..4190796 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4618,3 +4618,10 @@ This capability indicates that KVM supports 
paravirtualized Hyper-V TLB Flush
 hypercalls:
 HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
 HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
+
+8.19 KVM_CAP_COALESCED_PIO
+
+Architectures: x86, s390, ppc, arm64
+
+This Capability indicates that kvm supports writing to a coalesced-pio region
+is not reported to userspace until the next non-coalesced pio is issued.
diff --git a/Documentation/virtual/kvm/coalesced-io.txt 
b/Documentation/virtual/kvm/coalesced-io.txt
new file mode 100644
index 000..4a96eaf
--- /dev/null
+++ b/Documentation/virtual/kvm/coalesced-io.txt
@@ -0,0 +1,17 @@
+
+Coalesced MMIO and coalesced PIO can be used to optimize writes to
+simple device registers. Writes to a coalesced-I/O region are not
+reported to userspace until the next non-coalesced I/O is issued,
+in a similar fashion to write combining hardware.  In KVM, coalesced
+writes are handled in the kernel without exits to userspace, and
+are thus several times faster.
+
+Examples of devices that can benefit from coalesced I/O include:
+
+- devices whose memory is accessed with many consecutive writes, for
+  example the EGA/VGA video RAM.
+
+- windows I/O, such as the real-time clock. The address register (port
+  0x70 in the RTC case) can use coalesced I/O, cutting the number of
+  userspace exits by half when reading or writing the RTC.
+
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6270a3..9cc56d3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -420,13 +420,13 @@ struct kvm_run {
 struct kvm_coalesced_mmio_zone {
__u64 addr;
__u32 size;
-   __u32 pad;
+   __u32 pio;
 };
 
 struct kvm_coalesced_mmio {
__u64 phys_addr;
__u32 len;
-   __u32 pad;
+   __u32 pio;
__u8  data[8];
 };
 
@@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
 #define KVM_CAP_HYPERV_TLBFLUSH 155
+#define KVM_CAP_COALESCED_PIO 156
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 9e65feb..fc66a834 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -83,6 +83,7 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
ring->coalesced_mmio[ring->last].phys_addr = addr;
ring->coalesced_mmio[ring->last].len = len;
memcpy(ring->coalesced_mmio[ring->last].data, val, len);
+   ring->coalesced_mmio[ring->last].pio = dev->zone.pio;
smp_wmb();
ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
spin_unlock(>kvm->ring_lock);
@@ -149,8 +150,12 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
dev->zone = *zone;
 
mutex_lock(>slots_lock);
-   ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, zone->addr,
- zone->size, 

Re: [PATCH v13 2/2] Add oom victim's memcg to the oom context information

2018-07-10 Thread 禹舟键
Hi Michal
Sorry , I l forget to update the changlog for the second patch, but
the cpuset information is not missing.  Do I still need to make the
v14  or just update the changelog for v13?

Thanks


Re: [PATCH v3 1/3] vt: preserve unicode values corresponding to screen characters

2018-07-10 Thread Nicolas Pitre
I am on vacation away from an actual keyboard until next week. Will look at it 
then. 

> Le 10 juill. 2018 à 20:52, Kees Cook  a écrit :
> 
>> On Tue, Jun 26, 2018 at 8:56 PM, Nicolas Pitre  
>> wrote:
>> The vt code translates UTF-8 strings into glyph index values and stores
>> those glyph values directly in the screen buffer. Because there can only
>> be at most 512 glyphs, it is impossible to represent most unicode
>> characters, in which case a default glyph (often '?') is displayed
>> instead. The original unicode value is then lost.
>> 
>> This patch implements the basic screen buffer handling to preserve unicode
>> values alongside corresponding display glyphs.  It is not activated by
>> default, meaning that people not relying on that functionality won't get
>> the implied overhead.
>> 
>> Signed-off-by: Nicolas Pitre 
>> Tested-by: Dave Mielke 
>> Acked-by: Adam Borowski 
>> ---
>> drivers/tty/vt/vt.c| 220 +++--
>> include/linux/console_struct.h |   2 +
>> 2 files changed, 211 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
>> index 1eb1a376a0..7b636638b3 100644
>> --- a/drivers/tty/vt/vt.c
>> +++ b/drivers/tty/vt/vt.c
>> [...]
>> +static void vc_uniscr_scroll(struct vc_data *vc, unsigned int t, unsigned 
>> int b,
>> +enum con_scroll dir, unsigned int nr)
>> +{
>> +   struct uni_screen *uniscr = get_vc_uniscr(vc);
>> +
>> +   if (uniscr) {
>> +   unsigned int s, d, rescue, clear;
>> +   char32_t *save[nr];
> 
> Can you adjust this to avoid the VLA here? I've almost gotten all VLAs
> removed from the kernel[1], and this is introducing a new one. :)
> 
> Thanks!
> 
> -Kees
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 
>> +
>> +   s = clear = t;
>> +   d = t + nr;
>> +   rescue = b - nr;
>> +   if (dir == SM_UP) {
>> +   swap(s, d);
>> +   swap(clear, rescue);
>> +   }
>> +   memcpy(save, uniscr->lines + rescue, nr * sizeof(*save));
>> +   memmove(uniscr->lines + d, uniscr->lines + s,
>> +   (b - t - nr) * sizeof(*uniscr->lines));
>> +   memcpy(uniscr->lines + clear, save, nr * sizeof(*save));
>> +   vc_uniscr_clear_lines(vc, clear, nr);
>> +   }
>> +}
> 
> 
> -- 
> Kees Cook
> Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-console" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 2/2] Add oom victim's memcg to the oom context information

2018-07-10 Thread 禹舟键
Hi Michal
Sorry , I l forget to update the changlog for the second patch, but
the cpuset information is not missing.  Do I still need to make the
v14  or just update the changelog for v13?

Thanks


Re: [PATCH v3 1/3] vt: preserve unicode values corresponding to screen characters

2018-07-10 Thread Nicolas Pitre
I am on vacation away from an actual keyboard until next week. Will look at it 
then. 

> Le 10 juill. 2018 à 20:52, Kees Cook  a écrit :
> 
>> On Tue, Jun 26, 2018 at 8:56 PM, Nicolas Pitre  
>> wrote:
>> The vt code translates UTF-8 strings into glyph index values and stores
>> those glyph values directly in the screen buffer. Because there can only
>> be at most 512 glyphs, it is impossible to represent most unicode
>> characters, in which case a default glyph (often '?') is displayed
>> instead. The original unicode value is then lost.
>> 
>> This patch implements the basic screen buffer handling to preserve unicode
>> values alongside corresponding display glyphs.  It is not activated by
>> default, meaning that people not relying on that functionality won't get
>> the implied overhead.
>> 
>> Signed-off-by: Nicolas Pitre 
>> Tested-by: Dave Mielke 
>> Acked-by: Adam Borowski 
>> ---
>> drivers/tty/vt/vt.c| 220 +++--
>> include/linux/console_struct.h |   2 +
>> 2 files changed, 211 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
>> index 1eb1a376a0..7b636638b3 100644
>> --- a/drivers/tty/vt/vt.c
>> +++ b/drivers/tty/vt/vt.c
>> [...]
>> +static void vc_uniscr_scroll(struct vc_data *vc, unsigned int t, unsigned 
>> int b,
>> +enum con_scroll dir, unsigned int nr)
>> +{
>> +   struct uni_screen *uniscr = get_vc_uniscr(vc);
>> +
>> +   if (uniscr) {
>> +   unsigned int s, d, rescue, clear;
>> +   char32_t *save[nr];
> 
> Can you adjust this to avoid the VLA here? I've almost gotten all VLAs
> removed from the kernel[1], and this is introducing a new one. :)
> 
> Thanks!
> 
> -Kees
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 
>> +
>> +   s = clear = t;
>> +   d = t + nr;
>> +   rescue = b - nr;
>> +   if (dir == SM_UP) {
>> +   swap(s, d);
>> +   swap(clear, rescue);
>> +   }
>> +   memcpy(save, uniscr->lines + rescue, nr * sizeof(*save));
>> +   memmove(uniscr->lines + d, uniscr->lines + s,
>> +   (b - t - nr) * sizeof(*uniscr->lines));
>> +   memcpy(uniscr->lines + clear, save, nr * sizeof(*save));
>> +   vc_uniscr_clear_lines(vc, clear, nr);
>> +   }
>> +}
> 
> 
> -- 
> Kees Cook
> Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-console" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/14] thermal: ti-soc-thermal: cleanup COUNTER feature handling for OMAP5

2018-07-10 Thread J, KEERTHY




On 5/14/2018 5:12 PM, Bartlomiej Zolnierkiewicz wrote:

OMAP5 sensors don't claim COUNTER feature support (they use
COUNTER_DELAY feature instead) so there is no need to set fields
of struct temp_sensor_registers which are only used for COUNTER
feature.

There should be no functional changes caused by this patch.


Acked-by: Keerthy 



Signed-off-by: Bartlomiej Zolnierkiewicz 
---
  drivers/thermal/ti-soc-thermal/omap5-thermal-data.c | 9 -
  drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h   | 3 ---
  2 files changed, 12 deletions(-)

diff --git a/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c 
b/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c
index 8191bae..e384be1 100644
--- a/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c
+++ b/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c
@@ -41,9 +41,6 @@
.mask_counter_delay_mask = OMAP5430_MASK_COUNTER_DELAY_MASK,
.mask_freeze_mask = OMAP5430_MASK_FREEZE_MPU_MASK,
  
-	.bgap_counter = OMAP5430_BGAP_CTRL_OFFSET,

-   .counter_mask = OMAP5430_COUNTER_MASK,
-
.bgap_threshold = OMAP5430_BGAP_THRESHOLD_MPU_OFFSET,
.threshold_thot_mask = OMAP5430_T_HOT_MASK,
.threshold_tcold_mask = OMAP5430_T_COLD_MASK,
@@ -77,9 +74,6 @@
.mask_counter_delay_mask = OMAP5430_MASK_COUNTER_DELAY_MASK,
.mask_freeze_mask = OMAP5430_MASK_FREEZE_GPU_MASK,
  
-	.bgap_counter = OMAP5430_BGAP_CTRL_OFFSET,

-   .counter_mask = OMAP5430_COUNTER_MASK,
-
.bgap_threshold = OMAP5430_BGAP_THRESHOLD_GPU_OFFSET,
.threshold_thot_mask = OMAP5430_T_HOT_MASK,
.threshold_tcold_mask = OMAP5430_T_COLD_MASK,
@@ -114,9 +108,6 @@
.mask_counter_delay_mask = OMAP5430_MASK_COUNTER_DELAY_MASK,
.mask_freeze_mask = OMAP5430_MASK_FREEZE_CORE_MASK,
  
-	.bgap_counter = OMAP5430_BGAP_CTRL_OFFSET,

-   .counter_mask = OMAP5430_COUNTER_MASK,
-
.bgap_threshold = OMAP5430_BGAP_THRESHOLD_CORE_OFFSET,
.threshold_thot_mask = OMAP5430_T_HOT_MASK,
.threshold_tcold_mask = OMAP5430_T_COLD_MASK,
diff --git a/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h 
b/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h
index dcbf903..223c7a8 100644
--- a/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h
+++ b/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h
@@ -93,9 +93,6 @@
  #define OMAP5430_MASK_HOT_MPU_MASKBIT(1)
  #define OMAP5430_MASK_COLD_MPU_MASK   BIT(0)
  
-/* OMAP5430.BANDGAP_COUNTER */

-#define OMAP5430_COUNTER_MASK  (0xff << 0)
-
  /* OMAP5430.BANDGAP_THRESHOLD */
  #define OMAP5430_T_HOT_MASK   (0x3ff << 16)
  #define OMAP5430_T_COLD_MASK  (0x3ff << 0)



Re: [PATCH 04/14] thermal: ti-soc-thermal: cleanup COUNTER feature handling for OMAP5

2018-07-10 Thread J, KEERTHY




On 5/14/2018 5:12 PM, Bartlomiej Zolnierkiewicz wrote:

OMAP5 sensors don't claim COUNTER feature support (they use
COUNTER_DELAY feature instead) so there is no need to set fields
of struct temp_sensor_registers which are only used for COUNTER
feature.

There should be no functional changes caused by this patch.


Acked-by: Keerthy 



Signed-off-by: Bartlomiej Zolnierkiewicz 
---
  drivers/thermal/ti-soc-thermal/omap5-thermal-data.c | 9 -
  drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h   | 3 ---
  2 files changed, 12 deletions(-)

diff --git a/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c 
b/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c
index 8191bae..e384be1 100644
--- a/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c
+++ b/drivers/thermal/ti-soc-thermal/omap5-thermal-data.c
@@ -41,9 +41,6 @@
.mask_counter_delay_mask = OMAP5430_MASK_COUNTER_DELAY_MASK,
.mask_freeze_mask = OMAP5430_MASK_FREEZE_MPU_MASK,
  
-	.bgap_counter = OMAP5430_BGAP_CTRL_OFFSET,

-   .counter_mask = OMAP5430_COUNTER_MASK,
-
.bgap_threshold = OMAP5430_BGAP_THRESHOLD_MPU_OFFSET,
.threshold_thot_mask = OMAP5430_T_HOT_MASK,
.threshold_tcold_mask = OMAP5430_T_COLD_MASK,
@@ -77,9 +74,6 @@
.mask_counter_delay_mask = OMAP5430_MASK_COUNTER_DELAY_MASK,
.mask_freeze_mask = OMAP5430_MASK_FREEZE_GPU_MASK,
  
-	.bgap_counter = OMAP5430_BGAP_CTRL_OFFSET,

-   .counter_mask = OMAP5430_COUNTER_MASK,
-
.bgap_threshold = OMAP5430_BGAP_THRESHOLD_GPU_OFFSET,
.threshold_thot_mask = OMAP5430_T_HOT_MASK,
.threshold_tcold_mask = OMAP5430_T_COLD_MASK,
@@ -114,9 +108,6 @@
.mask_counter_delay_mask = OMAP5430_MASK_COUNTER_DELAY_MASK,
.mask_freeze_mask = OMAP5430_MASK_FREEZE_CORE_MASK,
  
-	.bgap_counter = OMAP5430_BGAP_CTRL_OFFSET,

-   .counter_mask = OMAP5430_COUNTER_MASK,
-
.bgap_threshold = OMAP5430_BGAP_THRESHOLD_CORE_OFFSET,
.threshold_thot_mask = OMAP5430_T_HOT_MASK,
.threshold_tcold_mask = OMAP5430_T_COLD_MASK,
diff --git a/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h 
b/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h
index dcbf903..223c7a8 100644
--- a/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h
+++ b/drivers/thermal/ti-soc-thermal/omap5xxx-bandgap.h
@@ -93,9 +93,6 @@
  #define OMAP5430_MASK_HOT_MPU_MASKBIT(1)
  #define OMAP5430_MASK_COLD_MPU_MASK   BIT(0)
  
-/* OMAP5430.BANDGAP_COUNTER */

-#define OMAP5430_COUNTER_MASK  (0xff << 0)
-
  /* OMAP5430.BANDGAP_THRESHOLD */
  #define OMAP5430_T_HOT_MASK   (0x3ff << 16)
  #define OMAP5430_T_COLD_MASK  (0x3ff << 0)



linux-next: manual merge of the driver-core tree with the iommu tree

2018-07-10 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the driver-core tree got a conflict in:

  drivers/iommu/ipmmu-vmsa.c

between commits:

  0b8ac1409641 ("iommu/ipmmu-vmsa: Hook up r8a7796 DT matching code")
  3701c123e1c1 ("iommu/ipmmu-vmsa: Hook up r8a779(70|95) DT matching code")
  98dbffd39a65 ("iommu/ipmmu-vmsa: Hook up R8A77965 DT matching code")

from the iommu tree and commit:

  ac6bbf0cdf42 ("iommu: Remove IOMMU_OF_DECLARE")

from the driver-core tree.

I fixed it up (I removed the new IOMMU_OF_DECLARE() lines) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgph5E2EixKgS.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the driver-core tree with the iommu tree

2018-07-10 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the driver-core tree got a conflict in:

  drivers/iommu/ipmmu-vmsa.c

between commits:

  0b8ac1409641 ("iommu/ipmmu-vmsa: Hook up r8a7796 DT matching code")
  3701c123e1c1 ("iommu/ipmmu-vmsa: Hook up r8a779(70|95) DT matching code")
  98dbffd39a65 ("iommu/ipmmu-vmsa: Hook up R8A77965 DT matching code")

from the iommu tree and commit:

  ac6bbf0cdf42 ("iommu: Remove IOMMU_OF_DECLARE")

from the driver-core tree.

I fixed it up (I removed the new IOMMU_OF_DECLARE() lines) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgph5E2EixKgS.pgp
Description: OpenPGP digital signature


Re: [RFC][PATCH 16/42] now we can fold open_check_o_direct() into do_dentry_open()

2018-07-10 Thread Linus Torvalds
On Tue, Jul 10, 2018 at 7:59 PM Al Viro  wrote:
>
> Umm...  Something like [..]

Ack.

Linus


Re: [RFC][PATCH 16/42] now we can fold open_check_o_direct() into do_dentry_open()

2018-07-10 Thread Linus Torvalds
On Tue, Jul 10, 2018 at 7:59 PM Al Viro  wrote:
>
> Umm...  Something like [..]

Ack.

Linus


Re: [RFC][PATCH 10/11] signal: Push pid type from signal senders down into __send_signal

2018-07-10 Thread Linus Torvalds



On Tue, 10 Jul 2018, Eric W. Biederman wrote:
>
> Use the information we already have to document which signals are sent
> to a group of processes and which signals are sent to a single process
> or a single thread.

Ahh.

This is much nicer than what I was playing with yesterday, trying to 
separate out the "bool group" logic in the signal sending code.

I didn't even think to use the pidtype. 

In my defense, I would never have done this whole pidtype cleanup that 
preceded this patch just to fix that odd fork() thing.

As I started reading this patch series, I went from "this seems a bit 
pointless" to "Ahhh" and as I did that I started liking the series a 
lot more.

My initial reaction was "this seems over-engineered" when I just looked at 
the subject lines in my mailbox.

But as I progressed through the series, I really appreciated it. And this 
"10/11" was when I went "ok, I don't even need to see patch 11, I know 
what he's doing.

Anyway, take that as a long-winded ack for the approach and the 
appreciation of the series.

Of course, that's just reading through the patches, no actual _testing_ of 
them. But it looks good to me.

Thanks,

Linus


Re: [RFC][PATCH 10/11] signal: Push pid type from signal senders down into __send_signal

2018-07-10 Thread Linus Torvalds



On Tue, 10 Jul 2018, Eric W. Biederman wrote:
>
> Use the information we already have to document which signals are sent
> to a group of processes and which signals are sent to a single process
> or a single thread.

Ahh.

This is much nicer than what I was playing with yesterday, trying to 
separate out the "bool group" logic in the signal sending code.

I didn't even think to use the pidtype. 

In my defense, I would never have done this whole pidtype cleanup that 
preceded this patch just to fix that odd fork() thing.

As I started reading this patch series, I went from "this seems a bit 
pointless" to "Ahhh" and as I did that I started liking the series a 
lot more.

My initial reaction was "this seems over-engineered" when I just looked at 
the subject lines in my mailbox.

But as I progressed through the series, I really appreciated it. And this 
"10/11" was when I went "ok, I don't even need to see patch 11, I know 
what he's doing.

Anyway, take that as a long-winded ack for the approach and the 
appreciation of the series.

Of course, that's just reading through the patches, no actual _testing_ of 
them. But it looks good to me.

Thanks,

Linus


[PATCH V2] mmc: core: improve reasonableness of bus width setting for HS400es

2018-07-10 Thread Hongjie Fang
mmc_select_hs400es() calls mmc_select_bus_width() which will continue
to set 4bit transfer mode if fail to set 8bit mode. The bus width
should not be set to 4bit in HS400es.

When fail to set 8bit mode, need return error directly for HS400es.

Signed-off-by: Hongjie Fang 
---
 drivers/mmc/core/mmc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 4466f5d..4bd6c09 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1021,8 +1021,11 @@ static int mmc_select_bus_width(struct mmc_card *card)
 EXT_CSD_BUS_WIDTH,
 ext_csd_bits[idx],
 card->ext_csd.generic_cmd6_time);
-   if (err)
+   if (err) {
+   if (card->mmc_avail_type & EXT_CSD_CARD_TYPE_HS400ES)
+   return err;
continue;
+   }
 
bus_width = bus_widths[idx];
mmc_set_bus_width(host, bus_width);
-- 
1.9.1



[PATCH V2] mmc: core: improve reasonableness of bus width setting for HS400es

2018-07-10 Thread Hongjie Fang
mmc_select_hs400es() calls mmc_select_bus_width() which will continue
to set 4bit transfer mode if fail to set 8bit mode. The bus width
should not be set to 4bit in HS400es.

When fail to set 8bit mode, need return error directly for HS400es.

Signed-off-by: Hongjie Fang 
---
 drivers/mmc/core/mmc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 4466f5d..4bd6c09 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1021,8 +1021,11 @@ static int mmc_select_bus_width(struct mmc_card *card)
 EXT_CSD_BUS_WIDTH,
 ext_csd_bits[idx],
 card->ext_csd.generic_cmd6_time);
-   if (err)
+   if (err) {
+   if (card->mmc_avail_type & EXT_CSD_CARD_TYPE_HS400ES)
+   return err;
continue;
+   }
 
bus_width = bus_widths[idx];
mmc_set_bus_width(host, bus_width);
-- 
1.9.1



Re: [RFC][PATCH 16/42] now we can fold open_check_o_direct() into do_dentry_open()

2018-07-10 Thread Al Viro
On Tue, Jul 10, 2018 at 07:44:59PM -0700, Linus Torvalds wrote:
> I like the patch, I hate the commit message.
> 
> It makes sense right now in this sequence, but I'd really like the
> commit message to say _why_ this sequence led up to this point.
> 
> Right now I still remember you trying this, and having to revert it
> because it didn't work before all the fput/put_filp issues. But a year
> from now? Five years from now?
> 
> So at least a "now that fput() works regardless of how far the open
> got.." kind of explanation, ok?

Umm...  Something like

These checks are better off in do_dentry_open(); the reason we couldn't
put them there used to be that callers couldn't tell what kind of cleanup
would do_dentry_open() failure call for.  Now that we have FMODE_OPENED,
cleanup is the same in all cases - it's simply fput().  So let's fold
that into do_dentry_open(), as Christoph's patch tried to.

perhaps?


Re: [RFC][PATCH 16/42] now we can fold open_check_o_direct() into do_dentry_open()

2018-07-10 Thread Al Viro
On Tue, Jul 10, 2018 at 07:44:59PM -0700, Linus Torvalds wrote:
> I like the patch, I hate the commit message.
> 
> It makes sense right now in this sequence, but I'd really like the
> commit message to say _why_ this sequence led up to this point.
> 
> Right now I still remember you trying this, and having to revert it
> because it didn't work before all the fput/put_filp issues. But a year
> from now? Five years from now?
> 
> So at least a "now that fput() works regardless of how far the open
> got.." kind of explanation, ok?

Umm...  Something like

These checks are better off in do_dentry_open(); the reason we couldn't
put them there used to be that callers couldn't tell what kind of cleanup
would do_dentry_open() failure call for.  Now that we have FMODE_OPENED,
cleanup is the same in all cases - it's simply fput().  So let's fold
that into do_dentry_open(), as Christoph's patch tried to.

perhaps?


Re: [PATCH 2/3] arm64: dts: qcom: sdm845-mtp: Add RPMh VRM/XOB regulators

2018-07-10 Thread kbuild test robot
Hi Douglas,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on agross/for-next]
[also build test ERROR on next-20180710]
[cannot apply to v4.18-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Douglas-Anderson/arm64-dts-sdm845-Add-RPMh-regulators-and-usb/20180711-061052
base:   https://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
for-next
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> arch/arm64/boot/dts/qcom/sdm845-mtp.dts:10:10: fatal error: 
>> dt-bindings/regulator/qcom,rpmh-regulator.h: No such file or directory
#include 
 ^
   compilation terminated.

vim +10 arch/arm64/boot/dts/qcom/sdm845-mtp.dts

 9  
  > 10  #include 
11  #include "sdm845.dtsi"
12  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 2/3] arm64: dts: qcom: sdm845-mtp: Add RPMh VRM/XOB regulators

2018-07-10 Thread kbuild test robot
Hi Douglas,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on agross/for-next]
[also build test ERROR on next-20180710]
[cannot apply to v4.18-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Douglas-Anderson/arm64-dts-sdm845-Add-RPMh-regulators-and-usb/20180711-061052
base:   https://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
for-next
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> arch/arm64/boot/dts/qcom/sdm845-mtp.dts:10:10: fatal error: 
>> dt-bindings/regulator/qcom,rpmh-regulator.h: No such file or directory
#include 
 ^
   compilation terminated.

vim +10 arch/arm64/boot/dts/qcom/sdm845-mtp.dts

 9  
  > 10  #include 
11  #include "sdm845.dtsi"
12  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFC][PATCH 01/42] drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()

2018-07-10 Thread Linus Torvalds
Ok, you didn't seem to have a coverletter email, so I'm just replying
to the first one.

Apart from the couple of totally trivial things I reacted to, this
looks very clean and nice. And now I sat in front of the computer
while reading it, so I could follow along better.

So apart from the small stylistic nits, all Acked-by from me.

Linus


Re: [RFC][PATCH 01/42] drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()

2018-07-10 Thread Linus Torvalds
Ok, you didn't seem to have a coverletter email, so I'm just replying
to the first one.

Apart from the couple of totally trivial things I reacted to, this
looks very clean and nice. And now I sat in front of the computer
while reading it, so I could follow along better.

So apart from the small stylistic nits, all Acked-by from me.

Linus


  1   2   3   4   5   6   7   8   9   10   >