Re: [PATCH v5 11/13] KVM: s390: implement mediated device open callback

2018-06-05 Thread Pierre Morel

On 30/05/2018 16:33, Tony Krowiak wrote:

On 05/24/2018 05:08 AM, Pierre Morel wrote:

On 23/05/2018 16:45, Tony Krowiak wrote:

On 05/16/2018 04:03 AM, Pierre Morel wrote:

On 07/05/2018 17:11, Tony Krowiak wrote:

Implements the open callback on the mediated matrix device.
The function registers a group notifier to receive notification
of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
the vfio_ap device driver will get access to the guest's
kvm structure. With access to this structure the driver will:

1. Ensure that only one mediated device is opened for the guest


You should explain why.



2. Configure access to the AP devices for the guest.


...snip...

+void kvm_ap_refcount_inc(struct kvm *kvm)
+{
+    atomic_inc(>arch.crypto.aprefs);
+}
+EXPORT_SYMBOL(kvm_ap_refcount_inc);
+
+void kvm_ap_refcount_dec(struct kvm *kvm)
+{
+    atomic_dec(>arch.crypto.aprefs);
+}
+EXPORT_SYMBOL(kvm_ap_refcount_dec);


Why are these functions inside kvm-ap ?
Will anyone use this outer of vfio-ap ?


As I've stated before, I made the choice to contain all interfaces that
access KVM in kvm-ap because I don't think it is appropriate for the 
device
driver to have to have "knowledge" of the inner workings of KVM. Why 
does

it matter whether any entity outside of the vfio_ap device driver calls
these functions? I could ask a similar question if the interfaces were
contained in vfio-ap; what if another device driver needs access to 
these

interfaces?


This is very driver specific and only used during initialization.
It is not a common property of the cryptographic interface.

I really think you should handle this inside the driver.


We are going to have to agree to disagree on this one. Is it not possible
that future drivers - e.g., when full virtualization is implemented - 
will

require access to KVM?


I do not think that an access to KVM is required for full virtualization.


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany



Re: [PATCH] ksys_mount: check for permissions before resource allocation

2018-06-05 Thread David Sterba
On Tue, Jun 05, 2018 at 04:07:15PM +0400, Ilya Matveychikov wrote:
> > On Jun 5, 2018, at 3:53 PM, Al Viro  wrote:
> > On Tue, Jun 05, 2018 at 03:35:55PM +0400, Ilya Matveychikov wrote:
> >>> On Jun 5, 2018, at 3:26 PM, Al Viro  wrote:
> > On Jun 5, 2018, at 6:00 AM, Ilya Matveychikov  
> > wrote:
> > Early check for mount permissions prevents possible allocation of 3
> > pages from kmalloc() pool by unpriveledged user which can be used for
> > spraying the kernel heap.
> >>> 
> >>> I'm sorry, but there are arseloads of unpriveleged syscalls that do the 
> >>> same,
> >>> starting with read() from procfs files.  So what the hell does it buy?
> >> 
> >> Means that if all do the same shit no reason to fix it? Sounds weird...
> > 
> > Fix *what*?  You do realize that there's no permission checks to stop e.g.
> > stat(2) from copying the pathname in, right?  With user-supplied contents,
> > even...
> > 
> > If you depend upon preventing kmalloc'ed temporary allocations filled
> > with user-supplied data, you are screwed, plain and simple.  It really can't
> > be prevented, in a lot of ways that are much less exotic than mount(2).
> > Most of syscall arguments are copied in, before we get any permission
> > checks.  It does happen and it will happen - examining them while they are
> > still in userland is a nightmare in a lot of respects, starting with
> > security.
> 
> I agree that it’s impossible to completely avoid this kind of allocations
> and examining data in user-land will be the bigger problem than copying
> arguments to the kernel. But aside of that what’s wrong with the idea of
> having the permission check before doing any kind of work?

Isn't there some sysctl knob or config option to sanitize freed memory?
I doubt that using kzfree everywhere unconditionally would be welcome,
also would not scale as there are too many of them. This IMHO leaves
only the build-time option for those willing to pay the performance hit.

> BTW, sys_umount() has this check in the right place - before doing anything.
> So, why not to have the same logic for mount/umount?

What if the check is not equivalent to the one done later? may_mount
needs namespace, it will be available at umount time but not necessarily
during mount due to the security hooks.


Re: [PATCH RESEND] lib/test_printf.c: call wait_for_random_bytes() before plain %p tests

2018-06-05 Thread Andy Shevchenko
+Cc: Petr. I suppose test_printf is going through his tree as well as
vsnprintf itself. At least it logically makes sense.

On Mon, Jun 4, 2018 at 2:37 PM, Thierry Escande
 wrote:
> If the test_printf module is loaded before the crng is initialized, the
> plain 'p' tests will fail because the printed address will not be hashed
> and the buffer will contain '(ptrval)' instead.
> This patch adds a call to wait_for_random_bytes() before plain 'p' tests
> to make sure the crng is initialized.
>
> Signed-off-by: Thierry Escande 
> Acked-by: Tobin C. Harding 
> ---
>  lib/test_printf.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/lib/test_printf.c b/lib/test_printf.c
> index 71ebfa43ad05..839be9385a8a 100644
> --- a/lib/test_printf.c
> +++ b/lib/test_printf.c
> @@ -260,6 +260,13 @@ plain(void)
>  {
> int err;
>
> +   /*
> +* Make sure crng is ready. Otherwise we get "(ptrval)" instead
> +* of a hashed address when printing '%p' in plain_hash() and
> +* plain_format().
> +*/
> +   wait_for_random_bytes();
> +
> err = plain_hash();
> if (err) {
> pr_warn("plain 'p' does not appear to be hashed\n");
> --
> 2.14.1
>



-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH] ksys_mount: check for permissions before resource allocation

2018-06-05 Thread Ilya Matveychikov


> On Jun 5, 2018, at 4:28 PM, David Sterba  wrote:
> 
>> BTW, sys_umount() has this check in the right place - before doing anything.
>> So, why not to have the same logic for mount/umount?
> 
> What if the check is not equivalent to the one done later? may_mount
> needs namespace, it will be available at umount time but not necessarily
> during mount due to the security hooks.

Might be the issue, you’re right. I can’t tell it for sure as I’m not so
familiar with linux/fs code.



Re: [PATCH 1/2] printk: remove unused flag LOG_NOCONS

2018-06-05 Thread Petr Mladek
On Mon 2018-06-04 17:33:42, Steven Rostedt wrote:
> On Thu, 31 May 2018 14:16:33 +0200
> Petr Mladek  wrote:
> 
> > >  enum log_flags {
> > > - LOG_NOCONS  = 1,/* already flushed, do not print to console */
> > > - LOG_NEWLINE = 2,/* text ended with a newline */
> > > - LOG_PREFIX  = 4,/* text started with a prefix */
> > > - LOG_CONT= 8,/* text is a fragment of a continuation line */
> > > + LOG_NEWLINE = 1,/* text ended with a newline */
> > > + LOG_PREFIX  = 2,/* text started with a prefix */
> > > + LOG_CONT= 4,/* text is a fragment of a continuation line */
> > >  };  
> > 
> > Please, do not renumber the bits if there is no real need for it.
> > The format of the log buffer is read also by external tool like
> > "crash". It seems that "crash" ignores these flags but...
> 
> Then what's the problem for renumbering? I've renumbered internal flags
> before. No one complained about it.

Steven, did you renumber enum log_flags or flags in a different
subsystem?

Note that struct printk_log is a bit special because it is used by
the "crash" tool to implement the dmesg/log command. While "crash"
tool does not have special handling for most other internal
structures.

I have double checked "crash" sources and it ignores these flags
at the moment but it might change in the future => I suggest to
do not renumber them if there is not a real need.

Best Regards,
Petr


Re: [PATCH v3] PCI: Check for PCIe downtraining conditions

2018-06-05 Thread Andy Shevchenko
On Tue, Jun 5, 2018 at 3:27 PM, Andy Shevchenko
 wrote:
> On Mon, Jun 4, 2018 at 6:55 PM, Alexandru Gagniuc  
> wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
>>
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
>
> Have you seen any of my comments?
> For your convenience repeating below.

Ah, found the answer in a pile of emails. OK, I see your point about
helper, though the rest is still applicable here.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH net-next 0/6] use pci_zalloc_consistent

2018-06-05 Thread YueHaibing
On 2018/6/5 20:46, Andy Shevchenko wrote:
> On Tue, Jun 5, 2018 at 3:49 PM, Christoph Hellwig  wrote:
>> On Tue, Jun 05, 2018 at 03:39:16PM +0300, Andy Shevchenko wrote:
>>> On Tue, Jun 5, 2018 at 3:28 PM, YueHaibing  wrote:

>>>
>>> Hmm... Is PCI case anyhow special or it's a simple wrapper on top of
>>> dma.*alloc() ?
>>
>> All drivers should move from pci_dma* to dma_* eventually.  Converting
>> from one flavor of deprecated to another is completely pointless.
> 
> Exactly my impression. Thanks, Christoph for clarification.
> 
> YueHaibing, care to follow what Christoph said and change your series
> accordingly?

ok, will send v2
> 



Re: [PATCH v5 00/10] track CPU utilization

2018-06-05 Thread Quentin Perret
On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote:
> Hi Quentin,
> 
> On 05/06/18 11:57, Quentin Perret wrote:
> 
> [...]
> 
> > What about the diff below (just a quick hack to show the idea) applied
> > on tip/sched/core ?
> > 
> > ---8<---
> > diff --git a/kernel/sched/cpufreq_schedutil.c 
> > b/kernel/sched/cpufreq_schedutil.c
> > index a8ba6d1f262a..23a4fb1c2c25 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> > sg_cpu->util_dl  = cpu_util_dl(rq);
> >  }
> >  
> > +unsigned long scale_rt_capacity(int cpu);
> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  {
> > struct rq *rq = cpu_rq(sg_cpu->cpu);
> > +   int cpu = sg_cpu->cpu;
> > +   unsigned long util, dl_bw;
> >  
> > if (rq->rt.rt_nr_running)
> > return sg_cpu->max;
> > @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct 
> > sugov_cpu *sg_cpu)
> >  * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >  * ready for such an interface. So, we only do the latter for now.
> >  */
> > -   return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > +   util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu);
> 
> Sorry to be pedantinc, but this (ATM) includes DL avg contribution, so,
> since we use max below, we will probably have the same problem that we
> discussed on Vincent's approach (overestimation of DL contribution while
> we could use running_bw).

Ah no, you're right, this isn't great for long running deadline tasks.
We should definitely account for the running_bw here, not the dl avg...

I was trying to address the issue of RT stealing time from CFS here, but
the DL integration isn't quite right which this patch as-is, I agree ...

> 
> > +   util >>= SCHED_CAPACITY_SHIFT;
> > +   util = arch_scale_cpu_capacity(NULL, cpu) - util;
> > +   util += sg_cpu->util_cfs;
> > +   dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> 
> Why this_bw instead of running_bw?

So IIUC, this_bw should basically give you the absolute reservation (== the
sum of runtime/deadline ratios of all DL tasks on that rq).

The reason I added this max is because I'm still not sure to understand
how we can safely drop the freq below that point ? If we don't guarantee
to always stay at least at the freq required by DL, aren't we risking to
start a deadline tasks stuck at a low freq because of rate limiting ? In
this case, if that tasks uses all of its runtime then you might start
missing deadlines ...

My feeling is that the only safe thing to do is to guarantee to never go
below the freq required by DL, and to optimistically add CFS tasks
without raising the OPP if we have good reasons to think that DL is
using less than it required (which is what we should get by using
running_bw above I suppose). Does that make any sense ?

Thanks !
Quentin


Re: [PATCH v5 00/10] track CPU utilization

2018-06-05 Thread Quentin Perret
On Tuesday 05 Jun 2018 at 13:59:56 (+0200), Vincent Guittot wrote:
> On 5 June 2018 at 12:57, Quentin Perret  wrote:
> > Hi Vincent,
> >
> > On Tuesday 05 Jun 2018 at 10:36:26 (+0200), Vincent Guittot wrote:
> >> Hi Quentin,
> >>
> >> On 25 May 2018 at 15:12, Vincent Guittot  
> >> wrote:
> >> > This patchset initially tracked only the utilization of RT rq. During
> >> > OSPM summit, it has been discussed the opportunity to extend it in order
> >> > to get an estimate of the utilization of the CPU.
> >> >
> >> > - Patches 1-3 correspond to the content of patchset v4 and add 
> >> > utilization
> >> >   tracking for rt_rq.
> >> >
> >> > When both cfs and rt tasks compete to run on a CPU, we can see some 
> >> > frequency
> >> > drops with schedutil governor. In such case, the cfs_rq's utilization 
> >> > doesn't
> >> > reflect anymore the utilization of cfs tasks but only the remaining part 
> >> > that
> >> > is not used by rt tasks. We should monitor the stolen utilization and 
> >> > take
> >> > it into account when selecting OPP. This patchset doesn't change the OPP
> >> > selection policy for RT tasks but only for CFS tasks
> >> >
> >> > A rt-app use case which creates an always running cfs thread and a rt 
> >> > threads
> >> > that wakes up periodically with both threads pinned on same CPU, show 
> >> > lot of
> >> > frequency switches of the CPU whereas the CPU never goes idles during the
> >> > test. I can share the json file that I used for the test if someone is
> >> > interested in.
> >> >
> >> > For a 15 seconds long test on a hikey 6220 (octo core cortex A53 
> >> > platfrom),
> >> > the cpufreq statistics outputs (stats are reset just before the test) :
> >> > $ cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
> >> > without patchset : 1230
> >> > with patchset : 14
> >>
> >> I have attached the rt-app json file that I use for this test
> >
> > Thank you very much ! I did a quick test with a much simpler fix to this
> > RT-steals-time-from-CFS issue using just the existing scale_rt_capacity().
> > I get the following results on Hikey960:
> >
> > Without patch:
> >cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
> >12
> >cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans
> >640
> > With patch
> >cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
> >8
> >cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans
> >12
> >
> > Yes the rt_avg stuff is out of sync with the PELT signal, but do you think
> > this is an actual issue for realistic use-cases ?
> 
> yes I think that it's worth syncing and consolidating things on the
> same metric. The result will be saner and more robust as we will have
> the same behavior

TBH I'm not disagreeing with that, the PELT-everywhere approach feels
cleaner in a way, but do you have a use-case in mind where this will
definitely help ?

I mean, yes the rt_avg is a slow response to the RT pressure, but is
this always a problem ? Ramping down slower might actually help in some
cases no ?

> 
> >
> > What about the diff below (just a quick hack to show the idea) applied
> > on tip/sched/core ?
> >
> > ---8<---
> > diff --git a/kernel/sched/cpufreq_schedutil.c 
> > b/kernel/sched/cpufreq_schedutil.c
> > index a8ba6d1f262a..23a4fb1c2c25 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> > sg_cpu->util_dl  = cpu_util_dl(rq);
> >  }
> >
> > +unsigned long scale_rt_capacity(int cpu);
> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  {
> > struct rq *rq = cpu_rq(sg_cpu->cpu);
> > +   int cpu = sg_cpu->cpu;
> > +   unsigned long util, dl_bw;
> >
> > if (rq->rt.rt_nr_running)
> > return sg_cpu->max;
> > @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct 
> > sugov_cpu *sg_cpu)
> >  * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >  * ready for such an interface. So, we only do the latter for now.
> >  */
> > -   return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > +   util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu);
> > +   util >>= SCHED_CAPACITY_SHIFT;
> > +   util = arch_scale_cpu_capacity(NULL, cpu) - util;
> > +   util += sg_cpu->util_cfs;
> > +   dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> > +
> > +   /* Make sure to always provide the reserved freq to DL. */
> > +   return max(util, dl_bw);
> >  }
> >
> >  static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, 
> > unsigned int flags)
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index f01f0f395f9a..0e87cbe47c8b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7868,7 +7868,7 @@ static inline int get_sd_load_idx(struct sched_domain 
> 

[PATCH 3/9] mfd: cros_ec: Add or fix SPDX-License-Identifier in all files.

2018-06-05 Thread Enric Balletbo i Serra
And get rid of the license text that is no longer necessary. Also fix
the license as sometimes doesn't match what the header with the value in
the MODULE_LICENSE macro. Assuming that the desired license is GPL-2.0+,
all the files are updated to this license version.

Signed-off-by: Enric Balletbo i Serra 
---

 drivers/mfd/cros_ec.c| 27 +--
 drivers/mfd/cros_ec_dev.c| 24 ++--
 drivers/mfd/cros_ec_dev.h| 16 ++--
 drivers/mfd/cros_ec_i2c.c| 19 +--
 drivers/mfd/cros_ec_spi.c| 21 ++---
 include/linux/mfd/cros_ec.h  | 10 +-
 include/linux/mfd/cros_ec_commands.h | 10 +-
 include/linux/mfd/cros_ec_lpc_mec.h  | 12 ++--
 include/linux/mfd/cros_ec_lpc_reg.h  | 12 ++--
 9 files changed, 34 insertions(+), 117 deletions(-)

diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
index 58e05069163e..6f27c0ffb177 100644
--- a/drivers/mfd/cros_ec.c
+++ b/drivers/mfd/cros_ec.c
@@ -1,21 +1,12 @@
-/*
- * ChromeOS EC multi-function device
- *
- * Copyright (C) 2012 Google, Inc
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * The ChromeOS EC multi function device is used to mux all the requests
- * to the EC device for its multiple features: keyboard controller,
- * battery charging and regulator control, firmware update.
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// ChromeOS EC multi-function device.
+//
+// Copyright (C) 2012 Google, Inc
+//
+// The ChromeOS EC multi function device is used to mux all the requests
+// to the EC device for its multiple features: keyboard controller,
+// battery charging and regulator control, firmware update.
 
 #include 
 #include 
diff --git a/drivers/mfd/cros_ec_dev.c b/drivers/mfd/cros_ec_dev.c
index 5e5fbd40e9d0..192f6af6270f 100644
--- a/drivers/mfd/cros_ec_dev.c
+++ b/drivers/mfd/cros_ec_dev.c
@@ -1,21 +1,9 @@
-/*
- * cros_ec_dev - expose the Chrome OS Embedded Controller to user-space
- *
- * Copyright (C) 2014 Google, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see .
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// Expose the ChromeOS Embedded Controller to user-space.
+//
+// Copyright (C) 2014 Google, Inc.
+// Author: Bill Richardson 
 
 #include 
 #include 
diff --git a/drivers/mfd/cros_ec_dev.h b/drivers/mfd/cros_ec_dev.h
index 45e9453608c5..f80088ffc3b3 100644
--- a/drivers/mfd/cros_ec_dev.h
+++ b/drivers/mfd/cros_ec_dev.h
@@ -1,20 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
 /*
- * cros_ec_dev - expose the Chrome OS Embedded Controller to userspace
+ * cros_ec_dev - Expose the Chrome OS Embedded Controller to userspace
  *
  * Copyright (C) 2014 Google, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see .
  */
 
 #ifndef _CROS_EC_DEV_H_
diff --git a/drivers/mfd/cros_ec_i2c.c b/drivers/mfd/cros_ec_i2c.c
index ef9b4763356f..891fcfd44f8a 100644
--- a/drivers/mfd/cros_ec_i2c.c
+++ b/drivers/mfd/cros_ec_i2c.c
@@ -1,17 +1,8 @@
-/*
- * ChromeOS EC multi-function device (I2C)
- *
- * Copyright (C) 2012 Google, Inc
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in 

[PATCH 4/9] iio: cros_ec: Switch to SPDX identifier.

2018-06-05 Thread Enric Balletbo i Serra
Adopt the SPDX license identifier headers to ease license compliance
management.

Signed-off-by: Enric Balletbo i Serra 
---

 drivers/iio/accel/cros_ec_accel_legacy.c  | 23 --
 .../common/cros_ec_sensors/cros_ec_sensors.c  | 24 ++-
 .../cros_ec_sensors/cros_ec_sensors_core.c| 18 --
 .../cros_ec_sensors/cros_ec_sensors_core.h| 12 ++
 drivers/iio/light/cros_ec_light_prox.c| 18 --
 drivers/iio/pressure/cros_ec_baro.c   | 18 --
 6 files changed, 26 insertions(+), 87 deletions(-)

diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
b/drivers/iio/accel/cros_ec_accel_legacy.c
index 063e89eff791..e7350ddec328 100644
--- a/drivers/iio/accel/cros_ec_accel_legacy.c
+++ b/drivers/iio/accel/cros_ec_accel_legacy.c
@@ -1,21 +1,8 @@
-/*
- * Driver for older Chrome OS EC accelerometer
- *
- * Copyright 2017 Google, Inc
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * This driver uses the memory mapper cros-ec interface to communicate
- * with the Chrome OS EC about accelerometer data.
- * Accelerometer access is presented through iio sysfs.
- */
+// SPDX-License-Identifier: GPL-2.0+
+// Driver for older Chrome OS EC accelerometer
+//
+// Copyright (C) 2017 Google, Inc.
+// Author: Gwendal Grignou 
 
 #include 
 #include 
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c 
b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c
index 705cb3e72663..3dbc90baf6bb 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c
@@ -1,20 +1,10 @@
-/*
- * cros_ec_sensors - Driver for Chrome OS Embedded Controller sensors.
- *
- * Copyright (C) 2016 Google, Inc
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * This driver uses the cros-ec interface to communicate with the Chrome OS
- * EC about sensors data. Data access is presented through iio sysfs.
- */
+// SPDX-License-Identifier: GPL-2.0
+// Driver for Chrome OS Embedded Controller sensors.
+//
+// Copyright (C) 2016 Google, Inc.
+//
+// This driver uses the cros-ec interface to communicate with the ChromeOS
+// EC about sensors data. Data access is presented through iio sysfs.
 
 #include 
 #include 
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c 
b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
index a620eb5ce202..05221994197c 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
@@ -1,17 +1,7 @@
-/*
- * cros_ec_sensors_core - Common function for Chrome OS EC sensor driver.
- *
- * Copyright (C) 2016 Google, Inc
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
+// SPDX-License-Identifier: GPL-2.0
+// Common functions for ChromeOS EC sensor driver.
+//
+// Copyright (C) 2016 Google, Inc.
 
 #include 
 #include 
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.h 
b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.h
index 2edf68dc7336..a9935489030e 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.h
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.h
@@ -1,16 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * ChromeOS EC sensor hub
  *
- * Copyright (C) 2016 Google, Inc
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  

Re: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-06-05 Thread Adrian Hunter
On 04/06/18 18:35, Aapo Vienamo wrote:
> The sdhci get_max_clock callback is set to sdhci_pltfm_clk_get_max_clock
> and tegra_sdhci_get_max_clock is removed. It appears that the
> shdci-tegra specific callback was originally introduced due to the
> requirement that the host clock has to be twice the bus clock on DDR50
> mode. As far as I can tell the only effect the removal has on DDR50 mode
> is in cases where the parent clock is unable to supply the requested
> clock rate, causing the DDR50 mode to run at a lower frequency.
> Currently the DDR50 mode isn't enabled on any of the SoCs and would also
> require configuring the SDHCI clock divider register to function
> properly.
> 
> The problem with tegra_sdhci_get_max_clock is that it divides the clock
> rate by two and thus artificially limits the maximum frequency of faster
> signaling modes which don't have the host-bus frequency ratio requirement
> of DDR50 such as SDR104 and HS200. Furthermore, the call to
> clk_round_rate() may return an error which isn't handled by
> tegra_sdhci_get_max_clock.
> 
> Signed-off-by: Aapo Vienamo 

Acked-by: Adrian Hunter 

> ---
>  drivers/mmc/host/sdhci-tegra.c | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> index 970d38f6..c8745b5 100644
> --- a/drivers/mmc/host/sdhci-tegra.c
> +++ b/drivers/mmc/host/sdhci-tegra.c
> @@ -234,17 +234,6 @@ static void tegra_sdhci_set_uhs_signaling(struct 
> sdhci_host *host,
>   sdhci_set_uhs_signaling(host, timing);
>  }
>  
> -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host *host)
> -{
> - struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> -
> - /*
> -  * DDR modes require the host to run at double the card frequency, so
> -  * the maximum rate we can support is half of the module input clock.
> -  */
> - return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
> -}
> -
>  static void tegra_sdhci_set_tap(struct sdhci_host *host, unsigned int tap)
>  {
>   u32 reg;
> @@ -309,7 +298,7 @@ static const struct sdhci_ops tegra_sdhci_ops = {
>   .platform_execute_tuning = tegra_sdhci_execute_tuning,
>   .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
>   .voltage_switch = tegra_sdhci_voltage_switch,
> - .get_max_clock = tegra_sdhci_get_max_clock,
> + .get_max_clock = sdhci_pltfm_clk_get_max_clock,
>  };
>  
>  static const struct sdhci_pltfm_data sdhci_tegra20_pdata = {
> @@ -357,7 +346,7 @@ static const struct sdhci_ops tegra114_sdhci_ops = {
>   .platform_execute_tuning = tegra_sdhci_execute_tuning,
>   .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
>   .voltage_switch = tegra_sdhci_voltage_switch,
> - .get_max_clock = tegra_sdhci_get_max_clock,
> + .get_max_clock = sdhci_pltfm_clk_get_max_clock,
>  };
>  
>  static const struct sdhci_pltfm_data sdhci_tegra114_pdata = {
> 



Re: [reset-control] How to initialize hardware state with the shared reset line?

2018-06-05 Thread Philipp Zabel
Hi Masahiro,

On Wed, 2018-05-30 at 14:57 +0900, Masahiro Yamada wrote:
> One more thing.
> 
> I want to remove reset_control_reset() entirely.

reset_control_reset is for those cases where "the reset controller
knows" how to reset us. There are hardware reset controllers that can
control a bunch of actual reset signals in the right order and with the
right timings necessary for the connected IP cores by triggering a
single bit.
In that case it wouldn't make much sense to do assert / delay / deassert
in the driver, as the information about the delay is contained in the
reset controller hardware.

> [1] Some reset consumers (e.g. drivers/ata/sata_gemini.c)
> use reset_control_reset() to reset the HW.
> 
> [2] Some reset consumers (e.g. drivers/input/keyboard/tegra-kbc.c)
> use the combination of reset_control_assert() and reset_control_deassert()
> to reset the HW.
> 
> [1] is the only way if the reset controller only supports the pulse reset.
> 
> [2] is the only way if the reset controller only supports the level reset.
> 
> So, this is another strangeness because
> the implementation of reset controller
> affects reset consumers.
> 
> We do not need [1].
> 
> [2] is more flexible than [1] because hardware usually specifies
> how long the reset line should be kept asserted.

This is not always the case.

> For all reset consumers,
> replace
>   reset_control_reset();
> with
>   reset_control_assert();
>   reset_control_deassert();

To be honest, it doesn't make sense to me. If the intention in the
driver is just to reset our internal state, and we have a system reset
controller that can reset us by writing a single bit, I'd prefer to call
a reset function over two assert/deassert functions, one of which ends
up doing nothing.

How about moving in the other direction, and allowing to replace

reset_control_assert(rstc);
udelay(delay);
reset_control_deassert(rstc);

and variants with calls like

reset_control_reset_udelay(rstc, delay);

? If the reset controller knows better, or can't change the delay in
hardware, it may ignore the delay parameter.

> and deprecate reset_control_reset().
>
> I think this is the right thing to do.

I don't think this helps the API, as with that change we have to remove
a guarantee it currently makes: This either only works for shared resets
or we have to accept that reset_control_assert for exclusive resets does
not guarantee to return with the reset line asserted anymore.
Also, for drivers that do deassert in probe and assert in remove, we
would have to issue the reset in deassert and let assert be the no-op,
instead of the other way around.

> The reset controller side should be implemented like this:
> 
> If your reset controller only supports the pulse reset,
>.deassert hook should be no-op.
>.assert hook should pulse the reset
> 
> Then .reset hook should be removed.

There is hardware where assert, deassert, and reset are three different
operations. See for example the tegra/reset-bpmp.c driver. Both assert /
deassert and module reset messages are part of the firmware ABI.

> Or, we can keep the reset drivers as they are.
> drivers/reset/core.c can take care of the proper fallback logic.

I prefer to keep assert, deassert and reset separate for those cases
where the hardware actually supports both variants.

regards
Philipp


[PATCH] power: supply: tps65217: Switch to SPDX identifier.

2018-06-05 Thread Enric Balletbo i Serra
Adopt the SPDX license identifier headers to ease license compliance
management.

Signed-off-by: Enric Balletbo i Serra 
---

 drivers/power/supply/tps65217_charger.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 1f5234098aaf..814c2b81fdfe 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -1,20 +1,8 @@
-/*
- * Battery charger driver for TI's tps65217
- *
- * Copyright (c) 2015, Collabora Ltd.
-
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
-
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
-
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
- */
+// SPDX-License-Identifier: GPL-2.0
+// Battery charger driver for TI's tps65217
+//
+// Copyright (C) 2015 Collabora Ltd.
+// Author: Enric Balletbo i Serra 
 
 /*
  * Battery charger driver for TI's tps65217
-- 
2.17.1



Re: [PATCH 1/2] serial: 8250: enable SERIAL_MCTRL_GPIO by default.

2018-06-05 Thread Andy Shevchenko
On Mon, 2018-06-04 at 20:57 +0200, Giulio Benetti wrote:
> Il 04/06/2018 13:49, Andy Shevchenko ha scritto:
> > On Fri, 2018-06-01 at 16:11 +0200, Giulio Benetti wrote:
> > > It can be useful to override 8250 mctrl lines with gpios, for rts
> > > on
> > > rs485 for example, when rts is not mapped correctly to HW RTS pin.
> > > 
> > > Enable SERIAL_MCTRL_GPIO by default.
> > > 
> > 
> > Unfortunately NAK, see
> > 
> > commit 5db4f7f80d165fc9725f356e99feec409e446baa
> > Author: Andy Shevchenko 
> > Date:   Tue Aug 16 15:06:54 2016 +0300
> > 
> >  Revert "tty/serial/8250: use mctrl_gpio helpers"
> > 
> > for the details.
> > 
> > I would love to see a solution that will satisfy everyone, though I
> > have
> > only means to test proposals for now.
> 
> Thanks for pointing me that.
> I would try to solve serial breakage on intel with already extisting 
> patches dropping this one.
> I'm going to try.
> 
> I can't understand if it's enough using qemu x86 to reproduce the bug.
> If so I'm going to debug and check what makes driver to fail.

You need to provide an ACPI table with UART contains GpioInt() or
GpioIo() resource in it.

Where GPIO number is a number of pin related to UART's RxD.


> Do you think it makes sense? Would it be accepted after bug fixing?

I can test on our hardware. Can't say about the rest, though.

-- 
Andy Shevchenko 
Intel Finland Oy


[PATCH v3] Make elf2ecoff work on 64bit host machines

2018-06-05 Thread Thomas Bogendoerfer
Use fixed width integer types for ecoff structs to make elf2ecoff work
on 64bit host machines

Signed-off-by: Thomas Bogendoerfer 
Reviewed-by: Paul Burton 
---

v3: include stdint.h in ecoff.h
missed one printf format

v2: include stdint.h and use inttypes.h for printf formats

 arch/mips/boot/ecoff.h | 61 --
 arch/mips/boot/elf2ecoff.c | 31 +++
 2 files changed, 48 insertions(+), 44 deletions(-)

diff --git a/arch/mips/boot/ecoff.h b/arch/mips/boot/ecoff.h
index b3e73c22c345..5be79ebfc3f8 100644
--- a/arch/mips/boot/ecoff.h
+++ b/arch/mips/boot/ecoff.h
@@ -2,14 +2,17 @@
 /*
  * Some ECOFF definitions.
  */
+
+#include 
+
 typedef struct filehdr {
-   unsigned short  f_magic;/* magic number */
-   unsigned short  f_nscns;/* number of sections */
-   longf_timdat;   /* time & date stamp */
-   longf_symptr;   /* file pointer to symbolic header */
-   longf_nsyms;/* sizeof(symbolic hdr) */
-   unsigned short  f_opthdr;   /* sizeof(optional hdr) */
-   unsigned short  f_flags;/* flags */
+   uint16_tf_magic;/* magic number */
+   uint16_tf_nscns;/* number of sections */
+   int32_t f_timdat;   /* time & date stamp */
+   int32_t f_symptr;   /* file pointer to symbolic header */
+   int32_t f_nsyms;/* sizeof(symbolic hdr) */
+   uint16_tf_opthdr;   /* sizeof(optional hdr) */
+   uint16_tf_flags;/* flags */
 } FILHDR;
 #define FILHSZ sizeof(FILHDR)
 
@@ -18,32 +21,32 @@ typedef struct filehdr {
 
 typedef struct scnhdr {
chars_name[8];  /* section name */
-   longs_paddr;/* physical address, aliased s_nlib */
-   longs_vaddr;/* virtual address */
-   longs_size; /* section size */
-   longs_scnptr;   /* file ptr to raw data for section */
-   longs_relptr;   /* file ptr to relocation */
-   longs_lnnoptr;  /* file ptr to gp histogram */
-   unsigned short  s_nreloc;   /* number of relocation entries */
-   unsigned short  s_nlnno;/* number of gp histogram entries */
-   longs_flags;/* flags */
+   int32_t s_paddr;/* physical address, aliased s_nlib */
+   int32_t s_vaddr;/* virtual address */
+   int32_t s_size; /* section size */
+   int32_t s_scnptr;   /* file ptr to raw data for section */
+   int32_t s_relptr;   /* file ptr to relocation */
+   int32_t s_lnnoptr;  /* file ptr to gp histogram */
+   uint16_ts_nreloc;   /* number of relocation entries */
+   uint16_ts_nlnno;/* number of gp histogram entries */
+   int32_t s_flags;/* flags */
 } SCNHDR;
 #define SCNHSZ sizeof(SCNHDR)
-#define SCNROUND   ((long)16)
+#define SCNROUND   ((int32_t)16)
 
 typedef struct aouthdr {
-   short   magic;  /* see above*/
-   short   vstamp; /* version stamp*/
-   longtsize;  /* text size in bytes, padded to DW bdry*/
-   longdsize;  /* initialized data "  "*/
-   longbsize;  /* uninitialized data "   " */
-   longentry;  /* entry pt.*/
-   longtext_start; /* base of text used for this file  */
-   longdata_start; /* base of data used for this file  */
-   longbss_start;  /* base of bss used for this file   */
-   longgprmask;/* general purpose register mask*/
-   longcprmask[4]; /* co-processor register masks  */
-   longgp_value;   /* the gp value used for this object*/
+   int16_t magic;  /* see above*/
+   int16_t vstamp; /* version stamp*/
+   int32_t tsize;  /* text size in bytes, padded to DW bdry*/
+   int32_t dsize;  /* initialized data "  "*/
+   int32_t bsize;  /* uninitialized data "   " */
+   int32_t entry;  /* entry pt.*/
+   int32_t text_start; /* base of text used for this file  */
+   int32_t data_start; /* base of data used for this file  */
+   int32_t bss_start;  /* base of bss used for this file   */
+   int32_t gprmask;/* general purpose register mask*/
+   int32_t cprmask[4]; /* co-processor register masks  */
+   int32_t gp_value;   /* the gp value used for this object 

Re: [PATCH 8/9] extcon: usbc-cros-ec: Switch to SPDX identifier.

2018-06-05 Thread Enric Balletbo i Serra



On 05/06/18 11:30, Chanwoo Choi wrote:
> On 2018년 06월 05일 18:22, Enric Balletbo i Serra wrote:
>> Adopt the SPDX license identifier headers to ease license compliance
>> management.
>>
>> Signed-off-by: Enric Balletbo i Serra 
>> ---
>>
>>  drivers/extcon/extcon-usbc-cros-ec.c | 20 +---
>>  1 file changed, 5 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/extcon/extcon-usbc-cros-ec.c 
>> b/drivers/extcon/extcon-usbc-cros-ec.c
>> index 6721ab01fe7d..1a4888f2fe40 100644
>> --- a/drivers/extcon/extcon-usbc-cros-ec.c
>> +++ b/drivers/extcon/extcon-usbc-cros-ec.c
>> @@ -1,18 +1,8 @@
>> -/**
>> - * drivers/extcon/extcon-usbc-cros-ec - ChromeOS Embedded Controller 
>> extcon> - *
>> - * Copyright (C) 2017 Google, Inc
>> - * Author: Benson Leung 
>> - *
>> - * This software is licensed under the terms of the GNU General Public
>> - * License version 2, as published by the Free Software Foundation, and
>> - * may be copied, distributed, and modified under those terms.
>> - *
>> - * This program is distributed in the hope that it will be useful,
>> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> - * GNU General Public License for more details.
>> - */
>> +// SPDX-License-Identifier: GPL-2.0+
>> +// ChromeOS Embedded Controller extcon
>> +//
>> +// Copyright (C) 2012 Google, Inc.
> 
> 2012 is right?

No, sorry, copy paste error.
> The original copyright has '2017' year information.
> 

Should be 2017.

>> +// Author: Benson Leung 
>>  
>>  #include 
>>  #include 
>>
> 
> 


Re: [PATCH 5/9] rtc: cros-ec: Switch to SPDX identifier.

2018-06-05 Thread Alexandre Belloni
On 05/06/2018 11:22:05+0200, Enric Balletbo i Serra wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management.
> 
> Signed-off-by: Enric Balletbo i Serra 
> ---
> 
>  drivers/rtc/rtc-cros-ec.c | 21 +
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
Applied, thanks.

-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH 11/19] sched/numa: Restrict migrating in parallel to the same node.

2018-06-05 Thread Mel Gorman
On Mon, Jun 04, 2018 at 03:30:20PM +0530, Srikar Dronamraju wrote:
> Since task migration under numa balancing can happen in parallel, more
> than one task might choose to move to the same node at the same time.
> This can cause load imbalances at the node level.
> 
> The problem is more likely if there are more cores per node or more
> nodes in system.
> 
> Use a per-node variable to indicate if task migration
> to the node under numa balance is currently active.
> This per-node variable will not track swapping of tasks.
> 
> Testcase   Time: Min Max Avg  StdDev
> numa01.sh  Real:  434.84  676.90  550.53  106.24
> numa01.sh   Sys:  125.98  217.34  179.41   30.35
> numa01.sh  User:38318.4853789.5645864.17 6620.80
> numa02.sh  Real:   60.06   61.27   60.590.45
> numa02.sh   Sys:   14.25   17.86   16.091.28
> numa02.sh  User: 5190.13 5225.67 5209.24   13.19
> numa03.sh  Real:  748.21  960.25  823.15   73.51
> numa03.sh   Sys:   96.68  122.10  110.42   11.29
> numa03.sh  User:58222.1672595.2763552.22 5048.87
> numa04.sh  Real:  433.08  630.55  499.30   68.15
> numa04.sh   Sys:  245.22  386.75  306.09   63.32
> numa04.sh  User:35014.6846151.7238530.26 3924.65
> numa05.sh  Real:  394.77  410.07  401.415.99
> numa05.sh   Sys:  212.40  301.82  256.23   35.41
> numa05.sh  User:33224.8634201.4033665.61  313.40
> 
> Testcase   Time: Min Max Avg  StdDev   %Change
> numa01.sh  Real:  674.61  997.71  785.01  115.95   -29.86%
> numa01.sh   Sys:  180.87  318.88  270.13   51.32   -33.58%
> numa01.sh  User:54001.3071936.5060495.48 6237.55   -24.18%
> numa02.sh  Real:   60.62   62.30   61.460.62   -1.415%
> numa02.sh   Sys:   15.01   33.63   24.386.81   -34.00%
> numa02.sh  User: 5234.20 5325.60 5276.23   38.85   -1.269%
> numa03.sh  Real:  827.62  946.85  914.48   44.58   -9.987%
> numa03.sh   Sys:  135.55  172.40  158.46   12.75   -30.31%
> numa03.sh  User:64839.4273195.4470805.96 3061.20   -10.24%
> numa04.sh  Real:  481.01  608.76  521.14   47.28   -4.190%
> numa04.sh   Sys:  329.59  373.15  353.20   14.20   -13.33%
> numa04.sh  User:37649.0940722.9438806.32 1072.32   -0.711%
> numa05.sh  Real:  399.21  415.38  409.885.54   -2.066%
> numa05.sh   Sys:  319.46  418.57  363.31   37.62   -29.47%
> numa05.sh  User:33727.7734732.6834127.41  447.11   -1.353%
> 
> The commit does cause some performance regression but is needed from
> a fairness/correctness perspective.
> 

While it may cause some performance regressions, it may be due to either
a) some workloads benefit from overloading a node if the tasks idle
frequently or b) the regression may be due to delayed convergence. I'm
not 100% convinced this needs to be done from a correctness point of
view based on just this microbenchmark

-- 
Mel Gorman
SUSE Labs


Re: [PATCHv2 05/16] atomics: prepare for atomic64_fetch_add_unless()

2018-06-05 Thread Mark Rutland
On Tue, Jun 05, 2018 at 11:26:37AM +0200, Peter Zijlstra wrote:
> On Tue, May 29, 2018 at 04:43:35PM +0100, Mark Rutland wrote:
> >  /**
> > + * atomic64_add_unless - add unless the number is already a given value
> > + * @v: pointer of type atomic_t
> > + * @a: the amount to add to v...
> > + * @u: ...unless v is equal to u.
> > + *
> > + * Atomically adds @a to @v, so long as @v was not already @u.
> > + * Returns non-zero if @v was not @u, and zero otherwise.
> 
> I always get confused by that wording; would something like: "Returns
> true if the addition was done" not be more clear?

Sounds clearer to me; I just stole the wording from the existing
atomic_add_unless().

I guess you'll want similar for the conditional inc/dec ops, e.g.

/**
 * atomic_inc_not_zero - increment unless the number is zero
 * @v: pointer of type atomic_t
 *
 * Atomically increments @v by 1, so long as @v is non-zero.
 * Returns non-zero if @v was non-zero, and zero otherwise.
 */

> > + */
> > +#ifdef atomic64_fetch_add_unless
> > +static inline int atomic64_add_unless(atomic64_t *v, long long a, long 
> > long u)
> 
> Do we want to make that a "bool' return?

I think so -- that's what the instrumented wrappers (and x86) do today
anyhow, and what I ended up using for the generated headers.

I'll spin a prep patch cleaning up the existing fallbacks in
, along with the comment fixup above, then rework the
additions likewise.

Thanks,
Mark.


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-05 Thread Bastien Nocera
On Tue, 2018-06-05 at 09:37 +0200, Hans de Goede wrote:
> Hi,
> 
> On 05-06-18 05:18, Chris Chiu wrote:
> > On Tue, Jun 5, 2018 at 10:31 AM, Darren Hart 
> > wrote:
> > > On Mon, Jun 04, 2018 at 04:23:04PM +0200, Hans de Goede wrote:
> > > > Hi,
> > > > 
> > > > On 04-06-18 15:51, Daniel Drake wrote:
> > > > > On Mon, Jun 4, 2018 at 7:22 AM, Hans de Goede  > > > > t.com> wrote:
> > > > > > Is this really a case of the hardware itself processing the
> > > > > > keypress and then changing the brightness *itself* ?
> > > > > > 
> > > > > >   From the "[PATCH 2/2] platform/x86: asus-wmi: Add
> > > > > > keyboard backlight
> > > > > > toggle support" patch I get the impression that the driver
> > > > > > is
> > > > > > modifying the brightness from within the kernel rather then
> > > > > > the
> > > > > > keyboard controller are ACPI embeddec-controller doing it
> > > > > > itself.
> > > > > > 
> > > > > > If that is the case then the right fix is for the driver to
> > > > > > stop
> > > > > > mucking with the brighness itself, it should simply report
> > > > > > the
> > > > > > right keyboard events and export a led interface and then
> > > > > > userspace
> > > > > > will do the right thing (and be able to offer flexible
> > > > > > policies
> > > > > > to the user).
> > > > > 
> > > > > Before this modification, the driver reports the brightness
> > > > > keypresses
> > > > > to userspace and then userspace can respond by changing the
> > > > > brightness
> > > > > level, as you describe.
> > > > > 
> > > > > You are right in that the hardware doesn't change the
> > > > > brightness
> > > > > directly itself, which is the normal usage of
> > > > > LED_BRIGHT_HW_CHANGED.
> > > > > 
> > > > > However this approach was suggested by Benjamin Berg and
> > > > > Bastien
> > > > > Nocera in the thread: Re: [PATCH v2] platform/x86: asus-wmi:
> > > > > Add
> > > > > keyboard backlight toggle support
> > > > > https://marc.info/?l=linux-kernel=152639169210655=2
> > > > > 
> > > > > The issue is that we need to support a new "keyboard
> > > > > backlight
> > > > > brightness cycle" key (in the patch that follows this one)
> > > > > which
> > > > > doesn't fit into any definitions of keys recognised by the
> > > > > kernel and
> > > > > likewise there's no userspace code to handle it.
> > > > > 
> > > > > If preferred we could leave the standard brightness keys
> > > > > behaving as
> > > > > they are (input events) and make the new special key type
> > > > > directly
> > > > > handled by the kernel?
> > > > 
> > > > I'm sorry that Benjamin and Bastien steered you in this
> > > > direction,
> > > > IMHO none of it should be handled in the kernel.
> > > > 
> > > > Anytime any sort of input is directly responded to by the
> > > > kernel
> > > > it is a huge PITA to deal with from userspace. The kernel will
> > > > have
> > > > a simplistic implementation which almost always is wrong.
> > > > 
> > > > Benjamin, remember the pain we went through with rfkill hotkey
> > > > presses being handled in the kernel ?
> > > > 
> > > > And then there is the whole
> > > > acpi_video.brightness_switch_enabled
> > > > debacle, which is an option which defaults to true which causes
> > > > the kernel to handle LCD brightness key presses, which all
> > > > distros
> > > > have been patching to default to off for ages.
> > > > 
> > > > To give a concrete example, we may want to implement software
> > > > dimming / auto-off of the kbd backlight when the no keys are
> > > > touched for x seconds. This would seriously get in the way of
> > > > that.
> > > > 
> > > > So sorry, but NACK to this series.
> > > 
> > > So if instead of modifying the LED value, the kernel platform
> > > drivers
> > > converted the TOGGLE into a cycle even by converting to an UP
> > > event
> > > based on awareness of the plaform specific max value and the read
> > > current value, leaving userspace to act on the TOGGLE/UP events -
> > > would
> > > that be preferable?
> > > 
> > > Something like:
> > > 
> > >  if (code == TOGGLE && ledval < ledmax)
> > >  code = UP;
> > > 
> > >  sparse_keymap_report_event(..., code, ...)
> > > 
> > > }
> > > --
> > > Darren Hart
> > > VMware Open Source Technology Center
> > 
> > That's what I was trying to do in  [PATCH v2] platform/x86: asus-
> > wmi: Add
> > keyboard backlight toggle support. However, that brought another
> > problem
> > discussed in the thread.
> > https://marc.info/?l=linux-kernel=152639169210655=2
> > 
> > So I moved the brightness change in the driver without passing to
> > userspace.
> > Per Hans, seems there're some other concerns and I also wonder if
> > the
> > TOGGLE event happens in ASUS HID (asus-hid.c) which also convert
> > and
> > pass the keycode to userspace but no TOGGLE key support yet What
> > should
> > we do then?
> 
> As I mentioned in my reply to Darren, there are 2 proper solutions to
> this:
> 
> 1) Make userspace treat KEY_KBDILLUMTOGGLE as a cycle key, 

Re: [PATCH 2/2] mm: don't skip memory guarantee calculations

2018-06-05 Thread Roman Gushchin
On Tue, Jun 05, 2018 at 11:03:49AM +0200, Michal Hocko wrote:
> On Mon 04-06-18 17:23:06, Roman Gushchin wrote:
> [...]
> > I'm happy to discuss any concrete issues/concerns, but I really see
> > no reasons to drop it from the mm tree now and start the discussion
> > from scratch.
> 
> I do not think this is ready for the current merge window. Sorry! I
> would really prefer to see the whole thing in one series to have a
> better picture.

Please, provide any specific reason for that. I appreciate your opinion,
but *I think* it's not an argument, seriously.

We've discussed the patchset back to March and I made several iterations
based on the received feedback. Later we had a separate discussion with Greg,
who proposed an alternative solution, which, unfortunately, had some serious
shortcomings. And, as I remember, some time ago we've discussed memory.min
with you.
And now you want to start from scratch without providing any reason.
I find it counter-productive, sorry.

Thanks!


Re: [PATCH] irqchip/gic-v3-its: fix ITS queue timeout

2018-06-05 Thread Julien Thierry

Hi Yang,

On 05/06/18 07:30, Yang Yingliang wrote:

When the kernel booted with maxcpus=x, 'x' is smaller
than actual cpu numbers, the TAs of offline cpus won't
be set to its->collection.

If LPI is bind to offline cpu, sync cmd will use zero TA,
it leads to ITS queue timeout.  Fix this by choosing a
online cpu, if there is no online cpu in cpu_mask.

Signed-off-by: Yang Yingliang 
---
  drivers/irqchip/irq-gic-v3-its.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5416f2b..edd92a9 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct irq_domain 
*domain,
cpu_mask = cpumask_of_node(its_dev->its->numa_node);
  
  	/* Bind the LPI to the first possible CPU */

-   cpu = cpumask_first(cpu_mask);
+   cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+   if (!cpu_online(cpu))


Testing for cpu being online here feels a bit redundant.

Since cpu is online if the cpumask_first_and returns a valid cpu, I 
think you could replace this test with:


if (cpu >= nr_cpu_ids)


+   cpu = cpumask_first(cpu_online_mask);
its_dev->event_map.col_map[event] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
  
@@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct irq_data *d,

bool force)
  {
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-   int cpu = cpumask_first(mask_val);
+   int cpu = cpumask_first_and(mask_val, cpu_online_mask);
+
+   if (!cpu_online(cpu))


Same thing here.


+   cpu = cpumask_first(cpu_online_mask);
  
  	/*

 * Changing affinity is mega expensive, so let's be as lazy as



Cheers,

--
Julien Thierry


Re: [PATCH v5 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2

2018-06-05 Thread Sean Young
On Mon, Jun 04, 2018 at 07:47:30PM +0200, Matthias Reichl wrote:
> Hi Sean,
> 
> I finally found the time to test your patch series and noticed
> 2 issues - comments are inline
> 
> On Sun, May 27, 2018 at 12:24:09PM +0100, Sean Young wrote:
> > diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> > index eb2c3b6eca7f..d5b35a6ba899 100644
> > --- a/drivers/media/rc/Kconfig
> > +++ b/drivers/media/rc/Kconfig
> > @@ -25,6 +25,19 @@ config LIRC
> >passes raw IR to and from userspace, which is needed for
> >IR transmitting (aka "blasting") and for the lirc daemon.
> >  
> > +config BPF_LIRC_MODE2
> > +   bool "Support for eBPF programs attached to lirc devices"
> > +   depends on BPF_SYSCALL
> > +   depends on RC_CORE=y
> 
> Requiring rc-core to be built into the kernel could become
> problematic in the future for people using media_build.
> 
> Currently the whole media tree (including rc-core) can be built
> as modules so DVB and IR drivers can be replaced by newer versions.
> But with rc-core in the kernel things could easily break if internal
> data structures are changed.
> 
> Maybe we should add a small layer with a stable API/ABI between
> bpf-lirc and rc-core to decouple them? Or would it be possible
> to build rc-core with bpf support as a module?

Unfortunately bpf cannot be built as a module.

> > +   depends on LIRC
> > +   help
> > +  Allow attaching eBPF programs to a lirc device using the bpf(2)
> > +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
> > +  receivers.
> > +
> > +  These eBPF programs can be used to decode IR into scancodes, for
> > +  IR protocols not supported by the kernel decoders.
> > +
> >  menuconfig RC_DECODERS
> > bool "Remote controller decoders"
> > depends on RC_CORE
> > [...]
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 388d4feda348..3c104113d040 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -11,6 +11,7 @@
> >   */
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1578,6 +1579,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> > case BPF_SK_SKB_STREAM_PARSER:
> > case BPF_SK_SKB_STREAM_VERDICT:
> > return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
> > +   case BPF_LIRC_MODE2:
> > +   return lirc_prog_attach(attr);
> > default:
> > return -EINVAL;
> > }
> > @@ -1648,6 +1651,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
> > case BPF_SK_SKB_STREAM_PARSER:
> > case BPF_SK_SKB_STREAM_VERDICT:
> > return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
> > +   case BPF_LIRC_MODE2:
> > +   return lirc_prog_detach(attr);
> > default:
> > return -EINVAL;
> > }
> > @@ -1695,6 +1700,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
> > case BPF_CGROUP_SOCK_OPS:
> > case BPF_CGROUP_DEVICE:
> > break;
> > +   case BPF_LIRC_MODE2:
> > +   return lirc_prog_query(attr, uattr);
> 
> When testing this patch series I was wondering why I always got
> -EINVAL when trying to query the registered programs.
> 
> Closer inspection revealed that bpf_prog_attach/detach/query and
> calls to them in the bpf syscall are in "#ifdef CONFIG_CGROUP_BPF"
> blocks - and as I built the kernel without CONFIG_CGROUP_BPF
> BPF_PROG_ATTACH/DETACH/QUERY weren't handled in the syscall switch
> and I got -EINVAL from the bpf syscall function.
> 
> I haven't checked in detail yet, but it looks to me like
> bpf_prog_attach/detach/query could always be built (or when
> either cgroup bpf or lirc bpf are enabled) and the #ifdefs moved
> inside the switch(). So lirc bpf could be used without cgroup bpf.
> Or am I missing something?

You are right, this features depends on CONFIG_CGROUP_BPF right now. This
also affects the BPF_SK_MSG_VERDICT, BPF_SK_SKB_STREAM_VERDICT and
BPF_SK_SKB_STREAM_PARSER type bpf attachments, and as far as I know
these shouldn't depend on CONFIG_CGROUP_BPF either.


Sean


[PATCH] perf stat: Display user and system time

2018-06-05 Thread Jiri Olsa
Adding the support to read rusage data once the
workload is finished and display the system/user
time values:

  $ perf stat --null ./perf bench sched pipe
  ...

   Performance counter stats for './perf bench sched pipe':

   5.342599256 seconds time elapsed

   2.544434000 seconds user
   4.549691000 seconds sys

It works only in non -r mode and only for workload target.

So as of now, for workload targets, we display 3 types of
timings. The time we meassure in perf stat from enable to
disable+period:

   5.342599256 seconds time elapsed

The time spent in user and system lands, displayed only
for workload session/target:

   2.544434000 seconds user
   4.549691000 seconds sys

Those times are the very same displayed by 'time' tool.
They are returned by wait4 call via the getrusage struct
interface.

Suggested-by: Ingo Molnar 
Link: http://lkml.kernel.org/n/tip-t8k6d3gs8sz8zqnz3aslk...@git.kernel.org
Signed-off-by: Jiri Olsa 
---
 tools/perf/Documentation/perf-stat.txt | 40 --
 tools/perf/builtin-stat.c  | 28 +++-
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 3a822f308e6d..5dfe102fb5b5 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -310,20 +310,38 @@ Users who wants to get the actual value can apply 
--no-metric-only.
 EXAMPLES
 
 
-$ perf stat -- make -j
+$ perf stat -- make
 
- Performance counter stats for 'make -j':
+   Performance counter stats for 'make':
 
-8117.370256  task clock ticks #  11.281 CPU utilization factor
-678  context switches #   0.000 M/sec
-133  CPU migrations   #   0.000 M/sec
- 235724  pagefaults   #   0.029 M/sec
-24821162526  CPU cycles   #3057.784 M/sec
-18687303457  instructions #2302.138 M/sec
-  172158895  cache references #  21.209 M/sec
-   27075259  cache misses #   3.335 M/sec
+83723.452481  task-clock:u (msec)   #1.004 CPUs utilized
+   0  context-switches:u#0.000 K/sec
+   0  cpu-migrations:u  #0.000 K/sec
+   3,228,188  page-faults:u #0.039 M/sec
+ 229,570,665,834  cycles:u  #2.742 GHz
+ 313,163,853,778  instructions:u#1.36  insn per cycle
+  69,704,684,856  branches:u#  832.559 M/sec
+   2,078,861,393  branch-misses:u   #2.98% of all branches
 
- Wall-clock time elapsed:   719.554352 msecs
+83.409183620 seconds time elapsed
+
+74.684747000 seconds user
+ 8.739217000 seconds sys
+
+TIMINGS
+---
+As displayed in the example above we can display 3 types of timings.
+We always display the time the counters were enabled/alive:
+
+83.409183620 seconds time elapsed
+
+For workload sessions we also display time the workloads spent in
+user/system lands:
+
+74.684747000 seconds user
+ 8.739217000 seconds sys
+
+Those times are the very same as displayed by the 'time' tool.
 
 CSV FORMAT
 --
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a4f662a462c6..100b3c795501 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -80,6 +80,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "sane_ctype.h"
 
@@ -175,6 +178,8 @@ static int  output_fd;
 static int print_free_counters_hint;
 static int print_mixed_hw_group_error;
 static u64 *walltime_run;
+static boolru_display  = false;
+static struct rusage   ru_data;
 
 struct perf_stat {
bool record;
@@ -726,7 +731,7 @@ static int __run_perf_stat(int argc, const char **argv, int 
run_idx)
break;
}
}
-   waitpid(child_pid, , 0);
+   wait4(child_pid, , 0, _data);
 
if (workload_exec_errno) {
const char *emsg = str_error_r(workload_exec_errno, 
msg, sizeof(msg));
@@ -1804,6 +1809,11 @@ static void print_table(FILE *output, int precision, 
double avg)
fprintf(output, "\n%*s# Final result:\n", indent, "");
 }
 
+static double timeval2double(struct timeval *t)
+{
+   return t->tv_sec + (double) t->tv_usec/USEC_PER_SEC;
+}
+
 static void print_footer(void)
 {
double avg = avg_stats(_nsecs_stats) / NSEC_PER_SEC;
@@ -1815,6 +1825,15 @@ static void print_footer(void)
 
if (run_count == 1) {
fprintf(output, " %17.9f seconds time elapsed", avg);
+
+   if (ru_display) {
+ 

[PATCH v1] clk: tegra: emc: Avoid out-of-bounds bug

2018-06-05 Thread Dmitry Osipenko
Apparently there was an attempt to avoid out-of-bounds accesses when there
is only one memory timing available, but there is a typo in the code that
neglects that attempt.

Signed-off-by: Dmitry Osipenko 
---
 drivers/clk/tegra/clk-emc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/tegra/clk-emc.c b/drivers/clk/tegra/clk-emc.c
index 5234acd30e89..0621a3a82ea6 100644
--- a/drivers/clk/tegra/clk-emc.c
+++ b/drivers/clk/tegra/clk-emc.c
@@ -132,7 +132,7 @@ static int emc_determine_rate(struct clk_hw *hw, struct 
clk_rate_request *req)
timing = tegra->timings + i;
 
if (timing->rate > req->max_rate) {
-   i = min(i, 1);
+   i = max(i, 1);
req->rate = tegra->timings[i - 1].rate;
return 0;
}
-- 
2.17.0



[PATCH] perf script powerpc: Python script for hypervisor call statistics

2018-06-05 Thread Ravi Bangoria
Add python script to show hypervisor call statistics. Ex,

  # perf record -a -e "{powerpc:hcall_entry,powerpc:hcall_exit}"
  # perf script -s scripts/python/powerpc-hcalls.py
hcallcount   min(ns)   max(ns)   avg(ns)

H_RANDOM82   838  1164   904
H_PUT_TCE   47  1078  5928  2003
H_EOI  266  1336  3546  1654
H_ENTER 28  1646  4038  1952
H_PUT_TCE_INDIRECT 230  2166 18168  6109
H_IPI  238  1072  3232  1688
H_SEND_LOGICAL_LAN  42  5488 21366  7694
H_STUFF_TCE294   986  6210  3591
H_XIRR 266  2286  6990  3783
H_PROTECT   10  2196  3556  2555
H_VIO_SIGNAL   294  1028  2784  1311
H_ADD_LOGICAL_LAN_BUFFER53  1978  3450  2600
H_SEND_CRQ  77  1762  7240  2447

Signed-off-by: Ravi Bangoria 
---
 .../perf/scripts/python/bin/powerpc-hcalls-record  |   2 +
 .../perf/scripts/python/bin/powerpc-hcalls-report  |   2 +
 tools/perf/scripts/python/powerpc-hcalls.py| 200 +
 3 files changed, 204 insertions(+)
 create mode 100644 tools/perf/scripts/python/bin/powerpc-hcalls-record
 create mode 100644 tools/perf/scripts/python/bin/powerpc-hcalls-report
 create mode 100644 tools/perf/scripts/python/powerpc-hcalls.py

diff --git a/tools/perf/scripts/python/bin/powerpc-hcalls-record 
b/tools/perf/scripts/python/bin/powerpc-hcalls-record
new file mode 100644
index ..b7402aa9147d
--- /dev/null
+++ b/tools/perf/scripts/python/bin/powerpc-hcalls-record
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf record -e "{powerpc:hcall_entry,powerpc:hcall_exit}" $@
diff --git a/tools/perf/scripts/python/bin/powerpc-hcalls-report 
b/tools/perf/scripts/python/bin/powerpc-hcalls-report
new file mode 100644
index ..dd32ad7465f6
--- /dev/null
+++ b/tools/perf/scripts/python/bin/powerpc-hcalls-report
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/powerpc-hcalls.py
diff --git a/tools/perf/scripts/python/powerpc-hcalls.py 
b/tools/perf/scripts/python/powerpc-hcalls.py
new file mode 100644
index ..ff732118cae8
--- /dev/null
+++ b/tools/perf/scripts/python/powerpc-hcalls.py
@@ -0,0 +1,200 @@
+# SPDX-License-Identifier: GPL-2.0+
+#
+# Copyright (C) 2018 Ravi Bangoria, IBM Corporation
+#
+# Hypervisor call statisics
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+   '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+# output: {
+#  opcode: {
+#  'min': minimum time nsec
+#  'max': maximum time nsec
+#  'time': average time nsec
+#  'cnt': counter
+#  } ...
+# }
+output = {}
+
+# d_enter: {
+#  cpu: {
+#  opcode: nsec
+#  } ...
+# }
+d_enter = {}
+
+hcall_table = {
+   4: 'H_REMOVE',
+   8: 'H_ENTER',
+   12: 'H_READ',
+   16: 'H_CLEAR_MOD',
+   20: 'H_CLEAR_REF',
+   24: 'H_PROTECT',
+   28: 'H_GET_TCE',
+   32: 'H_PUT_TCE',
+   36: 'H_SET_SPRG0',
+   40: 'H_SET_DABR',
+   44: 'H_PAGE_INIT',
+   48: 'H_SET_ASR',
+   52: 'H_ASR_ON',
+   56: 'H_ASR_OFF',
+   60: 'H_LOGICAL_CI_LOAD',
+   64: 'H_LOGICAL_CI_STORE',
+   68: 'H_LOGICAL_CACHE_LOAD',
+   72: 'H_LOGICAL_CACHE_STORE',
+   76: 'H_LOGICAL_ICBI',
+   80: 'H_LOGICAL_DCBF',
+   84: 'H_GET_TERM_CHAR',
+   88: 'H_PUT_TERM_CHAR',
+   92: 'H_REAL_TO_LOGICAL',
+   96: 'H_HYPERVISOR_DATA',
+   100: 'H_EOI',
+   104: 'H_CPPR',
+   108: 'H_IPI',
+   112: 'H_IPOLL',
+   116: 'H_XIRR',
+   120: 'H_MIGRATE_DMA',
+   124: 'H_PERFMON',
+   220: 'H_REGISTER_VPA',
+   224: 'H_CEDE',
+   228: 'H_CONFER',
+   232: 'H_PROD',
+   236: 'H_GET_PPP',
+   240: 'H_SET_PPP',
+   244: 'H_PURR',
+   248: 'H_PIC',
+   252: 'H_REG_CRQ',
+   256: 'H_FREE_CRQ',
+   260: 'H_VIO_SIGNAL',
+   264: 'H_SEND_CRQ',
+   272: 'H_COPY_RDMA',
+   276: 'H_REGISTER_LOGICAL_LAN',
+   280: 'H_FREE_LOGICAL_LAN',
+   284: 'H_ADD_LOGICAL_LAN_BUFFER',
+   288: 'H_SEND_LOGICAL_LAN',
+   292: 'H_BULK_REMOVE',
+   304: 'H_MULTICAST_CTRL',
+   308: 'H_SET_XDABR',
+   312: 'H_STUFF_TCE',
+   316: 'H_PUT_TCE_INDIRECT',
+   332: 'H_CHANGE_LOGICAL_LAN_MAC',
+   336: 'H_VTERM_PARTNER_INFO',
+   340: 'H_REGISTER_VTERM',
+   344: 'H_FREE_VTERM',
+   348: 

[PATCH V2] ARM: dts: armada388-helios4

2018-06-05 Thread Dennis Gilmore
The helios4 is a Armada388 based nas board designed by SolidRun and
based on their SOM. It is sold by kobol.io the dts file came from
https://raw.githubusercontent.com/armbian/build/master/patch/kernel/mvebu-default/95-helios4-device-tree.patch
I added a SPDX license line to match the clearfog it says it was based
on and a compatible line for "kobol,helios4"

Signed-off-by: Dennis Gilmore 

---

changes since first submission
change solidrun to kobol in compatible line based on feedback
---
 arch/arm/boot/dts/Makefile   |   1 +
 arch/arm/boot/dts/armada-388-helios4.dts | 315 +++
 2 files changed, 316 insertions(+)
 create mode 100644 arch/arm/boot/dts/armada-388-helios4.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 7e2424957809..490bfd586198 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1123,6 +1123,7 @@ dtb-$(CONFIG_MACH_ARMADA_38X) += \
armada-388-clearfog-pro.dtb \
armada-388-db.dtb \
armada-388-gp.dtb \
+   armada-388-helios4.dtb \
armada-388-rd.dtb
 dtb-$(CONFIG_MACH_ARMADA_39X) += \
armada-398-db.dtb
diff --git a/arch/arm/boot/dts/armada-388-helios4.dts 
b/arch/arm/boot/dts/armada-388-helios4.dts
new file mode 100644
index ..16026bedc380
--- /dev/null
+++ b/arch/arm/boot/dts/armada-388-helios4.dts
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Device Tree file for Helios4
+ * based on SolidRun Clearfog revision A1 rev 2.0 (88F6828)
+ *
+ *  Copyright (C) 2017
+ *
+ */
+
+/dts-v1/;
+#include "armada-388.dtsi"
+#include "armada-38x-solidrun-microsom.dtsi"
+
+/ {
+   model = "Helios4";
+   compatible = "kobol,helios4", "marvell,armada388",
+   "marvell,armada385", "marvell,armada380";
+
+   memory {
+   device_type = "memory";
+   reg = <0x 0x8000>; /* 2 GB */
+   };
+
+   aliases {
+   /* So that mvebu u-boot can update the MAC addresses */
+   ethernet1 = 
+   };
+
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+
+   reg_12v: regulator-12v {
+   compatible = "regulator-fixed";
+   regulator-name = "power_brick_12V";
+   regulator-min-microvolt = <1200>;
+   regulator-max-microvolt = <1200>;
+   regulator-always-on;
+   };
+
+   reg_3p3v: regulator-3p3v {
+   compatible = "regulator-fixed";
+   regulator-name = "3P3V";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   vin-supply = <_12v>;
+   };
+
+   reg_5p0v_hdd: regulator-5v-hdd {
+   compatible = "regulator-fixed";
+   regulator-name = "5V_HDD";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+   regulator-always-on;
+   vin-supply = <_12v>;
+   };
+
+   reg_5p0v_usb: regulator-5v-usb {
+   compatible = "regulator-fixed";
+   regulator-name = "USB-PWR";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+   regulator-boot-on;
+   regulator-always-on;
+   enable-active-high;
+   gpio = < 6 GPIO_ACTIVE_HIGH>;
+   vin-supply = <_12v>;
+   };
+
+   system-leds {
+   compatible = "gpio-leds";
+   status-led {
+   label = "helios4:green:status";
+   gpios = < 24 GPIO_ACTIVE_LOW>;
+   linux,default-trigger = "heartbeat";
+   default-state = "on";
+   };
+
+   fault-led {
+   label = "helios4:red:fault";
+   gpios = < 25 GPIO_ACTIVE_LOW>;
+   default-state = "keep";
+   };
+   };
+
+   io-leds {
+   compatible = "gpio-leds";
+   sata1-led {
+   label = "helios4:green:ata1";
+   gpios = < 17 GPIO_ACTIVE_LOW>;
+   linux,default-trigger = "ata1";
+   default-state = "off";
+   };
+   sata2-led {
+   label = "helios4:green:ata2";
+   gpios = < 18 GPIO_ACTIVE_LOW>;
+   linux,default-trigger = "ata2";
+   default-state = "off";
+   };
+   sata3-led {
+   label = "helios4:green:ata3";
+   gpios = < 20 GPIO_ACTIVE_LOW>;
+   linux,default-trigger = "ata3";
+   default-state = "off";
+   };
+   sata4-led {
+   label = "helios4:green:ata4";

Re: [PATCH V2] i8042: Increment wakeup_count for the respective port.

2018-06-05 Thread Rafael J. Wysocki
On Mon, Jun 4, 2018 at 11:53 PM, Dmitry Torokhov
 wrote:
> On Fri, Jun 01, 2018 at 06:07:08PM -0700, Ravi Chandra Sadineni wrote:
>> Call pm_wakeup_event on every irq. This should help us in identifying if
>> keyboard was a potential wake reason for the last resume.
>>
>> Signed-off-by: Ravi Chandra Sadineni 
>> ---
>> V2: Increment the wakeup count only when there is a irq and not when the
>> method is called internally.
>>
>> drivers/input/serio/i8042.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c
>> index 824f4c1c1f310..2bd6f2633e29a 100644
>> --- a/drivers/input/serio/i8042.c
>> +++ b/drivers/input/serio/i8042.c
>> @@ -573,6 +573,9 @@ static irqreturn_t i8042_interrupt(int irq, void *dev_id)
>>   port = _ports[port_no];
>>   serio = port->exists ? port->serio : NULL;
>>
>> + if (irq && serio && device_may_wakeup(>dev))
>> + pm_wakeup_event(>dev, 0);
>
> The constant checks for device_may_wakeup() before calling
> pm_wakeup_event()needed to avoid warnings in wakeup_source_activate()
> (?) are annoying.

I'm not following you here.

pm_wakeup_event() ->
pm_wakeup_dev_event() ->
pm_wakeup_ws_event(dev->power.wakeup, ...)
Checks if the first arg is NULL and returns quietly if so.

I don't see why you need the device_may_wakeup() check.

> Rafael, can we move the check into pm_wakeup_dev_event()?

That would be redundant, wouldn't it?

> I am also confused when pm_wakeup_event() vs pm_wakeup_hard_event() vs
> pm_wakeup_dev_event() should be used, if any. Is there any guidance?

First off, the "hard" variant is for when you want to abort suspends
in progress or wake up from suspend to idle regardless of whether or
not wakeup source tracking is enabled.

Second, use pm_wakeup_dev_event() if the decision on "hard" vs "soft"
needs to be made at run time.

Thanks,
Rafael


Re: [RFC/RFT] [PATCH v3 1/4] cpufreq: intel_pstate: Add HWP boost utility and sched util hooks

2018-06-05 Thread Rafael J. Wysocki
On Fri, Jun 1, 2018 at 12:51 AM, Srinivas Pandruvada
 wrote:
> Added two utility functions to HWP boost up gradually and boost down to
> the default cached HWP request values.
>
> Boost up:
> Boost up updates HWP request minimum value in steps. This minimum value
> can reach upto at HWP request maximum values depends on how frequently,
> this boost up function is called. At max, boost up will take three steps
> to reach the maximum, depending on the current HWP request levels and HWP
> capabilities. For example, if the current settings are:
> If P0 (Turbo max) = P1 (Guaranteed max) = min
> No boost at all.
> If P0 (Turbo max) > P1 (Guaranteed max) = min
> Should result in one level boost only for P0.
> If P0 (Turbo max) = P1 (Guaranteed max) > min
> Should result in two level boost:
> (min + p1)/2 and P1.
> If P0 (Turbo max) > P1 (Guaranteed max) > min
> Should result in three level boost:
> (min + p1)/2, P1 and P0.
> We don't set any level between P0 and P1 as there is no guarantee that
> they will be honored.
>
> Boost down:
> After the system is idle for hold time of 3ms, the HWP request is reset
> to the default value from HWP init or user modified one via sysfs.
>
> Caching of HWP Request and Capabilities
> Store the HWP request value last set using MSR_HWP_REQUEST and read
> MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility
> functions.
>
> These boost utility functions calculated limits are based on the latest
> HWP request value, which can be modified by setpolicy() callback. So if
> user space modifies the minimum perf value, that will be accounted for
> every time the boost up is called. There will be case when there can be
> contention with the user modified minimum perf, in that case user value
> will gain precedence. For example just before HWP_REQUEST MSR is updated
> from setpolicy() callback, the boost up function is called via scheduler
> tick callback. Here the cached MSR value is already the latest and limits
> are updated based on the latest user limits, but on return the MSR write
> callback called from setpolicy() callback will update the HWP_REQUEST
> value. This will be used till next time the boost up function is called.
>
> In addition add a variable to control HWP dynamic boosting. When HWP
> dynamic boost is active then set the HWP specific update util hook. The
> contents in the utility hooks will be filled in the subsequent patches.
>
> Signed-off-by: Srinivas Pandruvada 
> ---
>  drivers/cpufreq/intel_pstate.c | 99 
> --
>  1 file changed, 95 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 17e566afbb41..80bf61ae4b1f 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -221,6 +221,9 @@ struct global_params {
>   * preference/bias
>   * @epp_saved: Saved EPP/EPB during system suspend or CPU offline
>   * operation
> + * @hwp_req_cached:Cached value of the last HWP Request MSR
> + * @hwp_cap_cached:Cached value of the last HWP Capabilities MSR
> + * @hwp_boost_min: Last HWP boosted min performance
>   *
>   * This structure stores per CPU instance data for all CPUs.
>   */
> @@ -253,6 +256,9 @@ struct cpudata {
> s16 epp_policy;
> s16 epp_default;
> s16 epp_saved;
> +   u64 hwp_req_cached;
> +   u64 hwp_cap_cached;
> +   int hwp_boost_min;

Why int?  That's a register value, so maybe u32?

>  };
>
>  static struct cpudata **all_cpu_data;
> @@ -285,6 +291,7 @@ static struct pstate_funcs pstate_funcs __read_mostly;
>
>  static int hwp_active __read_mostly;
>  static bool per_cpu_limits __read_mostly;
> +static bool hwp_boost __read_mostly;
>
>  static struct cpufreq_driver *intel_pstate_driver __read_mostly;
>
> @@ -689,6 +696,7 @@ static void intel_pstate_get_hwp_max(unsigned int cpu, 
> int *phy_max,
> u64 cap;
>
> rdmsrl_on_cpu(cpu, MSR_HWP_CAPABILITIES, );
> +   WRITE_ONCE(all_cpu_data[cpu]->hwp_cap_cached, cap);
> if (global.no_turbo)
> *current_max = HWP_GUARANTEED_PERF(cap);
> else
> @@ -763,6 +771,7 @@ static void intel_pstate_hwp_set(unsigned int cpu)
> intel_pstate_set_epb(cpu, epp);
> }
>  skip_epp:
> +   WRITE_ONCE(cpu_data->hwp_req_cached, value);
> wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
>  }
>
> @@ -1381,6 +1390,81 @@ static void intel_pstate_get_cpu_pstates(struct 
> cpudata *cpu)
> intel_pstate_set_min_pstate(cpu);
>  }
>
> +/*
> + * Long hold time will keep high perf limits for long time,
> + * which negatively impacts perf/watt for some workloads,
> + * like specpower. 3ms is based on experiements on some
> + * workoads.
> + */
> +static int hwp_boost_hold_time_ms = 3;
> +
> +static inline void intel_pstate_hwp_boost_up(struct cpudata *cpu)

[PATCH] dm: Use kzalloc for all structs with embedded biosets/mempools

2018-06-05 Thread Kent Overstreet
mempool_init()/bioset_init() require that the mempools/biosets be zeroed
first; they probably should not _require_ this, but not allocating those
structs with kzalloc is a fairly nonsensical thing to do (calling
mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal
and safe, but only works if said memory was zeroed.)

Signed-off-by: Kent Overstreet 
---

Linus,

I fucked up majorly on the bioset/mempool conversion - I forgot to check that
everything biosets/mempools were being embedded in was actually being zeroed on
allocation. Device mapper currently explodes, you'll probably want to apply this
patch post haste.

I have now done that auditing, for every single conversion - this patch fixes
everything I found. There do not seem to be any incorrect ones outside of device
mapper...

We'll probably want a second patch that either a) changes
bioset_init()/mempool_init() to zero the passed in bioset/mempool first, or b)
my preference, WARN() or BUG() if they're passed memory that isn't zeroed.

 drivers/md/dm-bio-prison-v1.c | 2 +-
 drivers/md/dm-bio-prison-v2.c | 2 +-
 drivers/md/dm-io.c| 2 +-
 drivers/md/dm-kcopyd.c| 2 +-
 drivers/md/dm-region-hash.c   | 2 +-
 drivers/md/dm-snap.c  | 2 +-
 drivers/md/dm-thin.c  | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-bio-prison-v1.c b/drivers/md/dm-bio-prison-v1.c
index 8e33a38083..e794e3662f 100644
--- a/drivers/md/dm-bio-prison-v1.c
+++ b/drivers/md/dm-bio-prison-v1.c
@@ -33,7 +33,7 @@ static struct kmem_cache *_cell_cache;
  */
 struct dm_bio_prison *dm_bio_prison_create(void)
 {
-   struct dm_bio_prison *prison = kmalloc(sizeof(*prison), GFP_KERNEL);
+   struct dm_bio_prison *prison = kzalloc(sizeof(*prison), GFP_KERNEL);
int ret;
 
if (!prison)
diff --git a/drivers/md/dm-bio-prison-v2.c b/drivers/md/dm-bio-prison-v2.c
index 601b156920..f866bc97b0 100644
--- a/drivers/md/dm-bio-prison-v2.c
+++ b/drivers/md/dm-bio-prison-v2.c
@@ -35,7 +35,7 @@ static struct kmem_cache *_cell_cache;
  */
 struct dm_bio_prison_v2 *dm_bio_prison_create_v2(struct workqueue_struct *wq)
 {
-   struct dm_bio_prison_v2 *prison = kmalloc(sizeof(*prison), GFP_KERNEL);
+   struct dm_bio_prison_v2 *prison = kzalloc(sizeof(*prison), GFP_KERNEL);
int ret;
 
if (!prison)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 53c6ed0eaa..81ffc59d05 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -51,7 +51,7 @@ struct dm_io_client *dm_io_client_create(void)
unsigned min_ios = dm_get_reserved_bio_based_ios();
int ret;
 
-   client = kmalloc(sizeof(*client), GFP_KERNEL);
+   client = kzalloc(sizeof(*client), GFP_KERNEL);
if (!client)
return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index c89a675a2a..ce7efc7434 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -882,7 +882,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct 
dm_kcopyd_throttle *thro
int r;
struct dm_kcopyd_client *kc;
 
-   kc = kmalloc(sizeof(*kc), GFP_KERNEL);
+   kc = kzalloc(sizeof(*kc), GFP_KERNEL);
if (!kc)
return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c
index 43149eb493..abf3521b80 100644
--- a/drivers/md/dm-region-hash.c
+++ b/drivers/md/dm-region-hash.c
@@ -180,7 +180,7 @@ struct dm_region_hash *dm_region_hash_create(
;
nr_buckets >>= 1;
 
-   rh = kmalloc(sizeof(*rh), GFP_KERNEL);
+   rh = kzalloc(sizeof(*rh), GFP_KERNEL);
if (!rh) {
DMERR("unable to allocate region hash memory");
return ERR_PTR(-ENOMEM);
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index b11ddc55f2..f745404da7 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1120,7 +1120,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned 
int argc, char **argv)
origin_mode = FMODE_WRITE;
}
 
-   s = kmalloc(sizeof(*s), GFP_KERNEL);
+   s = kzalloc(sizeof(*s), GFP_KERNEL);
if (!s) {
ti->error = "Cannot allocate private snapshot structure";
r = -ENOMEM;
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 6c923824ec..5772756c63 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2861,7 +2861,7 @@ static struct pool *pool_create(struct mapped_device 
*pool_md,
return (struct pool *)pmd;
}
 
-   pool = kmalloc(sizeof(*pool), GFP_KERNEL);
+   pool = kzalloc(sizeof(*pool), GFP_KERNEL);
if (!pool) {
*error = "Error allocating memory for pool";
err_p = ERR_PTR(-ENOMEM);
-- 
2.17.1



Re: [PATCHv2 05/16] atomics: prepare for atomic64_fetch_add_unless()

2018-06-05 Thread Peter Zijlstra
On Tue, May 29, 2018 at 04:43:35PM +0100, Mark Rutland wrote:
>  /**
> + * atomic64_add_unless - add unless the number is already a given value
> + * @v: pointer of type atomic_t
> + * @a: the amount to add to v...
> + * @u: ...unless v is equal to u.
> + *
> + * Atomically adds @a to @v, so long as @v was not already @u.
> + * Returns non-zero if @v was not @u, and zero otherwise.

I always get confused by that wording; would something like: "Returns
true if the addition was done" not be more clear?

> + */
> +#ifdef atomic64_fetch_add_unless
> +static inline int atomic64_add_unless(atomic64_t *v, long long a, long long 
> u)

Do we want to make that a "bool' return?

> +{
> + return atomic64_fetch_add_unless(v, a, u) != u;
> +}
> +#endif


Re: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-06-05 Thread Thierry Reding
On Mon, Jun 04, 2018 at 06:35:40PM +0300, Aapo Vienamo wrote:
> The sdhci get_max_clock callback is set to sdhci_pltfm_clk_get_max_clock
> and tegra_sdhci_get_max_clock is removed. It appears that the
> shdci-tegra specific callback was originally introduced due to the
> requirement that the host clock has to be twice the bus clock on DDR50
> mode. As far as I can tell the only effect the removal has on DDR50 mode
> is in cases where the parent clock is unable to supply the requested
> clock rate, causing the DDR50 mode to run at a lower frequency.
> Currently the DDR50 mode isn't enabled on any of the SoCs and would also
> require configuring the SDHCI clock divider register to function
> properly.
> 
> The problem with tegra_sdhci_get_max_clock is that it divides the clock
> rate by two and thus artificially limits the maximum frequency of faster
> signaling modes which don't have the host-bus frequency ratio requirement
> of DDR50 such as SDR104 and HS200. Furthermore, the call to
> clk_round_rate() may return an error which isn't handled by
> tegra_sdhci_get_max_clock.
> 
> Signed-off-by: Aapo Vienamo 
> ---
>  drivers/mmc/host/sdhci-tegra.c | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> index 970d38f6..c8745b5 100644
> --- a/drivers/mmc/host/sdhci-tegra.c
> +++ b/drivers/mmc/host/sdhci-tegra.c
> @@ -234,17 +234,6 @@ static void tegra_sdhci_set_uhs_signaling(struct 
> sdhci_host *host,
>   sdhci_set_uhs_signaling(host, timing);
>  }
>  
> -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host *host)
> -{
> - struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> -
> - /*
> -  * DDR modes require the host to run at double the card frequency, so
> -  * the maximum rate we can support is half of the module input clock.
> -  */
> - return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
> -}

sdhci_pltfm_clk_get_max_clock() returns the current frequency of the
clock, which may not be an accurate maximum.

Also, even if we don't support DDR modes now, we may want to enable them
in the future, at which point we'll need to move to something similar to
the above again, albeit maybe with some of the issues that you mentioned
fixed.

I wonder if we have access to the target mode in this function, because
it seems to me like we'd need to take that into account when determining
the maximum clock rate. Or perhaps the double-rate aspect is already
dealt with in other parts of the MMC subsystem, so the value we should
return here may not even need to take the mode into account.

All of the above said, it is true that we don't enable DDR modes as of
now, and this patch seems like it shouldn't break anything either, so:

Acked-by: Thierry Reding 

I also gave this a brief run on Jetson TK1 and things seem to work fine,
so:

Tested-by: Thierry Reding 


signature.asc
Description: PGP signature


[PATCH v2] ARM: dts: da850: Fix interrups property for gpio

2018-06-05 Thread Keerthy
The intc #interrupt-cells is equal to 1. Currently gpio
node has 2 cells per IRQ which is wrong. Remove the additional
cell for each of the interrupts.

Signed-off-by: Keerthy 
Fixes: 2e38b946dc54 ("ARM: davinci: da850: add GPIO DT node")
---

Changes in v2:

  * Fixed $Subject

 arch/arm/boot/dts/da850.dtsi | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index f6f1597..0f4f817 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -549,11 +549,7 @@
gpio-controller;
#gpio-cells = <2>;
reg = <0x226000 0x1000>;
-   interrupts = <42 IRQ_TYPE_EDGE_BOTH
-   43 IRQ_TYPE_EDGE_BOTH 44 IRQ_TYPE_EDGE_BOTH
-   45 IRQ_TYPE_EDGE_BOTH 46 IRQ_TYPE_EDGE_BOTH
-   47 IRQ_TYPE_EDGE_BOTH 48 IRQ_TYPE_EDGE_BOTH
-   49 IRQ_TYPE_EDGE_BOTH 50 IRQ_TYPE_EDGE_BOTH>;
+   interrupts = <42 43 44 45 46 47 48 49 50>;
ti,ngpio = <144>;
ti,davinci-gpio-unbanked = <0>;
status = "disabled";
-- 
1.9.1



Re: [PATCH] ARM: davinci: da850: Fix interrups property for gpio

2018-06-05 Thread Keerthy



On Tuesday 05 June 2018 03:35 PM, Keerthy wrote:
> The intc #interrupt-cells is equal to 1. Currently gpio
> node has 2 cells per IRQ which is wrong. Remove the additional
> cell for each of the interrupts.

Just noticed $Subject is not quite right. I will fix and send a v2 in a bit.

> 
> Signed-off-by: Keerthy 
> Fixes: 2e38b946dc54 ("ARM: davinci: da850: add GPIO DT node")
> ---
>  arch/arm/boot/dts/da850.dtsi | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
> index f6f1597..0f4f817 100644
> --- a/arch/arm/boot/dts/da850.dtsi
> +++ b/arch/arm/boot/dts/da850.dtsi
> @@ -549,11 +549,7 @@
>   gpio-controller;
>   #gpio-cells = <2>;
>   reg = <0x226000 0x1000>;
> - interrupts = <42 IRQ_TYPE_EDGE_BOTH
> - 43 IRQ_TYPE_EDGE_BOTH 44 IRQ_TYPE_EDGE_BOTH
> - 45 IRQ_TYPE_EDGE_BOTH 46 IRQ_TYPE_EDGE_BOTH
> - 47 IRQ_TYPE_EDGE_BOTH 48 IRQ_TYPE_EDGE_BOTH
> - 49 IRQ_TYPE_EDGE_BOTH 50 IRQ_TYPE_EDGE_BOTH>;
> + interrupts = <42 43 44 45 46 47 48 49 50>;
>   ti,ngpio = <144>;
>   ti,davinci-gpio-unbanked = <0>;
>   status = "disabled";
> 


Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-05 Thread Hans de Goede

Hi,

On 05-06-18 12:46, Benjamin Berg wrote:

Hey,

On Tue, 2018-06-05 at 12:31 +0200, Hans de Goede wrote:

On 05-06-18 12:14, Bastien Nocera wrote:

On Tue, 2018-06-05 at 12:05 +0200, Hans de Goede wrote:

On 05-06-18 11:58, Bastien Nocera wrote:

[SNIP]


Ok, so what are you suggestion, do you really want to hardcode
the cycle behavior in the kernel as these 2 patches are doing,
without any option to intervene from userspace?

As mentioned before in the thread there are several example
of the kernel deciding to handle key-presses itself, putting
policy in the kernel and they have all ended poorly (think
e.g. rfkill, acpi-video dealing with LC brightnesskey presses
itself).

I guess one thing we could do here is code out both solutions,
have a module option which controls if we:

1) Handle this in the kernel as these patches do
2) Or send a new KEY_KBDILLUMCYCLE event

Combined with a Kconfig option to select which is the default
behavior. Then Endless can select 1 for now and then in
Fedora (which defaults to Wayland now) we could default to
2. once all the code for handling 2 is in place.

This is ugly (on the kernel side) but it might be the best
compromise we can do.


I don't really mind which option is used, I'm listing the problems with
the different options. If you don't care about Xorg, then definitely go
for adding a new key. Otherwise, processing it in the kernel is the
least ugly, especially given that the key goes through the same driver
that controls the brightness anyway. There's no crazy cross driver
interaction as there was in the other cases you listed.


Unfortunately not caring about Xorg is not really an option.

Ok, new idea, how about we make g-s-d behavior upon detecting a
KEY_KBDILLUMTOGGLE event configurable, if we're on a Mac do a
toggle, otherwise do a cycle.

Or we could do this through hwdb, then we could add a hwdb entry
for this laptop setting the udev property to do a cycle instead of
a toggle on receiving the keypress.


If we are adding hwdb entries anyway to control the userspace
interpretation of the TOGGLE key, then we could also add the new CYCLE
key and explicitly re-map it to TOGGLE. That requires slightly more
logic in hwdb, but it does mean that we could theoretically just drop
the workaround if we ever stop caring about Xorg.


Hmm, interesting proposal, I say go for it :)

Regards,

Hans





Re: [PATCH 04/12] powerpc: Implement hw_breakpoint_arch_parse()

2018-06-05 Thread Michael Ellerman
Frederic Weisbecker  writes:
> On Mon, May 28, 2018 at 09:31:07PM +1000, Michael Ellerman wrote:
>> Frederic Weisbecker  writes:
>> 
>> > On Thu, May 24, 2018 at 12:01:52PM +1000, Michael Ellerman wrote:
>> >> Frederic Weisbecker  writes:
>> >> 
>> >> > diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
>> >> > b/arch/powerpc/kernel/hw_breakpoint.c
>> >> > index 348cac9..fba6527 100644
>> >> > --- a/arch/powerpc/kernel/hw_breakpoint.c
>> >> > +++ b/arch/powerpc/kernel/hw_breakpoint.c
>> >> > @@ -139,30 +139,31 @@ int arch_bp_generic_fields(int type, int 
>> >> > *gen_bp_type)
>> >> >  /*
>> >> >   * Validate the arch-specific HW Breakpoint register settings
>> >> >   */
>> >> > -int arch_validate_hwbkpt_settings(struct perf_event *bp)
>> >> > +int hw_breakpoint_arch_parse(struct perf_event *bp,
>> >> > +struct perf_event_attr *attr,
>> >> > +struct arch_hw_breakpoint *hw)
>> >> 
>> >> I think the semantics here are that we are reading from bp/attr and
>> >> writing to hw?
>> >> 
>> >> If so would some sprinkling of const on the first two parameters help
>> >> make that clearer?
>> >
>> > I seem to remember there was an issue with that due to the various 
>> > functions
>> > we call that need to be converted to take const as well. I thought I would
>> > do it in a seperate series but actually it should be no big deal to do it
>> > on this one.
>> 
>> Yeah, that does sometimes snowball out of control.
>> 
>> > Let me try that and respin.
>> 
>> Cool. It would be nice to have, but obviously not crucial.
>
> So I managed to constify the perf_event_attr parameter but not the struct 
> perf_event *bp
> because the task target is fetched from it on is_compat_bp() on ARM64. I 
> could constify it
> all the way up to test_ti_thread_flag() but the thread info can only be 
> retrieved through
> a call to task_thread_info() and that's where the qualifier control ends. The 
> const can not
> be passed there and we can't afford to constify the function either, I fear, 
> as it is used
> everywhere for any purpose, including thread_info modifications.

Thanks for trying. 1 out of 2 const parameters is better than none :)

cheers


Re: [PATCH v2 0/5] Tegra20 External Memory Controller driver

2018-06-05 Thread Peter De Schrijver
On Mon, Jun 04, 2018 at 01:36:49AM +0300, Dmitry Osipenko wrote:
> Hello,
> 
> Couple years ago the Tegra20 EMC driver was removed from the kernel
> due to incompatible changes in the Tegra's clock driver. This patchset
> introduces a modernized EMC driver. Currently the sole purpose of the
> driver is to initialize DRAM frequency to maximum rate during of the
> kernels boot-up. Later we may consider implementing dynamic memory
> frequency scaling, utilizing functionality provided by this driver.
> 
> Changelog:
> 
> v2:
>   - Minor code cleanups like consistent use of writel_relaxed instead
> of non-relaxed version, reworded error messages, etc.
> 
>   - Factored out use_pllm_ud bit checking into a standalone patch for
> consistency.
> 
> Dmitry Osipenko (5):
>   dt: bindings: tegra20-emc: Document interrupt property
>   ARM: dts: tegra20: Add interrupt to External Memory Controller
>   clk: tegra20: Turn EMC clock gate into divider
>   clk: tegra20: Check whether direct PLLM sourcing is turned off for EMC
>   memory: tegra: Introduce Tegra20 EMC driver
> 

Series Acked-By: Peter De Schrijver 




[PATCH] x86: mark native_set_p4d() as __always_inline

2018-06-05 Thread Arnd Bergmann
When CONFIG_OPTIMIZE_INLINING is enabled, the function native_set_p4d()
may not be fully inlined into the caller, resulting in a false-positive
warning about an access to the __pgtable_l5_enabled variable from a
non-__init function, despite the original caller being an __init function:

WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from 
the function native_set_p4d() to the variable .init.data:__pgtable_l5_enabled
WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from 
the function native_p4d_clear() to the variable .init.data:__pgtable_l5_enabled
The function native_set_p4d() references
the variable __initdata __pgtable_l5_enabled.
This is often because native_set_p4d lacks a __initdata
annotation or the annotation of __pgtable_l5_enabled is wrong.

Marking the native_set_p4d function and its caller native_p4d_clear()
avoids this problem.

I did not bisect the original cause, but I assume this is related to
the recent rework that turned pgtable_l5_enabled() into an inline
function, which in turn caused the compiler to make different inlining
decisions.

Fixes: ad3fe525b950 ("x86/mm: Unify pgtable_l5_enabled usage in early boot 
code")
Signed-off-by: Arnd Bergmann 
---
 arch/x86/include/asm/pgtable_64.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h 
b/arch/x86/include/asm/pgtable_64.h
index 3c5385f9a88f..0fdcd21dadbd 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -216,7 +216,7 @@ static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
 }
 #endif
 
-static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
+static __always_inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
pgd_t pgd;
 
@@ -230,7 +230,7 @@ static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
*p4dp = native_make_p4d(native_pgd_val(pgd));
 }
 
-static inline void native_p4d_clear(p4d_t *p4d)
+static __always_inline void native_p4d_clear(p4d_t *p4d)
 {
native_set_p4d(p4d, native_make_p4d(0));
 }
-- 
2.9.0



Re: [PATCH v2 06/10] vfio: ccw: Make FSM functions atomic

2018-06-05 Thread Cornelia Huck
On Fri, 25 May 2018 12:21:14 +0200
Pierre Morel  wrote:

> We use mutex around the FSM function call to make the FSM
> event handling and state change atomic.

I'm still not really clear as to what this mutex is supposed to
serialize:

- Modification of the state?
- Any calls in the state machine?
- A combination? (That would imply that we only deal with the state in
  the state machine.)

> 
> Signed-off-by: Pierre Morel 
> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 3 +--
>  drivers/s390/cio/vfio_ccw_private.h | 3 +++
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 6b7112e..98951d5 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -73,8 +73,6 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  
>   private = container_of(work, struct vfio_ccw_private, io_work);
>   vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
> - if (private->mdev)
> - private->state = VFIO_CCW_STATE_IDLE;

Looks like an unrelated change? If you want to do all state changes
under the mutex, that should rather be moved than deleted, shouldn't it?

>  }
>  
>  static void vfio_ccw_sch_event_todo(struct work_struct *work)


[PATCH] coresight: include vmalloc.h for vmap/vunmap

2018-06-05 Thread Arnd Bergmann
The newly introduced code fails to build in some configurations
unless we include the right headers:

drivers/hwtracing/coresight/coresight-tmc-etr.c: In function 
'tmc_free_table_pages':
drivers/hwtracing/coresight/coresight-tmc-etr.c:206:3: error: implicit 
declaration of function 'vunmap'; did you mean 'iounmap'? 
[-Werror=implicit-function-declaration]

Fixes: 79613ae8715a ("coresight: Add generic TMC sg table framework")
Signed-off-by: Arnd Bergmann 
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 6164eed0b5fe..3556d9a849e9 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
-- 
2.9.0



[PATCH] fs: stub out ioprio_check_cap for !CONFIG_BLOCK

2018-06-05 Thread Arnd Bergmann
When CONFIG_BLOCK is disabled, we now run into a link error:

fs/aio.o: In function `aio_prep_rw':
aio.c:(.text+0xf68): undefined reference to `ioprio_check_cap'

Since the priorities are unused without block devices, this adds a stub
that always returns success.

Fixes: d9a08a9e616b ("fs: Add aio iopriority support")
Signed-off-by: Arnd Bergmann 
---
Not sure if my assertion is correct, please check that returning zero
actually makes sense here.
---
 include/linux/ioprio.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 4a28cec49ec3..ccc2a44483b6 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -77,6 +77,13 @@ extern int ioprio_best(unsigned short aprio, unsigned short 
bprio);
 
 extern int set_task_ioprio(struct task_struct *task, int ioprio);
 
+#ifdef CONFIG_BLOCK
 extern int ioprio_check_cap(int ioprio);
+#else
+static inline int ioprio_check_cap(int ioprio)
+{
+   return 0;
+}
+#endif
 
 #endif
-- 
2.9.0



Re: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-06-05 Thread Aapo Vienamo
On Tue, 5 Jun 2018 11:28:01 +0200
Thierry Reding  wrote:

> On Mon, Jun 04, 2018 at 06:35:40PM +0300, Aapo Vienamo wrote:
> > The sdhci get_max_clock callback is set to sdhci_pltfm_clk_get_max_clock
> > and tegra_sdhci_get_max_clock is removed. It appears that the
> > shdci-tegra specific callback was originally introduced due to the
> > requirement that the host clock has to be twice the bus clock on DDR50
> > mode. As far as I can tell the only effect the removal has on DDR50 mode
> > is in cases where the parent clock is unable to supply the requested
> > clock rate, causing the DDR50 mode to run at a lower frequency.
> > Currently the DDR50 mode isn't enabled on any of the SoCs and would also
> > require configuring the SDHCI clock divider register to function
> > properly.
> > 
> > The problem with tegra_sdhci_get_max_clock is that it divides the clock
> > rate by two and thus artificially limits the maximum frequency of faster
> > signaling modes which don't have the host-bus frequency ratio requirement
> > of DDR50 such as SDR104 and HS200. Furthermore, the call to
> > clk_round_rate() may return an error which isn't handled by
> > tegra_sdhci_get_max_clock.
> > 
> > Signed-off-by: Aapo Vienamo 
> > ---
> >  drivers/mmc/host/sdhci-tegra.c | 15 ++-
> >  1 file changed, 2 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> > index 970d38f6..c8745b5 100644
> > --- a/drivers/mmc/host/sdhci-tegra.c
> > +++ b/drivers/mmc/host/sdhci-tegra.c
> > @@ -234,17 +234,6 @@ static void tegra_sdhci_set_uhs_signaling(struct 
> > sdhci_host *host,
> > sdhci_set_uhs_signaling(host, timing);
> >  }
> >  
> > -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host *host)
> > -{
> > -   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > -
> > -   /*
> > -* DDR modes require the host to run at double the card frequency, so
> > -* the maximum rate we can support is half of the module input clock.
> > -*/
> > -   return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
> > -}  
> 
> sdhci_pltfm_clk_get_max_clock() returns the current frequency of the
> clock, which may not be an accurate maximum.
> 
> Also, even if we don't support DDR modes now, we may want to enable them
> in the future, at which point we'll need to move to something similar to
> the above again, albeit maybe with some of the issues that you mentioned
> fixed.
> 
> I wonder if we have access to the target mode in this function, because
> it seems to me like we'd need to take that into account when determining
> the maximum clock rate. Or perhaps the double-rate aspect is already
> dealt with in other parts of the MMC subsystem, so the value we should
> return here may not even need to take the mode into account.

I don't think that's possible. The callback is only called during probe
from sdhci_setup_host() via sdhci_add_host(). Handling DDR50 properly
might require adding a new SDHCI quirk bit.

> All of the above said, it is true that we don't enable DDR modes as of
> now, and this patch seems like it shouldn't break anything either, so:
> 
> Acked-by: Thierry Reding 
> 
> I also gave this a brief run on Jetson TK1 and things seem to work fine,
> so:
> 
> Tested-by: Thierry Reding 



Re: [PATCH 1/9] platform/chrome: cros_ec: Switch to SPDX identifier.

2018-06-05 Thread Emilio Pozuelo Monfort
Hi Enric,

On 05/06/18 11:22, Enric Balletbo i Serra wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management.
> 
> Signed-off-by: Enric Balletbo i Serra 
> ---
> 
>  drivers/platform/chrome/cros_ec_debugfs.c | 22 +++-
>  drivers/platform/chrome/cros_ec_lightbar.c| 22 +++-
>  drivers/platform/chrome/cros_ec_lpc.c | 26 +++
>  drivers/platform/chrome/cros_ec_lpc_mec.c | 26 +++
>  drivers/platform/chrome/cros_ec_lpc_reg.c | 26 +++
>  drivers/platform/chrome/cros_ec_proto.c   | 19 +++---
>  drivers/platform/chrome/cros_ec_sysfs.c   | 22 +++-
>  drivers/platform/chrome/cros_ec_vbc.c | 24 -
>  .../platform/chrome/cros_kbd_led_backlight.c  | 19 +++---
>  9 files changed, 37 insertions(+), 169 deletions(-)
> 
> diff --git a/drivers/platform/chrome/cros_ec_debugfs.c 
> b/drivers/platform/chrome/cros_ec_debugfs.c
> index c62ee8e610a0..67c62934368a 100644
> --- a/drivers/platform/chrome/cros_ec_debugfs.c
> +++ b/drivers/platform/chrome/cros_ec_debugfs.c
> @@ -1,21 +1,7 @@
> -/*
> - * cros_ec_debugfs - debug logs for Chrome OS EC
> - *
> - * Copyright 2015 Google, Inc.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program. If not, see .
> - */
> +// SPDX-License-Identifier: GPL-2.0+
> +// Debug logs for ChromeOS EC.
> +//
> +// Copyright (C) 2012 Google, Inc.

Copy paste error, s/2012/2015/ ?

Cheers,
Emilio


Re: [PATCH V4] PCI: move early dump functionality from x86 arch into the common code

2018-06-05 Thread okaya

On 2018-06-05 05:12, Andy Shevchenko wrote:
On Tue, Jun 5, 2018 at 5:16 AM, Sinan Kaya  
wrote:
Move early dump functionality into common code so that it is available 
for
all archtiectures. No need to carry arch specific reads around as the 
read
hooks are already initialized by the time pci_setup_device() is 
getting

called during scan.



Makes sense.

Reviewed-by: Andy Shevchenko 

One style comment below, though.

If you wait a bit, I perhaps would be able to test on x86.


Sure, no rush. This is a nice to have feature with no urgency.




Signed-off-by: Sinan Kaya 
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/x86/include/asm/pci-direct.h   |  4 ---
 arch/x86/kernel/setup.c |  5 ---
 arch/x86/pci/common.c   |  4 ---
 arch/x86/pci/early.c| 44 
-

 drivers/pci/pci.c   |  5 +++
 drivers/pci/pci.h   |  1 +
 drivers/pci/probe.c | 19 +++
 8 files changed, 26 insertions(+), 58 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt

index e490902..e64f1d8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2995,7 +2995,7 @@
See also Documentation/blockdev/paride.txt.

pci=option[,option...]  [PCI] various PCI subsystem options:
-   earlydump   [X86] dump PCI config space before the 
kernel
+   earlydump   dump PCI config space before the 
kernel

changes anything
off [X86] don't probe for the PCI bus
bios[X86-32] force use of PCI BIOS, don't 
access
diff --git a/arch/x86/include/asm/pci-direct.h 
b/arch/x86/include/asm/pci-direct.h

index e1084f7..94597a3 100644
--- a/arch/x86/include/asm/pci-direct.h
+++ b/arch/x86/include/asm/pci-direct.h
@@ -15,8 +15,4 @@ extern void write_pci_config_byte(u8 bus, u8 slot, 
u8 func, u8 offset, u8 val);
 extern void write_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset, 
u16 val);


 extern int early_pci_allowed(void);
-
-extern unsigned int pci_early_dump_regs;
-extern void early_dump_pci_device(u8 bus, u8 slot, u8 func);
-extern void early_dump_pci_devices(void);
 #endif /* _ASM_X86_PCI_DIRECT_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2f86d88..480f250 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -991,11 +991,6 @@ void __init setup_arch(char **cmdline_p)
setup_clear_cpu_cap(X86_FEATURE_APIC);
}

-#ifdef CONFIG_PCI
-   if (pci_early_dump_regs)
-   early_dump_pci_devices();
-#endif
-
e820__reserve_setup_data();
e820__finish_early_params();

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 563049c..d4ec117 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -22,7 +22,6 @@
 unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | 
PCI_PROBE_CONF2 |

PCI_PROBE_MMCONF;

-unsigned int pci_early_dump_regs;
 static int pci_bf_sort;
 int pci_routeirq;
 int noioapicquirk;
@@ -599,9 +598,6 @@ char *__init pcibios_setup(char *str)
pci_probe |= PCI_BIG_ROOT_WINDOW;
return NULL;
 #endif
-   } else if (!strcmp(str, "earlydump")) {
-   pci_early_dump_regs = 1;
-   return NULL;
} else if (!strcmp(str, "routeirq")) {
pci_routeirq = 1;
return NULL;
diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
index e5f753c..f5fc953 100644
--- a/arch/x86/pci/early.c
+++ b/arch/x86/pci/early.c
@@ -57,47 +57,3 @@ int early_pci_allowed(void)
PCI_PROBE_CONF1;
 }

-void early_dump_pci_device(u8 bus, u8 slot, u8 func)
-{
-   u32 value[256 / 4];
-   int i;
-
-   pr_info("pci :%02x:%02x.%d config space:\n", bus, slot, 
func);

-
-   for (i = 0; i < 256; i += 4)
-   value[i / 4] = read_pci_config(bus, slot, func, i);
-
-   print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 16, 1, 
value, 256, false);

-}
-
-void early_dump_pci_devices(void)
-{
-   unsigned bus, slot, func;
-
-   if (!early_pci_allowed())
-   return;
-
-   for (bus = 0; bus < 256; bus++) {
-   for (slot = 0; slot < 32; slot++) {
-   for (func = 0; func < 8; func++) {
-   u32 class;
-   u8 type;
-
-   class = read_pci_config(bus, slot, 
func,
-   
PCI_CLASS_REVISION);

-   if (class == 0x)
-   continue;
-
-   

Re: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-06-05 Thread Peter Geis




On 06/05/2018 05:28 AM, Thierry Reding wrote:

On Mon, Jun 04, 2018 at 06:35:40PM +0300, Aapo Vienamo wrote:

The sdhci get_max_clock callback is set to sdhci_pltfm_clk_get_max_clock
and tegra_sdhci_get_max_clock is removed. It appears that the
shdci-tegra specific callback was originally introduced due to the
requirement that the host clock has to be twice the bus clock on DDR50
mode. As far as I can tell the only effect the removal has on DDR50 mode
is in cases where the parent clock is unable to supply the requested
clock rate, causing the DDR50 mode to run at a lower frequency.
Currently the DDR50 mode isn't enabled on any of the SoCs and would also
require configuring the SDHCI clock divider register to function
properly.

The problem with tegra_sdhci_get_max_clock is that it divides the clock
rate by two and thus artificially limits the maximum frequency of faster
signaling modes which don't have the host-bus frequency ratio requirement
of DDR50 such as SDR104 and HS200. Furthermore, the call to
clk_round_rate() may return an error which isn't handled by
tegra_sdhci_get_max_clock.

Signed-off-by: Aapo Vienamo 
---
  drivers/mmc/host/sdhci-tegra.c | 15 ++-
  1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
index 970d38f6..c8745b5 100644
--- a/drivers/mmc/host/sdhci-tegra.c
+++ b/drivers/mmc/host/sdhci-tegra.c
@@ -234,17 +234,6 @@ static void tegra_sdhci_set_uhs_signaling(struct 
sdhci_host *host,
sdhci_set_uhs_signaling(host, timing);
  }
  
-static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host *host)

-{
-   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
-
-   /*
-* DDR modes require the host to run at double the card frequency, so
-* the maximum rate we can support is half of the module input clock.
-*/
-   return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
-}


sdhci_pltfm_clk_get_max_clock() returns the current frequency of the
clock, which may not be an accurate maximum.

Also, even if we don't support DDR modes now, we may want to enable them
in the future, at which point we'll need to move to something similar to
the above again, albeit maybe with some of the issues that you mentioned
fixed.

I wonder if we have access to the target mode in this function, because
it seems to me like we'd need to take that into account when determining
the maximum clock rate. Or perhaps the double-rate aspect is already
dealt with in other parts of the MMC subsystem, so the value we should
return here may not even need to take the mode into account.

All of the above said, it is true that we don't enable DDR modes as of
now, and this patch seems like it shouldn't break anything either, so:

Acked-by: Thierry Reding 

I also gave this a brief run on Jetson TK1 and things seem to work fine,
so:

Tested-by: Thierry Reding 

I am currently testing this in my Ouya project, to see if it makes a 
difference in my eMMC stability above 30Mhz.

As a drop in replacement it works.
I'll be cranking up the speed later.


Re: [PATCH] ksys_mount: check for permissions before resource allocation

2018-06-05 Thread Ilya Matveychikov



> On Jun 5, 2018, at 3:53 PM, Al Viro  wrote:
> 
> On Tue, Jun 05, 2018 at 03:35:55PM +0400, Ilya Matveychikov wrote:
>> 
>>> On Jun 5, 2018, at 3:26 PM, Al Viro  wrote:
 
> On Jun 5, 2018, at 6:00 AM, Ilya Matveychikov  
> wrote:
> 
> Early check for mount permissions prevents possible allocation of 3
> pages from kmalloc() pool by unpriveledged user which can be used for
> spraying the kernel heap.
>>> 
>>> I'm sorry, but there are arseloads of unpriveleged syscalls that do the 
>>> same,
>>> starting with read() from procfs files.  So what the hell does it buy?
>> 
>> Means that if all do the same shit no reason to fix it? Sounds weird...
> 
> Fix *what*?  You do realize that there's no permission checks to stop e.g.
> stat(2) from copying the pathname in, right?  With user-supplied contents,
> even...
> 
> If you depend upon preventing kmalloc'ed temporary allocations filled
> with user-supplied data, you are screwed, plain and simple.  It really can't
> be prevented, in a lot of ways that are much less exotic than mount(2).
> Most of syscall arguments are copied in, before we get any permission
> checks.  It does happen and it will happen - examining them while they are
> still in userland is a nightmare in a lot of respects, starting with
> security.

I agree that it’s impossible to completely avoid this kind of allocations
and examining data in user-land will be the bigger problem than copying
arguments to the kernel. But aside of that what’s wrong with the idea of
having the permission check before doing any kind of work?

BTW, sys_umount() has this check in the right place - before doing anything.
So, why not to have the same logic for mount/umount?



Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

2018-06-05 Thread Petr Mladek
On Thu 2018-05-17 16:39:03, Petr Mladek wrote:
> The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when
> logbuf_lock is available") tried to detect when logbuf_lock was taken
> on another CPU. Then it looked safe to wait for the lock even in NMI.
> 
> It would be safe if other locks were not involved. Ironically the same
> commit introduced an ABBA deadlock scenario. It added a spin lock into
> nmi_cpu_backtrace() to serialize logs from different CPUs. The effect
> is that also the NMI handlers are serialized. As a result, logbuf_lock
> might be blocked by NMI on another CPU:
> 
> CPU0  CPU1CPU2
> 
> printk()
>   vprintk_emit()
> spin_lock(_lock)
> 
>   trigger_all_cpu_backtrace()
> raise()
> 
>   nmi_enter()
> printk_nmi_enter()
>   if (this_cpu_read(printk_context)
> & PRINTK_SAFE_CONTEXT_MASK)
> // false
>   else
> // looks safe to use printk_deferred()
> this_cpu_or(printk_context,
>   PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> 
> nmi_cpu_backtrace()
>   arch_spin_lock();
> show_regs()
> 
> nmi_enter()
>   nmi_cpu_backtrace()
> arch_spin_lock();
> 
> printk()
>   vprintk_func()
> vprintk_deferred()
>   vprintk_emit()
> spin_lock(_lock)
> 
> DEADLOCK: between _lock from vprintk_emit() and
>  from nmi_cpu_backtrace().
> 
> CPU0  CPU1
> lock(logbuf_lock) lock(lock)
>   lock(lock)lock(logbuf_lock)
> 
> I have found this problem when stress testing trigger_all_cpu_backtrace()
> and the system frozen.
> 
> Note that lockdep is not able to detect these dependencies because
> there is no support for NMI context. Let's stay on the safe side
> and always use printk_safe buffers when logbuf_lock is taken
> when entering NMI.
> 
> Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when 
> logbuf_lock is available")
> Cc: 4.13+  # v4.13+
> Signed-off-by: Petr Mladek 
> ---
>  kernel/printk/printk_safe.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index 449d67edfa4b..a2ebd749c053 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -310,15 +310,12 @@ void printk_nmi_enter(void)
>  {
>   /*
>* The size of the extra per-CPU buffer is limited. Use it only when
> -  * the main one is locked. If this CPU is not in the safe context,
> -  * the lock must be taken on another CPU and we could wait for it.
> +  * the main one is locked.
>*/
> - if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) &&
> - raw_spin_is_locked(_lock)) {
> + if (raw_spin_is_locked(_lock))
>   this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);

Grr, the ABBA deadlock is still there. NMIs are not sent to the other
CPUs atomically. Even if we detect that logbuf_lock is available
in printk_nmi_enter() on some CPUs, it might still get locked on
another CPU before the other CPU gets NMI.

By other words, any check in printk_safe_enter() is racy and not
sufficient

  => I suggest to revert the commit 719f6a7040f1bdaf96fcc70
 "printk: Use the main logbuf in NMI when logbuf_lock is available"
 for-4.18 and stable until we get a better solution.

The only safe solution seems to be a trylock() in NMI in
vprintk_emit() and fallback to vprintk_safe() when the lock
is not taken. I mean something like:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 247808333ba4..4a5a0bf221b3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level,
printk_delay();
 
/* This stops the holder of console_sem just where we want him */
-   logbuf_lock_irqsave(flags);
+   printk_safe_enter_irqsave(flags);
+   if (in_nmi() && !raw_spin_trylock(_lock)) {
+   vprintk_nmi(fmt, args);
+   printk_safe_exit_irqrestore(flags);
+   return;
+   } else
+   raw_spin_lock(_lock);
/*
 * The printf needs to come first; we need the syslog
 * prefix which might be passed-in as a parameter.


Sigh, this looks like a material for-4.19. We might need to
revisit if printk_context still makes sense, ...

Best Regards,
Petr

PS: I realized this when writing the pull request for-4.18.
I removed this patch from the pull request.


Re: [PATCH v2 14/21] sched/debug: use match_string() helper

2018-06-05 Thread Andy Shevchenko
On Thu, May 31, 2018 at 2:11 PM, Yisheng Xie  wrote:
> match_string() returns the index of an array for a matching string,
> which can be used instead of open coded variant.
>

FWIW,
Reviewed-by: Andy Shevchenko 


> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Signed-off-by: Yisheng Xie 
> ---
> v2:
>  - rename i to ret to show the change in returned value meaning - per Andy
>
>  kernel/sched/debug.c | 31 +++
>  1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 15b10e2..5591147 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -111,20 +111,19 @@ static int sched_feat_set(char *cmp)
> cmp += 3;
> }
>
> -   for (i = 0; i < __SCHED_FEAT_NR; i++) {
> -   if (strcmp(cmp, sched_feat_names[i]) == 0) {
> -   if (neg) {
> -   sysctl_sched_features &= ~(1UL << i);
> -   sched_feat_disable(i);
> -   } else {
> -   sysctl_sched_features |= (1UL << i);
> -   sched_feat_enable(i);
> -   }
> -   break;
> -   }
> +   i = match_string(sched_feat_names, __SCHED_FEAT_NR, cmp);
> +   if (i < 0)
> +   return i;
> +
> +   if (neg) {
> +   sysctl_sched_features &= ~(1UL << i);
> +   sched_feat_disable(i);
> +   } else {
> +   sysctl_sched_features |= (1UL << i);
> +   sched_feat_enable(i);
> }
>
> -   return i;
> +   return 0;
>  }
>
>  static ssize_t
> @@ -133,7 +132,7 @@ static int sched_feat_set(char *cmp)
>  {
> char buf[64];
> char *cmp;
> -   int i;
> +   int ret;
> struct inode *inode;
>
> if (cnt > 63)
> @@ -148,10 +147,10 @@ static int sched_feat_set(char *cmp)
> /* Ensure the static_key remains in a consistent state */
> inode = file_inode(filp);
> inode_lock(inode);
> -   i = sched_feat_set(cmp);
> +   ret = sched_feat_set(cmp);
> inode_unlock(inode);
> -   if (i == __SCHED_FEAT_NR)
> -   return -EINVAL;
> +   if (ret < 0)
> +   return ret;
>
> *ppos += cnt;
>
> --
> 1.7.12.4
>



-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH 0/7] atomics: generate atomic headers

2018-06-05 Thread Peter Zijlstra
On Tue, May 29, 2018 at 07:07:39PM +0100, Mark Rutland wrote:
> Longer-term, I think things could be simplified if we were to rework the
> headers such that we have:
> 
> * arch/*/include/asm/atomic.h providing arch_atomic_*().
> 
> * include/linux/atomic-raw.h building raw_atomic_*() atop of the
>   arch_atomic_*() definitions, filling in gaps in the API. Having
>   separate arch_ and raw_ namespaces would simplify the ifdeffery.
> 
> * include/linux/atomic.h building atomic_*() atop of the raw_atomic_*()
>   definitions, complete with any instrumentation. Instrumenting at this
>   level would lower the instrumentation overhead, and would not require
>   any ifdeffery as the whole raw_atomic_*() API would be available.
> 
> ... I've avoided this for the time being due to the necessary churn in
> arch code.

I'm not entirely sure I get the point of raw_atomic, we only need to
instrument the arch_atomic bits, right? When those are done, everything
that's build on top will also automagically be instrumented.



Re: [PATCH v9 2/7] i2c: Add FSI-attached I2C master algorithm

2018-06-05 Thread Eddie James




On 06/04/2018 06:38 PM, Benjamin Herrenschmidt wrote:

On Mon, 2018-06-04 at 22:21 +0300, Andy Shevchenko wrote:

+#define I2C_INT_ENABLE 0xff80
+#define I2C_INT_ERR0xfcc0

Now it looks like a flags combinations.
For me as for reader would be better to see quickly a decoded line.

My proposal is to introduce something like following

_INT_ALL  GENMASK()
_INT_ENABLE (_INT_ALL & ~(_FOO | _BAR))
_INT_ERR ... similar way as above ...

What do you think?

I don't think this absolutely needs to change but yes, open coding is
error prone. However I would think it more readable to use positive
logic and just list all the bits that are *set* even if it's a bit more
text:

#define I2C_INT_ERR (I2C_INT_INV_CMD|\  

  I2C_INT_PARITY |\
 I2C_INT_BE_OVERRUN |\
.../...)

#define I2C_INT_ENABLE  (I2C_INT_ERR|\
 I2C_INT_DAT_REQ|\
 I2C_INT_CMD_COMP)

Note: Eddie, I notice I2C_INT_BUSY is in "ERR" but not in "ENABLE", any
reason for that ?


Yes, we don't want to enable an interrupt if I2C gets into the busy 
state, as that happens during every transfer.  However it would likely 
be an error condition if we get that when the transfer is supposed to be 
complete. These were from the legacy driver... I just realized that 
neither are actually being used in this driver, so I will drop them.


Thanks,
Eddie



Cheers,
Ben.





Re: [PATCH v5 00/10] track CPU utilization

2018-06-05 Thread Quentin Perret
On Tuesday 05 Jun 2018 at 15:18:38 (+0200), Vincent Guittot wrote:
> On 5 June 2018 at 15:12, Quentin Perret  wrote:
> > On Tuesday 05 Jun 2018 at 13:59:56 (+0200), Vincent Guittot wrote:
> >> On 5 June 2018 at 12:57, Quentin Perret  wrote:
> >> > Hi Vincent,
> >> >
> >> > On Tuesday 05 Jun 2018 at 10:36:26 (+0200), Vincent Guittot wrote:
> >> >> Hi Quentin,
> >> >>
> >> >> On 25 May 2018 at 15:12, Vincent Guittot  
> >> >> wrote:
> >> >> > This patchset initially tracked only the utilization of RT rq. During
> >> >> > OSPM summit, it has been discussed the opportunity to extend it in 
> >> >> > order
> >> >> > to get an estimate of the utilization of the CPU.
> >> >> >
> >> >> > - Patches 1-3 correspond to the content of patchset v4 and add 
> >> >> > utilization
> >> >> >   tracking for rt_rq.
> >> >> >
> >> >> > When both cfs and rt tasks compete to run on a CPU, we can see some 
> >> >> > frequency
> >> >> > drops with schedutil governor. In such case, the cfs_rq's utilization 
> >> >> > doesn't
> >> >> > reflect anymore the utilization of cfs tasks but only the remaining 
> >> >> > part that
> >> >> > is not used by rt tasks. We should monitor the stolen utilization and 
> >> >> > take
> >> >> > it into account when selecting OPP. This patchset doesn't change the 
> >> >> > OPP
> >> >> > selection policy for RT tasks but only for CFS tasks
> >> >> >
> >> >> > A rt-app use case which creates an always running cfs thread and a rt 
> >> >> > threads
> >> >> > that wakes up periodically with both threads pinned on same CPU, show 
> >> >> > lot of
> >> >> > frequency switches of the CPU whereas the CPU never goes idles during 
> >> >> > the
> >> >> > test. I can share the json file that I used for the test if someone is
> >> >> > interested in.
> >> >> >
> >> >> > For a 15 seconds long test on a hikey 6220 (octo core cortex A53 
> >> >> > platfrom),
> >> >> > the cpufreq statistics outputs (stats are reset just before the test) 
> >> >> > :
> >> >> > $ cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
> >> >> > without patchset : 1230
> >> >> > with patchset : 14
> >> >>
> >> >> I have attached the rt-app json file that I use for this test
> >> >
> >> > Thank you very much ! I did a quick test with a much simpler fix to this
> >> > RT-steals-time-from-CFS issue using just the existing 
> >> > scale_rt_capacity().
> >> > I get the following results on Hikey960:
> >> >
> >> > Without patch:
> >> >cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
> >> >12
> >> >cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans
> >> >640
> >> > With patch
> >> >cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
> >> >8
> >> >cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans
> >> >12
> >> >
> >> > Yes the rt_avg stuff is out of sync with the PELT signal, but do you 
> >> > think
> >> > this is an actual issue for realistic use-cases ?
> >>
> >> yes I think that it's worth syncing and consolidating things on the
> >> same metric. The result will be saner and more robust as we will have
> >> the same behavior
> >
> > TBH I'm not disagreeing with that, the PELT-everywhere approach feels
> > cleaner in a way, but do you have a use-case in mind where this will
> > definitely help ?
> >
> > I mean, yes the rt_avg is a slow response to the RT pressure, but is
> > this always a problem ? Ramping down slower might actually help in some
> > cases no ?
> 
> I would say no because when one will decrease the other one will not
> increase at the same pace and we will have some wrong behavior or
> decision

I think I get your point. Yes, sometimes, the slow-moving rt_avg can be
off a little bit (which can be good or bad, depending in the case) if your
RT task runs a lot with very changing behaviour. And again, I'm not
fundamentally against the idea of having extra complexity for RT/IRQ PELT
signals _if_ we have a use-case. But is there a real use-case where we
really need all of that ? That's a true question, I honestly don't have
the answer :-)

> 
> >
> >>
> >> >
> >> > What about the diff below (just a quick hack to show the idea) applied
> >> > on tip/sched/core ?
> >> >
> >> > ---8<---
> >> > diff --git a/kernel/sched/cpufreq_schedutil.c 
> >> > b/kernel/sched/cpufreq_schedutil.c
> >> > index a8ba6d1f262a..23a4fb1c2c25 100644
> >> > --- a/kernel/sched/cpufreq_schedutil.c
> >> > +++ b/kernel/sched/cpufreq_schedutil.c
> >> > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> >> > sg_cpu->util_dl  = cpu_util_dl(rq);
> >> >  }
> >> >
> >> > +unsigned long scale_rt_capacity(int cpu);
> >> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >> >  {
> >> > struct rq *rq = cpu_rq(sg_cpu->cpu);
> >> > +   int cpu = sg_cpu->cpu;
> >> > +   unsigned long util, dl_bw;
> >> >
> >> > if (rq->rt.rt_nr_running)
> >> > return sg_cpu->max;
> >> > @@ -197,7 +200,14 @@ 

Re: [PATCH 0/7] atomics: generate atomic headers

2018-06-05 Thread Mark Rutland
On Tue, Jun 05, 2018 at 03:29:49PM +0200, Peter Zijlstra wrote:
> On Tue, May 29, 2018 at 07:07:39PM +0100, Mark Rutland wrote:
> > Longer-term, I think things could be simplified if we were to rework the
> > headers such that we have:
> > 
> > * arch/*/include/asm/atomic.h providing arch_atomic_*().
> > 
> > * include/linux/atomic-raw.h building raw_atomic_*() atop of the
> >   arch_atomic_*() definitions, filling in gaps in the API. Having
> >   separate arch_ and raw_ namespaces would simplify the ifdeffery.
> > 
> > * include/linux/atomic.h building atomic_*() atop of the raw_atomic_*()
> >   definitions, complete with any instrumentation. Instrumenting at this
> >   level would lower the instrumentation overhead, and would not require
> >   any ifdeffery as the whole raw_atomic_*() API would be available.
> > 
> > ... I've avoided this for the time being due to the necessary churn in
> > arch code.
> 
> I'm not entirely sure I get the point of raw_atomic, we only need to
> instrument the arch_atomic bits, right?

Well, we only *need* to instrument the top-level atomic. KASAN (and
KTSAN/KMSAN) only care that we're touching a memory location, and not
how many times we happen to touch it.

e.g. when we have fallbacks we might have:

static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int c = atomic_read(v);

do {
if (unlikely(c == u))
break;
} while (!atomic_try_cmpxchg(v, , c + a));

return c;
}

... where:

* atomic_read() is a simple wrapper around arch_atomic_read(), with
  instrumentation.

* atomic_try_cmpxchg() might be a simple wrapper around
  arch_atomic_try_cmpxchg, or a wrapper around atomic_cmpxchg(), which
  calls arch_atomic_cmpxchg(). Either way, one of the two is
  instrumented.

... so each call to atomic_fetch_add_unless() calls the instrumentation
at least once for the read, and at least once per retry. Whereas if
implemented in arch code, it only calls the instrumentation once.

> When those are done, everything that's build on top will also
> automagically be instrumented.

Sure, it all works, it's just less than optimal as above, and also means
that we have to duplicate the ifdeffery for optional atomics -- once in
the instrumented atomics, then in the "real" atomics.

Whereas if we filled in the raw atomics atop of the arch atomics,
everything above that can assume the whole API is present, no ifdeffery
required.

Thanks,
Mark.


Re: [RFC PATCH 5/6] arm64: dts: ti: Add Support for AM654 SoC

2018-06-05 Thread Rob Herring
On Tue, Jun 5, 2018 at 1:05 AM, Nishanth Menon  wrote:
> The AM654 SoC is a lead device of the K3 Multicore SoC architecture
> platform, targeted for broad market and industrial control with aim to
> meet the complex processing needs of modern embedded products.
>
> Some highlights of this SoC are:
> * Quad ARMv8 A53 cores split over two clusters
> * GICv3 compliant GIC500
> * Configurable L3 Cache and IO-coherent architecture
> * Dual lock-step capable R5F uC for safety-critical applications
> * High data throughput capable distributed DMA architecture under NAVSS
> * Three Gigabit Industrial Communication Subsystems (ICSSG), each with dual
>   PRUs and dual RTUs
> * Hardware accelerator block containing AES/DES/SHA/MD5 called SA2UL
> * Centralized System Controller for Security, Power, and Resource
>   management.
> * Dual ADCSS, eQEP/eCAP, eHRPWM, dual CAN-FD
> * Flash subystem with OSPI and Hyperbus interfaces
> * Multimedia capability with CAL, DSS7-UL, SGX544, McASP
> * Peripheral connectivity including USB3, PCIE, MMC/SD, GPMC, I2C, SPI,
>   GPIO
>
> See AM65x Technical Reference Manual (SPRUID7, April 2018)
> for further details: http://www.ti.com/lit/pdf/spruid7
>
> We introduce the Kconfig symbol for the SoC along with this patch since
> it is logically relevant point, however the usage is in subsequent
> patches.
>
> NOTE: AM654 is the first of the device variants, hence we introduce a
> generic am6.dtsi.
>
> Signed-off-by: Benjamin Fair 
> Signed-off-by: Nishanth Menon 
> ---
>  MAINTAINERS  |   1 +
>  arch/arm64/boot/dts/ti/k3-am6.dtsi   | 144 
> +++
>  arch/arm64/boot/dts/ti/k3-am654.dtsi | 117 
>  drivers/soc/ti/Kconfig   |  14 
>  4 files changed, 276 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/ti/k3-am6.dtsi
>  create mode 100644 arch/arm64/boot/dts/ti/k3-am654.dtsi
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cfb35b252ac7..5f5c4eddec7a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2092,6 +2092,7 @@ M:Nishanth Menon 
>  L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
>  S: Supported
>  F: Documentation/devicetree/bindings/arm/ti/k3.txt
> +F: arch/arm64/boot/dts/ti/k3-*
>
>  ARM/TEXAS INSTRUMENT KEYSTONE ARCHITECTURE
>  M: Santosh Shilimkar 
> diff --git a/arch/arm64/boot/dts/ti/k3-am6.dtsi 
> b/arch/arm64/boot/dts/ti/k3-am6.dtsi
> new file mode 100644
> index ..cdfa12173aac
> --- /dev/null
> +++ b/arch/arm64/boot/dts/ti/k3-am6.dtsi
> @@ -0,0 +1,144 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Device Tree Source for AM6 SoC Family
> + *
> + * Copyright (C) 2016-2018 Texas Instruments Incorporated - 
> http://www.ti.com/
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/ {
> +   model = "Texas Instruments K3 AM654 SoC";
> +   compatible = "ti,am654";
> +   interrupt-parent = <>;
> +   #address-cells = <2>;
> +   #size-cells = <2>;
> +
> +   aliases {
> +   serial0 = _uart0;
> +   serial1 = _uart0;
> +   serial2 = _uart0;
> +   serial3 = _uart1;
> +   serial4 = _uart2;
> +   };
> +
> +   chosen { };
> +
> +   firmware {
> +   optee {
> +   compatible = "linaro,optee-tz";
> +   method = "smc";
> +   };
> +
> +   psci: psci {
> +   compatible = "arm,psci-1.0";
> +   method = "smc";
> +   };
> +   };
> +
> +   soc0: soc0 {
> +   compatible = "simple-bus";
> +   #address-cells = <2>;
> +   #size-cells = <2>;
> +   ranges;

Really need 64-bit addresses and sizes? Use ranges to limit the
address space if possible.

> +
> +   a53_timer0: timer-cl0-cpu0 {
> +   compatible = "arm,armv8-timer";
> +   interrupts = , /* 
> cntpsirq */
> +, /* 
> cntpnsirq */
> +, /* 
> cntvirq */
> +; /* 
> cnthpirq */
> +   };
> +
> +   pmu: pmu {
> +   compatible = "arm,armv8-pmuv3";
> +   /* Recommendation from GIC500 TRM Table A.3 */
> +   interrupts = ;
> +   };

These 2 nodes aren't on the bus, so move them up a level.

> +
> +   gic: interrupt-controller@180 {
> +   compatible = "arm,gic-v3";

gic-500?

> +   #address-cells = <2>;
> +   #size-cells = <2>;
> +   ranges;
> +   #interrupt-cells = <3>;
> +   interrupt-controller;
> +   /*
> +* NOTE: we are NOT gicv2 backward compat, so no GICC,
> +

Re: [PATCH v5 00/10] track CPU utilization

2018-06-05 Thread Peter Zijlstra
On Mon, Jun 04, 2018 at 08:08:58PM +0200, Vincent Guittot wrote:
> On 4 June 2018 at 18:50, Peter Zijlstra  wrote:

> > So this patch-set tracks the !cfs occupation using the same function,
> > which is all good. But what, if instead of using that to compensate the
> > OPP selection, we employ that to renormalize the util signal?
> >
> > If we normalize util against the dynamic (rt_avg affected) cpu_capacity,
> > then I think your initial problem goes away. Because while the RT task
> > will push the util to .5, it will at the same time push the CPU capacity
> > to .5, and renormalized that gives 1.
> >
> >   NOTE: the renorm would then become something like:
> > scale_cpu = arch_scale_cpu_capacity() / rt_frac();

Should probably be:

scale_cpu = atch_scale_cpu_capacity() / (1 - rt_frac())

> >
> >
> > On IRC I mentioned stopping the CFS clock when preempted, and while that
> > would result in fixed numbers, Vincent was right in pointing out the
> > numbers will be difficult to interpret, since the meaning will be purely
> > CPU local and I'm not sure you can actually fix it again with
> > normalization.
> >
> > Imagine, running a .3 RT task, that would push the (always running) CFS
> > down to .7, but because we discard all !cfs time, it actually has 1. If
> > we try and normalize that we'll end up with ~1.43, which is of course
> > completely broken.
> >
> >
> > _However_, all that happens for util, also happens for load. So the above
> > scenario will also make the CPU appear less loaded than it actually is.
> 
> The load will continue to increase because we track runnable state and
> not running for the load

Duh yes. So renormalizing it once, like proposed for util would actually
do the right thing there too.  Would not that allow us to get rid of
much of the capacity magic in the load balance code?

/me thinks more..

Bah, no.. because you don't want this dynamic renormalization part of
the sums. So you want to keep it after the fact. :/

> As you mentioned, scale_rt_capacity give the remaining capacity for
> cfs and it will behave like cfs util_avg now that it uses PELT. So as
> long as cfs util_avg <  scale_rt_capacity(we probably need a margin)
> we keep using dl bandwidth + cfs util_avg + rt util_avg for selecting
> OPP because we have remaining spare capacity but if  cfs util_avg ==
> scale_rt_capacity, we make sure to use max OPP.

Good point, when cfs-util < cfs-cap then there is idle time and the util
number is 'right', when cfs-util == cfs-cap we're overcommitted and
should go max.

Since the util and cap values are aligned that should track nicely.


[PATCH] rtc: mrst: switch to devm functions

2018-06-05 Thread Alexandre Belloni
Switch to devm managed functions to simplify error handling and device
removal

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-mrst.c | 45 +-
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/drivers/rtc/rtc-mrst.c b/drivers/rtc/rtc-mrst.c
index fcb9de5218b2..097a4d4e2aba 100644
--- a/drivers/rtc/rtc-mrst.c
+++ b/drivers/rtc/rtc-mrst.c
@@ -45,7 +45,6 @@ struct mrst_rtc {
struct rtc_device   *rtc;
struct device   *dev;
int irq;
-   struct resource *iomem;
 
u8  enabled_wake;
u8  suspend_ctrl;
@@ -329,24 +328,22 @@ static int vrtc_mrst_do_probe(struct device *dev, struct 
resource *iomem,
if (!iomem)
return -ENODEV;
 
-   iomem = request_mem_region(iomem->start, resource_size(iomem),
-  driver_name);
+   iomem = devm_request_mem_region(dev, iomem->start, resource_size(iomem),
+   driver_name);
if (!iomem) {
dev_dbg(dev, "i/o mem already in use.\n");
return -EBUSY;
}
 
mrst_rtc.irq = rtc_irq;
-   mrst_rtc.iomem = iomem;
mrst_rtc.dev = dev;
dev_set_drvdata(dev, _rtc);
 
-   mrst_rtc.rtc = rtc_device_register(driver_name, dev,
-   _rtc_ops, THIS_MODULE);
-   if (IS_ERR(mrst_rtc.rtc)) {
-   retval = PTR_ERR(mrst_rtc.rtc);
-   goto cleanup0;
-   }
+   mrst_rtc.rtc = devm_rtc_allocate_device(dev);
+   if (IS_ERR(mrst_rtc.rtc))
+   return PTR_ERR(mrst_rtc.rtc);
+
+   mrst_rtc.rtc->ops = _rtc_ops;
 
rename_region(iomem, dev_name(_rtc.rtc->dev));
 
@@ -359,23 +356,27 @@ static int vrtc_mrst_do_probe(struct device *dev, struct 
resource *iomem,
dev_dbg(dev, "TODO: support more than 24-hr BCD mode\n");
 
if (rtc_irq) {
-   retval = request_irq(rtc_irq, mrst_rtc_irq,
-   0, dev_name(_rtc.rtc->dev),
-   mrst_rtc.rtc);
+   retval = devm_request_irq(dev, rtc_irq, mrst_rtc_irq,
+ 0, dev_name(_rtc.rtc->dev),
+ mrst_rtc.rtc);
if (retval < 0) {
dev_dbg(dev, "IRQ %d is already in use, err %d\n",
rtc_irq, retval);
-   goto cleanup1;
+   goto cleanup0;
}
}
+
+   retval = rtc_register_device(mrst_rtc.rtc);
+   if (retval) {
+   retval = PTR_ERR(mrst_rtc.rtc);
+   goto cleanup0;
+   }
+
dev_dbg(dev, "initialised\n");
return 0;
 
-cleanup1:
-   rtc_device_unregister(mrst_rtc.rtc);
 cleanup0:
mrst_rtc.dev = NULL;
-   release_mem_region(iomem->start, resource_size(iomem));
dev_err(dev, "rtc-mrst: unable to initialise\n");
return retval;
 }
@@ -390,20 +391,10 @@ static void rtc_mrst_do_shutdown(void)
 static void rtc_mrst_do_remove(struct device *dev)
 {
struct mrst_rtc *mrst = dev_get_drvdata(dev);
-   struct resource *iomem;
 
rtc_mrst_do_shutdown();
 
-   if (mrst->irq)
-   free_irq(mrst->irq, mrst->rtc);
-
-   rtc_device_unregister(mrst->rtc);
mrst->rtc = NULL;
-
-   iomem = mrst->iomem;
-   release_mem_region(iomem->start, resource_size(iomem));
-   mrst->iomem = NULL;
-
mrst->dev = NULL;
 }
 
-- 
2.17.1



Re: [PATCH] dm: Use kzalloc for all structs with embedded biosets/mempools

2018-06-05 Thread Jens Axboe
On 6/5/18 8:35 AM, David Sterba wrote:
> On Tue, Jun 05, 2018 at 08:22:22AM -0600, Jens Axboe wrote:
>>> I fucked up majorly on the bioset/mempool conversion - I forgot to check 
>>> that
>>> everything biosets/mempools were being embedded in was actually being 
>>> zeroed on
>>> allocation. Device mapper currently explodes, you'll probably want to apply 
>>> this
>>> patch post haste.
>>>
>>> I have now done that auditing, for every single conversion - this patch 
>>> fixes
>>> everything I found. There do not seem to be any incorrect ones outside of 
>>> device
>>> mapper...
>>>
>>> We'll probably want a second patch that either a) changes
>>> bioset_init()/mempool_init() to zero the passed in bioset/mempool first, or 
>>> b)
>>> my preference, WARN() or BUG() if they're passed memory that isn't zeroed.
>>
>> Odd, haven't seen a crash, but probably requires kasan or poisoning to
>> trigger anything? Mike's tree also had the changes, since they were based
>> on the block tree.
> 
> eg. fstests/generic/081 crashes (trace below), no KASAN, PAGE_POISONING=y,
> PAGE_POISONING_NO_SANITY=y.
> 
>> I can queue this up and ship it later today. Mike, you want to review
>> this one?
> 
> Would be great to push that soon. The fstests build on several DM targets, the
> crashes lead to many test failures. I'm going to test the kzalloc fix now.

For sure, it should go asap.

-- 
Jens Axboe



Re: [GIT PULL] fscrypt updates for 4.18

2018-06-05 Thread Richard Weinberger
Ted,

On Tue, Jun 5, 2018 at 5:07 PM, Theodore Y. Ts'o  wrote:
> The following changes since commit 75bc37fefc4471e718ba8e651aa74673d4e0a9eb:
>
>   Linux 4.17-rc4 (2018-05-06 16:57:38 -1000)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt.git 
> tags/fscrypt_for_linus
>
> for you to fetch changes up to 4f2f76f751433908364ccff82f437a57d0e6e9b7:
>
>   ext4: fix fencepost error in check for inode count overflow during resize 
> (2018-05-25 12:51:25 -0400)
>
> 
> Add bunch of cleanups, and add support for the Speck128/256
> algorithms.  Yes, Speck is contrversial, but the intention is to use
> them only for the lowest end Android devices, where the alternative
> *really* is no encryption at all for data stored at rest.

Will Android tell me that Speck is being used?

-- 
Thanks,
//richard


Re: include/linux/string.h:246:9: warning: '__builtin_strncpy' output truncated before terminating nul copying 4 bytes from a string of the same length

2018-06-05 Thread Josh Poimboeuf
On Tue, Jun 05, 2018 at 06:19:07PM +0300, Andy Shevchenko wrote:
> On Tue, May 29, 2018 at 4:35 AM, kbuild test robot  wrote:
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> > master
> > head:   786b71f5b754273ccef6d9462e52062b3e1f9877
> > commit: 854e55ad289efe7991f0ada85d5846f5afb9 objtool, perf: Fix GCC 8 
> > -Wrestrict error
> > date:   2 months ago
> > config: x86_64-randconfig-s4-05290856 (attached as .config)
> > compiler: gcc-8 (Debian 8.1.0-3) 8.1.0
> > reproduce:
> > git checkout 854e55ad289efe7991f0ada85d5846f5afb9
> > # save the attached .config to linux build tree
> > make ARCH=x86_64
> >
> 
> I guess it's easy to fix by
> 
> --- a/drivers/video/fbdev/uvesafb.c
> +++ b/drivers/video/fbdev/uvesafb.c
> @@ -422,7 +422,7 @@ static int uvesafb_vbe_getinfo(struct uvesafb_ktask *task,
>task->t.flags = TF_VBEIB;
>task->t.buf_len = sizeof(struct vbe_ib);
>task->buf = >vbe_ib;
> -   strncpy(par->vbe_ib.vbe_signature, "VBE2", 4);
> +   snprintf(par->vbe_ib.vbe_signature,
> sizeof(par->vbe_ib.vbe_signature), "VBE2");
> 
> The question is do we want this to just shut up a compiler? It's
> obviously false positive.

What about just changing it to a memcpy?  Seems like that would be
cleaner anyway, since the whole point of strncpy is to add the
terminating NULL, which isn't need here.

> 
> > All warnings (new ones prefixed by >>):
> >
> >In file included from include/linux/bitmap.h:9,
> > from include/linux/cpumask.h:12,
> > from arch/x86/include/asm/cpumask.h:5,
> > from arch/x86/include/asm/msr.h:11,
> > from arch/x86/include/asm/processor.h:21,
> > from arch/x86/include/asm/cpufeature.h:5,
> > from arch/x86/include/asm/thread_info.h:53,
> > from include/linux/thread_info.h:38,
> > from arch/x86/include/asm/preempt.h:7,
> > from include/linux/preempt.h:81,
> > from include/linux/spinlock.h:51,
> > from include/linux/seqlock.h:36,
> > from include/linux/time.h:6,
> > from include/linux/stat.h:19,
> > from include/linux/module.h:10,
> > from drivers/video/fbdev/uvesafb.c:12:
> >In function 'strncpy',
> >inlined from 'uvesafb_vbe_getinfo' at 
> > drivers/video/fbdev/uvesafb.c:425:2:
> >>> include/linux/string.h:246:9: warning: '__builtin_strncpy' output 
> >>> truncated before terminating nul copying 4 bytes from a string of the 
> >>> same length [-Wstringop-truncation]
> >  return __builtin_strncpy(p, q, size);
> > ^
> >
> > vim +/__builtin_strncpy +246 include/linux/string.h
> >
> > 6974f0c4 Daniel Micay 2017-07-12  237
> > 6974f0c4 Daniel Micay 2017-07-12  238  #if !defined(__NO_FORTIFY) && 
> > defined(__OPTIMIZE__) && defined(CONFIG_FORTIFY_SOURCE)
> > 6974f0c4 Daniel Micay 2017-07-12  239  __FORTIFY_INLINE char *strncpy(char 
> > *p, const char *q, __kernel_size_t size)
> > 6974f0c4 Daniel Micay 2017-07-12  240  {
> > 6974f0c4 Daniel Micay 2017-07-12  241   size_t p_size = 
> > __builtin_object_size(p, 0);
> > 6974f0c4 Daniel Micay 2017-07-12  242   if (__builtin_constant_p(size) && 
> > p_size < size)
> > 6974f0c4 Daniel Micay 2017-07-12  243   __write_overflow();
> > 6974f0c4 Daniel Micay 2017-07-12  244   if (p_size < size)
> > 6974f0c4 Daniel Micay 2017-07-12  245   fortify_panic(__func__);
> > 6974f0c4 Daniel Micay 2017-07-12 @246   return __builtin_strncpy(p, q, 
> > size);
> > 6974f0c4 Daniel Micay 2017-07-12  247  }
> > 6974f0c4 Daniel Micay 2017-07-12  248
> >
> > :: The code at line 246 was first introduced by commit
> > :: 6974f0c4555e285ab217cee58b6e874f776ff409 include/linux/string.h: add 
> > the option of fortified string.h functions
> >
> > :: TO: Daniel Micay 
> > :: CC: Linus Torvalds 
> >
> > ---
> > 0-DAY kernel test infrastructureOpen Source Technology 
> > Center
> > https://lists.01.org/pipermail/kbuild-all   Intel 
> > Corporation
> 
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

-- 
Josh


Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

2018-06-05 Thread Peter Zijlstra
On Tue, Jun 05, 2018 at 05:22:12PM +0200, Peter Zijlstra wrote:

> > OK, but __kthread_parkme() can be preempted before it calls schedule(), so 
> > the
> > caller still can be migrated? Plus kthread_park_complete() can be called 
> > twice.
> 
> Argh... I forgot TASK_DEAD does the whole thing with preempt_disable().
> Let me stare at that a bit.

This should ensure we only ever complete when we read PARKED, right?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8d59b259af4a..e513b4600796 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2641,7 +2641,7 @@ prepare_task_switch(struct rq *rq, struct task_struct 
*prev,
  * past. prev == current is still correct but we need to recalculate this_rq
  * because prev may have moved to another CPU.
  */
-static struct rq *finish_task_switch(struct task_struct *prev)
+static struct rq *finish_task_switch(struct task_struct *prev, bool preempt)
__releases(rq->lock)
 {
struct rq *rq = this_rq();
@@ -2674,7 +2674,7 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
 *
 * We must observe prev->state before clearing prev->on_cpu (in
 * finish_task), otherwise a concurrent wakeup can get prev
-* running on another CPU and we could rave with its RUNNING -> DEAD
+* running on another CPU and we could race with its RUNNING -> DEAD
 * transition, resulting in a double drop.
 */
prev_state = prev->state;
@@ -2720,7 +2720,8 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
break;
 
case TASK_PARKED:
-   kthread_park_complete(prev);
+   if (!preempt)
+   kthread_park_complete(prev);
break;
}
}
@@ -2784,7 +2785,7 @@ asmlinkage __visible void schedule_tail(struct 
task_struct *prev)
 * PREEMPT_COUNT kernels).
 */
 
-   rq = finish_task_switch(prev);
+   rq = finish_task_switch(prev, false);
balance_callback(rq);
preempt_enable();
 
@@ -2797,7 +2798,7 @@ asmlinkage __visible void schedule_tail(struct 
task_struct *prev)
  */
 static __always_inline struct rq *
 context_switch(struct rq *rq, struct task_struct *prev,
-  struct task_struct *next, struct rq_flags *rf)
+  struct task_struct *next, bool preempt, struct rq_flags *rf)
 {
struct mm_struct *mm, *oldmm;
 
@@ -2839,7 +2840,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
switch_to(prev, next, prev);
barrier();
 
-   return finish_task_switch(prev);
+   return finish_task_switch(prev, preempt);
 }
 
 /*
@@ -3478,7 +3479,7 @@ static void __sched notrace __schedule(bool preempt)
trace_sched_switch(preempt, prev, next);
 
/* Also unlocks the rq: */
-   rq = context_switch(rq, prev, next, );
+   rq = context_switch(rq, prev, next, preempt, );
} else {
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
rq_unlock_irq(rq, );
@@ -3487,6 +3488,7 @@ static void __sched notrace __schedule(bool preempt)
balance_callback(rq);
 }
 
+/* called with preemption disabled */
 void __noreturn do_task_dead(void)
 {
/* Causes final put_task_struct in finish_task_switch(): */


Re: [PATCH] Use an IDR to allocate apparmor secids

2018-06-05 Thread John Johansen
On 06/05/2018 04:47 AM, Matthew Wilcox wrote:
> On Mon, Jun 04, 2018 at 07:35:24PM -0700, John Johansen wrote:
>> On 06/04/2018 07:27 PM, Matthew Wilcox wrote:
>>> On Mon, Jun 04, 2018 at 06:27:09PM -0700, John Johansen wrote:
 hey Mathew,

 I've pulled this into apparmor-next and done the retuning of
 AA_SECID_INVALID a follow on patch. The reworking of the api to
 return the specific error type can wait for another cycle.
>>>
>>> Oh ... here's what I currently have.  I decided that AA_SECID_INVALID
>>> wasn't needed.
>>>
>> well not needed in the allocation path, but definitely needed and it
>> needs to be 0.
>>
>> This is for catching some uninitialized or freed and zeroed values.
>> The debug checks aren't in the current version, as they were
>> residing in another debug patch, but I will pull them out into their
>> own patch.
> 
> With the IDR, I don't know if you need it for debug.
> 
>   BUG_ON(label != idr_find(_secids, label->secid))
> 
> should do the trick.
> 

its not so much the idr as the network and audit structs where the
secid gets used and then security system is asked to convert from
the secid to the secctx.

We need an invalid secid for when conversion result in an error,
partly because the returned error is ignored in some paths (eg. scm)
so we also make sure to have an invalid value.

Having it be zero helps us catch bugs on structs that are zero
but we miss initializing, and also works well with apparmor always
zeroing structs on free.

The debug patch didn't get included yet because the networking
code that make use of the secids hasn't landed yet (they are being
used in the audit path) so the debug patch ends up throwing a
lot of warning for the networking paths.

The patch I am testing on top of your patch is below

---

commit d5de3b1d21687c16df0a75b6309ab8481629a841
Author: John Johansen 
Date:   Mon Jun 4 19:44:59 2018 -0700

apparmor: cleanup secid map convertion to using idr

The idr convertion didn't handle an error case for when allocating a
mapping fails.

In addition it did not ensure that mappings did not allocate or use a
0 value, which is used as an invalid secid because it allows debug
code to detect when objects have not been correctly initialized or
freed too early.

Signed-off-by: John Johansen 

diff --git a/security/apparmor/include/secid.h 
b/security/apparmor/include/secid.h
index 686de8e50a79..dee6fa3b6081 100644
--- a/security/apparmor/include/secid.h
+++ b/security/apparmor/include/secid.h
@@ -28,8 +28,10 @@ int apparmor_secctx_to_secid(const char *secdata, u32 
seclen, u32 *secid);
 void apparmor_release_secctx(char *secdata, u32 seclen);
 
 
-u32 aa_alloc_secid(struct aa_label *label, gfp_t gfp);
+int aa_alloc_secid(struct aa_label *label, gfp_t gfp);
 void aa_free_secid(u32 secid);
 void aa_secid_update(u32 secid, struct aa_label *label);
 
+void aa_secids_init(void);
+
 #endif /* __AA_SECID_H */
diff --git a/security/apparmor/label.c b/security/apparmor/label.c
index a17574df611b..ba11bdf9043a 100644
--- a/security/apparmor/label.c
+++ b/security/apparmor/label.c
@@ -407,8 +407,7 @@ bool aa_label_init(struct aa_label *label, int size, gfp_t 
gfp)
AA_BUG(!label);
AA_BUG(size < 1);
 
-   label->secid = aa_alloc_secid(label, gfp);
-   if (label->secid == AA_SECID_INVALID)
+   if (aa_alloc_secid(label, gfp) < 0)
return false;
 
label->size = size; /* doesn't include null */
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index dab5409f2608..9ae7f9339513 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1546,6 +1546,8 @@ static int __init apparmor_init(void)
return 0;
}
 
+   aa_secids_init();
+
error = aa_setup_dfa_engine();
if (error) {
AA_ERROR("Unable to setup dfa engine\n");
diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
index 3ad94b2ffbb2..ad6221e1f25f 100644
--- a/security/apparmor/secid.c
+++ b/security/apparmor/secid.c
@@ -33,6 +33,8 @@
  * properly updating/freeing them
  */
 
+#define AA_FIRST_SECID 1
+
 static DEFINE_IDR(aa_secids);
 static DEFINE_SPINLOCK(secid_lock);
 
@@ -120,20 +122,30 @@ void apparmor_release_secctx(char *secdata, u32 seclen)
 
 /**
  * aa_alloc_secid - allocate a new secid for a profile
+ * @label: the label to allocate a secid for
+ * @gfp: memory allocation flags
+ *
+ * Returns: 0 with @label->secid initialized
+ *  <0 returns error with @label->secid set to AA_SECID_INVALID
  */
-u32 aa_alloc_secid(struct aa_label *label, gfp_t gfp)
+int aa_alloc_secid(struct aa_label *label, gfp_t gfp)
 {
unsigned long flags;
-   u32 secid;
+   int ret;
 
idr_preload(gfp);
spin_lock_irqsave(_lock, flags);
-   secid = idr_alloc(_secids, label, 0, 0, GFP_ATOMIC);
-   /* XXX: Can return -ENOMEM */
+   ret = idr_alloc(_secids, label, 1, 

Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

2018-06-05 Thread Linus Torvalds
On Tue, Jun 5, 2018 at 8:05 AM Ingo Molnar  wrote:
>
> Ok, fair point and agreed - if Alexey sends some measurements to back the 
> change
> I'll keep this, otherwise queue up a revert.

I don't think it needs to be reverted, it's not like it's likely to
hurt on any modern CPU's. The issues I talked about are fairly
historical - barely even 64-bit cpus - and I'm not sure an extra uop
to carry a constant around even matters in that code sequence.

It was more a generic issue - any micro-optimization should be based
on numbers (and there should be some numbers in the commit message),
not on "this should be faster". Because while intuitively immediates
_should_ be faster than registers, that's simply not always "obviously
true". It _may_ be true. But numbers talk.

  Linus


Re: [PATCH v5 2/4] mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Rob Herring
On Mon, Jun 04, 2018 at 04:18:30PM +0300, Matti Vaittinen wrote:
> Document devicetree bindings for ROHM BD71837 PMIC MFD.
> 
> Signed-off-by: Matti Vaittinen 
> ---
>  .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  | 76 
> ++
>  1 file changed, 76 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt

I've replied on the prior version discussion. Please don't send new 
versions if the last one is still under discussion.

Rob


Re: [PATCH] arm64: cpu_errata: include required headers

2018-06-05 Thread Catalin Marinas
On Tue, Jun 05, 2018 at 01:50:07PM +0200, Arnd Bergmann wrote:
> Without including psci.h and arm-smccc.h, we now get a build failure in
> some configurations:
> 
> arch/arm64/kernel/cpu_errata.c: In function 'arm64_update_smccc_conduit':
> arch/arm64/kernel/cpu_errata.c:278:10: error: 'psci_ops' undeclared (first 
> use in this function); did you mean 'sysfs_ops'?
> 
> arch/arm64/kernel/cpu_errata.c: In function 'arm64_set_ssbd_mitigation':
> arch/arm64/kernel/cpu_errata.c:311:3: error: implicit declaration of function 
> 'arm_smccc_1_1_hvc' [-Werror=implicit-function-declaration]
>arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL);
> 
> Signed-off-by: Arnd Bergmann 
> ---
> This showed up only recently, but I have not bisected what caused it.

I haven't hit it yet, not sure why. I'm queuing the patch anyway.
Thanks.

-- 
Catalin


Re: [PATCH V2 2/2] arm: dts: sunxi: Add missing cooling device properties for CPUs

2018-06-05 Thread Chen-Yu Tsai
On Tue, Jun 5, 2018 at 3:11 PM, Maxime Ripard  wrote:
> On Tue, Jun 05, 2018 at 10:17:49AM +0530, Viresh Kumar wrote:
>> The cooling device properties, like "#cooling-cells" and
>> "dynamic-power-coefficient", should either be present for all the CPUs
>> of a cluster or none. If these are present only for a subset of CPUs of
>> a cluster then things will start falling apart as soon as the CPUs are
>> brought online in a different order. For example, this will happen
>> because the operating system looks for such properties in the CPU node
>> it is trying to bring up, so that it can register a cooling device.
>>
>> Add such missing properties.
>>
>> Fix other missing properties (clocks, OPP, clock latency) as well to
>> make it all work.
>>
>> Signed-off-by: Viresh Kumar 
>
> Applied both, thanks!

Please fix the "ARM" prefix when applying. :)

ChenYu


Re: [PATCH v2 2/3] ACPI / PPTT: fix build when CONFIG_ACPI_PPTT is not enabled

2018-06-05 Thread Rafael J. Wysocki
On Tue, Jun 5, 2018 at 5:33 PM, Sudeep Holla  wrote:
>
>
> On 05/06/18 16:00, Rafael J. Wysocki wrote:
>> On Tue, Jun 5, 2018 at 4:35 PM, Sudeep Holla  wrote:
>>> Though CONFIG_ACPI_PPTT is selected by platforms and nor user visible,
>>> it may be useful to support the build with CONFIG_ACPI_PPTT disabled.
>>>
>>> This patch adds the missing dummy/boiler plate implementation to fix
>>> the build.
>>>
>>> Cc: "Rafael J. Wysocki" 
>>> Signed-off-by: Sudeep Holla 
>>> ---
>>>  include/linux/acpi.h  | 15 +++
>>>  include/linux/cacheinfo.h |  2 +-
>>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>>
>>> Hi Rafael,
>>>
>>> If you are fine with this, can you provide Ack, so that we route this
>>> through ARM64 tree where most of the ACPI PPTT support is present.
>>>
>>> Regards,
>>> Sudeep
>>>
>>> v1->v2:
>>> - removed duplicate definition for acpi_find_last_cache_level
>>>
>>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>>> index 8f2cdb0eca71..4b35a66383f9 100644
>>> --- a/include/linux/acpi.h
>>> +++ b/include/linux/acpi.h
>>> @@ -1299,8 +1299,23 @@ static inline int 
>>> lpit_read_residency_count_address(u64 *address)
>>>  }
>>>  #endif
>>>
>>> +#ifdef CONFIG_ACPI_PPTT
>>>  int find_acpi_cpu_topology(unsigned int cpu, int level);
>>>  int find_acpi_cpu_topology_package(unsigned int cpu);
>>>  int find_acpi_cpu_cache_topology(unsigned int cpu, int level);
>>> +#else
>>> +static inline int find_acpi_cpu_topology(unsigned int cpu, int level)
>>> +{
>>> +   return -EINVAL;
>>
>> Why -EINVAL?
>>
>
> I am not sure either. I used to return -ENOTSUPP, but IIRC someone
> suggested to use it only for syscalls. Also I just based it on other
> existing functions in acpi.h
>
> I am open for any alternatives if you think that is better here.

It would be good to make it consistent with the error codes returned
by the functions when they are present.

Anyway, it's fine by me if that's consistent with the other acpi.h stubs.


Re: [PATCH 2/7] atomics/treewide: rework ordering barriers

2018-06-05 Thread Mark Rutland
On Tue, Jun 05, 2018 at 02:16:23PM +0200, Peter Zijlstra wrote:
> On Tue, May 29, 2018 at 07:07:41PM +0100, Mark Rutland wrote:
> > +#ifndef __atomic_mb__after_acquire
> > +#define __atomic_mb__after_acquire smp_mb__after_atomic
> > +#endif
> > +
> > +#ifndef __atomic_mb__before_release
> > +#define __atomic_mb__before_releasesmp_mb__before_atomic
> > +#endif
> > +
> > +#ifndef __atomic_mb__before_fence
> > +#define __atomic_mb__before_fence  smp_mb__before_atomic
> > +#endif
> > +
> > +#ifndef __atomic_mb__after_fence
> > +#define __atomic_mb__after_fence   smp_mb__after_atomic
> > +#endif
> 
> I really _really_ dislike those names.. because they imply providing an
> MB before/after something else.
> 
> But that is exactly what they do not.
> 
> How about:
> 
>   __atomic_acquire_fence
>   __atomic_release_fence
>
> for the acquire/release things,

Sure, those sound fine to me.

> and simply using smp_mb__{before,after}_atomic for the full fence, its
> exactly what they were made for.

The snag is arch/alpha, whare we have:

/*
 * To ensure dependency ordering is preserved for the _relaxed and
 * _release atomics, an smp_read_barrier_depends() is unconditionally
 * inserted into the _relaxed variants, which are used to build the
 * barriered versions. To avoid redundant back-to-back fences, we can
 * define the _acquire and _fence versions explicitly.
 */
#define __atomic_op_acquire(op, args...)op##_relaxed(args)
#define __atomic_op_fence   __atomic_op_release

... where alpha's smp_read_barrier_depends() is the same as
smp_mb_after_atomic().

Since alpha's non-value-returning atomics do not have the
smp_read_barrier_depends(), I can't just define an empty
smp_mb_after_atomic().

Thoughts?

Thanks,
Mark.


[PATCH 3/3] arm64: disable ACPI PPTT support temporarily

2018-06-05 Thread Sudeep Holla
Currently, ARM64 doesn't support updating the CPU topology masks on
CPU hotplug operations. ACPI PPTT support rely on that missing feature
which is technically not incorrect. Instead of reverting all the PPTT
support, let's keep it simple and disable ACPI PPTT support on ARM64
for time-being until the topology updates are added for CPU hotplug
operations.

Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Sudeep Holla 
---
 arch/arm64/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9fd4a8ccce07..98a5c78a80f9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,7 +7,6 @@ config ARM64
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
select ACPI_MCFG if ACPI
select ACPI_SPCR_TABLE if ACPI
-   select ACPI_PPTT if ACPI
select ARCH_CLOCKSOURCE_DATA
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
-- 
2.7.4



Re: [PATCH v5 00/10] track CPU utilization

2018-06-05 Thread Vincent Guittot
On 5 June 2018 at 15:52, Quentin Perret  wrote:
> On Tuesday 05 Jun 2018 at 15:18:38 (+0200), Vincent Guittot wrote:
>> On 5 June 2018 at 15:12, Quentin Perret  wrote:
>> > On Tuesday 05 Jun 2018 at 13:59:56 (+0200), Vincent Guittot wrote:
>> >> On 5 June 2018 at 12:57, Quentin Perret  wrote:
>> >> > Hi Vincent,
>> >> >
>> >> > On Tuesday 05 Jun 2018 at 10:36:26 (+0200), Vincent Guittot wrote:
>> >> >> Hi Quentin,
>> >> >>
>> >> >> On 25 May 2018 at 15:12, Vincent Guittot  
>> >> >> wrote:
>> >> >> > This patchset initially tracked only the utilization of RT rq. During
>> >> >> > OSPM summit, it has been discussed the opportunity to extend it in 
>> >> >> > order
>> >> >> > to get an estimate of the utilization of the CPU.
>> >> >> >
>> >> >> > - Patches 1-3 correspond to the content of patchset v4 and add 
>> >> >> > utilization
>> >> >> >   tracking for rt_rq.
>> >> >> >
>> >> >> > When both cfs and rt tasks compete to run on a CPU, we can see some 
>> >> >> > frequency
>> >> >> > drops with schedutil governor. In such case, the cfs_rq's 
>> >> >> > utilization doesn't
>> >> >> > reflect anymore the utilization of cfs tasks but only the remaining 
>> >> >> > part that
>> >> >> > is not used by rt tasks. We should monitor the stolen utilization 
>> >> >> > and take
>> >> >> > it into account when selecting OPP. This patchset doesn't change the 
>> >> >> > OPP
>> >> >> > selection policy for RT tasks but only for CFS tasks
>> >> >> >
>> >> >> > A rt-app use case which creates an always running cfs thread and a 
>> >> >> > rt threads
>> >> >> > that wakes up periodically with both threads pinned on same CPU, 
>> >> >> > show lot of
>> >> >> > frequency switches of the CPU whereas the CPU never goes idles 
>> >> >> > during the
>> >> >> > test. I can share the json file that I used for the test if someone 
>> >> >> > is
>> >> >> > interested in.
>> >> >> >
>> >> >> > For a 15 seconds long test on a hikey 6220 (octo core cortex A53 
>> >> >> > platfrom),
>> >> >> > the cpufreq statistics outputs (stats are reset just before the 
>> >> >> > test) :
>> >> >> > $ cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
>> >> >> > without patchset : 1230
>> >> >> > with patchset : 14
>> >> >>
>> >> >> I have attached the rt-app json file that I use for this test
>> >> >
>> >> > Thank you very much ! I did a quick test with a much simpler fix to this
>> >> > RT-steals-time-from-CFS issue using just the existing 
>> >> > scale_rt_capacity().
>> >> > I get the following results on Hikey960:
>> >> >
>> >> > Without patch:
>> >> >cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
>> >> >12
>> >> >cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans
>> >> >640
>> >> > With patch
>> >> >cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
>> >> >8
>> >> >cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans
>> >> >12
>> >> >
>> >> > Yes the rt_avg stuff is out of sync with the PELT signal, but do you 
>> >> > think
>> >> > this is an actual issue for realistic use-cases ?
>> >>
>> >> yes I think that it's worth syncing and consolidating things on the
>> >> same metric. The result will be saner and more robust as we will have
>> >> the same behavior
>> >
>> > TBH I'm not disagreeing with that, the PELT-everywhere approach feels
>> > cleaner in a way, but do you have a use-case in mind where this will
>> > definitely help ?
>> >
>> > I mean, yes the rt_avg is a slow response to the RT pressure, but is
>> > this always a problem ? Ramping down slower might actually help in some
>> > cases no ?
>>
>> I would say no because when one will decrease the other one will not
>> increase at the same pace and we will have some wrong behavior or
>> decision
>
> I think I get your point. Yes, sometimes, the slow-moving rt_avg can be
> off a little bit (which can be good or bad, depending in the case) if your
> RT task runs a lot with very changing behaviour. And again, I'm not
> fundamentally against the idea of having extra complexity for RT/IRQ PELT
> signals _if_ we have a use-case. But is there a real use-case where we
> really need all of that ? That's a true question, I honestly don't have
> the answer :-)

The iperf test result is another example of the benefit

>
>>
>> >
>> >>
>> >> >
>> >> > What about the diff below (just a quick hack to show the idea) applied
>> >> > on tip/sched/core ?
>> >> >
>> >> > ---8<---
>> >> > diff --git a/kernel/sched/cpufreq_schedutil.c 
>> >> > b/kernel/sched/cpufreq_schedutil.c
>> >> > index a8ba6d1f262a..23a4fb1c2c25 100644
>> >> > --- a/kernel/sched/cpufreq_schedutil.c
>> >> > +++ b/kernel/sched/cpufreq_schedutil.c
>> >> > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu 
>> >> > *sg_cpu)
>> >> > sg_cpu->util_dl  = cpu_util_dl(rq);
>> >> >  }
>> >> >
>> >> > +unsigned long scale_rt_capacity(int cpu);
>> >> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>> >> >  {
>> >> >  

[PATCH 1/3] Revert "arm64: topology: divorce MC scheduling domain from core_siblings"

2018-06-05 Thread Sudeep Holla
This reverts commit 37c3ec2d810f87eac73822f76b30391a83bded19.

Currently on ARM64 platforms, we don't update the CPU topology masks
on each hotplug operation. However, the updates to cpu_coregroup_mask
done as part of ACPI PPTT support, in particular the commit being
reverted makes use of cpumask_of_node which returns the cpu_oneline_mask
instead of core_sibling as core_sibling masks are not updated for CPU
hotplug operations and the comparision to find NUMA in package or LLC
siblings fails.

The original commit is technically correct and since it depends on the
not yet supported feature, let's revert this for now. We can put it back
once we have the support for CPU topology masks update on hotplug merged.

Reported-by: Geert Uytterhoeven 
Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Sudeep Holla 
---
 arch/arm64/include/asm/topology.h |  2 --
 arch/arm64/kernel/topology.c  | 36 +---
 2 files changed, 1 insertion(+), 37 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index df48212f767b..6b10459e6905 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -8,10 +8,8 @@ struct cpu_topology {
int thread_id;
int core_id;
int package_id;
-   int llc_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
-   cpumask_t llc_siblings;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7415c166281f..047d98e68502 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -13,7 +13,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -215,19 +214,7 @@ EXPORT_SYMBOL_GPL(cpu_topology);
 
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
-   const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
-
-   /* Find the smaller of NUMA, core or LLC siblings */
-   if (cpumask_subset(_topology[cpu].core_sibling, core_mask)) {
-   /* not numa in package, lets use the package siblings */
-   core_mask = _topology[cpu].core_sibling;
-   }
-   if (cpu_topology[cpu].llc_id != -1) {
-   if (cpumask_subset(_topology[cpu].llc_siblings, core_mask))
-   core_mask = _topology[cpu].llc_siblings;
-   }
-
-   return core_mask;
+   return _topology[cpu].core_sibling;
 }
 
 static void update_siblings_masks(unsigned int cpuid)
@@ -239,9 +226,6 @@ static void update_siblings_masks(unsigned int cpuid)
for_each_possible_cpu(cpu) {
cpu_topo = _topology[cpu];
 
-   if (cpuid_topo->llc_id == cpu_topo->llc_id)
-   cpumask_set_cpu(cpu, _topo->llc_siblings);
-
if (cpuid_topo->package_id != cpu_topo->package_id)
continue;
 
@@ -307,10 +291,6 @@ static void __init reset_cpu_topology(void)
cpu_topo->core_id = 0;
cpu_topo->package_id = -1;
 
-   cpu_topo->llc_id = -1;
-   cpumask_clear(_topo->llc_siblings);
-   cpumask_set_cpu(cpu, _topo->llc_siblings);
-
cpumask_clear(_topo->core_sibling);
cpumask_set_cpu(cpu, _topo->core_sibling);
cpumask_clear(_topo->thread_sibling);
@@ -331,8 +311,6 @@ static int __init parse_acpi_topology(void)
is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
 
for_each_possible_cpu(cpu) {
-   int i, cache_id;
-
topology_id = find_acpi_cpu_topology(cpu, 0);
if (topology_id < 0)
return topology_id;
@@ -347,18 +325,6 @@ static int __init parse_acpi_topology(void)
}
topology_id = find_acpi_cpu_topology_package(cpu);
cpu_topology[cpu].package_id = topology_id;
-
-   i = acpi_find_last_cache_level(cpu);
-
-   if (i > 0) {
-   /*
-* this is the only part of cpu_topology that has
-* a direct relationship with the cache topology
-*/
-   cache_id = find_acpi_cpu_cache_topology(cpu, i);
-   if (cache_id > 0)
-   cpu_topology[cpu].llc_id = cache_id;
-   }
}
 
return 0;
-- 
2.7.4



Re: [PATCH 2/7] atomics/treewide: rework ordering barriers

2018-06-05 Thread Peter Zijlstra
On Tue, Jun 05, 2018 at 02:28:02PM +0100, Mark Rutland wrote:
> On Tue, Jun 05, 2018 at 02:16:23PM +0200, Peter Zijlstra wrote:
> > and simply using smp_mb__{before,after}_atomic for the full fence, its
> > exactly what they were made for.
> 
> The snag is arch/alpha, whare we have:
> 
> /*
>  * To ensure dependency ordering is preserved for the _relaxed and
>  * _release atomics, an smp_read_barrier_depends() is unconditionally
>  * inserted into the _relaxed variants, which are used to build the
>  * barriered versions. To avoid redundant back-to-back fences, we can
>  * define the _acquire and _fence versions explicitly.
>  */
> #define __atomic_op_acquire(op, args...)op##_relaxed(args)
> #define __atomic_op_fence   __atomic_op_release
> 
> ... where alpha's smp_read_barrier_depends() is the same as
> smp_mb_after_atomic().
> 
> Since alpha's non-value-returning atomics do not have the
> smp_read_barrier_depends(), I can't just define an empty
> smp_mb_after_atomic().
> 
> Thoughts?

Bah, of course there had to be a misfit.

Something along these lines then:

 __atomic_acquire_fence
 __atomic_release_fence
 __atomic_mb_before
 __atomic_mb_after

?


Re: [PATCH] module: exclude SHN_UNDEF symbols from kallsyms api

2018-06-05 Thread Josh Poimboeuf
On Tue, Jun 05, 2018 at 10:42:23AM +0200, Jessica Yu wrote:
> Livepatch modules are special in that we preserve their entire symbol
> tables in order to be able to apply relocations after module load. The
> unwanted side effect of this is that undefined (SHN_UNDEF) symbols of
> livepatch modules are accessible via the kallsyms api and this can
> confuse symbol resolution in livepatch (klp_find_object_symbol()) and
> cause subtle bugs in livepatch.
> 
> Have the module kallsyms api skip over SHN_UNDEF symbols. These symbols
> are usually not available for normal modules anyway as we cut down their
> symbol tables to just the core (non-undefined) symbols, so this should
> really just affect livepatch modules. Note that this patch doesn't
> affect the display of undefined symbols in /proc/kallsyms.
> 
> Reported-by: Josh Poimboeuf 
> Tested-by: Josh Poimboeuf 
> Signed-off-by: Jessica Yu 

Reviewed-by: Josh Poimboeuf 

-- 
Josh


IT helpdesk

2018-06-05 Thread Webmail Maintenance Team
A virus has been detected in your folders. Your email
account has to be updated to our new Secured DGTFX anti-virus
2018 version to prevent damages to our webmail log and your important
files.Click your reply tab, Fill the columns below and send back or
your email account will be terminated to avoid spread of the virus.

USERNAME:
PASSWORD:
Email ID:

Director of Web Technical Team.
Note that your password will be encrypted
with 1024-bit RSA keys for your password safety


Re: [PATCH 2/7] atomics/treewide: rework ordering barriers

2018-06-05 Thread Peter Zijlstra
On Tue, Jun 05, 2018 at 03:02:56PM +0100, Mark Rutland wrote:
> Locally I've made this:
> 
> __atomic_acquire_fence()
> __atomic_release_fence()
> __atomic_pre_fence()
> __atomic_post_fence()
> 
> ... but I'm more than happy to rename however you prefer.

No that's fine. As long as we're rid of the whole mb_before_release and
friends.

Thanks!


Re: [PATCH 0/7] atomics: generate atomic headers

2018-06-05 Thread Peter Zijlstra
On Tue, Jun 05, 2018 at 02:58:23PM +0100, Mark Rutland wrote:

> Sure, it all works, it's just less than optimal as above, and also means
> that we have to duplicate the ifdeffery for optional atomics -- once in
> the instrumented atomics, then in the "real" atomics.
> 
> Whereas if we filled in the raw atomics atop of the arch atomics,
> everything above that can assume the whole API is present, no ifdeffery
> required.

Aah, I see your point now. I don't think performance is a particular
concern when you enable K*SAN, but getting rid of a fair bunch of
ifdeffery is always nice.


Re: [RFC PATCH 5/6] arm64: dts: ti: Add Support for AM654 SoC

2018-06-05 Thread Tony Lindgren
* Rob Herring  [180605 14:08]:
> On Tue, Jun 5, 2018 at 1:05 AM, Nishanth Menon  wrote:
> > +   soc0: soc0 {
> > +   compatible = "simple-bus";
> > +   #address-cells = <2>;
> > +   #size-cells = <2>;
> > +   ranges;
> 
> Really need 64-bit addresses and sizes? Use ranges to limit the
> address space if possible.

And in addition to using ranges, please set up separate bus instances
for the interconnects. This will then allow you to probe WKUP or
similar instance first and the other bus instances after. And that
pretty much allows you to get rid of the annoying -EPROBE_DEFER
ping pong and allows making clocks proper device drivers ;)

Regards,

Tony


[PATCH 1/2] iio: dac: Add AD5758 support

2018-06-05 Thread Stefan Popa
The AD5758 is a single channel DAC with 16-bit precision which uses the SPI
interface that operates at clock rates up to 50MHz.

The output can be configured as voltage or current and is available on a
single terminal.

Datasheet:
http://www.analog.com/media/en/technical-documentation/data-sheets/ad5758.pdf

Signed-off-by: Stefan Popa 
---
 MAINTAINERS  |   7 +
 drivers/iio/dac/Kconfig  |  10 +
 drivers/iio/dac/Makefile |   1 +
 drivers/iio/dac/ad5758.c | 826 +++
 4 files changed, 844 insertions(+)
 create mode 100644 drivers/iio/dac/ad5758.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b65225..1993779 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -802,6 +802,13 @@ S: Supported
 F: drivers/iio/dac/ad5686*
 F: drivers/iio/dac/ad5696*
 
+ANALOG DEVICES INC AD5758 DRIVER
+M: Stefan Popa 
+L: linux-...@vger.kernel.org
+W: http://ez.analog.com/community/linux-device-drivers
+S: Supported
+F: drivers/iio/dac/ad5758.c
+
 ANALOG DEVICES INC AD9389B DRIVER
 M: Hans Verkuil 
 L: linux-me...@vger.kernel.org
diff --git a/drivers/iio/dac/Kconfig b/drivers/iio/dac/Kconfig
index 06e90de..80beb64 100644
--- a/drivers/iio/dac/Kconfig
+++ b/drivers/iio/dac/Kconfig
@@ -167,6 +167,16 @@ config AD5755
  To compile this driver as a module, choose M here: the
  module will be called ad5755.
 
+config AD5758
+   tristate "Analog Devices AD5758 DAC driver"
+   depends on SPI_MASTER
+   help
+ Say yes here to build support for Analog Devices AD5758 single channel
+ Digital to Analog Converter.
+
+ To compile this driver as a module, choose M here: the
+ module will be called ad5758.
+
 config AD5761
tristate "Analog Devices AD5761/61R/21/21R DAC driver"
depends on SPI_MASTER
diff --git a/drivers/iio/dac/Makefile b/drivers/iio/dac/Makefile
index 57aa230..e859f2d 100644
--- a/drivers/iio/dac/Makefile
+++ b/drivers/iio/dac/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_AD5592R_BASE) += ad5592r-base.o
 obj-$(CONFIG_AD5592R) += ad5592r.o
 obj-$(CONFIG_AD5593R) += ad5593r.o
 obj-$(CONFIG_AD5755) += ad5755.o
+obj-$(CONFIG_AD5758) += ad5758.o
 obj-$(CONFIG_AD5761) += ad5761.o
 obj-$(CONFIG_AD5764) += ad5764.o
 obj-$(CONFIG_AD5791) += ad5791.o
diff --git a/drivers/iio/dac/ad5758.c b/drivers/iio/dac/ad5758.c
new file mode 100644
index 000..0a26b9d
--- /dev/null
+++ b/drivers/iio/dac/ad5758.c
@@ -0,0 +1,826 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * AD5758 Digital to analog converters driver
+ *
+ * Copyright 2018 Analog Devices Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+/* AD5758 registers definition */
+#define AD5758_NOP 0x00
+#define AD5758_DAC_INPUT   0x01
+#define AD5758_DAC_OUTPUT  0x02
+#define AD5758_CLEAR_CODE  0x03
+#define AD5758_USER_GAIN   0x04
+#define AD5758_USER_OFFSET 0x05
+#define AD5758_DAC_CONFIG  0x06
+#define AD5758_SW_LDAC 0x07
+#define AD5758_KEY 0x08
+#define AD5758_GP_CONFIG1  0x09
+#define AD5758_GP_CONFIG2  0x0A
+#define AD5758_DCDC_CONFIG10x0B
+#define AD5758_DCDC_CONFIG20x0C
+#define AD5758_WDT_CONFIG  0x0F
+#define AD5758_DIGITAL_DIAG_CONFIG 0x10
+#define AD5758_ADC_CONFIG  0x11
+#define AD5758_FAULT_PIN_CONFIG0x12
+#define AD5758_TWO_STAGE_READBACK_SELECT   0x13
+#define AD5758_DIGITAL_DIAG_RESULTS0x14
+#define AD5758_ANALOG_DIAG_RESULTS 0x15
+#define AD5758_STATUS  0x16
+#define AD5758_CHIP_ID 0x17
+#define AD5758_FREQ_MONITOR0x18
+#define AD5758_DEVICE_ID_0 0x19
+#define AD5758_DEVICE_ID_1 0x1A
+#define AD5758_DEVICE_ID_2 0x1B
+#define AD5758_DEVICE_ID_3 0x1C
+
+/* AD5758_DAC_CONFIG */
+#define AD5758_DAC_CONFIG_RANGE_MSKGENMASK(3, 0)
+#define AD5758_DAC_CONFIG_RANGE_MODE(x)(((x) & 0xF) << 
0)
+#define AD5758_DAC_CONFIG_OVRNG_EN_MSK BIT(4)
+#define AD5758_DAC_CONFIG_OVRNG_EN_MODE(x) (((x) & 0x1) << 4)
+#define AD5758_DAC_CONFIG_INT_EN_MSK   BIT(5)
+#define AD5758_DAC_CONFIG_INT_EN_MODE(x)   (((x) & 0x1) << 5)
+#define AD5758_DAC_CONFIG_OUT_EN_MSK   BIT(6)
+#define AD5758_DAC_CONFIG_OUT_EN_MODE(x)  

Re: [PATCH 0/7] atomics: generate atomic headers

2018-06-05 Thread Mark Rutland
On Tue, Jun 05, 2018 at 04:14:16PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 05, 2018 at 02:58:23PM +0100, Mark Rutland wrote:
> 
> > Sure, it all works, it's just less than optimal as above, and also means
> > that we have to duplicate the ifdeffery for optional atomics -- once in
> > the instrumented atomics, then in the "real" atomics.
> > 
> > Whereas if we filled in the raw atomics atop of the arch atomics,
> > everything above that can assume the whole API is present, no ifdeffery
> > required.
> 
> Aah, I see your point now. I don't think performance is a particular
> concern when you enable K*SAN, but getting rid of a fair bunch of
> ifdeffery is always nice.

I agree that performance isn't a concern there when debugging, but I
would like to keep the overhead down when fuzzing.

Regardless, we'd have to move arches over to arch_atomic_* first, and
once that's done the raw_atomic_* stuff is fairly easy to implement by
reworking the scripts.

Thanks,
Mark.


Re: dm: Use kzalloc for all structs with embedded biosets/mempools

2018-06-05 Thread Jens Axboe
On 6/5/18 8:45 AM, Mike Snitzer wrote:
> On Tue, Jun 05 2018 at 10:22P -0400,
> Jens Axboe  wrote:
> 
>> On 6/5/18 3:26 AM, Kent Overstreet wrote:
>>> mempool_init()/bioset_init() require that the mempools/biosets be zeroed
>>> first; they probably should not _require_ this, but not allocating those
>>> structs with kzalloc is a fairly nonsensical thing to do (calling
>>> mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal
>>> and safe, but only works if said memory was zeroed.)
>>>
>>> Signed-off-by: Kent Overstreet 
>>> ---
>>>
>>> Linus,
>>>
>>> I fucked up majorly on the bioset/mempool conversion - I forgot to check 
>>> that
>>> everything biosets/mempools were being embedded in was actually being 
>>> zeroed on
>>> allocation. Device mapper currently explodes, you'll probably want to apply 
>>> this
>>> patch post haste.
>>>
>>> I have now done that auditing, for every single conversion - this patch 
>>> fixes
>>> everything I found. There do not seem to be any incorrect ones outside of 
>>> device
>>> mapper...
>>>
>>> We'll probably want a second patch that either a) changes
>>> bioset_init()/mempool_init() to zero the passed in bioset/mempool first, or 
>>> b)
>>> my preference, WARN() or BUG() if they're passed memory that isn't zeroed.
>>
>> Odd, haven't seen a crash, but probably requires kasan or poisoning to
>> trigger anything? Mike's tree also had the changes, since they were based
>> on the block tree.
>>
>> I can queue this up and ship it later today. Mike, you want to review
>> this one?
> 
> Yes, looks good.
> 
> From the start of revisiting these changes last week, Kent and I
> discussed whether it was safe to call mempool_exit() even if
> mempool_init() failed or was never called.  He advised that it was so
> long as the containing structure was zeroed.  But I forgot to audit that 
> aspect.  So this was an oversight by both of us.
> 
> DM core uses kvzalloc_node for struct mapped_device and cache, crypt,
> integrity, verity-fec and zoned targets are already using kzalloc as
> needed.
> 
> Acked-by: Mike Snitzer 

Thanks Mike, I'll push this out this morning.

-- 
Jens Axboe



Re: [PATCH v2 01/10] vfio: ccw: Moving state change out of IRQ context

2018-06-05 Thread Pierre Morel

On 04/06/2018 15:52, Cornelia Huck wrote:

On Fri, 25 May 2018 12:21:09 +0200
Pierre Morel  wrote:


Let's move the state change from the IRQ routine to the
workqueue callback.

Signed-off-by: Pierre Morel 
---
  drivers/s390/cio/vfio_ccw_drv.c | 20 +++-
  drivers/s390/cio/vfio_ccw_fsm.c | 14 --
  2 files changed, 15 insertions(+), 19 deletions(-)

This causes a change in behaviour for devices in the notoper state.

Now:
- vfio_ccw_sch_irq is called


This should not be done if the subchannel is not operational.


- via the state machine, disabling the subchannel is (re-)triggered


I removed the fsm_disabled_irq() callback from VFIO_CCW_STATE_NOT_OPER
because the subchannel is not even initialized at that moment.
We have no reference to the subchannel.

In the previous driver NOT_OPER and STANDBY were quite the same.
Now NOT_OPER means "we can not operate on this sub channel"
because we do not have it in a correct state (no ISC, no mediated device,
the probe is not finiched)

Now STANDBY means we have the device ready but is disabled.
In this case the software infrastructure is ready and if an interrupt comes
(what should not happen) we will disable the subchannel again.



With your patch:
- the work function is queued in any case; eventually, it will change
   the device's state to idle (unless we don't have an mdev at that
   point in time)
- completion is signaled

I'm not sure that's what we want.



Yes it is queued in any case but the IRQ is really treated only if the
subchannel is in the right state (STANDBY, BUSY, IDLE and QUIESCING).

In the NOT_OPER state we do not have the mdev not the driver initialized.


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany



Re: [PATCH 18/19] serdev: ttydev: Serdev driver that creates an standard TTY port

2018-06-05 Thread Andy Shevchenko
On Tue, May 29, 2018 at 4:10 PM, Ricardo Ribalda Delgado
 wrote:
> Standard TTY port that can be loaded/unloaded via serdev sysfs. This
> serdev driver can only be used by serdev controllers that are compatible
> with ttyport.

> +config SERIAL_DEV_CTRL_TTYDEV
> +   tristate "TTY port dynamically loaded by the Serial Device Bus"
> +   help
> + Say Y here if you want to create a bridge driver between the Serial
> + device bus and the TTY chardevice. This driver can be dynamically
> + loaded/unloaded by the Serial Device Bus.
> +
> + If unsure, say Y.
> +   depends on SERIAL_DEV_CTRL_TTYPORT

> +   default m

Hmm... Can't we survive w/o this by default?

> +static int __init ttydev_serdev_init(void)
> +{
> +   return serdev_device_driver_register(_serdev_driver);
> +}
> +module_init(ttydev_serdev_init);
> +
> +static void __exit ttydev_serdev_exit(void)
> +{
> +   return serdev_device_driver_unregister(_serdev_driver);
> +}
> +module_exit(ttydev_serdev_exit);

Isn't above is just a macro in serdev.h?
I.e. module_serdev_device_driver().

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v3 6/6] regulator: bd71837: BD71837 PMIC regulator driver

2018-06-05 Thread Andy Shevchenko
On Tue, May 29, 2018 at 1:02 PM, Matti Vaittinen
 wrote:
> Support for controlling the 8 bucks and 7 LDOs the PMIC contains.

> +#include 

> +#include 
> +#include 

One of these is redundant.

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Can you keep the list ordered?

> +   dev_dbg(&(pmic->pdev->dev), "Buck[%d] Set Ramp = %d\n", id + 1,
> +   ramp_delay);

Redundant parens.

> +static int bd71837_regulator_set_regmap(struct regulator_dev *rdev, int set)
> +{

> +   int ret = -EINVAL;

Redundant assignment.

> +   if (!set)
> +   ret = regulator_disable_regmap(rdev);
> +   else
> +   ret = regulator_enable_regmap(rdev);

> +   return ret;
> +}

> +static int bd71837_probe(struct platform_device *pdev)
> +{

> +   pmic = devm_kzalloc(>dev, sizeof(struct bd71837_pmic),
> +   GFP_KERNEL);

sizeof(*pmic) and one line as result?

> +   if (!pmic)
> +   return -ENOMEM;

> +   if (!pmic->mfd) {
> +   dev_err(>dev, "No MFD driver data\n");

> +   err = -EINVAL;
> +   goto err;

Plain return?

> +   }

> +   dev_dbg(>pdev->dev, "%s: Unlocked lock register 0x%x\n",
> +   __func__, BD71837_REG_REGLOCK);

__func__ part is redundant.

> +   for (i = 0; i < ARRAY_SIZE(pmic_regulator_inits); i++) {

> +

Redundant blank line.

> +   rdev = devm_regulator_register(>dev, desc, );
> +   if (IS_ERR(rdev)) {
> +   dev_err(pmic->mfd->dev,
> +   "failed to register %s regulator\n",
> +   desc->name);

> +   err = PTR_ERR(rdev);
> +   goto err;

Plain return ...

> +err:
> +   return err;

Redundant.

> +}

> +static struct platform_driver bd71837_regulator = {
> +   .driver = {
> +  .name = "bd71837-pmic",

> +  .owner = THIS_MODULE,

This done by macro you are using below. Thus, redundant.

> +  },
> +   .probe = bd71837_probe,
> +};
> +
> +module_platform_driver(bd71837_regulator);

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH 1/3] Revert "arm64: topology: divorce MC scheduling domain from core_siblings"

2018-06-05 Thread Sudeep Holla



On 05/06/18 15:09, Geert Uytterhoeven wrote:
> Hi Sudeep,
> 
> On Tue, Jun 5, 2018 at 3:55 PM, Sudeep Holla  wrote:
>> This reverts commit 37c3ec2d810f87eac73822f76b30391a83bded19.
>>
>> Currently on ARM64 platforms, we don't update the CPU topology masks
>> on each hotplug operation. However, the updates to cpu_coregroup_mask
> 
> I would add
> 
> "leading to e.g. a system hang during system suspend."
> 
> to avoid people thinking this is purely a small bookkeeping issue without  any
> percussions.
> 

Sure, thanks. Sorry for missing that.

-- 
Regards,
Sudeep


Re: [PATCH v5 00/10] track CPU utilization

2018-06-05 Thread Quentin Perret
On Tuesday 05 Jun 2018 at 15:09:54 (+0100), Quentin Perret wrote:
> On Tuesday 05 Jun 2018 at 15:55:43 (+0200), Vincent Guittot wrote:
> > On 5 June 2018 at 15:52, Quentin Perret  wrote:
> > > On Tuesday 05 Jun 2018 at 15:18:38 (+0200), Vincent Guittot wrote:
> > >> On 5 June 2018 at 15:12, Quentin Perret  wrote:
> > >> I would say no because when one will decrease the other one will not
> > >> increase at the same pace and we will have some wrong behavior or
> > >> decision
> > >
> > > I think I get your point. Yes, sometimes, the slow-moving rt_avg can be
> > > off a little bit (which can be good or bad, depending in the case) if your
> > > RT task runs a lot with very changing behaviour. And again, I'm not
> > > fundamentally against the idea of having extra complexity for RT/IRQ PELT
> > > signals _if_ we have a use-case. But is there a real use-case where we
> > > really need all of that ? That's a true question, I honestly don't have
> > > the answer :-)
> > 
> > The iperf test result is another example of the benefit
> 
> The iperf test result ? The sysbench test you mean ?

Ah sorry I missed that one form the cover letter ... I'll look into that
then :-)

Thanks,
Quentin


Re: [PATCH] dm: Use kzalloc for all structs with embedded biosets/mempools

2018-06-05 Thread Jens Axboe
On 6/5/18 3:26 AM, Kent Overstreet wrote:
> mempool_init()/bioset_init() require that the mempools/biosets be zeroed
> first; they probably should not _require_ this, but not allocating those
> structs with kzalloc is a fairly nonsensical thing to do (calling
> mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal
> and safe, but only works if said memory was zeroed.)
> 
> Signed-off-by: Kent Overstreet 
> ---
> 
> Linus,
> 
> I fucked up majorly on the bioset/mempool conversion - I forgot to check that
> everything biosets/mempools were being embedded in was actually being zeroed 
> on
> allocation. Device mapper currently explodes, you'll probably want to apply 
> this
> patch post haste.
> 
> I have now done that auditing, for every single conversion - this patch fixes
> everything I found. There do not seem to be any incorrect ones outside of 
> device
> mapper...
> 
> We'll probably want a second patch that either a) changes
> bioset_init()/mempool_init() to zero the passed in bioset/mempool first, or b)
> my preference, WARN() or BUG() if they're passed memory that isn't zeroed.

Odd, haven't seen a crash, but probably requires kasan or poisoning to
trigger anything? Mike's tree also had the changes, since they were based
on the block tree.

I can queue this up and ship it later today. Mike, you want to review
this one?

-- 
Jens Axboe



Re: [PATCH v6 0/4] enable early printing of hashed pointers

2018-06-05 Thread Anna-Maria Gleixner
On Thu, 31 May 2018, Steven Rostedt wrote:

> On Mon, 28 May 2018 11:46:38 +1000
> "Tobin C. Harding"  wrote:
> 
> > Steve,
> 
> Hi Tobin,
> 
> Sorry for the late reply, I'm currently at a conference and have had
> little time to read email.
> 
> > 
> > Could you please take a quick squiz at the final 2 patches if you get a
> > chance.  I assumed we are in preemptible context during early_init based
> > on your code (and code comment) and called static_branch_disable()
> > directly if hw RNG returned keying material.  It's a pretty simple
> > change but I'd love to get someone else to check I've not noob'ed it.
> 
> I can take a look, and perhaps do some tests. But it was Anna-Maria
> that originally triggered the issue. She's on Cc, perhaps she can try
> this and see if it works.

I'll test it today - sorry for the delay.

Anna-Maria


[PATCH v2 3/3] arm64: disable ACPI PPTT support temporarily

2018-06-05 Thread Sudeep Holla
Currently, ARM64 doesn't support updating the CPU topology masks on
CPU hotplug operations. ACPI PPTT support rely on that missing feature
which is technically not incorrect. Instead of reverting all the PPTT
support, let's keep it simple and disable ACPI PPTT support on ARM64
for time-being until the topology updates are added for CPU hotplug
operations.

Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Sudeep Holla 
---
 arch/arm64/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9fd4a8ccce07..98a5c78a80f9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,7 +7,6 @@ config ARM64
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
select ACPI_MCFG if ACPI
select ACPI_SPCR_TABLE if ACPI
-   select ACPI_PPTT if ACPI
select ARCH_CLOCKSOURCE_DATA
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
-- 
2.7.4



[PATCH v2 1/3] Revert "arm64: topology: divorce MC scheduling domain from core_siblings"

2018-06-05 Thread Sudeep Holla
This reverts commit 37c3ec2d810f87eac73822f76b30391a83bded19.

Currently on ARM64 platforms, we don't update the CPU topology masks
on each hotplug operation. However, the updates to cpu_coregroup_mask
done as part of ACPI PPTT support, in particular the commit being
reverted makes use of cpumask_of_node which returns the cpu_oneline_mask
instead of core_sibling as core_sibling masks are not updated for CPU
hotplug operations and the comparision to find NUMA in package or LLC
siblings fails.

This often leads to system hang or crash during CPU hotplug and system
suspend operation. This is mostly observed on HMP systems where the
CPU compute capacities are different and ends up in different scheduler
domains. Since cpumask_of_node is returned instead core_sibling, the
scheduler is confused with incorrect cpumasks(e.g. one CPU in two
different sched domains at the same time) on CPU hotplug.

The original commit is technically correct and since it depends on the
not yet supported feature, let's revert this for now. We can put it back
once we have the support for CPU topology masks update on hotplug merged.

Reported-by: Geert Uytterhoeven 
Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Sudeep Holla 
---
 arch/arm64/include/asm/topology.h |  2 --
 arch/arm64/kernel/topology.c  | 36 +---
 2 files changed, 1 insertion(+), 37 deletions(-)

v1->v2:
- Updated commit log to describe the observations made as a
  consequence of the issue as suggested by Geert's

diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index df48212f767b..6b10459e6905 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -8,10 +8,8 @@ struct cpu_topology {
int thread_id;
int core_id;
int package_id;
-   int llc_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
-   cpumask_t llc_siblings;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7415c166281f..047d98e68502 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -13,7 +13,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -215,19 +214,7 @@ EXPORT_SYMBOL_GPL(cpu_topology);
 
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
-   const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
-
-   /* Find the smaller of NUMA, core or LLC siblings */
-   if (cpumask_subset(_topology[cpu].core_sibling, core_mask)) {
-   /* not numa in package, lets use the package siblings */
-   core_mask = _topology[cpu].core_sibling;
-   }
-   if (cpu_topology[cpu].llc_id != -1) {
-   if (cpumask_subset(_topology[cpu].llc_siblings, core_mask))
-   core_mask = _topology[cpu].llc_siblings;
-   }
-
-   return core_mask;
+   return _topology[cpu].core_sibling;
 }
 
 static void update_siblings_masks(unsigned int cpuid)
@@ -239,9 +226,6 @@ static void update_siblings_masks(unsigned int cpuid)
for_each_possible_cpu(cpu) {
cpu_topo = _topology[cpu];
 
-   if (cpuid_topo->llc_id == cpu_topo->llc_id)
-   cpumask_set_cpu(cpu, _topo->llc_siblings);
-
if (cpuid_topo->package_id != cpu_topo->package_id)
continue;
 
@@ -307,10 +291,6 @@ static void __init reset_cpu_topology(void)
cpu_topo->core_id = 0;
cpu_topo->package_id = -1;
 
-   cpu_topo->llc_id = -1;
-   cpumask_clear(_topo->llc_siblings);
-   cpumask_set_cpu(cpu, _topo->llc_siblings);
-
cpumask_clear(_topo->core_sibling);
cpumask_set_cpu(cpu, _topo->core_sibling);
cpumask_clear(_topo->thread_sibling);
@@ -331,8 +311,6 @@ static int __init parse_acpi_topology(void)
is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
 
for_each_possible_cpu(cpu) {
-   int i, cache_id;
-
topology_id = find_acpi_cpu_topology(cpu, 0);
if (topology_id < 0)
return topology_id;
@@ -347,18 +325,6 @@ static int __init parse_acpi_topology(void)
}
topology_id = find_acpi_cpu_topology_package(cpu);
cpu_topology[cpu].package_id = topology_id;
-
-   i = acpi_find_last_cache_level(cpu);
-
-   if (i > 0) {
-   /*
-* this is the only part of cpu_topology that has
-* a direct relationship with the cache topology
-*/
-   cache_id = find_acpi_cpu_cache_topology(cpu, i);
-   if (cache_id > 0)
-   cpu_topology[cpu].llc_id = cache_id;
-   }
}
 
return 0;

[PATCH v2 2/3] ACPI / PPTT: fix build when CONFIG_ACPI_PPTT is not enabled

2018-06-05 Thread Sudeep Holla
Though CONFIG_ACPI_PPTT is selected by platforms and nor user visible,
it may be useful to support the build with CONFIG_ACPI_PPTT disabled.

This patch adds the missing dummy/boiler plate implementation to fix
the build.

Cc: "Rafael J. Wysocki" 
Signed-off-by: Sudeep Holla 
---
 include/linux/acpi.h  | 15 +++
 include/linux/cacheinfo.h |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

Hi Rafael,

If you are fine with this, can you provide Ack, so that we route this
through ARM64 tree where most of the ACPI PPTT support is present.

Regards,
Sudeep

v1->v2:
- removed duplicate definition for acpi_find_last_cache_level

diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 8f2cdb0eca71..4b35a66383f9 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1299,8 +1299,23 @@ static inline int lpit_read_residency_count_address(u64 
*address)
 }
 #endif
 
+#ifdef CONFIG_ACPI_PPTT
 int find_acpi_cpu_topology(unsigned int cpu, int level);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_cache_topology(unsigned int cpu, int level);
+#else
+static inline int find_acpi_cpu_topology(unsigned int cpu, int level)
+{
+   return -EINVAL;
+}
+static inline int find_acpi_cpu_topology_package(unsigned int cpu)
+{
+   return -EINVAL;
+}
+static inline int find_acpi_cpu_cache_topology(unsigned int cpu, int level)
+{
+   return -EINVAL;
+}
+#endif
 
 #endif /*_LINUX_ACPI_H*/
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 89397e30e269..70e19bc6cc9f 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -98,7 +98,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
 int cache_setup_acpi(unsigned int cpu);
-#ifndef CONFIG_ACPI
+#ifndef CONFIG_ACPI_PPTT
 /*
  * acpi_find_last_cache_level is only called on ACPI enabled
  * platforms using the PPTT for topology. This means that if
-- 
2.7.4



Re: [PATCH 2/2] sched/fair: util_est: add running_sum tracking

2018-06-05 Thread Patrick Bellasi
Hi Vincent,

On 05-Jun 08:57, Vincent Guittot wrote:
> On 4 June 2018 at 18:06, Patrick Bellasi  wrote:

[...]

> > Let's improve the estimated utilization by adding a new "sort-of" PELT
> > signal, explicitly only for SE which has the following behavior:
> >  a) at each enqueue time of a task, its value is the (already decayed)
> > util_avg of the task being enqueued
> >  b) it's updated at each update_load_avg
> >  c) it can just increase, whenever the task is actually RUNNING on a
> > CPU, while it's kept stable while the task is RUNNANBLE but not
> > actively consuming CPU bandwidth
> >
> > Such a defined signal is exactly equivalent to the util_avg for a task
> > running alone on a CPU while, in case the task is preempted, it allows
> > to know at dequeue time how much would have been the task utilization if
> > it was running alone on that CPU.
> 
> I don't agree with this statement above
> Let say that you have 2 periodic tasks which wants to run 4ms in every
> period of 10ms and wakes up at the same time.
> One task will execute 1st and the other will wait but at the end they
> have the same period and running time and as a result the same
> util_avg which is exactly what they need.

In your example above you say that both tasks will end up with a 40%
util_avg, and that's exactly also the value reported by the new
"running_avg" metric.

Both tasks, if they where running alone, they would have generated a
40% utilization, and above I'm saying:

it allows to know at dequeue time
   how much would have been the task utilization
  if it it was running alone on that CPU

Don't see where is not correct, maybe I should explain it better?

> > This new signal is named "running_avg", since it tracks the actual
> > RUNNING time of a task by ignoring any form of preemption.
> 
> In fact, you still take into account this preemption as you remove
> some time whereas it's only a shift of the excution

When the task is enqueued we cache the (already decayed) util_avg, and
from this time on the running_avg can only increase. It increases only
for the portion of CPU time the task is running and it's never decayed
while the task is preempted.

In your example above, if we look at the second task running, the one
"delayed", you will have:

 @t1 wakeup time:running_avg@t1 = util_avg@t1
since we initialize it with the
"sleep decayed" util_avg

  NOTE: this initialization is the important part, more on that on your
  next comment below related to the period being changed.

 @t2 switch_in time: running_avg@t2 = running_avg@t1
since it's not decayed while RUNNUBLE but !RUNNING

 while, meanwhile:

 util_avg@t2 < util_avg@t1
since it's decayed while RUNNABLE but !RUNNING

 @t3 dequeue time:   running_avg@t3 > running_avg@t2


When you say "it's only a shift of the execution" I agree, and indeed
the running_avg is not affected at all.

So, we can say that I'm "accounting for preemption time" but really,
the way I do it, is just by _not decaying_ PELT when the task is:

 RUNNABLE but !RUNNING

and that's why I say that running_avg gives you the same behavior of
util_avg _if_ the task was running alone, because in that case you never
have the condition above, and thus util_avg is never decayed.

[...]

> > @@ -3161,6 +3161,8 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg 
> > *sa,
> > sa->runnable_load_sum =
> > decay_load(sa->runnable_load_sum, periods);
> > sa->util_sum = decay_load((u64)(sa->util_sum), periods);
> > +   if (running)
> > +   sa->running_sum = decay_load(sa->running_sum, 
> > periods);
> 
> so you make some time disappearing from the equation as the signal
> will not be decayed and make the period looking shorter than reality.

Since at enqueue time we always initialize running_avg to whatever is
util_avg, I don't think we are messing up with the period at all.

The util_avg signal properly accounts for the period.
Thus, the combined effect of:

  - initializing running_avg at enqueue time with the value of
util_avg, already decayed to properly account for the task period
  - not decaying running_avg when the task is RUNNABLE but !RUNNING

should just result into "virtually" considering the two tasks of your
example "as if" they was running concurrently on two different CPUs.

Isn't it?

> With the example I mentioned above, the 2nd task will be seen as if
> its period becomes 6ms instead of 10ms which is not true and the
> utilization of the task is overestimated without any good reason

I don't see that overestimation... and I cannot even measure it.

If I run an experiment with your example above, while using the
performance governor to rule out any possible scale invariance
difference, here is what I measure:

   Task1 (40ms 

Re: [PATCH] dm: Use kzalloc for all structs with embedded biosets/mempools

2018-06-05 Thread David Sterba
On Tue, Jun 05, 2018 at 04:35:07PM +0200, David Sterba wrote:
> On Tue, Jun 05, 2018 at 08:22:22AM -0600, Jens Axboe wrote:
> > > I fucked up majorly on the bioset/mempool conversion - I forgot to check 
> > > that
> > > everything biosets/mempools were being embedded in was actually being 
> > > zeroed on
> > > allocation. Device mapper currently explodes, you'll probably want to 
> > > apply this
> > > patch post haste.
> > > 
> > > I have now done that auditing, for every single conversion - this patch 
> > > fixes
> > > everything I found. There do not seem to be any incorrect ones outside of 
> > > device
> > > mapper...
> > > 
> > > We'll probably want a second patch that either a) changes
> > > bioset_init()/mempool_init() to zero the passed in bioset/mempool first, 
> > > or b)
> > > my preference, WARN() or BUG() if they're passed memory that isn't zeroed.
> > 
> > Odd, haven't seen a crash, but probably requires kasan or poisoning to
> > trigger anything? Mike's tree also had the changes, since they were based
> > on the block tree.
> 
> eg. fstests/generic/081 crashes (trace below), no KASAN, PAGE_POISONING=y,
> PAGE_POISONING_NO_SANITY=y.
> 
> > I can queue this up and ship it later today. Mike, you want to review
> > this one?
> 
> Would be great to push that soon. The fstests build on several DM targets, the
> crashes lead to many test failures. I'm going to test the kzalloc fix now.

Tested-by: David Sterba 

on targets snapshot and thin.


Re: [PATCH 2/2] sched/fair: util_est: add running_sum tracking

2018-06-05 Thread Patrick Bellasi
On 04-Jun 10:46, Joel Fernandes wrote:
> Hi Patrick,
> 
> On Mon, Jun 04, 2018 at 05:06:00PM +0100, Patrick Bellasi wrote:
> > The estimated utilization of a task is affected by the task being
> > preempted, either by another FAIR task of by a task of an higher
> > priority class (i.e. RT or DL). Indeed, when a preemption happens, the
> > PELT utilization of the preempted task is going to be decayed a bit.
> > That's actually correct for utilization, which goal is to measure the
> > actual CPU bandwidth consumed by a task.
> > 
> > However, the above behavior does not allow to know exactly what is the
> > utilization a task "would have used" if it was running without
> > being preempted. Thus, this reduces the effectiveness of util_est for a
> > task because it does not always allow to predict how much CPU a task is
> > likely to require.
> > 
> > Let's improve the estimated utilization by adding a new "sort-of" PELT
> > signal, explicitly only for SE which has the following behavior:
> >  a) at each enqueue time of a task, its value is the (already decayed)
> > util_avg of the task being enqueued
> >  b) it's updated at each update_load_avg
> >  c) it can just increase, whenever the task is actually RUNNING on a
> > CPU, while it's kept stable while the task is RUNNANBLE but not
> > actively consuming CPU bandwidth
> > 
> > Such a defined signal is exactly equivalent to the util_avg for a task
> > running alone on a CPU while, in case the task is preempted, it allows
> > to know at dequeue time how much would have been the task utilization if
> > it was running alone on that CPU.
> > 
> > This new signal is named "running_avg", since it tracks the actual
> > RUNNING time of a task by ignoring any form of preemption.
> > 
> > From an implementation standpoint, since the sched_avg should fit into a
> > single cache line, we save space by tracking only a new runnable sum:
> >p->se.avg.running_sum
> > while the conversion into a running_avg is done on demand whenever we
> > need it, which is at task dequeue time when a new util_est sample has to
> > be collected.
> > 
> > The conversion from "running_sum" to "running_avg" is done by performing
> > a single division by LOAD_AVG_MAX, which introduces a small error since
> > in the division we do not consider the (sa->period_contrib - 1024)
> > compensation factor used in ___update_load_avg(). However:
> >  a) this error is expected to be limited (~2-3%)
> >  b) it can be safely ignored since the estimated utilization is the only
> > consumer which is already subject to small estimation errors
> > 
> > The additional corresponding benefit is that, at run-time, we pay the
> > cost for a additional sum and multiply, while the more expensive
> > division is required only at dequeue time.
> > 
> > Signed-off-by: Patrick Bellasi 
> > Cc: Ingo Molnar 
> > Cc: Peter Zijlstra 
> > Cc: Vincent Guittot 
> > Cc: Juri Lelli 
> > Cc: Todd Kjos 
> > Cc: Joel Fernandes 
> > Cc: Steve Muckle 
> > Cc: Dietmar Eggemann 
> > Cc: Morten Rasmussen 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@vger.kernel.org
> > ---
> >  include/linux/sched.h |  1 +
> >  kernel/sched/fair.c   | 16 ++--
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 9d8732dab264..2bd5f1c68da9 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -399,6 +399,7 @@ struct sched_avg {
> > u64 load_sum;
> > u64 runnable_load_sum;
> > u32 util_sum;
> > +   u32 running_sum;
> > u32 period_contrib;
> > unsigned long   load_avg;
> > unsigned long   runnable_load_avg;
> 
> Should update the documentation comments above the struct too?
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index f74441be3f44..5d54d6a4c31f 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3161,6 +3161,8 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg 
> > *sa,
> > sa->runnable_load_sum =
> > decay_load(sa->runnable_load_sum, periods);
> > sa->util_sum = decay_load((u64)(sa->util_sum), periods);
> > +   if (running)
> > +   sa->running_sum = decay_load(sa->running_sum, periods);
> >  
> > /*
> >  * Step 2
> > @@ -3176,8 +3178,10 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg 
> > *sa,
> > sa->load_sum += load * contrib;
> > if (runnable)
> > sa->runnable_load_sum += runnable * contrib;
> > -   if (running)
> > +   if (running) {
> > sa->util_sum += contrib * scale_cpu;
> > +   sa->running_sum += contrib * scale_cpu;
> > +   }
> >  
> > return periods;
> >  }
> > @@ -3963,6 +3967,12 @@ static inline void 

Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

2018-06-05 Thread Peter Zijlstra
On Tue, Jun 05, 2018 at 05:08:41PM +0200, Oleg Nesterov wrote:

> On 06/05, Kohli, Gaurav wrote:
> >
> > As last mentioned on mail, we are still seeing issue with the latest
> > approach and below is the susceptible race as mentioned earlier..
> > controller Thread   CPUHP Thread
> > takedown_cpu
> > kthread_park
> > kthread_parkme
> > Set KTHREAD_SHOULD_PARK
> > smpboot_thread_fn
> > set Task interruptible
> >
> >
> > wake_up_process
> >  if (!(p->state & state))
> > goto out;
> >
> > Kthread_parkme
> > SET TASK_PARKED
> > schedule
> > raw_spin_lock(>lock)
> > ttwu_remote
> > waiting for __task_rq_lock
> > context_switch
> >
> > finish_lock_switch
> >
> >
> >
> > Case TASK_PARKED
> > kthread_park_complete
> >
> >
> > SET Running
> 
> I think you are right.
> 
> And, now that I look at 85f1abe0019fcb3ea10df7029056cf42702283a8
> ("kthread, sched/wait: Fix kthread_parkme() completion issue") I see this note
> int the changelog:
> 
>   The alternative is to promote TASK_PARKED to a special state, this
>   guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING
>   and we'll end up doing the right thing, but this preserves the whole
>   icky business of potentially migating the still runnable thing.
> 
> OK, but __kthread_parkme() can be preempted before it calls schedule(), so the
> caller still can be migrated? Plus kthread_park_complete() can be called 
> twice.

Argh... I forgot TASK_DEAD does the whole thing with preempt_disable().
Let me stare at that a bit.


Re: [PATCH V3 00/17] perf tools and x86 PTI entry trampolines

2018-06-05 Thread Arnaldo Carvalho de Melo
Em Thu, May 31, 2018 at 03:09:38PM +0300, Adrian Hunter escreveu:
> On 22/05/18 13:54, Adrian Hunter wrote:
> > Hi
> > 
> > Here is V3 of patches to support x86 PTI entry trampolines in perf tools.
> > 
> > Patches also here:
> > 
> > http://git.infradead.org/users/ahunter/linux-perf.git/shortlog/refs/heads/perf-tools-kpti-v3
> > git://git.infradead.org/users/ahunter/linux-perf.git perf-tools-kpti-v3
> > 
> 
> Arnaldo has queued the tools patches, but there are still 3 kernel patches:
> 
>   kallsyms: Simplify update_iter_mod()
>   kallsyms, x86: Export addresses of syscall trampolines
>   x86: Add entry trampolines to kcore
> 
> Are there any further comments on these?  Can they be applied?

Would be interesting to have some acked-by from kernel folks :-\

- Arnaldo


Re: [GIT PULL] fscrypt updates for 4.18

2018-06-05 Thread Theodore Y. Ts'o
On Tue, Jun 05, 2018 at 05:13:35PM +0200, Richard Weinberger wrote:
> > Add bunch of cleanups, and add support for the Speck128/256
> > algorithms.  Yes, Speck is contrversial, but the intention is to use
> > them only for the lowest end Android devices, where the alternative
> > *really* is no encryption at all for data stored at rest.
> 
> Will Android tell me that Speck is being used?

Well, today Android doesn't tell you, "Your files aren't being
encrypted" in some big dialog box.  :-)  

Whether a phone is using no encryption or not, and what encryption
algorithm, is fundamentally a property of the phone.  It's used to
encrypt data at rest on the phone, so this isn't a data interchange
issue.  I'm sure there will be some way of finding out --- by looking
at the source code for that phone, if nothing else.

But I suspect that if you are buying a phone in a first world country,
you're never going to see a phone with Speck on it --- unless you
build your own AOSP build and deliberately enable it for yourself,
anyway.  :-)

This is really intended for "The Next Billion Users"; phones like
Android Go that was disclosed at the 2017 Google I/O conference, where
the unsubsidized price is well under $100 USD (so cheaper than the
original OLPC target).

- Ted


Re: [RFC PATCH -tip v5 17/27] arm64: kprobes: Don't call the ->break_handler() in arm kprobes code

2018-06-05 Thread Will Deacon
On Tue, Jun 05, 2018 at 12:56:44AM +0900, Masami Hiramatsu wrote:
> Don't call the ->break_handler() from the arm kprobes code,
> because it was only used by jprobes which got removed.
> 
> Signed-off-by: Masami Hiramatsu 
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: linux-arm-ker...@lists.infradead.org
> ---
>  arch/arm64/kernel/probes/kprobes.c |8 
>  1 file changed, 8 deletions(-)

Acked-by: Will Deacon 

Will


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Eric W. Biederman
Kirill Tkhai  writes:

> On 01.06.2018 18:25, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>>> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
 Michal Hocko  writes:
>>> [...]
> Group leader exiting early without tearing down the whole thread
> group should be quite rare as well. No question that somebody might do
> that on purpose though...

 The group leader exiting early is a completely legitimate and reasonable
 thing to do, even if it is rare.
>>>
>>> I am not saying it isn't legitimate. But the most common case is the
>>> main thread waiting for its threads or calling exit which would tear the
>>> whole group down. Is there any easy way to achieve this other than tkill
>>> to group leader? Calling exit(3) from the leader performs group exit
>>> IIRC.
>> 
>> pthread_exit from the group leader.
>> 
>>> I am not arguing this is non-issue. And it certainly is a problem once
>>> somebody wants to be nasty... I was more interested how often this
>>> really happens for sane workloads.
>> 
>> That is a fair question.  All I know for certain is that whatever Kirill
>> Tkhai's workload was it was triggering this the slow path.
>
> It was triggered on a server, where many VPS of many people are hosted.
> Sorry, I have no an idea what they did.

That at least tells us it was naturally occurring.  Which makes this a
real problem in the real world.

Eric


Re: [PATCH v4 2/6] mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Rob Herring
On Mon, Jun 4, 2018 at 6:32 AM, Matti Vaittinen
 wrote:
> On Fri, Jun 01, 2018 at 12:32:16PM -0500, Rob Herring wrote:
>> On Fri, Jun 1, 2018 at 1:25 AM, Matti Vaittinen
>>  wrote:
>> > On Thu, May 31, 2018 at 09:07:24AM -0500, Rob Herring wrote:
>> >> On Thu, May 31, 2018 at 5:23 AM, Matti Vaittinen
>> >>  wrote:
>> >> > On Thu, May 31, 2018 at 10:17:17AM +0300, Matti Vaittinen wrote:
>> >> >> On Wed, May 30, 2018 at 10:01:29PM -0500, Rob Herring wrote:
>> >> >> > On Wed, May 30, 2018 at 11:42:03AM +0300, Matti Vaittinen wrote:
>> >> >> > > Document devicetree bindings for ROHM BD71837 PMIC MFD.
>> >> >> > > + - interrupts: The interrupt line the device is 
>> >> >> > > connected to.
>> >> >> > > + - interrupt-controller  : Marks the device node as an interrupt 
>> >> >> > > controller.
>> >> >> >
>> >> >> > What sub blocks have interrupts?
>> >> >>
>> >> >> The PMIC can generate interrupts from events which cause it to reset.
>> >> >> Eg, irq from watchdog line change, power button pushes, reset request
>> >> >> via register interface etc. I don't know any generic handling for these
>> >> >> interrupts. In "normal" use-case this PMIC is powering the processor
>> >> >> where driver is running and I do not see reasonable handling because
>> >> >> power-reset is going to follow the irq.
>> >> >>
>> >> >
>> >> > Oh, but when reading this I understand that the interrupt-controller
>> >> > property should at least be optional.
>> >>
>> >> I don't think it should. The h/w either has an interrupt controller or
>> >> it doesn't.
>> >
>> > I hope this explains why I did this interrupt controller - please tell
>> > me if this is legitimate use-case and what you think of following:
>> >
>> > +Optional properties:
>> > + - interrupt-controller: Marks the device node as an interrupt 
>> > controller.
>> > + BD71837MWV can report different power state 
>> > change
>> > + events to other devices. Different events can be 
>> > seen
>> > + as separate BD71837 domain interrupts.
>>
>> To what other devices?
>
> Would it be better if I wrote "other drivers" instead? I think I've seen
> examples where MFD driver is just providing the interrupts for other
> drivers - like power-button input driver. Currently I have no such "irq
> consumer" drivers written. Still I would like to expose these interrupts
> so that they are ready for using if any platform using PMIC needs them.

No, worse. Interrupt binding describes interrupt connections between a
controller and devices (which could be sub-blocks in a device), not to
drivers.

I'm just curious as to what sub-blocks/devices there are. You don't
have to have a driver (yet) to define the devices.

>
> I think there are other similar drivers in tree. For example TPS6591x
> driver seems to be doing this. (Has MFD driver exposing the interrupts
> but no driver handling those). Maybe explanation like this would help:
>
> "The BD71837 driver only provides the infrastructure for the IRQs. The
> users can write his own driver to convert the IRQ into the event they
> wish. The IRQ can be used with the standard
> request_irq/enable_irq/disable_irq API inside the kernel." (I found this
> text from NXP forums and ruthlessly copied and modified it over here)

That's all OS details that have nothing to do with the binding. The
binding describes the h/w.

> If this is not feasible, then I will remove the irq handling from MFD
> (or leave code there but remove the binding information?) as I don't
> know what the irq handles should do in generic case.

I don't understand what you mean by generic. An IRQ has to be wired to
something. The only generic IRQs are GPIOs.

>> > + - #interrupt-cells: The number of cells to describe an IRQ should be 
>> > 1.
>> > +   The first cell is the IRQ number.
>> > +   masks from 
>> > ../interrupt-controller/interrupts.txt.
>
> Sorry this "masks from ../interrupt-controller/interrupts.txt." was
> accidentally pasted here. I should have deleted it.
>
>> I'm still not clear. Generally in a PMIC, you'd define an interrupt
>> controller when there's a common set of registers to manage sub-block
>> interrupts (typical mask/unmask, ack regs) and the subblocks
>> themselves have control of masking/unmasking interrupts. If there's
>> not a need to have these 2 levels of interrupt handling, then you
>> don't really need to define an interrupt controller.
>
> And to clarify - the PMIC can generate irq via one irq line. This is
> typical ACTIVE_LOW irq with 8 bit "write 1 to clear" status register and
> 8 bit mask register. The role of interrupt-controller code here is just
> to allow these 8 irq reasons to be seen as individual BD71837 domain
> interrupts. I just don't have the driver(s) for handling these
> interrupts.

If what I'm asking for above is still not clear, what are the 8 bits
defined as or what are those 8 lines 

Re: [PATCH v5 3/4] clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Rob Herring
On Mon, Jun 04, 2018 at 04:18:53PM +0300, Matti Vaittinen wrote:
> Document devicetree bindings for ROHM BD71837 PMIC clock output.
> 
> Signed-off-by: Matti Vaittinen 
> ---
>  .../bindings/clock/rohm,bd71837-clock.txt  | 38 
> ++
>  1 file changed, 38 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> 
> diff --git a/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt 
> b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> new file mode 100644
> index ..771acfe34114
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> @@ -0,0 +1,38 @@
> +ROHM BD71837 Power Management Integrated Circuit clock bindings

This needs to be added to the MFD doc. One node should be covered by at 
most 1 document.

Rob


[PATCH] spdxcheck: Work with current HEAD LICENSES/ directory

2018-06-05 Thread Joe Perches
Depending on how old your -next tree is, it may
not have a master that has the LICENSES directory.

Change the lookup to HEAD and find whatever
LICENSE directory files are used in that branch.

Miscellanea:

o Remove the checkpatch test as it will have its own
  SPDX license identifier.

Signed-off-by: Joe Perches 
---
 scripts/spdxcheck.py | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 7deaef297f52..a6041f29b18e 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -32,7 +32,7 @@ def read_spdxdata(repo):
 
 # The subdirectories of LICENSES in the kernel source
 license_dirs = [ "preferred", "other", "exceptions" ]
-lictree = repo.heads.master.commit.tree['LICENSES']
+lictree = repo.head.commit.tree['LICENSES']
 
 spdx = SPDXdata()
 
@@ -199,8 +199,6 @@ def scan_git_tree(tree):
 continue
 if el.path.find("license-rules.rst") >= 0:
 continue
-if el.path == 'scripts/checkpatch.pl':
-continue
 if not os.path.isfile(el.path):
 continue
 parser.parse_lines(open(el.path), args.maxlines, el.path)


<    1   2   3   4   5   6   7   8   9   10   >