Re: [PATCH v2] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-07-15 Thread Rafael J. Wysocki
On Monday, July 11, 2016 11:47:53 AM Viresh Kumar wrote:
> On 30-06-16, 11:53, Akshay Adiga wrote:
> > Refactoring code to use frequency table index instead of pstate_id.
> > This abstraction will make the code independent of the pstate values.
> > 
> > - No functional changes
> > - The highest frequency is at frequency table index 0 and the frequency
> >   decreases as the index increases.
> > - Macros pstates_to_idx() and idx_to_pstate() can be used for conversion
> >   between pstate_id and index.
> > - powernv_pstate_info now contains frequency table index to min, max and
> >   nominal frequency (instead of pstate_ids)
> > - global_pstate_info new stores index values instead pstate ids.
> > - variables renamed as *_idx which now store index instead of pstate
> > 
> > Signed-off-by: Akshay Adiga 
> > Reviewed-by: Gautham R. Shenoy 
> > ---
> > Changes from v1:
> >   - changed macro names from get_pstate()/ get_index() to 
> > idx_to_pstate()/ pstate_to_idx()
> >   - Renamed variables that store index instead of pstate_id to *_idx
> >   - Retained previous printk's 
> > 
> > v1 : http://marc.info/?l=linux-pm&m=146677701501225&w=1
> > 
> >  drivers/cpufreq/powernv-cpufreq.c | 177 
> > ++
> >  1 file changed, 102 insertions(+), 75 deletions(-)
> 
> Haven't done in-depth review, but I trust that Gautham has done it :)
> 
> Acked-by: Viresh Kumar 

Patch applied, thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread H. Peter Anvin
,Johannes Weiner ,Alexei Starovoitov 
,Arnaldo Carvalho de Melo ,Alexander Shishkin 
,Balbir Singh 
,Markus Elfring ,"David 
S. Miller" ,Nicolas Dichtel 
,Andrew Morton 
,Konstantin Khlebnikov ,Jiri Slaby 
,Cyrill Gorcunov ,Michal Hocko 
,Vlastimil Babka ,Dave Hansen 
,Greg Kroah-Hartman 
,Dan Carpenter ,Michael 
Kerrisk ,"Kirill A. Shutemov" 
,Marcus Gelderie ,Vladimir 
Davydov ,Joe Perches ,Frederic 
Weisbecker ,Andrea Arcangeli ,!
 "Eric W.
Biederman" ,Andi Kleen ,Oleg 
Nesterov ,Stas Sergeev ,Amanieu d'Antras 
,Richard Weinberger ,Wang Xiaoqiang 
,Helge Deller ,Mateusz Guzik 
,Alex Thorlton ,Ben Segall 
,John Stultz ,Rik van Riel 
,Eric B Munson ,Alexey Klimov 
,Chen Gang ,Andrey Ryabinin 
,David Rientjes ,Hugh Dickins 
,Alexander Kuleshov ,"open 
list:DOCUMENTATION" ,"open list:IA64 (Itanium) 
PLATFORM" ,"open list:KERNEL VIRTUAL MACHINE (KVM) 
FOR POWERPC" ,"open list:KERNEL VIRTUAL MACHINE (KVM)" 
,"open list:LINUX FOR POWERPC!
  (32-BIT
AND 64-BIT)" ,"open list:INFINIBAND SUBSYSTEM" 
,"open list:FILESYSTEMS (VFS and infrastructure)" 
,"open list:CONTROL GROUP (CGROUP)" 
,"open list:BPF (Safe dynamic programs and tools)" 
,"open list:MEMORY MANAGEMENT" 
Message-ID: 

On July 15, 2016 6:59:56 AM PDT, Peter Zijlstra  wrote:
>On Fri, Jul 15, 2016 at 01:52:48PM +, Topi Miettinen wrote:
>> On 07/15/16 12:43, Peter Zijlstra wrote:
>> > On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
>> >> Hello,
>> >>
>> >> There are many basic ways to control processes, including
>capabilities,
>> >> cgroups and resource limits. However, there are far fewer ways to
>find out
>> >> useful values for the limits, except blind trial and error.
>> >>
>> >> This patch series attempts to fix that by giving at least a nice
>starting
>> >> point from the highwater mark values of the resources in question.
>> >> I looked where each limit is checked and added a call to update
>the mark
>> >> nearby.
>> > 
>> > And how is that useful? Setting things to the high watermark is
>> > basically the same as not setting the limit at all.
>> 
>> What else would you use, too small limits?
>
>That question doesn't make sense.
>
>What's the point of setting a limit if it ends up being the same as
>no-limit (aka unlimited).
>
>If you cannot explain; and you have not so far; what use these values
>are, why would we look at the patches.

One reason is to catch a malfunctioning process rather than dragging the whole 
system down with it.  It could also be useful for development.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH v2 02/11] mm: Hardened usercopy

2016-07-15 Thread Daniel Micay
> I'd like it to dump stack and be fatal to the process involved, but
> yeah, I guess BUG() would work. Creating an infrastructure for
> handling security-related Oopses can be done separately from this
> (and
> I'd like to see that added, since it's a nice bit of configurable
> reactivity to possible attacks).

In grsecurity, the oops handling also uses do_group_exit instead of
do_exit but both that change (or at least the option to do it) and the
exploit handling could be done separately from this without actually
needing special treatment for USERCOPY. Could expose is as something
like panic_on_oops=2 as a balance between the existing options.

signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH v2 02/11] mm: Hardened usercopy

2016-07-15 Thread Daniel Micay
> This could be a BUG, but I'd rather not panic the entire kernel.

It seems unlikely that it will panic without panic_on_oops and that's
an explicit opt-in to taking down the system on kernel logic errors
exactly like this. In grsecurity, it calls the kernel exploit handling
logic (panic if root, otherwise kill all process of that user and ban
them until reboot) but that same logic is also called for BUG via oops
handling so there's only really a distinction with panic_on_oops=1.

Does it make sense to be less fatal for a fatal assertion that's more
likely to be security-related? Maybe you're worried about having some
false positives for the whitelisting portion, but I don't think those
will lurk around very long with the way this works.

signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Topi Miettinen
On 07/15/16 14:19, Richard Weinberger wrote:
> Hi!
> 
> Am 15.07.2016 um 12:35 schrieb Topi Miettinen:
>> Hello,
>>
>> There are many basic ways to control processes, including capabilities,
>> cgroups and resource limits. However, there are far fewer ways to find out
>> useful values for the limits, except blind trial and error.
>>
>> This patch series attempts to fix that by giving at least a nice starting
>> point from the highwater mark values of the resources in question.
>> I looked where each limit is checked and added a call to update the mark
>> nearby.
>>
>> Example run of program from Documentation/accounting/getdelauys.c:
>>
>> ./getdelays -R -p `pidof smartd`
>> printing resource accounting
>> RLIMIT_CPU=0
>> RLIMIT_FSIZE=0
>> RLIMIT_DATA=18198528
>> RLIMIT_STACK=135168
>> RLIMIT_CORE=0
>> RLIMIT_RSS=0
>> RLIMIT_NPROC=1
>> RLIMIT_NOFILE=55
>> RLIMIT_MEMLOCK=0
>> RLIMIT_AS=130879488
>> RLIMIT_LOCKS=0
>> RLIMIT_SIGPENDING=0
>> RLIMIT_MSGQUEUE=0
>> RLIMIT_NICE=0
>> RLIMIT_RTPRIO=0
>> RLIMIT_RTTIME=0
>>
>> ./getdelays -R -C /sys/fs/cgroup/systemd/system.slice/smartd.service/
>> printing resource accounting
>> sleeping 1, blocked 0, running 0, stopped 0, uninterruptible 0
>> RLIMIT_CPU=0
>> RLIMIT_FSIZE=0
>> RLIMIT_DATA=18198528
>> RLIMIT_STACK=135168
>> RLIMIT_CORE=0
>> RLIMIT_RSS=0
>> RLIMIT_NPROC=1
>> RLIMIT_NOFILE=55
>> RLIMIT_MEMLOCK=0
>> RLIMIT_AS=130879488
>> RLIMIT_LOCKS=0
>> RLIMIT_SIGPENDING=0
>> RLIMIT_MSGQUEUE=0
>> RLIMIT_NICE=0
>> RLIMIT_RTPRIO=0
>> RLIMIT_RTTIME=0
>>
>> In this example, smartd is running as a non-root user. The presented
>> values can be used as a starting point for giving new limits to the
>> service.
> 
> I don't think it is worth sprinkling the kernel with 
> update_resource_highwatermark()
> calls just to get these metrics.
> 
> Can't we teach the existing perf infrastructure to collect these 
> highwatermarks for us?

I don't know. What kind of changes do you think would be needed?

-Topi

> 
> Thanks,
> //richard
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Topi Miettinen
On 07/15/16 13:59, Peter Zijlstra wrote:
> On Fri, Jul 15, 2016 at 01:52:48PM +, Topi Miettinen wrote:
>> On 07/15/16 12:43, Peter Zijlstra wrote:
>>> On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
 Hello,

 There are many basic ways to control processes, including capabilities,
 cgroups and resource limits. However, there are far fewer ways to find out
 useful values for the limits, except blind trial and error.

 This patch series attempts to fix that by giving at least a nice starting
 point from the highwater mark values of the resources in question.
 I looked where each limit is checked and added a call to update the mark
 nearby.
>>>
>>> And how is that useful? Setting things to the high watermark is
>>> basically the same as not setting the limit at all.
>>
>> What else would you use, too small limits?
> 
> That question doesn't make sense.
> 
> What's the point of setting a limit if it ends up being the same as
> no-limit (aka unlimited).

Having a limit is not the same as not having any limits at all. You're
in a way right that good limits don't affect the program normally. But
they can make a difference if the flow is not normal. For example a
successful exploit or a memory leak bug could cause RLIMIT_AS to trigger.

> 
> If you cannot explain; and you have not so far; what use these values
> are, why would we look at the patches.
> 

The use case is to allow system administrators, distro maintainers and
developers to configure systems to use the resource limits. The limits
are not very useful right now, as there is no way to figure out what
values to use. There are a few /proc files to look, for example current
number of file descriptors (for RLIMIT_NOFILE) could be counted via
/proc/pid/fd. But now there is no way to know if there were more in use
at some point. Likewise, a program can use more address space when you
are not looking. The source code does not tell these things explicitly.

-Topi

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Topi Miettinen
On 07/15/16 13:04, Balbir Singh wrote:
> On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
>> Hello,
>>
>> There are many basic ways to control processes, including capabilities,
>> cgroups and resource limits. However, there are far fewer ways to find out
>> useful values for the limits, except blind trial and error.
>>
>> This patch series attempts to fix that by giving at least a nice starting
>> point from the highwater mark values of the resources in question.
>> I looked where each limit is checked and added a call to update the mark
>> nearby.
>>
>> Example run of program from Documentation/accounting/getdelauys.c:
>>
>> ./getdelays -R -p `pidof smartd`
>> printing resource accounting
>> RLIMIT_CPU=0
>> RLIMIT_FSIZE=0
>> RLIMIT_DATA=18198528
>> RLIMIT_STACK=135168
>> RLIMIT_CORE=0
>> RLIMIT_RSS=0
>> RLIMIT_NPROC=1
>> RLIMIT_NOFILE=55
>> RLIMIT_MEMLOCK=0
>> RLIMIT_AS=130879488
>> RLIMIT_LOCKS=0
>> RLIMIT_SIGPENDING=0
>> RLIMIT_MSGQUEUE=0
>> RLIMIT_NICE=0
>> RLIMIT_RTPRIO=0
>> RLIMIT_RTTIME=0
>>
>> ./getdelays -R -C /sys/fs/cgroup/systemd/system.slice/smartd.service/
>> printing resource accounting
>> sleeping 1, blocked 0, running 0, stopped 0, uninterruptible 0
>> RLIMIT_CPU=0
>> RLIMIT_FSIZE=0
>> RLIMIT_DATA=18198528
>> RLIMIT_STACK=135168
>> RLIMIT_CORE=0
>> RLIMIT_RSS=0
>> RLIMIT_NPROC=1
>> RLIMIT_NOFILE=55
>> RLIMIT_MEMLOCK=0
>> RLIMIT_AS=130879488
>> RLIMIT_LOCKS=0
>> RLIMIT_SIGPENDING=0
>> RLIMIT_MSGQUEUE=0
>> RLIMIT_NICE=0
>> RLIMIT_RTPRIO=0
>> RLIMIT_RTTIME=0
> 
> Does this mean that rlimit_data and rlimit_stack should be set to the
> values as specified by the data above?

My plan is that either system administrator, distro maintainer or even
upstream developer can get reasonable values for the limits. They may
still be wrong, but things would be better than without any help to
configure the system.

> 
> Do we expect a smart user space daemon to then tweak the RLIMIT values?

Someone could write an autotuning daemon that checks if the system has
changed (for example due to upgrade) and then run some tests to
reconfigure the system. But the limits are a bit too fragile, or rather,
applications can't handle failure, so I don't know if that would really
work.

-Topi


> 
> Balbir Singh.
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Richard Weinberger
Hi!

Am 15.07.2016 um 12:35 schrieb Topi Miettinen:
> Hello,
> 
> There are many basic ways to control processes, including capabilities,
> cgroups and resource limits. However, there are far fewer ways to find out
> useful values for the limits, except blind trial and error.
> 
> This patch series attempts to fix that by giving at least a nice starting
> point from the highwater mark values of the resources in question.
> I looked where each limit is checked and added a call to update the mark
> nearby.
> 
> Example run of program from Documentation/accounting/getdelauys.c:
> 
> ./getdelays -R -p `pidof smartd`
> printing resource accounting
> RLIMIT_CPU=0
> RLIMIT_FSIZE=0
> RLIMIT_DATA=18198528
> RLIMIT_STACK=135168
> RLIMIT_CORE=0
> RLIMIT_RSS=0
> RLIMIT_NPROC=1
> RLIMIT_NOFILE=55
> RLIMIT_MEMLOCK=0
> RLIMIT_AS=130879488
> RLIMIT_LOCKS=0
> RLIMIT_SIGPENDING=0
> RLIMIT_MSGQUEUE=0
> RLIMIT_NICE=0
> RLIMIT_RTPRIO=0
> RLIMIT_RTTIME=0
> 
> ./getdelays -R -C /sys/fs/cgroup/systemd/system.slice/smartd.service/
> printing resource accounting
> sleeping 1, blocked 0, running 0, stopped 0, uninterruptible 0
> RLIMIT_CPU=0
> RLIMIT_FSIZE=0
> RLIMIT_DATA=18198528
> RLIMIT_STACK=135168
> RLIMIT_CORE=0
> RLIMIT_RSS=0
> RLIMIT_NPROC=1
> RLIMIT_NOFILE=55
> RLIMIT_MEMLOCK=0
> RLIMIT_AS=130879488
> RLIMIT_LOCKS=0
> RLIMIT_SIGPENDING=0
> RLIMIT_MSGQUEUE=0
> RLIMIT_NICE=0
> RLIMIT_RTPRIO=0
> RLIMIT_RTTIME=0
> 
> In this example, smartd is running as a non-root user. The presented
> values can be used as a starting point for giving new limits to the
> service.

I don't think it is worth sprinkling the kernel with 
update_resource_highwatermark()
calls just to get these metrics.

Can't we teach the existing perf infrastructure to collect these highwatermarks 
for us?

Thanks,
//richard
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Peter Zijlstra
On Fri, Jul 15, 2016 at 01:52:48PM +, Topi Miettinen wrote:
> On 07/15/16 12:43, Peter Zijlstra wrote:
> > On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
> >> Hello,
> >>
> >> There are many basic ways to control processes, including capabilities,
> >> cgroups and resource limits. However, there are far fewer ways to find out
> >> useful values for the limits, except blind trial and error.
> >>
> >> This patch series attempts to fix that by giving at least a nice starting
> >> point from the highwater mark values of the resources in question.
> >> I looked where each limit is checked and added a call to update the mark
> >> nearby.
> > 
> > And how is that useful? Setting things to the high watermark is
> > basically the same as not setting the limit at all.
> 
> What else would you use, too small limits?

That question doesn't make sense.

What's the point of setting a limit if it ends up being the same as
no-limit (aka unlimited).

If you cannot explain; and you have not so far; what use these values
are, why would we look at the patches.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Topi Miettinen
On 07/15/16 12:43, Peter Zijlstra wrote:
> On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
>> Hello,
>>
>> There are many basic ways to control processes, including capabilities,
>> cgroups and resource limits. However, there are far fewer ways to find out
>> useful values for the limits, except blind trial and error.
>>
>> This patch series attempts to fix that by giving at least a nice starting
>> point from the highwater mark values of the resources in question.
>> I looked where each limit is checked and added a call to update the mark
>> nearby.
> 
> And how is that useful? Setting things to the high watermark is
> basically the same as not setting the limit at all.

What else would you use, too small limits?

-Topi

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Balbir Singh
On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
> Hello,
> 
> There are many basic ways to control processes, including capabilities,
> cgroups and resource limits. However, there are far fewer ways to find out
> useful values for the limits, except blind trial and error.
> 
> This patch series attempts to fix that by giving at least a nice starting
> point from the highwater mark values of the resources in question.
> I looked where each limit is checked and added a call to update the mark
> nearby.
> 
> Example run of program from Documentation/accounting/getdelauys.c:
> 
> ./getdelays -R -p `pidof smartd`
> printing resource accounting
> RLIMIT_CPU=0
> RLIMIT_FSIZE=0
> RLIMIT_DATA=18198528
> RLIMIT_STACK=135168
> RLIMIT_CORE=0
> RLIMIT_RSS=0
> RLIMIT_NPROC=1
> RLIMIT_NOFILE=55
> RLIMIT_MEMLOCK=0
> RLIMIT_AS=130879488
> RLIMIT_LOCKS=0
> RLIMIT_SIGPENDING=0
> RLIMIT_MSGQUEUE=0
> RLIMIT_NICE=0
> RLIMIT_RTPRIO=0
> RLIMIT_RTTIME=0
> 
> ./getdelays -R -C /sys/fs/cgroup/systemd/system.slice/smartd.service/
> printing resource accounting
> sleeping 1, blocked 0, running 0, stopped 0, uninterruptible 0
> RLIMIT_CPU=0
> RLIMIT_FSIZE=0
> RLIMIT_DATA=18198528
> RLIMIT_STACK=135168
> RLIMIT_CORE=0
> RLIMIT_RSS=0
> RLIMIT_NPROC=1
> RLIMIT_NOFILE=55
> RLIMIT_MEMLOCK=0
> RLIMIT_AS=130879488
> RLIMIT_LOCKS=0
> RLIMIT_SIGPENDING=0
> RLIMIT_MSGQUEUE=0
> RLIMIT_NICE=0
> RLIMIT_RTPRIO=0
> RLIMIT_RTTIME=0

Does this mean that rlimit_data and rlimit_stack should be set to the
values as specified by the data above?

Do we expect a smart user space daemon to then tweak the RLIMIT values?

Balbir Singh.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Peter Zijlstra
On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
> Hello,
> 
> There are many basic ways to control processes, including capabilities,
> cgroups and resource limits. However, there are far fewer ways to find out
> useful values for the limits, except blind trial and error.
> 
> This patch series attempts to fix that by giving at least a nice starting
> point from the highwater mark values of the resources in question.
> I looked where each limit is checked and added a call to update the mark
> nearby.

And how is that useful? Setting things to the high watermark is
basically the same as not setting the limit at all.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v10, 3/7] soc: fsl: add GUTS driver for QorIQ platforms

2016-07-15 Thread Paul Gortmaker
[Re: [v10, 3/7] soc: fsl: add GUTS driver for QorIQ platforms] On 15/07/2016 
(Fri 14:12) Scott Wood wrote:

> On Fri, 2016-07-15 at 12:43 -0400, Paul Gortmaker wrote:
> > > +source "drivers/soc/fsl/qe/Kconfig"

[...]

> > > +
> > > +config FSL_GUTS
> > > +   bool
> > > diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
> > > index 203307f..02afb7f 100644
> > > --- a/drivers/soc/fsl/Makefile
> > > +++ b/drivers/soc/fsl/Makefile
> > > @@ -4,3 +4,4 @@
> > > 
> > >  obj-$(CONFIG_QUICC_ENGINE) += qe/
> > >  obj-$(CONFIG_CPM)  += qe/
> > > +obj-$(CONFIG_FSL_GUTS) += guts.o
> > > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> > > new file mode 100644
> > > index 000..fa155e6
> > > --- /dev/null
> > > +++ b/drivers/soc/fsl/guts.c
> > > @@ -0,0 +1,119 @@
> > > +/*
> > > + * Freescale QorIQ Platforms GUTS Driver
> > > + *
> > > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License as published by
> > > + * the Free Software Foundation; either version 2 of the License, or
> > > + * (at your option) any later version.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > Seems there was lots of discussion on this.  If it does end up being
> > resent, it would be nice to get the module.h and other modular stuff
> > gone since it is a bool Kconfig.
> 
> I plan to resend just the GUTS driver portion and send it through the PPC
> tree.
> 
> I don't see any modular stuff in there besides the linux/module.h include.

Great.  Normally I'm seeing the MODULE_DEVICE_TABLE and MODULE_AUTHOR
and MODULE_LICENSE etc, so it has (unfortunately) become a knee jerk
reaction to assume the latter follows a module.h presence...  thanks for
removing the extraneous include.

Paul.
--

> 
> -Scott
> 
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 11/11] mm: SLUB hardened usercopy support

2016-07-15 Thread Kees Cook
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLUB allocator to catch any copies that may span objects. Includes a
redzone handling fix discovered by Michael Ellerman.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
Tested-by: Michael Ellerman 
---
 init/Kconfig |  1 +
 mm/slub.c| 36 
 2 files changed, 37 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index 798c2020ee7c..1c4711819dfd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1765,6 +1765,7 @@ config SLAB
 
 config SLUB
bool "SLUB (Unqueued Allocator)"
+   select HAVE_HARDENED_USERCOPY_ALLOCATOR
help
   SLUB is a slab allocator that minimizes cache line usage
   instead of managing queues of cached objects (SLAB approach).
diff --git a/mm/slub.c b/mm/slub.c
index 825ff4505336..7dee3d9a5843 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3614,6 +3614,42 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 EXPORT_SYMBOL(__kmalloc_node);
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+/*
+ * Rejects objects that are incorrectly sized.
+ *
+ * Returns NULL if check passes, otherwise const char * to name of cache
+ * to indicate an error.
+ */
+const char *__check_heap_object(const void *ptr, unsigned long n,
+   struct page *page)
+{
+   struct kmem_cache *s;
+   unsigned long offset;
+   size_t object_size;
+
+   /* Find object and usable object size. */
+   s = page->slab_cache;
+   object_size = slab_ksize(s);
+
+   /* Find offset within object. */
+   offset = (ptr - page_address(page)) % s->size;
+
+   /* Adjust for redzone and reject if within the redzone. */
+   if (kmem_cache_debug(s) && s->flags & SLAB_RED_ZONE) {
+   if (offset < s->red_left_pad)
+   return s->name;
+   offset -= s->red_left_pad;
+   }
+
+   /* Allow address range falling entirely within object size. */
+   if (offset <= object_size && n <= object_size - offset)
+   return NULL;
+
+   return s->name;
+}
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 static size_t __ksize(const void *object)
 {
struct page *page;
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 10/11] mm: SLAB hardened usercopy support

2016-07-15 Thread Kees Cook
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLAB allocator to catch any copies that may span objects.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
Tested-By: Valdis Kletnieks 
---
 init/Kconfig |  1 +
 mm/slab.c| 30 ++
 2 files changed, 31 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index f755a602d4a1..798c2020ee7c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1757,6 +1757,7 @@ choice
 
 config SLAB
bool "SLAB"
+   select HAVE_HARDENED_USERCOPY_ALLOCATOR
help
  The regular slab allocator that is established and known to work
  well in all environments. It organizes cache hot objects in
diff --git a/mm/slab.c b/mm/slab.c
index cc8bbc1e6bc9..5e2d5f349aca 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4477,6 +4477,36 @@ static int __init slab_proc_init(void)
 module_init(slab_proc_init);
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+/*
+ * Rejects objects that are incorrectly sized.
+ *
+ * Returns NULL if check passes, otherwise const char * to name of cache
+ * to indicate an error.
+ */
+const char *__check_heap_object(const void *ptr, unsigned long n,
+   struct page *page)
+{
+   struct kmem_cache *cachep;
+   unsigned int objnr;
+   unsigned long offset;
+
+   /* Find and validate object. */
+   cachep = page->slab_cache;
+   objnr = obj_to_index(cachep, page, (void *)ptr);
+   BUG_ON(objnr >= cachep->num);
+
+   /* Find offset within object. */
+   offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
+
+   /* Allow address range falling entirely within object size. */
+   if (offset <= cachep->object_size && n <= cachep->object_size - offset)
+   return NULL;
+
+   return cachep->name;
+}
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 /**
  * ksize - get the actual amount of memory allocated for a given object
  * @objp: Pointer to the object
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 08/11] sparc/uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on sparc.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
---
 arch/sparc/Kconfig  |  1 +
 arch/sparc/include/asm/uaccess_32.h | 14 ++
 arch/sparc/include/asm/uaccess_64.h | 11 +--
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 546293d9e6c5..59b09600dd32 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -43,6 +43,7 @@ config SPARC
select OLD_SIGSUSPEND
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
+   select HAVE_ARCH_HARDENED_USERCOPY
 
 config SPARC32
def_bool !64BIT
diff --git a/arch/sparc/include/asm/uaccess_32.h 
b/arch/sparc/include/asm/uaccess_32.h
index 57aca2792d29..341a5a133f48 100644
--- a/arch/sparc/include/asm/uaccess_32.h
+++ b/arch/sparc/include/asm/uaccess_32.h
@@ -248,22 +248,28 @@ unsigned long __copy_user(void __user *to, const void 
__user *from, unsigned lon
 
 static inline unsigned long copy_to_user(void __user *to, const void *from, 
unsigned long n)
 {
-   if (n && __access_ok((unsigned long) to, n))
+   if (n && __access_ok((unsigned long) to, n)) {
+   if (!__builtin_constant_p(n))
+   check_object_size(from, n, true);
return __copy_user(to, (__force void __user *) from, n);
-   else
+   } else
return n;
 }
 
 static inline unsigned long __copy_to_user(void __user *to, const void *from, 
unsigned long n)
 {
+   if (!__builtin_constant_p(n))
+   check_object_size(from, n, true);
return __copy_user(to, (__force void __user *) from, n);
 }
 
 static inline unsigned long copy_from_user(void *to, const void __user *from, 
unsigned long n)
 {
-   if (n && __access_ok((unsigned long) from, n))
+   if (n && __access_ok((unsigned long) from, n)) {
+   if (!__builtin_constant_p(n))
+   check_object_size(to, n, false);
return __copy_user((__force void __user *) to, from, n);
-   else
+   } else
return n;
 }
 
diff --git a/arch/sparc/include/asm/uaccess_64.h 
b/arch/sparc/include/asm/uaccess_64.h
index e9a51d64974d..8bda94fab8e8 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -210,8 +210,12 @@ unsigned long copy_from_user_fixup(void *to, const void 
__user *from,
 static inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long size)
 {
-   unsigned long ret = ___copy_from_user(to, from, size);
+   unsigned long ret;
 
+   if (!__builtin_constant_p(size))
+   check_object_size(to, size, false);
+
+   ret = ___copy_from_user(to, from, size);
if (unlikely(ret))
ret = copy_from_user_fixup(to, from, size);
 
@@ -227,8 +231,11 @@ unsigned long copy_to_user_fixup(void __user *to, const 
void *from,
 static inline unsigned long __must_check
 copy_to_user(void __user *to, const void *from, unsigned long size)
 {
-   unsigned long ret = ___copy_to_user(to, from, size);
+   unsigned long ret;
 
+   if (!__builtin_constant_p(size))
+   check_object_size(from, size, true);
+   ret = ___copy_to_user(to, from, size);
if (unlikely(ret))
ret = copy_to_user_fixup(to, from, size);
return ret;
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 09/11] s390/uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on s390.

Signed-off-by: Kees Cook 
---
 arch/s390/Kconfig   | 1 +
 arch/s390/lib/uaccess.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index a8c259059adf..9f694311c9ed 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -122,6 +122,7 @@ config S390
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_EARLY_PFN_TO_NID
+   select HAVE_ARCH_HARDENED_USERCOPY
select HAVE_ARCH_JUMP_LABEL
select CPU_NO_EFFICIENT_FFS if !HAVE_MARCH_Z9_109_FEATURES
select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c
index ae4de559e3a0..6986c20166f0 100644
--- a/arch/s390/lib/uaccess.c
+++ b/arch/s390/lib/uaccess.c
@@ -104,6 +104,7 @@ static inline unsigned long copy_from_user_mvcp(void *x, 
const void __user *ptr,
 
 unsigned long __copy_from_user(void *to, const void __user *from, unsigned 
long n)
 {
+   check_object_size(to, n, false);
if (static_branch_likely(&have_mvcos))
return copy_from_user_mvcos(to, from, n);
return copy_from_user_mvcp(to, from, n);
@@ -177,6 +178,7 @@ static inline unsigned long copy_to_user_mvcs(void __user 
*ptr, const void *x,
 
 unsigned long __copy_to_user(void __user *to, const void *from, unsigned long 
n)
 {
+   check_object_size(from, n, true);
if (static_branch_likely(&have_mvcos))
return copy_to_user_mvcos(to, from, n);
return copy_to_user_mvcs(to, from, n);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 06/11] ia64/uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on ia64.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
---
 arch/ia64/Kconfig   |  1 +
 arch/ia64/include/asm/uaccess.h | 18 +++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index f80758cb7157..32a87ef516a0 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -53,6 +53,7 @@ config IA64
select MODULES_USE_ELF_RELA
select ARCH_USE_CMPXCHG_LOCKREF
select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_HARDENED_USERCOPY
default y
help
  The Itanium Processor Family is Intel's 64-bit successor to
diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 2189d5ddc1ee..465c70982f40 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -241,12 +241,18 @@ extern unsigned long __must_check __copy_user (void 
__user *to, const void __use
 static inline unsigned long
 __copy_to_user (void __user *to, const void *from, unsigned long count)
 {
+   if (!__builtin_constant_p(count))
+   check_object_size(from, count, true);
+
return __copy_user(to, (__force void __user *) from, count);
 }
 
 static inline unsigned long
 __copy_from_user (void *to, const void __user *from, unsigned long count)
 {
+   if (!__builtin_constant_p(count))
+   check_object_size(to, count, false);
+
return __copy_user((__force void __user *) to, from, count);
 }
 
@@ -258,8 +264,11 @@ __copy_from_user (void *to, const void __user *from, 
unsigned long count)
const void *__cu_from = (from); 
\
long __cu_len = (n);
\

\
-   if (__access_ok(__cu_to, __cu_len, get_fs()))   
\
-   __cu_len = __copy_user(__cu_to, (__force void __user *) 
__cu_from, __cu_len);   \
+   if (__access_ok(__cu_to, __cu_len, get_fs())) { 
\
+   if (!__builtin_constant_p(n))   
\
+   check_object_size(__cu_from, __cu_len, true);   
\
+   __cu_len = __copy_user(__cu_to, (__force void __user *)  
__cu_from, __cu_len);  \
+   }   
\
__cu_len;   
\
 })
 
@@ -270,8 +279,11 @@ __copy_from_user (void *to, const void __user *from, 
unsigned long count)
long __cu_len = (n);
\

\
__chk_user_ptr(__cu_from);  
\
-   if (__access_ok(__cu_from, __cu_len, get_fs())) 
\
+   if (__access_ok(__cu_from, __cu_len, get_fs())) {   
\
+   if (!__builtin_constant_p(n))   
\
+   check_object_size(__cu_to, __cu_len, false);
\
__cu_len = __copy_user((__force void __user *) __cu_to, 
__cu_from, __cu_len);   \
+   }   
\
__cu_len;   
\
 })
 
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 07/11] powerpc/uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on powerpc.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
Tested-by: Michael Ellerman 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/uaccess.h | 21 +++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 01f7464d9fea..b7a18b2604be 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -164,6 +164,7 @@ config PPC
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
select HAVE_LIVEPATCH if HAVE_DYNAMIC_FTRACE_WITH_REGS
+   select HAVE_ARCH_HARDENED_USERCOPY
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index b7c20f0b8fbe..c1dc6c14deb8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -310,10 +310,15 @@ static inline unsigned long copy_from_user(void *to,
 {
unsigned long over;
 
-   if (access_ok(VERIFY_READ, from, n))
+   if (access_ok(VERIFY_READ, from, n)) {
+   if (!__builtin_constant_p(n))
+   check_object_size(to, n, false);
return __copy_tofrom_user((__force void __user *)to, from, n);
+   }
if ((unsigned long)from < TASK_SIZE) {
over = (unsigned long)from + n - TASK_SIZE;
+   if (!__builtin_constant_p(n - over))
+   check_object_size(to, n - over, false);
return __copy_tofrom_user((__force void __user *)to, from,
n - over) + over;
}
@@ -325,10 +330,15 @@ static inline unsigned long copy_to_user(void __user *to,
 {
unsigned long over;
 
-   if (access_ok(VERIFY_WRITE, to, n))
+   if (access_ok(VERIFY_WRITE, to, n)) {
+   if (!__builtin_constant_p(n))
+   check_object_size(from, n, true);
return __copy_tofrom_user(to, (__force void __user *)from, n);
+   }
if ((unsigned long)to < TASK_SIZE) {
over = (unsigned long)to + n - TASK_SIZE;
+   if (!__builtin_constant_p(n))
+   check_object_size(from, n - over, true);
return __copy_tofrom_user(to, (__force void __user *)from,
n - over) + over;
}
@@ -372,6 +382,10 @@ static inline unsigned long __copy_from_user_inatomic(void 
*to,
if (ret == 0)
return 0;
}
+
+   if (!__builtin_constant_p(n))
+   check_object_size(to, n, false);
+
return __copy_tofrom_user((__force void __user *)to, from, n);
 }
 
@@ -398,6 +412,9 @@ static inline unsigned long __copy_to_user_inatomic(void 
__user *to,
if (ret == 0)
return 0;
}
+   if (!__builtin_constant_p(n))
+   check_object_size(from, n, true);
+
return __copy_tofrom_user(to, (__force const void __user *)from, n);
 }
 
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 03/11] x86/uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in
copy_*_user() and __copy_*_user() because copy_*_user() actually calls
down to _copy_*_user() and not __copy_*_user().

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
Tested-By: Valdis Kletnieks 
---
 arch/x86/Kconfig  |  2 ++
 arch/x86/include/asm/uaccess.h| 10 ++
 arch/x86/include/asm/uaccess_32.h |  2 ++
 arch/x86/include/asm/uaccess_64.h |  2 ++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4407f596b72c..39d89e058249 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -80,11 +80,13 @@ config X86
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_AOUTif X86_32
select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_HARDENED_USERCOPY
select HAVE_ARCH_HUGE_VMAP  if X86_64 || X86_PAE
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if X86_64 && SPARSEMEM_VMEMMAP
select HAVE_ARCH_KGDB
select HAVE_ARCH_KMEMCHECK
+   select HAVE_ARCH_LINEAR_KERNEL_MAPPING  if X86_64
select HAVE_ARCH_MMAP_RND_BITS  if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 2982387ba817..d3312f0fcdfc 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -742,9 +742,10 @@ copy_from_user(void *to, const void __user *from, unsigned 
long n)
 * case, and do only runtime checking for non-constant sizes.
 */
 
-   if (likely(sz < 0 || sz >= n))
+   if (likely(sz < 0 || sz >= n)) {
+   check_object_size(to, n, false);
n = _copy_from_user(to, from, n);
-   else if(__builtin_constant_p(n))
+   } else if (__builtin_constant_p(n))
copy_from_user_overflow();
else
__copy_from_user_overflow(sz, n);
@@ -762,9 +763,10 @@ copy_to_user(void __user *to, const void *from, unsigned 
long n)
might_fault();
 
/* See the comment in copy_from_user() above. */
-   if (likely(sz < 0 || sz >= n))
+   if (likely(sz < 0 || sz >= n)) {
+   check_object_size(from, n, true);
n = _copy_to_user(to, from, n);
-   else if(__builtin_constant_p(n))
+   } else if (__builtin_constant_p(n))
copy_to_user_overflow();
else
__copy_to_user_overflow(sz, n);
diff --git a/arch/x86/include/asm/uaccess_32.h 
b/arch/x86/include/asm/uaccess_32.h
index 4b32da24faaf..7d3bdd1ed697 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -37,6 +37,7 @@ unsigned long __must_check __copy_from_user_ll_nocache_nozero
 static __always_inline unsigned long __must_check
 __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
 {
+   check_object_size(from, n, true);
return __copy_to_user_ll(to, from, n);
 }
 
@@ -95,6 +96,7 @@ static __always_inline unsigned long
 __copy_from_user(void *to, const void __user *from, unsigned long n)
 {
might_fault();
+   check_object_size(to, n, false);
if (__builtin_constant_p(n)) {
unsigned long ret;
 
diff --git a/arch/x86/include/asm/uaccess_64.h 
b/arch/x86/include/asm/uaccess_64.h
index 2eac2aa3e37f..673059a109fe 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -54,6 +54,7 @@ int __copy_from_user_nocheck(void *dst, const void __user 
*src, unsigned size)
 {
int ret = 0;
 
+   check_object_size(dst, size, false);
if (!__builtin_constant_p(size))
return copy_user_generic(dst, (__force void *)src, size);
switch (size) {
@@ -119,6 +120,7 @@ int __copy_to_user_nocheck(void __user *dst, const void 
*src, unsigned size)
 {
int ret = 0;
 
+   check_object_size(src, size, true);
if (!__builtin_constant_p(size))
return copy_user_generic((__force void *)dst, src, size);
switch (size) {
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 05/11] arm64/uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on arm64. As done by KASAN in -next,
renames the low-level functions to __arch_copy_*_user() so a static inline
can do additional work before the copy.

Signed-off-by: Kees Cook 
---
 arch/arm64/Kconfig   |  2 ++
 arch/arm64/include/asm/uaccess.h | 16 ++--
 arch/arm64/kernel/arm64ksyms.c   |  4 ++--
 arch/arm64/lib/copy_from_user.S  |  4 ++--
 arch/arm64/lib/copy_to_user.S|  4 ++--
 5 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5a0a691d4220..b771cd97f74b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -51,10 +51,12 @@ config ARM64
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
+   select HAVE_ARCH_HARDENED_USERCOPY
select HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN if SPARSEMEM_VMEMMAP && !(ARM64_16K_PAGES && 
ARM64_VA_BITS_48)
select HAVE_ARCH_KGDB
+   select HAVE_ARCH_LINEAR_KERNEL_MAPPING
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 9e397a542756..5d0dacdb695b 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -256,11 +256,23 @@ do {  
\
-EFAULT;\
 })
 
-extern unsigned long __must_check __copy_from_user(void *to, const void __user 
*from, unsigned long n);
-extern unsigned long __must_check __copy_to_user(void __user *to, const void 
*from, unsigned long n);
+extern unsigned long __must_check __arch_copy_from_user(void *to, const void 
__user *from, unsigned long n);
+extern unsigned long __must_check __arch_copy_to_user(void __user *to, const 
void *from, unsigned long n);
 extern unsigned long __must_check __copy_in_user(void __user *to, const void 
__user *from, unsigned long n);
 extern unsigned long __must_check __clear_user(void __user *addr, unsigned 
long n);
 
+static inline unsigned long __must_check __copy_from_user(void *to, const void 
__user *from, unsigned long n)
+{
+   check_object_size(to, n, false);
+   return __arch_copy_from_user(to, from, n);
+}
+
+static inline unsigned long __must_check __copy_to_user(void __user *to, const 
void *from, unsigned long n)
+{
+   check_object_size(from, n, true);
+   return __arch_copy_to_user(to, from, n);
+}
+
 static inline unsigned long __must_check copy_from_user(void *to, const void 
__user *from, unsigned long n)
 {
if (access_ok(VERIFY_READ, from, n))
diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c
index 678f30b05a45..2dc44406a7ad 100644
--- a/arch/arm64/kernel/arm64ksyms.c
+++ b/arch/arm64/kernel/arm64ksyms.c
@@ -34,8 +34,8 @@ EXPORT_SYMBOL(copy_page);
 EXPORT_SYMBOL(clear_page);
 
/* user mem (segment) */
-EXPORT_SYMBOL(__copy_from_user);
-EXPORT_SYMBOL(__copy_to_user);
+EXPORT_SYMBOL(__arch_copy_from_user);
+EXPORT_SYMBOL(__arch_copy_to_user);
 EXPORT_SYMBOL(__clear_user);
 EXPORT_SYMBOL(__copy_in_user);
 
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 17e8306dca29..0b90497d4424 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -66,7 +66,7 @@
.endm
 
 end.reqx5
-ENTRY(__copy_from_user)
+ENTRY(__arch_copy_from_user)
 ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \
CONFIG_ARM64_PAN)
add end, x0, x2
@@ -75,7 +75,7 @@ ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), 
ARM64_ALT_PAN_NOT_UAO, \
CONFIG_ARM64_PAN)
mov x0, #0  // Nothing to copy
ret
-ENDPROC(__copy_from_user)
+ENDPROC(__arch_copy_from_user)
 
.section .fixup,"ax"
.align  2
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 21faae60f988..7a7efe255034 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -65,7 +65,7 @@
.endm
 
 end.reqx5
-ENTRY(__copy_to_user)
+ENTRY(__arch_copy_to_user)
 ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \
CONFIG_ARM64_PAN)
add end, x0, x2
@@ -74,7 +74,7 @@ ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), 
ARM64_ALT_PAN_NOT_UAO, \
CONFIG_ARM64_PAN)
mov x0, #0
ret
-ENDPROC(__copy_to_user)
+ENDPROC(__arch_copy_to_user)
 
.section .fixup,"ax"
.align  2
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 02/11] mm: Hardened usercopy

2016-07-15 Thread Kees Cook
This is the start of porting PAX_USERCOPY into the mainline kernel. This
is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
work is based on code by PaX Team and Brad Spengler, and an earlier port
from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.

This patch contains the logic for validating several conditions when
performing copy_to_user() and copy_from_user() on the kernel object
being copied to/from:
- address range doesn't wrap around
- address range isn't NULL or zero-allocated (with a non-zero copy size)
- if on the slab allocator:
  - object size must be less than or equal to copy size (when check is
implemented in the allocator, which appear in subsequent patches)
- otherwise, object must not span page allocations
- if on the stack
  - object must not extend before/after the current process task
  - object must be contained by the current stack frame (when there is
arch/build support for identifying stack frames)
- object must not overlap with kernel text

Signed-off-by: Kees Cook 
Tested-By: Valdis Kletnieks 
Tested-by: Michael Ellerman 
---
 arch/Kconfig|   7 ++
 include/linux/slab.h|  12 +++
 include/linux/thread_info.h |  15 +++
 mm/Makefile |   4 +
 mm/usercopy.c   | 234 
 security/Kconfig|  28 ++
 6 files changed, 300 insertions(+)
 create mode 100644 mm/usercopy.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 5e2776562035..195ee4cc939a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -433,6 +433,13 @@ config HAVE_ARCH_WITHIN_STACK_FRAMES
  and similar) by implementing an inline arch_within_stack_frames(),
  which is used by CONFIG_HARDENED_USERCOPY.
 
+config HAVE_ARCH_LINEAR_KERNEL_MAPPING
+   bool
+   help
+ An architecture should select this if it has a secondary linear
+ mapping of the kernel text. This is used to verify that kernel
+ text exposures are not visible under CONFIG_HARDENED_USERCOPY.
+
 config HAVE_CONTEXT_TRACKING
bool
help
diff --git a/include/linux/slab.h b/include/linux/slab.h
index aeb3e6d00a66..96a16a3fb7cb 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -155,6 +155,18 @@ void kfree(const void *);
 void kzfree(const void *);
 size_t ksize(const void *);
 
+#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
+const char *__check_heap_object(const void *ptr, unsigned long n,
+   struct page *page);
+#else
+static inline const char *__check_heap_object(const void *ptr,
+ unsigned long n,
+ struct page *page)
+{
+   return NULL;
+}
+#endif
+
 /*
  * Some archs want to perform DMA into kmalloc caches and need a guaranteed
  * alignment larger than the alignment of a 64-bit integer.
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 3d5c80b4391d..f24b99eac969 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -155,6 +155,21 @@ static inline int arch_within_stack_frames(const void * 
const stack,
 }
 #endif
 
+#ifdef CONFIG_HARDENED_USERCOPY
+extern void __check_object_size(const void *ptr, unsigned long n,
+   bool to_user);
+
+static inline void check_object_size(const void *ptr, unsigned long n,
+bool to_user)
+{
+   __check_object_size(ptr, n, to_user);
+}
+#else
+static inline void check_object_size(const void *ptr, unsigned long n,
+bool to_user)
+{ }
+#endif /* CONFIG_HARDENED_USERCOPY */
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_THREAD_INFO_H */
diff --git a/mm/Makefile b/mm/Makefile
index 78c6f7dedb83..32d37247c7e5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -21,6 +21,9 @@ KCOV_INSTRUMENT_memcontrol.o := n
 KCOV_INSTRUMENT_mmzone.o := n
 KCOV_INSTRUMENT_vmstat.o := n
 
+# Since __builtin_frame_address does work as used, disable the warning.
+CFLAGS_usercopy.o += $(call cc-disable-warning, frame-address)
+
 mmu-y  := nommu.o
 mmu-$(CONFIG_MMU)  := gup.o highmem.o memory.o mincore.o \
   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
@@ -99,3 +102,4 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
+obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
diff --git a/mm/usercopy.c b/mm/usercopy.c
new file mode 100644
index ..e4bf4e7ccdf6
--- /dev/null
+++ b/mm/usercopy.c
@@ -0,0 +1,234 @@
+/*
+ * This implements the various checks for CONFIG_HARDENED_USERCOPY*,
+ * which are designed to protect kernel memory from needless exposure
+ * and overwrite under many unintended conditions. This code is based
+ * on PAX_USERCOPY, which is:
+ *
+ * Copyright (C) 2

[PATCH v3 00/11] mm: Hardened usercopy

2016-07-15 Thread Kees Cook
Hi,

[I'm going to carry this series in my kspp -next tree now, though I'd
really love to have some explicit Acked-bys or Reviewed-bys. If you've
looked through it or tested it, please consider it. :) (I added Valdis
and mpe's Tested-bys where they seemed correct, thank you!)]

This is a start of the mainline port of PAX_USERCOPY[1]. After I started
writing tests (now in lkdtm in -next) for Casey's earlier port[2], I kept
tweaking things further and further until I ended up with a whole new
patch series. To that end, I took Rik and other people's feedback along
with other changes and clean-ups.

Based on my understanding, PAX_USERCOPY was designed to catch a
few classes of flaws (mainly bad bounds checking) around the use of
copy_to_user()/copy_from_user(). These changes don't touch get_user() and
put_user(), since these operate on constant sized lengths, and tend to be
much less vulnerable. There are effectively three distinct protections in
the whole series, each of which I've given a separate CONFIG, though this
patch set is only the first of the three intended protections. (Generally
speaking, PAX_USERCOPY covers what I'm calling CONFIG_HARDENED_USERCOPY
(this) and CONFIG_HARDENED_USERCOPY_WHITELIST (future), and
PAX_USERCOPY_SLABS covers CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
(future).)

This series, which adds CONFIG_HARDENED_USERCOPY, checks that objects
being copied to/from userspace meet certain criteria:
- if address is a heap object, the size must not exceed the object's
  allocated size. (This will catch all kinds of heap overflow flaws.)
- if address range is in the current process stack, it must be within the
  current stack frame (if such checking is possible) or at least entirely
  within the current process's stack. (This could catch large lengths that
  would have extended beyond the current process stack, or overflows if
  their length extends back into the original stack.)
- if the address range is part of kernel data, rodata, or bss, allow it.
- if address range is page-allocated, that it doesn't span multiple
  allocations.
- if address is within the kernel text, reject it.
- everything else is accepted

The patches in the series are:
- Support for arch-specific stack frame checking (which will likely be
  replaced in the future by Josh's more comprehensive unwinder):
1- mm: Implement stack frame object validation
- The core copy_to/from_user() checks, without the slab object checks:
2- mm: Hardened usercopy
- Per-arch enablement of the protection:
3- x86/uaccess: Enable hardened usercopy
4- ARM: uaccess: Enable hardened usercopy
5- arm64/uaccess: Enable hardened usercopy
6- ia64/uaccess: Enable hardened usercopy
7- powerpc/uaccess: Enable hardened usercopy
8- sparc/uaccess: Enable hardened usercopy
9- s390/uaccess: Enable hardened usercopy
- The heap allocator implementation of object size checking:
   10- mm: SLAB hardened usercopy support
   11- mm: SLUB hardened usercopy support

Some notes:

- This is expected to apply on top of -next which contains fixes for the
  position of _etext on both arm and arm64, though it has minor conflicts
  with KASAN that are trivial to fix up. Living in -next are also tests
  for this protection in lkdtm, prefixed with USERCOPY_.

- I couldn't detect a measurable performance change with these features
  enabled. Kernel build times were unchanged, hackbench was unchanged,
  etc. I think we could flip this to "on by default" at some point, but
  for now, I'm leaving it off until I can get some more definitive
  measurements. I would love if someone with greater familiarity with
  perf could give this a spin and report results.

- The SLOB support extracted from grsecurity seems entirely broken. I
  have no idea what's going on there, I spent my time testing SLAB and
  SLUB. Having someone else look at SLOB would be nice, but this series
  doesn't depend on it.

Additional features that would be nice, but aren't blocking this series:

- Needs more architecture support for stack frame checking (only x86 now,
  but it seems Josh will have a good solution for this soon).


Thanks!

-Kees

[1] https://grsecurity.net/download.php "grsecurity - test kernel patch"
[2] http://www.openwall.com/lists/kernel-hardening/2016/05/19/5

v3:
- switch to using BUG for better Oops integration
- when checking page allocations, check each for Reserved
- use enums for the stack check return for readability

v2:
- added s390 support
- handle slub red zone
- disallow writes to rodata area
- stack frame walker now CONFIG-controlled arch-specific helper

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 01/11] mm: Implement stack frame object validation

2016-07-15 Thread Kees Cook
This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.

This is based on code from PaX.

Signed-off-by: Kees Cook 
---
 arch/Kconfig   |  9 
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/thread_info.h | 44 ++
 include/linux/thread_info.h|  9 
 4 files changed, 63 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index d794384a0404..5e2776562035 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -424,6 +424,15 @@ config CC_STACKPROTECTOR_STRONG
 
 endchoice
 
+config HAVE_ARCH_WITHIN_STACK_FRAMES
+   bool
+   help
+ An architecture should select this if it can walk the kernel stack
+ frames to determine if an object is part of either the arguments
+ or local variables (i.e. that it excludes saved return addresses,
+ and similar) by implementing an inline arch_within_stack_frames(),
+ which is used by CONFIG_HARDENED_USERCOPY.
+
 config HAVE_CONTEXT_TRACKING
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0a7b885964ba..4407f596b72c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -91,6 +91,7 @@ config X86
select HAVE_ARCH_SOFT_DIRTY if X86_64
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+   select HAVE_ARCH_WITHIN_STACK_FRAMES
select HAVE_EBPF_JITif X86_64
select HAVE_CC_STACKPROTECTOR
select HAVE_CMPXCHG_DOUBLE
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 30c133ac05cd..ab386f1336f2 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -180,6 +180,50 @@ static inline unsigned long current_stack_pointer(void)
return sp;
 }
 
+/*
+ * Walks up the stack frames to make sure that the specified object is
+ * entirely contained by a single stack frame.
+ *
+ * Returns:
+ *  1 if within a frame
+ * -1 if placed across a frame boundary (or outside stack)
+ *  0 unable to determine (no frame pointers, etc)
+ */
+static inline int arch_within_stack_frames(const void * const stack,
+  const void * const stackend,
+  const void *obj, unsigned long len)
+{
+#if defined(CONFIG_FRAME_POINTER)
+   const void *frame = NULL;
+   const void *oldframe;
+
+   oldframe = __builtin_frame_address(1);
+   if (oldframe)
+   frame = __builtin_frame_address(2);
+   /*
+* low --> high
+* [saved bp][saved ip][args][local vars][saved bp][saved ip]
+* ^^
+*   allow copies only within here
+*/
+   while (stack <= frame && frame < stackend) {
+   /*
+* If obj + len extends past the last frame, this
+* check won't pass and the next frame will be 0,
+* causing us to bail out and correctly report
+* the copy as invalid.
+*/
+   if (obj + len <= frame)
+   return obj >= oldframe + 2 * sizeof(void *) ? 1 : -1;
+   oldframe = frame;
+   frame = *(const void * const *)frame;
+   }
+   return -1;
+#else
+   return 0;
+#endif
+}
+
 #else /* !__ASSEMBLY__ */
 
 #ifdef CONFIG_X86_64
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index b4c2a485b28a..3d5c80b4391d 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -146,6 +146,15 @@ static inline bool test_and_clear_restore_sigmask(void)
 #error "no set_restore_sigmask() provided and default one won't work"
 #endif
 
+#ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
+static inline int arch_within_stack_frames(const void * const stack,
+  const void * const stackend,
+  const void *obj, unsigned long len)
+{
+   return 0;
+}
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_THREAD_INFO_H */
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 04/11] ARM: uaccess: Enable hardened usercopy

2016-07-15 Thread Kees Cook
Enables CONFIG_HARDENED_USERCOPY checks on arm.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook 
---
 arch/arm/Kconfig   |  1 +
 arch/arm/include/asm/uaccess.h | 11 +--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 90542db1220d..f56b29b3f57e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -35,6 +35,7 @@ config ARM
select HARDIRQS_SW_RESEND
select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
+   select HAVE_ARCH_HARDENED_USERCOPY
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 35c9db857ebe..7fb59199c6bb 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -496,7 +496,10 @@ arm_copy_from_user(void *to, const void __user *from, 
unsigned long n);
 static inline unsigned long __must_check
 __copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-   unsigned int __ua_flags = uaccess_save_and_enable();
+   unsigned int __ua_flags;
+
+   check_object_size(to, n, false);
+   __ua_flags = uaccess_save_and_enable();
n = arm_copy_from_user(to, from, n);
uaccess_restore(__ua_flags);
return n;
@@ -511,11 +514,15 @@ static inline unsigned long __must_check
 __copy_to_user(void __user *to, const void *from, unsigned long n)
 {
 #ifndef CONFIG_UACCESS_WITH_MEMCPY
-   unsigned int __ua_flags = uaccess_save_and_enable();
+   unsigned int __ua_flags;
+
+   check_object_size(from, n, true);
+   __ua_flags = uaccess_save_and_enable();
n = arm_copy_to_user(to, from, n);
uaccess_restore(__ua_flags);
return n;
 #else
+   check_object_size(from, n, true);
return arm_copy_to_user(to, from, n);
 #endif
 }
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Thiago Jung Bauermann
Am Freitag, 15 Juli 2016, 22:26:09 schrieb Arnd Bergmann:
> On Friday, July 15, 2016 2:42:10 PM CEST Russell King - ARM Linux wrote:
> > On other architectures, DT can also contain open-firmware "functions"
> > but I don't think there's much support in the kernel for that - maybe
> > the PPC folk can reply on that point.
> 
> The open firmware runtime interface are shut down by the time we have
> a flattened device tree, so those are not accessible any more. IIRC
> SPARC leaves the open firmware interface live, but it doesn't use
> fdt, so that's not relevant here.
> 
> However, the powerpc specific RTAS runtime services provide a similar
> interface to the UEFI runtime support and allow to call into
> binary code from the kernel, which gets mapped from a physical
> address in the "linux,rtas-base" property in the rtas device node.
> 
> Modifying the /rtas node will definitely give you a backdoor into
> priviledged code, but modifying only /chosen should not let you get
> in through that specific method.

Except that arch/powerpc/kernel/rtas.c looks for any node in the tree called 
"rtas", so it will try to use /chosen/rtas, or /chosen/foo/rtas.

We can forbid subnodes in /chosen in the dtb passed to kexec_file_load, 
though that means userspace can't use the simple-framebuffer binding via 
this mechanism.

We also have to blacklist the device_type and compatible properties in 
/chosen to avoid the problem Mark mentioned.

Still doable, but not ideal. :-/

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Arnd Bergmann
On Friday, July 15, 2016 2:42:10 PM CEST Russell King - ARM Linux wrote:
> 
> On other architectures, DT can also contain open-firmware "functions"
> but I don't think there's much support in the kernel for that - maybe
> the PPC folk can reply on that point.

The open firmware runtime interface are shut down by the time we have
a flattened device tree, so those are not accessible any more. IIRC
SPARC leaves the open firmware interface live, but it doesn't use
fdt, so that's not relevant here.

However, the powerpc specific RTAS runtime services provide a similar
interface to the UEFI runtime support and allow to call into
binary code from the kernel, which gets mapped from a physical
address in the "linux,rtas-base" property in the rtas device node.

Modifying the /rtas node will definitely give you a backdoor into
priviledged code, but modifying only /chosen should not let you get
in through that specific method.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH v2 02/11] mm: Hardened usercopy

2016-07-15 Thread Kees Cook
On Fri, Jul 15, 2016 at 12:19 PM, Daniel Micay  wrote:
>> I'd like it to dump stack and be fatal to the process involved, but
>> yeah, I guess BUG() would work. Creating an infrastructure for
>> handling security-related Oopses can be done separately from this
>> (and
>> I'd like to see that added, since it's a nice bit of configurable
>> reactivity to possible attacks).
>
> In grsecurity, the oops handling also uses do_group_exit instead of
> do_exit but both that change (or at least the option to do it) and the
> exploit handling could be done separately from this without actually
> needing special treatment for USERCOPY. Could expose is as something
> like panic_on_oops=2 as a balance between the existing options.

I'm also uncomfortable about BUG() being removed by unsetting
CONFIG_BUG, but that seems unlikely. :)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch] ibmvfc: prevent a potential deadlock

2016-07-15 Thread Martin K. Petersen
> "Dan" == Dan Carpenter  writes:

Dan> My static checker complains that we need to unlock on this path.
Dan> Seems true.

Applied to 4.8/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH v2 02/11] mm: Hardened usercopy

2016-07-15 Thread Kees Cook
On Fri, Jul 15, 2016 at 12:00 PM, Daniel Micay  wrote:
>> This could be a BUG, but I'd rather not panic the entire kernel.
>
> It seems unlikely that it will panic without panic_on_oops and that's
> an explicit opt-in to taking down the system on kernel logic errors
> exactly like this. In grsecurity, it calls the kernel exploit handling
> logic (panic if root, otherwise kill all process of that user and ban
> them until reboot) but that same logic is also called for BUG via oops
> handling so there's only really a distinction with panic_on_oops=1.
>
> Does it make sense to be less fatal for a fatal assertion that's more
> likely to be security-related? Maybe you're worried about having some
> false positives for the whitelisting portion, but I don't think those
> will lurk around very long with the way this works.

I'd like it to dump stack and be fatal to the process involved, but
yeah, I guess BUG() would work. Creating an infrastructure for
handling security-related Oopses can be done separately from this (and
I'd like to see that added, since it's a nice bit of configurable
reactivity to possible attacks).

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v10, 3/7] soc: fsl: add GUTS driver for QorIQ platforms

2016-07-15 Thread Scott Wood
On Fri, 2016-07-15 at 12:43 -0400, Paul Gortmaker wrote:
> On Wed, May 4, 2016 at 11:12 PM, Yangbo Lu  wrote:
> > 
> > The global utilities block controls power management, I/O device
> > enabling, power-onreset(POR) configuration monitoring, alternate
> > function selection for multiplexed signals,and clock control.
> > 
> > This patch adds GUTS driver to manage and access global utilities
> > block.
> > 
> > Signed-off-by: Yangbo Lu 
> > Acked-by: Scott Wood 
> > ---
> > Changes for v4:
> > - Added this patch
> > Changes for v5:
> > - Modified copyright info
> > - Changed MODULE_LICENSE to GPL
> > - Changed EXPORT_SYMBOL_GPL to EXPORT_SYMBOL
> > - Made FSL_GUTS user-invisible
> > - Added a complete compatible list for GUTS
> > - Stored guts info in file-scope variable
> > - Added mfspr() getting SVR
> > - Redefined GUTS APIs
> > - Called fsl_guts_init rather than using platform driver
> > - Removed useless parentheses
> > - Removed useless 'extern' key words
> > Changes for v6:
> > - Made guts thread safe in fsl_guts_init
> > Changes for v7:
> > - Removed 'ifdef' for function declaration in guts.h
> > Changes for v8:
> > - Fixes lines longer than 80 characters checkpatch issue
> > - Added 'Acked-by: Scott Wood'
> > Changes for v9:
> > - None
> > Changes for v10:
> > - None
> > ---
> >  drivers/soc/Kconfig  |   2 +-
> >  drivers/soc/fsl/Kconfig  |   8 +++
> >  drivers/soc/fsl/Makefile |   1 +
> >  drivers/soc/fsl/guts.c   | 119
> > 
> >  include/linux/fsl/guts.h | 126 +-
> > -
> >  5 files changed, 207 insertions(+), 49 deletions(-)
> >  create mode 100644 drivers/soc/fsl/Kconfig
> >  create mode 100644 drivers/soc/fsl/guts.c
> > 
> > diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
> > index cb58ef0..7106463 100644
> > --- a/drivers/soc/Kconfig
> > +++ b/drivers/soc/Kconfig
> > @@ -2,7 +2,7 @@ menu "SOC (System On Chip) specific Drivers"
> > 
> >  source "drivers/soc/bcm/Kconfig"
> >  source "drivers/soc/brcmstb/Kconfig"
> > -source "drivers/soc/fsl/qe/Kconfig"
> > +source "drivers/soc/fsl/Kconfig"
> >  source "drivers/soc/mediatek/Kconfig"
> >  source "drivers/soc/qcom/Kconfig"
> >  source "drivers/soc/rockchip/Kconfig"
> > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
> > new file mode 100644
> > index 000..b313759
> > --- /dev/null
> > +++ b/drivers/soc/fsl/Kconfig
> > @@ -0,0 +1,8 @@
> > +#
> > +# Freescale SOC drivers
> > +#
> > +
> > +source "drivers/soc/fsl/qe/Kconfig"
> > +
> > +config FSL_GUTS
> > +   bool
> > diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
> > index 203307f..02afb7f 100644
> > --- a/drivers/soc/fsl/Makefile
> > +++ b/drivers/soc/fsl/Makefile
> > @@ -4,3 +4,4 @@
> > 
> >  obj-$(CONFIG_QUICC_ENGINE) += qe/
> >  obj-$(CONFIG_CPM)  += qe/
> > +obj-$(CONFIG_FSL_GUTS) += guts.o
> > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> > new file mode 100644
> > index 000..fa155e6
> > --- /dev/null
> > +++ b/drivers/soc/fsl/guts.c
> > @@ -0,0 +1,119 @@
> > +/*
> > + * Freescale QorIQ Platforms GUTS Driver
> > + *
> > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include 
> Seems there was lots of discussion on this.  If it does end up being
> resent, it would be nice to get the module.h and other modular stuff
> gone since it is a bool Kconfig.

I plan to resend just the GUTS driver portion and send it through the PPC
tree.

I don't see any modular stuff in there besides the linux/module.h include.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch v2 -next] wan/fsl_ucc_hdlc: info leak in uhdlc_ioctl()

2016-07-15 Thread David Miller
From: Dan Carpenter 
Date: Thu, 14 Jul 2016 14:16:53 +0300

> There is a 2 byte struct whole after line.loopback so we need to clear
> that out to avoid disclosing stack information.
> 
> Fixes: c19b6d246a35 ('drivers/net: support hdlc function for QE-UCC')
> Signed-off-by: Dan Carpenter 
> ---
> v2: remove the other initialization to zero

Applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 09/14] resource limits: track highwater mark of locked memory

2016-07-15 Thread Topi Miettinen
On 07/15/16 15:14, Oleg Nesterov wrote:
> On 07/15, Topi Miettinen wrote:
>>
>> Track maximum size of locked memory, to be able to configure
>> RLIMIT_MEMLOCK resource limits. The information is available
>> with taskstats and cgroupstats netlink socket.
> 
> So I personally still dislike the very idea of this series... but I won't
> argue if you convince maintainers.
> 
>> @@ -2020,6 +2020,10 @@ static int acct_stack_growth(struct vm_area_struct 
>> *vma, unsigned long size, uns
>>  return -ENOMEM;
>>  
>>  update_resource_highwatermark(RLIMIT_STACK, actual_size);
>> +if (vma->vm_flags & VM_LOCKED)
>> +update_resource_highwatermark(RLIMIT_MEMLOCK,
>> +  (mm->locked_vm + grow) <<
>> +  PAGE_SHIFT);
> 
> Btw this is not right. The same for the previous patch which tracks
> RLIMIT_STACK. The "current" task can debugger/etc.

acct_stack_growth() is called from expand_upwards() and
expand_downwards(). They call security_mmap_addr() and the various LSM
implementations also use current task in the checks. Are these also not
right?

> 
> Yes, yes, this just reminds that the whole rlimit logic in this path
> is broken but still...

I'd be happy to fix the logic with a separate prerequisite patch and
then use the right logic for this patch, but I'm not sure I know how.
Could you elaborate a bit?

-Topi

> 
> Oleg.
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch V2 30/67] powerpc/numa: Convert to hotplug state machine

2016-07-15 Thread Sebastian Andrzej Siewior
* Anton Blanchard | 2016-07-15 10:28:25 [+1000]:

>Hi Anna-Maria,
Hi Anton,

>> >> Install the callbacks via the state machine and let the core invoke
>> >> the callbacks on the already online CPUs.  
>> >
>> > This is causing an oops on ppc64le QEMU, looks like a NULL
>> > pointer:  
>> 
>> Did you tested it against tip WIP.hotplug?
>
>I noticed tip started failing in my CI environment which tests on QEMU.
>The failure bisected to commit 425209e0abaf2c6e3a90ce4fedb935c10652bf80
>
>It reproduces running ppc64le QEMU on a x86-64 box. On Ubuntu:
…

Thanks for that. I can reproduce this ontop of latest WIP.hotplug with
this patch.

>Anton

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v10, 3/7] soc: fsl: add GUTS driver for QorIQ platforms

2016-07-15 Thread Paul Gortmaker
On Wed, May 4, 2016 at 11:12 PM, Yangbo Lu  wrote:
> The global utilities block controls power management, I/O device
> enabling, power-onreset(POR) configuration monitoring, alternate
> function selection for multiplexed signals,and clock control.
>
> This patch adds GUTS driver to manage and access global utilities
> block.
>
> Signed-off-by: Yangbo Lu 
> Acked-by: Scott Wood 
> ---
> Changes for v4:
> - Added this patch
> Changes for v5:
> - Modified copyright info
> - Changed MODULE_LICENSE to GPL
> - Changed EXPORT_SYMBOL_GPL to EXPORT_SYMBOL
> - Made FSL_GUTS user-invisible
> - Added a complete compatible list for GUTS
> - Stored guts info in file-scope variable
> - Added mfspr() getting SVR
> - Redefined GUTS APIs
> - Called fsl_guts_init rather than using platform driver
> - Removed useless parentheses
> - Removed useless 'extern' key words
> Changes for v6:
> - Made guts thread safe in fsl_guts_init
> Changes for v7:
> - Removed 'ifdef' for function declaration in guts.h
> Changes for v8:
> - Fixes lines longer than 80 characters checkpatch issue
> - Added 'Acked-by: Scott Wood'
> Changes for v9:
> - None
> Changes for v10:
> - None
> ---
>  drivers/soc/Kconfig  |   2 +-
>  drivers/soc/fsl/Kconfig  |   8 +++
>  drivers/soc/fsl/Makefile |   1 +
>  drivers/soc/fsl/guts.c   | 119 
>  include/linux/fsl/guts.h | 126 
> +--
>  5 files changed, 207 insertions(+), 49 deletions(-)
>  create mode 100644 drivers/soc/fsl/Kconfig
>  create mode 100644 drivers/soc/fsl/guts.c
>
> diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
> index cb58ef0..7106463 100644
> --- a/drivers/soc/Kconfig
> +++ b/drivers/soc/Kconfig
> @@ -2,7 +2,7 @@ menu "SOC (System On Chip) specific Drivers"
>
>  source "drivers/soc/bcm/Kconfig"
>  source "drivers/soc/brcmstb/Kconfig"
> -source "drivers/soc/fsl/qe/Kconfig"
> +source "drivers/soc/fsl/Kconfig"
>  source "drivers/soc/mediatek/Kconfig"
>  source "drivers/soc/qcom/Kconfig"
>  source "drivers/soc/rockchip/Kconfig"
> diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
> new file mode 100644
> index 000..b313759
> --- /dev/null
> +++ b/drivers/soc/fsl/Kconfig
> @@ -0,0 +1,8 @@
> +#
> +# Freescale SOC drivers
> +#
> +
> +source "drivers/soc/fsl/qe/Kconfig"
> +
> +config FSL_GUTS
> +   bool
> diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
> index 203307f..02afb7f 100644
> --- a/drivers/soc/fsl/Makefile
> +++ b/drivers/soc/fsl/Makefile
> @@ -4,3 +4,4 @@
>
>  obj-$(CONFIG_QUICC_ENGINE) += qe/
>  obj-$(CONFIG_CPM)  += qe/
> +obj-$(CONFIG_FSL_GUTS) += guts.o
> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> new file mode 100644
> index 000..fa155e6
> --- /dev/null
> +++ b/drivers/soc/fsl/guts.c
> @@ -0,0 +1,119 @@
> +/*
> + * Freescale QorIQ Platforms GUTS Driver
> + *
> + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 

Seems there was lots of discussion on this.  If it does end up being
resent, it would be nice to get the module.h and other modular stuff
gone since it is a bool Kconfig.

Thanks,
Paul.
--

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Mark Rutland
On Fri, Jul 15, 2016 at 12:29:09PM -0300, Thiago Jung Bauermann wrote:
> Am Freitag, 15 Juli 2016, 14:33:47 schrieb Mark Rutland:
> > On Fri, Jul 15, 2016 at 09:26:10AM -0400, Vivek Goyal wrote:
> > > I don't know anything about DTB. So here comes a very basic question.
> > > Does DTB allow passing an executable blob to kernel or pass the
> > > location of some unsigned executable code at kernel level. I think from
> > > secureboot point of view that would be a concern. Being able to trick
> > > kernel to execute an unsigned code at privileged level.
> > 
> > The DTB itself won't contain executable code.
> > 
> > However, arbitrary bindings could point kernel at such code. For
> > instance, /chosen/linux,uefi-system-table could point the kernel at a
> > faked EFI system table, with pointers to malicious code. So
> > arbitrary modification of /chosen is not safe.
> 
> PowerPC doesn't have UEFI so this option is not a concern in that 
> architecture. I'm having a look at what a PowerPC kernel gets from /chosen 
> and haven't found anything of concern so far, but I'm still looking.
> 
> On the other hand, the kernel command line has the option acpi_rsdp, which 
> is used to pass the address of the RSDP. I don't really know much about EFI 
> so I'm not sure if it can be used to point to code that the kernel can 
> execute, but it does point to tables that contain AML code.

Please let's not conflate EFI and ACPI, the two are distinct.

I believe that there aren't any ACPI tables which contain native code,
or which contain pointers to native code, but I could be mistaken. It
doesn't seem unlikely that malicious AML is possible, but I'm not
familiar enough with AML to know how we sandbox that.

From a scan of Documentation/kernel-parameters.txt, it doesn't look like
there are options to override the EFI system table (or related tables),
so it doesn't look like there's a trivial mechanism to trigger arbitrary
code execution. It looks like efi_fake_mem could be used to trick the
kernel to poke things it shouldn't, though that likely brings the system
down entirely.

> > Bindings describe arbitrary system features (devices, firmware
> > interfaces, etc), so in general they might provide mechanisms to execute
> > code.
> 
> Even bindings in /chosen?

Yes, even bindings in /chosen. As above, the linux,uefi-system-table
property lives under /chosen, and provides pointers to native code.
Control over this property could yield arbitrary code execution.

Additionally, there are drivers that just go looking for a compatible
string, and will probe regardless of where the node is in the hierarchy.
e.g. clock controller drivers, memory nodes. So /chosen isn't sandboxed
as such. 

I fear that there are many things that one could place under /chosen
that could make the kernel do the wrong thing. Given the example of
drivers, I'm not sure it's going to be possible to audit all the
relevant code.

Thanks,
Mark.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check

2016-07-15 Thread Pan Xinhui

Hi, Baibir
sorry for late responce, I missed reading your mail.

在 16/7/6 18:54, Balbir Singh 写道:

On Tue, 2016-06-28 at 10:43 -0400, Pan Xinhui wrote:

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself. Currently
kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu

^^ takes

as parameter and return true if the cpu is preempted. Then kernel can break
the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

Only pSeries need supoort it. And the fact is powerNV are built into same

   ^^ support

kernel image with pSeries. So we need return false if we are runnig as
powerNV. The another fact is that lppaca->yiled_count keeps zero on

  ^^ yield

powerNV. So we can just skip the machine type.



Blame on me, I indeed need avoid such typo..
thanks for pointing it out.


Suggested-by: Boqun Feng 
Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/spinlock.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 523673d..3ac9fcb 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,24 @@
 #define SYNC_IO
 #endif

+/*
+ * This support kernel to check if one cpu is preempted or not.
+ * Then we can fix some lock holder preemption issue.
+ */
+#ifdef CONFIG_PPC_PSERIES
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+   /*
+* pSeries and powerNV can be built into same kernel image. In
+* principle we need return false directly if we are running as
+* powerNV. However the yield_count is always zero on powerNV, So
+* skip such machine type check


Or you could use the ppc_md interface callbacks if required, but your
solution works as well



thanks, So I can keep my code as is.

thanks
xinhui


+*/
+   return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
+}
+#endif
+
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;



Balbir Singh.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Thiago Jung Bauermann
Am Freitag, 15 Juli 2016, 14:33:47 schrieb Mark Rutland:
> On Fri, Jul 15, 2016 at 09:26:10AM -0400, Vivek Goyal wrote:
> > On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote:
> > > On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann 
wrote:
> > > > Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann:
> > > > > Right, but the question remains whether this helps while you allow
> > > > > the
> > > > > boot loader to modify the dtb. If an attacker gets in and cannot
> > > > > modify
> > > > > the kernel or initid but can modify the DT, a successful attack
> > > > > would
> > > > > be a bit harder than having a modified kernel, but you may still
> > > > > need
> > > > > to treat the system as compromised.
> > > > 
> > > > Yes, and the same question also remains regarding the kernel command
> > > > line.
> > > > 
> > > > We can have the kernel perform sanity checks on the device tree,
> > > > just as the kernel needs to sanity check the command line.
> > > > 
> > > > There's the point that was raised about not wanting to increase the
> > > > attack surface, and that's a valid point. But at least in the way
> > > > Petitboot works today, it needs to modify the device tree and pass
> > > > it to the kernel.
> > > > 
> > > > One thing that is unavoidable to come from userspace is
> > > > /chosen/linux,stdout-path, because it's Petitboot that knows from
> > > > which
> > > > console the user is interacting with. The other modification to set
> > > > properties in vga@0 can be done in the kernel.
> > > > 
> > > > Given that on DTB-based systems /chosen is an important and
> > > > established way to pass information to the operating system being
> > > > booted, I'd like to suggest the following, then:
> > > > 
> > > > Extend the syscall as shown in this RFC from Takahiro AKASHI, but
> > > > instead of accepting a complete DTB from userspace, the syscall
> > > > would accept a DTB containing only a /chosen node. If the DTB
> > > > contains any other node, the syscall fails with EINVAL. The kernel
> > > > can then add the properties in /chosen to the device tree that it
> > > > will pass to the next kernel.
> > > > 
> > > > What do you think?
> > > 
> > > I think that helps, as it makes the problem space correspond to that
> > > of modifying the command line, but I can still come up with countless
> > > attacks based on modifications of the /chosen node and/or the command
> > > line, in fact it's probably easier than any other node.
> > 
> > I don't know anything about DTB. So here comes a very basic question.
> > Does DTB allow passing an executable blob to kernel or pass the
> > location of some unsigned executable code at kernel level. I think from
> > secureboot point of view that would be a concern. Being able to trick
> > kernel to execute an unsigned code at privileged level.
> 
> The DTB itself won't contain executable code.
> 
> However, arbitrary bindings could point kernel at such code. For
> instance, /chosen/linux,uefi-system-table could point the kernel at a
> faked EFI system table, with pointers to malicious code. So
> arbitrary modification of /chosen is not safe.

PowerPC doesn't have UEFI so this option is not a concern in that 
architecture. I'm having a look at what a PowerPC kernel gets from /chosen 
and haven't found anything of concern so far, but I'm still looking.

On the other hand, the kernel command line has the option acpi_rsdp, which 
is used to pass the address of the RSDP. I don't really know much about EFI 
so I'm not sure if it can be used to point to code that the kernel can 
execute, but it does point to tables that contain AML code.

> Bindings describe arbitrary system features (devices, firmware
> interfaces, etc), so in general they might provide mechanisms to execute
> code.

Even bindings in /chosen?

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 09/14] resource limits: track highwater mark of locked memory

2016-07-15 Thread Oleg Nesterov
On 07/15, Topi Miettinen wrote:
>
> Track maximum size of locked memory, to be able to configure
> RLIMIT_MEMLOCK resource limits. The information is available
> with taskstats and cgroupstats netlink socket.

So I personally still dislike the very idea of this series... but I won't
argue if you convince maintainers.

> @@ -2020,6 +2020,10 @@ static int acct_stack_growth(struct vm_area_struct 
> *vma, unsigned long size, uns
>   return -ENOMEM;
>  
>   update_resource_highwatermark(RLIMIT_STACK, actual_size);
> + if (vma->vm_flags & VM_LOCKED)
> + update_resource_highwatermark(RLIMIT_MEMLOCK,
> +   (mm->locked_vm + grow) <<
> +   PAGE_SHIFT);

Btw this is not right. The same for the previous patch which tracks
RLIMIT_STACK. The "current" task can debugger/etc.

Yes, yes, this just reminds that the whole rlimit logic in this path
is broken but still...

Oleg.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Russell King - ARM Linux
On Fri, Jul 15, 2016 at 09:26:10AM -0400, Vivek Goyal wrote:
> On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote:
> > I think that helps, as it makes the problem space correspond to that
> > of modifying the command line, but I can still come up with countless
> > attacks based on modifications of the /chosen node and/or the command
> > line, in fact it's probably easier than any other node.
> 
> I don't know anything about DTB. So here comes a very basic question. Does
> DTB allow passing an executable blob to kernel or pass the location of
> some unsigned executable code at kernel level.

DT on ARM is a description of the hardware - it can be thought of as a
set of nodes with properties attached.  The properties can describe
anything (we have documentation in Documentation/devicetree/bindings
which describes what we expect the properties to contain.)

On other architectures, DT can also contain open-firmware "functions"
but I don't think there's much support in the kernel for that - maybe
the PPC folk can reply on that point.

It is possible that someone may, at some point, decide to create a
property that points to some executable blob, but I can't think of a
reason why we should ever allow such a monstrosity in mainline kernels.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch] ibmvfc: prevent a potential deadlock

2016-07-15 Thread Brian King
Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Mark Rutland
On Fri, Jul 15, 2016 at 09:26:10AM -0400, Vivek Goyal wrote:
> On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote:
> > On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann wrote:
> > > Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann:
> > 
> > > > 
> > > > Right, but the question remains whether this helps while you allow the
> > > > boot loader to modify the dtb. If an attacker gets in and cannot modify
> > > > the kernel or initid but can modify the DT, a successful attack would
> > > > be a bit harder than having a modified kernel, but you may still need
> > > > to treat the system as compromised.
> > > 
> > > Yes, and the same question also remains regarding the kernel command line.
> > > 
> > > We can have the kernel perform sanity checks on the device tree, just as 
> > > the 
> > > kernel needs to sanity check the command line.
> > > 
> > > There's the point that was raised about not wanting to increase the 
> > > attack 
> > > surface, and that's a valid point. But at least in the way Petitboot 
> > > works 
> > > today, it needs to modify the device tree and pass it to the kernel.
> > > 
> > > One thing that is unavoidable to come from userspace is 
> > > /chosen/linux,stdout-path, because it's Petitboot that knows from which 
> > > console the user is interacting with. The other modification to set 
> > > properties in vga@0 can be done in the kernel.
> > > 
> > > Given that on DTB-based systems /chosen is an important and established 
> > > way 
> > > to pass information to the operating system being booted, I'd like to 
> > > suggest the following, then:
> > > 
> > > Extend the syscall as shown in this RFC from Takahiro AKASHI, but instead 
> > > of 
> > > accepting a complete DTB from userspace, the syscall would accept a DTB 
> > > containing only a /chosen node. If the DTB contains any other node, the 
> > > syscall fails with EINVAL. The kernel can then add the properties in 
> > > /chosen 
> > > to the device tree that it will pass to the next kernel.
> > > 
> > > What do you think?
> > 
> > I think that helps, as it makes the problem space correspond to that
> > of modifying the command line, but I can still come up with countless
> > attacks based on modifications of the /chosen node and/or the command
> > line, in fact it's probably easier than any other node.
> 
> I don't know anything about DTB. So here comes a very basic question. Does
> DTB allow passing an executable blob to kernel or pass the location of
> some unsigned executable code at kernel level. I think from secureboot point 
> of
> view that would be a concern. Being able to trick kernel to execute an
> unsigned code at privileged level.

The DTB itself won't contain executable code.

However, arbitrary bindings could point kernel at such code. For
instance, /chosen/linux,uefi-system-table could point the kernel at a
faked EFI system table, with pointers to malicious code. So
arbitrary modification of /chosen is not safe.

Bindings describe arbitrary system features (devices, firmware
interfaces, etc), so in general they might provide mechanisms to execute
code.

Thanks,
Mark.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal
On Fri, Jul 15, 2016 at 09:31:02AM +0200, Arnd Bergmann wrote:
> On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann wrote:
> > Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann:
> 
> > > 
> > > Right, but the question remains whether this helps while you allow the
> > > boot loader to modify the dtb. If an attacker gets in and cannot modify
> > > the kernel or initid but can modify the DT, a successful attack would
> > > be a bit harder than having a modified kernel, but you may still need
> > > to treat the system as compromised.
> > 
> > Yes, and the same question also remains regarding the kernel command line.
> > 
> > We can have the kernel perform sanity checks on the device tree, just as 
> > the 
> > kernel needs to sanity check the command line.
> > 
> > There's the point that was raised about not wanting to increase the attack 
> > surface, and that's a valid point. But at least in the way Petitboot works 
> > today, it needs to modify the device tree and pass it to the kernel.
> > 
> > One thing that is unavoidable to come from userspace is 
> > /chosen/linux,stdout-path, because it's Petitboot that knows from which 
> > console the user is interacting with. The other modification to set 
> > properties in vga@0 can be done in the kernel.
> > 
> > Given that on DTB-based systems /chosen is an important and established way 
> > to pass information to the operating system being booted, I'd like to 
> > suggest the following, then:
> > 
> > Extend the syscall as shown in this RFC from Takahiro AKASHI, but instead 
> > of 
> > accepting a complete DTB from userspace, the syscall would accept a DTB 
> > containing only a /chosen node. If the DTB contains any other node, the 
> > syscall fails with EINVAL. The kernel can then add the properties in 
> > /chosen 
> > to the device tree that it will pass to the next kernel.
> > 
> > What do you think?
> 
> I think that helps, as it makes the problem space correspond to that
> of modifying the command line, but I can still come up with countless
> attacks based on modifications of the /chosen node and/or the command
> line, in fact it's probably easier than any other node.

I don't know anything about DTB. So here comes a very basic question. Does
DTB allow passing an executable blob to kernel or pass the location of
some unsigned executable code at kernel level. I think from secureboot point of
view that would be a concern. Being able to trick kernel to execute an
unsigned code at privileged level.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-15 Thread Mark Rutland
On Fri, Jul 15, 2016 at 09:09:55AM -0400, Vivek Goyal wrote:
> On Tue, Jul 12, 2016 at 10:42:01AM +0900, AKASHI Takahiro wrote:
> 
> [..]
> > -SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
> > +SYSCALL_DEFINE6(kexec_file_load, int, kernel_fd, int, initrd_fd,
> > unsigned long, cmdline_len, const char __user *, cmdline_ptr,
> > -   unsigned long, flags)
> > +   unsigned long, flags, const struct kexec_fdset __user *, ufdset)
> 
> Can one add more parameters to existing syscall. Can it break existing
> programs with new kernel? I was of the impression that one can't do that.
> But may be I am missing something.

I think the idea was that we would only look at the new params if a new
flags was set, and otherwise it would behave as the old syscall.

Regardless, I think it makes far more sense to add a kexec_file_load2
syscall if we're going to modify the prototype at all. It's a rather
different proposition to the existing syscall, and needs to be treated
as such.

Thanks,
Mark.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle

2016-07-15 Thread Balbir Singh
On Fri, Jul 08, 2016 at 02:17:02AM +0530, Shreyas B. Prabhu wrote:
> Signed-off-by: Shreyas B. Prabhu 
> ---
> -No changes since v4
> 
> Changes in v4
> =
> - New in v4
> 
>  arch/powerpc/kernel/idle_power7.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/idle_power7.S 
> b/arch/powerpc/kernel/idle_power7.S
> index 470ceeb..705c867 100644
> --- a/arch/powerpc/kernel/idle_power7.S
> +++ b/arch/powerpc/kernel/idle_power7.S
> @@ -252,7 +252,7 @@ _GLOBAL(power7_sleep)
>   /* No return */
>  
>  _GLOBAL(power7_winkle)
> - li  r3,3
> + li  r3,PNV_THREAD_WINKLE
>   li  r4,1
>   b   power7_powersave_common

Acked-by: Balbir Singh 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 3/3] kexec: extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal
On Tue, Jul 12, 2016 at 10:42:01AM +0900, AKASHI Takahiro wrote:

[..]
> -SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
> +SYSCALL_DEFINE6(kexec_file_load, int, kernel_fd, int, initrd_fd,
>   unsigned long, cmdline_len, const char __user *, cmdline_ptr,
> - unsigned long, flags)
> + unsigned long, flags, const struct kexec_fdset __user *, ufdset)

Can one add more parameters to existing syscall. Can it break existing
programs with new kernel? I was of the impression that one can't do that.
But may be I am missing something.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Vivek Goyal
On Fri, Jul 15, 2016 at 09:49:25AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 13, 2016 at 03:13:42PM +0200, Arnd Bergmann wrote:
> > On Wednesday, July 13, 2016 10:41:28 AM CEST Mark Rutland wrote:
> > > The big question is whether this is a realistic case on a secure boot
> > > system.
> > 
> > What does x86 do here? I assume changes to the command line are also
> > limited.
> 
> They aren't.  You can specify /anything/ even with a fully-signed kernel
> and initrd, which was one of the things I pointed out in my previous
> set of responses.

Yes, kernel command line is not signed. For that matter even initird is
not signed. Just kernel is signed and its signatures are verified. Idea
is an unsigned code should not be able to execute in kernel space.

Vivek
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 02/11] mm: Hardened usercopy

2016-07-15 Thread Balbir Singh
On Thu, Jul 14, 2016 at 09:53:31PM -0700, Kees Cook wrote:
> On Thu, Jul 14, 2016 at 9:05 PM, Kees Cook  wrote:
> > On Thu, Jul 14, 2016 at 6:41 PM, Balbir Singh  wrote:
> >> On Thu, Jul 14, 2016 at 09:04:18PM -0400, Rik van Riel wrote:
> >>> On Fri, 2016-07-15 at 09:20 +1000, Balbir Singh wrote:
> >>>
> >>> > > ==
> >>> > > +((unsigned long)end & (unsigned
> >>> > > long)PAGE_MASK)))
> >>> > > + return NULL;
> >>> > > +
> >>> > > + /* Allow if start and end are inside the same compound
> >>> > > page. */
> >>> > > + endpage = virt_to_head_page(end);
> >>> > > + if (likely(endpage == page))
> >>> > > + return NULL;
> >>> > > +
> >>> > > + /* Allow special areas, device memory, and sometimes
> >>> > > kernel data. */
> >>> > > + if (PageReserved(page) && PageReserved(endpage))
> >>> > > + return NULL;
> >>> >
> >>> > If we came here, it's likely that endpage > page, do we need to check
> >>> > that only the first and last pages are reserved? What about the ones
> >>> > in
> >>> > the middle?
> >>>
> >>> I think this will be so rare, we can get away with just
> >>> checking the beginning and the end.
> >>>
> >>
> >> But do we want to leave a hole where an aware user space
> >> can try a longer copy_* to avoid this check? If it is unlikely
> >> should we just bite the bullet and do the check for the entire
> >> range?
> >
> > I'd be okay with expanding the test -- it should be an extremely rare
> > situation already since the common Reserved areas (kernel data) will
> > have already been explicitly tested.
> >
> > What's the best way to do "next page"? Should it just be:
> >
> > for ( ; page <= endpage ; ptr += PAGE_SIZE, page = virt_to_head_page(ptr) ) 
> > {
> > if (!PageReserved(page))
> > return "";
> > }
> >
> > return NULL;
> >
> > ?
> 
> Er, I was testing the wrong thing. How about:
> 
> /*
>  * Reject if range is not Reserved (i.e. special or device memory),
>  * since then the object spans several independently allocated pages.
>  */
> for (; ptr <= end ; ptr += PAGE_SIZE, page = virt_to_head_page(ptr)) {
> if (!PageReserved(page))
> return "";
> }
> 
> return NULL;

That looks reasonable to me

Balbir

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch V2 30/67] powerpc/numa: Convert to hotplug state machine

2016-07-15 Thread Anton Blanchard via Linuxppc-dev

> > I noticed tip started failing in my CI environment which tests on
> > QEMU. The failure bisected to commit
> > 425209e0abaf2c6e3a90ce4fedb935c10652bf80  
> 
> That's very useful, thanks Anton!
> 
> I have removed this commit from the series for the time being,
> refactored the followup commits (there was one trivial conflict). We
> can re-try this patch when a fix is found.

Thanks Ingo, my tests are passing again after your last push.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

2016-07-15 Thread Tejun Heo
On Fri, Jul 15, 2016 at 03:30:41PM +1000, Michael Ellerman wrote:
> It looks like this still hasn't gone to Linus for 4.7?
> 
> Could it please, it's a pretty nasty regression on our boxes.

Sorry about that.  Just sent out the pull request.

Thanks.

-- 
tejun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread Topi Miettinen
Hello,

There are many basic ways to control processes, including capabilities,
cgroups and resource limits. However, there are far fewer ways to find out
useful values for the limits, except blind trial and error.

This patch series attempts to fix that by giving at least a nice starting
point from the highwater mark values of the resources in question.
I looked where each limit is checked and added a call to update the mark
nearby.

Example run of program from Documentation/accounting/getdelauys.c:

./getdelays -R -p `pidof smartd`
printing resource accounting
RLIMIT_CPU=0
RLIMIT_FSIZE=0
RLIMIT_DATA=18198528
RLIMIT_STACK=135168
RLIMIT_CORE=0
RLIMIT_RSS=0
RLIMIT_NPROC=1
RLIMIT_NOFILE=55
RLIMIT_MEMLOCK=0
RLIMIT_AS=130879488
RLIMIT_LOCKS=0
RLIMIT_SIGPENDING=0
RLIMIT_MSGQUEUE=0
RLIMIT_NICE=0
RLIMIT_RTPRIO=0
RLIMIT_RTTIME=0

./getdelays -R -C /sys/fs/cgroup/systemd/system.slice/smartd.service/
printing resource accounting
sleeping 1, blocked 0, running 0, stopped 0, uninterruptible 0
RLIMIT_CPU=0
RLIMIT_FSIZE=0
RLIMIT_DATA=18198528
RLIMIT_STACK=135168
RLIMIT_CORE=0
RLIMIT_RSS=0
RLIMIT_NPROC=1
RLIMIT_NOFILE=55
RLIMIT_MEMLOCK=0
RLIMIT_AS=130879488
RLIMIT_LOCKS=0
RLIMIT_SIGPENDING=0
RLIMIT_MSGQUEUE=0
RLIMIT_NICE=0
RLIMIT_RTPRIO=0
RLIMIT_RTTIME=0

In this example, smartd is running as a non-root user. The presented
values can be used as a starting point for giving new limits to the
service.

There's one problem with the patch 07/13, kernel initialization calls
create_worker() which seems to use different locking model or something:

[0.145410] =
[0.148000] [ INFO: possible irq lock inversion dependency detected ]
[0.148000] 4.7.0-rc7+ #155 Not tainted
[0.148000] -
[0.148000] swapper/0/1 just changed the state of lock:
[0.148000]  (&(&(&sig->stats_lock)->lock)->rlock){+.}, at: 
[] __sched_setscheduler+0x339/0xbd0
[0.148000] but this lock was taken by another, HARDIRQ-safe lock in the 
past:
[0.148000]  (&rq->lock){-.}

and interrupts could create inverse lock ordering between them.

[0.148000] 
[0.148000] other info that might help us debug this:
[0.148000]  Possible interrupt unsafe locking scenario:
[0.148000] 
[0.148000]CPU0CPU1
[0.148000]
[0.148000]   lock(&(&(&sig->stats_lock)->lock)->rlock);
[0.148000]local_irq_disable();
[0.148000]lock(&rq->lock);
[0.148000]
lock(&(&(&sig->stats_lock)->lock)->rlock);
[0.148000]   
[0.148000] lock(&rq->lock);
[0.148000] 
[0.148000]  *** DEADLOCK ***
[0.148000] 
[0.148000] 2 locks held by swapper/0/1:
[0.148000]  #0:  (cpu_hotplug.lock){.+.+.+}, at: [] 
get_online_cpus+0x24/0x70
[0.148000]  #1:  (smpboot_threads_lock){+.+.+.}, at: [] 
smpboot_register_percpu_thread_cpumask+0x37/0xf0
[0.148000] 
[0.148000] the shortest dependencies between 2nd lock and 1st lock:
[0.148000]  -> (&rq->lock){-.} ops: 181 {
[0.148000] IN-HARDIRQ-W at:
[0.148000]   [] 
__lock_acquire+0x6e9/0x1440
[0.148000]   [] 
lock_acquire+0xe3/0x1c0
[0.148000]   [] 
_raw_spin_lock+0x31/0x40
[0.148000]   [] 
scheduler_tick+0x41/0xd0
[0.148000]   [] 
update_process_times+0x51/0x60
[0.148000]   [] 
tick_periodic+0x2f/0xc0
[0.148000]   [] 
tick_handle_periodic+0x25/0x70
[0.148000]   [] 
timer_interrupt+0x15/0x20
[0.148000]   [] 
handle_irq_event_percpu+0x41/0x320
[0.148000]   [] 
handle_irq_event+0x39/0x60
[0.148000]   [] 
handle_level_irq+0x88/0x110
[0.148000]   [] handle_irq+0x1a/0x30
[0.148000]   [] do_IRQ+0x61/0x120
[0.148000]   [] ret_from_intr+0x0/0x19
[0.148000]   [] 
__setup_irq+0x3f9/0x5e0
[0.148000]   [] setup_irq+0x46/0xa0
[0.148000]   [] 
setup_default_timer_irq+0x1e/0x20
[0.148000]   [] 
hpet_time_init+0x17/0x19
[0.148000]   [] 
x86_late_time_init+0xa/0x11
[0.148000]   [] 
start_kernel+0x39d/0x465
[0.148000]   [] 
x86_64_start_reservations+0x2f/0x31
[0.148000]   [] 
x86_64_start_kernel+0x178/0x18b
[0.148000] INITIAL USE at:
[0.148000]  [] 
__lock_acquire+0x240/0x1440
[0.148000]  [] lock_acquire+0xe3/0x1c0
[0.148000]  [] 
_raw_spin_lock_irqsave+0x3c/0x50
[0.148000]  [] 
rq_attach

Re: [4.1.28 PATCH] powerpc: Fix build break due to missing PPC_FEATURE2_HTM_NOSC

2016-07-15 Thread Sasha Levin
On 07/14/2016 07:10 AM, Michael Ellerman wrote:
> The backport of 4705e02498d6 ("powerpc: Update TM user feature bits in
> scan_features()") (f49eb503f0f9), missed the fact that 4.1 doesn't
> include the commit that added PPC_FEATURE2_HTM_NOSC.
> 
> The correct fix is simply to omit PPC_FEATURE2_HTM_NOSC.
> 
> Fixes: f49eb503f0f9 ("powerpc: Update TM user feature bits in 
> scan_features()")
> Reported-by: Christian Zigotzky 
> Signed-off-by: Michael Ellerman 

Thanks Michael, I've queued it up.


Thanks,
Sasha
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH for-4.8 03/12] powerpc/mm: use _raw variant of page table accessors

2016-07-15 Thread David Laight
From: Aneesh Kumar K.V
> Sent: 13 July 2016 10:35
> 
> This switch few of the page table accessor to use the __raw variant
> and does the cpu to big endian conversion of constants. This helps in
> generating better code.

It might be better to say that checks for a value being 0 don't depend
on the endianness.

In which case you want a function that return !!xxx_raw() itself.

OTOH it might be worth finding out why the cpu's byteswapping memory
accessors aren't used - which might save the byteswap instruction
sequence in all paths.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[patch] ibmvfc: prevent a potential deadlock

2016-07-15 Thread Dan Carpenter
My static checker complains that we need to unlock on this path.  Seems
true.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index fc523c3..ab67ec4 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4722,6 +4722,8 @@ static void ibmvfc_rport_add_thread(struct work_struct 
*work)
tgt_dbg(tgt, "Setting rport roles\n");
fc_remote_port_rolechg(rport, 
tgt->ids.roles);
put_device(&rport->dev);
+   } else {
+   
spin_unlock_irqrestore(vhost->host->host_lock, flags);
}
 
kref_put(&tgt->kref, ibmvfc_release_tgt);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[patch] usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()

2016-07-15 Thread Dan Carpenter
We can't assign -EINVAL to a u16.

Fixes: 3948f0e0c999 ('usb: add Freescale QE/CPM USB peripheral controller 
driver')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/usb/gadget/udc/fsl_qe_udc.c 
b/drivers/usb/gadget/udc/fsl_qe_udc.c
index 93d28cb..901366f 100644
--- a/drivers/usb/gadget/udc/fsl_qe_udc.c
+++ b/drivers/usb/gadget/udc/fsl_qe_udc.c
@@ -1878,11 +1878,8 @@ static int qe_get_frame(struct usb_gadget *gadget)
 
tmp = in_be16(&udc->usb_param->frame_n);
if (tmp & 0x8000)
-   tmp = tmp & 0x07ff;
-   else
-   tmp = -EINVAL;
-
-   return (int)tmp;
+   return tmp & 0x07ff;
+   return -EINVAL;
 }
 
 static int fsl_qe_start(struct usb_gadget *gadget,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[patch] crypto: nx - off by one bug in nx_of_update_msc()

2016-07-15 Thread Dan Carpenter
The props->ap[] array is defined like this:

struct alg_props ap[NX_MAX_FC][NX_MAX_MODE][3];

So we can see that if msc->fc and msc->mode are == to NX_MAX_FC or
NX_MAX_MODE then we're off by one.

Fixes: ae0222b7289d ('powerpc/crypto: nx driver code supporting nx encryption')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/crypto/nx/nx.c b/drivers/crypto/nx/nx.c
index 0794f1c..42f0f22 100644
--- a/drivers/crypto/nx/nx.c
+++ b/drivers/crypto/nx/nx.c
@@ -392,7 +392,7 @@ static void nx_of_update_msc(struct device   *dev,
 ((bytes_so_far + sizeof(struct msc_triplet)) <= lenp) &&
 i < msc->triplets;
 i++) {
-   if (msc->fc > NX_MAX_FC || msc->mode > NX_MAX_MODE) {
+   if (msc->fc >= NX_MAX_FC || msc->mode >= NX_MAX_MODE) {
dev_err(dev, "unknown function code/mode "
"combo: %d/%d (ignored)\n", msc->fc,
msc->mode);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/mm: Cleanup LPCR defines

2016-07-15 Thread Michael Ellerman
From: "Aneesh Kumar K.V" 

This makes it easy to verify we are not overloading the bits.
No functionality change by this patch.

mpe: Cleanup more. Completely fixup whitespace, convert all UL values to
ASM_CONST(), and replace all occurrences of 63-x with the actual shift.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/reg.h | 74 +-
 1 file changed, 37 insertions(+), 37 deletions(-)

v2: Clean it up even more.

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 295a19aec9b9..d7e9ab5e4709 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -330,43 +330,43 @@
 #define   HFSCR_FP __MASK(FSCR_FP_LG)
 #define SPRN_TAR   0x32f   /* Target Address Register */
 #define SPRN_LPCR  0x13E   /* LPAR Control Register */
-#define   LPCR_VPM0(1ul << (63-0))
-#define   LPCR_VPM1(1ul << (63-1))
-#define   LPCR_ISL (1ul << (63-2))
-#define   LPCR_VC_SH   (63-2)
-#define   LPCR_DPFD_SH (63-11)
-#define   LPCR_DPFD(7ul << LPCR_DPFD_SH)
-#define   LPCR_VRMASD  (0x1ful << (63-16))
-#define   LPCR_VRMA_L  (1ul << (63-12))
-#define   LPCR_VRMA_LP0(1ul << (63-15))
-#define   LPCR_VRMA_LP1(1ul << (63-16))
-#define   LPCR_VRMASD_SH (63-16)
-#define   LPCR_RMLS0x1C00  /* impl dependent rmo limit sel */
-#define  LPCR_RMLS_SH  (63-37)
-#define   LPCR_ILE 0x0200  /* !HV irqs set MSR:LE */
-#define   LPCR_AIL 0x0180  /* Alternate interrupt location */
-#define   LPCR_AIL_0   0x  /* MMU off exception offset 0x0 */
-#define   LPCR_AIL_3   0x0180  /* MMU on exception offset 0xc00...4xxx 
*/
-#define   LPCR_ONL 0x0004  /* online - PURR/SPURR count */
-#define   LPCR_LD  0x0002  /* large decremeter */
-#define   LPCR_PECE0x0001f000  /* powersave exit cause enable */
-#define LPCR_PECEDP0x0001  /* directed priv dbells cause 
exit */
-#define LPCR_PECEDH0x8000  /* directed hyp dbells cause 
exit */
-#define LPCR_PECE0 0x4000  /* ext. exceptions can cause exit */
-#define LPCR_PECE1 0x2000  /* decrementer can cause exit */
-#define LPCR_PECE2 0x1000  /* machine check etc can cause exit */
-#define   LPCR_MER 0x0800  /* Mediated External Exception */
-#define   LPCR_MER_SH  11
-#define   LPCR_TC  0x0200  /* Translation control */
-#define   LPCR_LPES0x000c
-#define   LPCR_LPES0   0x0008  /* LPAR Env selector 0 */
-#define   LPCR_LPES1   0x0004  /* LPAR Env selector 1 */
-#define   LPCR_LPES_SH 2
-#define   LPCR_RMI 0x0002  /* real mode is cache inhibit */
-#define   LPCR_HVICE   0x0002  /* P9: HV interrupt enable */
-#define   LPCR_HDICE   0x0001  /* Hyp Decr enable (HV,PR,EE) */
-#define   LPCR_UPRT0x0040  /* Use Process Table (ISA 3) */
-#define   LPCR_HR  0x0010
+#define   LPCR_VPM0ASM_CONST(0x8000)
+#define   LPCR_VPM1ASM_CONST(0x4000)
+#define   LPCR_ISL ASM_CONST(0x2000)
+#define   LPCR_VC_SH   61
+#define   LPCR_DPFD_SH 52
+#define   LPCR_DPFD(ASM_CONST(7) << LPCR_DPFD_SH)
+#define   LPCR_VRMASD_SH   47
+#define   LPCR_VRMASD  (ASM_CONST(1) << LPCR_VRMASD_SH)
+#define   LPCR_VRMA_L  ASM_CONST(0x0008)
+#define   LPCR_VRMA_LP0ASM_CONST(0x0001)
+#define   LPCR_VRMA_LP1ASM_CONST(0x8000)
+#define   LPCR_RMLS0x1C00  /* Implementation dependent RMO 
limit sel */
+#define   LPCR_RMLS_SH 26
+#define   LPCR_ILE ASM_CONST(0x0200)   /* !HV irqs set 
MSR:LE */
+#define   LPCR_AIL ASM_CONST(0x0180)   /* Alternate 
interrupt location */
+#define   LPCR_AIL_0   ASM_CONST(0x)   /* MMU off 
exception offset 0x0 */
+#define   LPCR_AIL_3   ASM_CONST(0x0180)   /* MMU on 
exception offset 0xc00...4xxx */
+#define   LPCR_ONL ASM_CONST(0x0004)   /* online - 
PURR/SPURR count */
+#define   LPCR_LD  ASM_CONST(0x0002)   /* large 
decremeter */
+#define   LPCR_PECEASM_CONST(0x0001f000)   /* powersave 
exit cause enable */
+#define LPCR_PECEDPASM_CONST(0x0001)   /* directed 
priv dbells cause exit */
+#define LPCR_PECEDHASM_CONST(0x8000)   /* directed hyp 
dbells cause exit */
+#define LPCR_PECE0 ASM_CONST(0x4000)   /* ext. 
exceptions can cause exit */
+#define LPCR_PECE1 ASM_CONST(0x2000)   /* decrementer 
can cause exit */
+#define LPCR_PECE2 ASM_CONST(0x1000)   /* machine 
check etc can cause exit */
+#define   LPCR_MER

Re: selftests/powerpc: Add a test for PROT_SAO

2016-07-15 Thread Michael Ellerman
On Mon, 2016-11-07 at 05:25:18 UTC, Michael Ellerman wrote:
> PROT_SAO is a powerpc-specific flag to mmap(), and we rely on arch
> specific logic to allow it to be passed to mmap().
> 
> Add a small test to ensure mmap() accepts PROT_SAO. We don't have a good
> way to test that it actually causes the mapping to be created with the
> right flags, so for now we just touch the mapping so it's faulted in. In
> future we might be able to do something better.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/24af8c5a52a70bbfd275f59836

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/crash: Rearrange loop condition to avoid out of bounds array access

2016-07-15 Thread Michael Ellerman
On Mon, 2016-11-07 at 04:17:31 UTC, Suraj Jitindar Singh wrote:
> The array crash_shutdown_handles[] has size CRASH_HANDLER_MAX, thus when
> we loop over the elements of the list we check crash_shutdown_handles[i]
> && i < CRASH_HANDLER_MAX. However this means that when we increment i to
> CRASH_HANDLER_MAX we will perform an out of bound array access checking
> the first condition before exiting on the second condition.
> 
> To avoid the out of bounds access, simply reorder the loop conditions.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a7d6392866e9777cb287ad194c

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/pmac/smp: Add missing FROZEN hotplug notifier transitions

2016-07-15 Thread Michael Ellerman
On Mon, 2016-04-04 at 09:30:01 UTC, Anna-Maria Gleixner wrote:
> The FROZEN transitions are used when a CPU suspends/resumes. In case
> of a suspend/resume, only the up prepare (CPU_UP_PREPARE_FROZEN) is
> handled. The error handling transition CPU_UP_CANCELED_FROZEN as well
> as the CPU_ONLINE_FROZEN transition are not handled.
> 
> Masking the switch case action argument with ~CPU_TASKS_FROZEN, to
> handle all FROZEN tasks the same way than the corresponding non frozen
> tasks.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Anna-Maria Gleixner 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c011926fcbeb9565599f278148

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,5/5] powerpc/sparse: Make ppc_md.{halt, restart} __noreturn

2016-07-15 Thread Michael Ellerman
On Tue, 2016-12-07 at 00:54:52 UTC, Daniel Axtens wrote:
> PowerNV marks it's halt and restart calls as __noreturn. However,
> ppc_md does not have this annotation. Add the annotation to ppc_md,
> and then to every halt/restart function that is missing it.
> 
> Additionally, I have verified that all of these functions do not
> return. Occasionally I have added a spin loop to be sure.
> 
> Signed-off-by: Daniel Axtens 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/95ec77c06e8e63fff50c497eca

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [5/5] powerpc/xmon: Dump ISA 2.07 SPRs

2016-07-15 Thread Michael Ellerman
On Thu, 2016-07-07 at 12:54:30 UTC, Michael Ellerman wrote:
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/e0ddf7a24558b356d5cf5ecc12

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,4/5] powerpc/sparse: Pass endianness to sparse

2016-07-15 Thread Michael Ellerman
On Tue, 2016-12-07 at 00:54:51 UTC, Daniel Axtens wrote:
> Explicitly give sparse an endianness in the Makefile, so that it
> doesn't get confused.
> 
> Normally we have #ifdef one and #else the other, so it doesn't usually
> matter, but we have been bitten by it before, and indeed this patch
> fixes a number of sparse errors.
> 
> Suggested-by: Arnd Bergmann 
> Signed-off-by: Daniel Axtens 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/62c2c5cf387beb4bbf45045c30

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [4/5] powerpc/xmon: Dump ISA 2.06 SPRs

2016-07-15 Thread Michael Ellerman
On Thu, 2016-07-07 at 12:54:29 UTC, Michael Ellerman wrote:
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/1846193b178dcc58435fdc5735

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [3/5] powerpc/xmon: Adjust spacing of existing SPRs to make room for more

2016-07-15 Thread Michael Ellerman
On Thu, 2016-07-07 at 12:54:28 UTC, Michael Ellerman wrote:
> Purely to make it pleasing to the eye.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/56346ad88d65fd60dde7b0535f

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [28/41] powerpc/85xx/mpc85xx_rdb: Don't use the flat device-tree after boot

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:04:04 UTC, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/acd3578ed9100565ef1b39685e

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [27/41] powerpc/85xx/mpc85xx_ds: Don't use the flat device-tree after boot

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:04:03 UTC, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5b0f9f83684dff40014ce1d3c0

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [26/41] powerpc/85xx/ge_imp3a: Don't use the flat device-tree after boot

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:04:02 UTC, Benjamin Herrenschmidt wrote:
> ge_imp3a_pic_init() is called way beyond the unflattening of
> the tree, it shouldn't be using of_flat_dt_*
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b282788341933c4dcd462f3c93

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [25/41] powerpc/cell: Don't use flat device-tree after boot

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:04:01 UTC, Benjamin Herrenschmidt wrote:
> Some bit of SPU code was using the FDT rather than the expanded
> device-tree. Fix it.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/69a94d84c7efc7bc146b5a8d6f

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/5] powerpc/xmon: Move static regno into its only user

2016-07-15 Thread Michael Ellerman
On Thu, 2016-07-07 at 12:54:27 UTC, Michael Ellerman wrote:
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/13629dad1e30e310bb21baa102

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [19/41] powerpc: Don't test for machine type in smp_setup_cpu_maps()

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:03:55 UTC, Benjamin Herrenschmidt wrote:
> The subsequent test for RTAS along with the LPAR test are sufficient
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/0f2b3442fb850626d50a9d7e53

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [18/41] powerpc/rtas: Don't test for machine type in rtas_initialize()

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:03:54 UTC, Benjamin Herrenschmidt wrote:
> The test is unnecessary, the FW_FEATURE_LPAR is sufficient as there
> exist no other LPAR type that has RTAS.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/484cc1ed3c6b90459f02977f6f

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:14 UTC, Ian Munsie wrote:
> From: Andrew Donnellan 
> 
> Add a new API, cxl_check_and_switch_mode() to allow for switching of
> bi-modal CAPI cards, such as the Mellanox CX-4 network card.
...
> 
> Co-authored-by: Ian Munsie 
> Cc: Gavin Shan 
> Signed-off-by: Andrew Donnellan 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b0b5e5918ad1babfd1d43d98c7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:13 UTC, Ian Munsie wrote:
> From: Andrew Donnellan 
> 
> When calling pnv_php_set_slot_power_state() with state ==
> OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're
> dealing with OPAL_PCI_SLOT_POWER_OFF.
> 
> Cc: Gavin Shan 
> Cc: linux-...@vger.kernel.org
> Cc: Bjorn Helgaas 
> Signed-off-by: Andrew Donnellan 
> Signed-off-by: Ian Munsie 
> Acked-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5473a6bf635d35d5c1d12d0e13

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:12 UTC, Ian Munsie wrote:
> From: Andrew Donnellan 
> 
> The cxl driver will use infrastructure from pnv_php to handle device tree
> updates when switching bi-modal CAPI cards into CAPI mode.
> 
> To enable this, export pnv_php_find_slot() and
> pnv_php_set_slot_power_state(), and add corresponding declarations, as well
> as the definition of struct pnv_php_slot, to asm/pnv-pci.h.
> 
> Cc: Gavin Shan 
> Cc: linux-...@vger.kernel.org
> Cc: Bjorn Helgaas 
> Signed-off-by: Andrew Donnellan 
> Signed-off-by: Ian Munsie 
> Acked-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/89379f165a1be13aa9b4731a90

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [12/15] cxl: Workaround PE=0 hardware limitation in Mellanox CX4

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:11 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The CX4 card cannot cope with a context with PE=0 due to a hardware
> limitation, resulting in:
> 
> [   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
> [   34.166580] mlx5_core :01:00.1: Failed allocating uar, aborting
> 
> Since the kernel API allocates a default context very early during
> device init that will almost certainly get Process Element ID 0 there is
> no easy way for us to extend the API to allow the Mellanox to inform us
> of this limitation ahead of time.
> 
> Instead, work around the issue by extending the XSL structure to include
> a minimum PE to allocate. Although the bug is not in the XSL, it is the
> easiest place to work around this limitation given that the CX4 is
> currently the only card that uses an XSL.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 
> Reviewed-by: Frederic Barrat 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f67a6722d650b864b020b19b39

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [11/15] cxl: Add support for interrupts on the Mellanox CX4

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:10 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
> interrupts are routed from the networking hardware to the XSL using the
> MSIX table, and from there will be transformed back into an MSIX
> interrupt using the cxl style interrupts (i.e. using IVTE entries and
> ranges to map a PE and AFU interrupt number to an MSIX address).
> 
> We want to hide the implementation details of cxl interrupts as much as
> possible. To this end, we use a special version of the MSI setup &
> teardown routines in the PHB while in cxl mode to allocate the cxl
> interrupts and configure the IVTE entries in the process element.
> 
> This function does not configure the MSIX table - the CX4 card uses a
> custom format in that table and it would not be appropriate to fill that
> out in generic code. The rest of the functionality is similar to the
> "Full MSI-X mode" described in the CAIA, and this could be easily
> extended to support other adapters that use that mode in the future.
> 
> The interrupts will be associated with the default context. If the
> maximum number of interrupts per context has been limited (e.g. by the
> mlx5 driver), it will automatically allocate additional kernel contexts
> to associate extra interrupts as required. These contexts will be
> started using the same WED that was used to start the default context.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a2f67d5ee8d950caaa7a6144cf

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [10/41] powerpc: Add comment explaining the purpose of setup_kdump_trampoline()

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:03:46 UTC, Benjamin Herrenschmidt wrote:
> Anything in early_setup() needs to be justified to be there, in
> this case, we need the trampolines before we can take exceptions
> and thus before we turn on the MMU.
> 
> Also remove a pretty meaningless and misplaced debug message
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/63c254a501049f70c53aea6025

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [10/15] cxl: Add preliminary workaround for CX4 interrupt limitation

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:09 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The Mellanox CX4 has a hardware limitation where only 4 bits of the
> AFU interrupt number can be passed to the XSL when sending an interrupt,
> limiting it to only 15 interrupts per context (AFU interrupt number 0 is
> invalid).
> 
> In order to overcome this, we will allocate additional contexts linked
> to the default context as extra address space for the extra interrupts -
> this will be implemented in the next patch.
> 
> This patch adds the preliminary support to allow this, by way of adding
> a linked list in the context structure that we use to keep track of the
> contexts dedicated to interrupts, and an API to simultaneously iterate
> over the related context structures, AFU interrupt numbers and hardware
> interrupt numbers. The point of using a single API to iterate these is
> to hide some of the details of the iteration from external code, and to
> reduce the number of APIs that need to be exported via base.c to allow
> built in code to call.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Frederic Barrat 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/cbce0917e2e47d4bf5aa3b5fd6

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/5] powerpc/xmon: Remove unused externs

2016-07-15 Thread Michael Ellerman
On Thu, 2016-07-07 at 12:54:26 UTC, Michael Ellerman wrote:
> None of these are used, or have been since we merged ppc & ppc64.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/5b71eff78267a1e0d2f178a8b5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/5] powerpc/kvm: Clarify __user annotations

2016-07-15 Thread Michael Ellerman
On Tue, 2016-12-07 at 00:54:48 UTC, Daniel Axtens wrote:
> kvmppc_h_put_tce_indirect labels a u64 pointer as __user. It also
> labelled the u64 where get_user puts the result as __user. This isn't
> a pointer and so doesn't need to be labelled __user.
> 
> Split the u64 value definition onto a new line to make it clear that
> it doesn't get the annotation.
> 
> Signed-off-by: Daniel Axtens 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f8750513b7001d5ae96313d4e1

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [09/41] powerpc: Update obsolete comments in setup_32.c about entry conditions

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:03:45 UTC, Benjamin Herrenschmidt wrote:
> early_init() is called in-place before kernel relocation and using
> whatever MMU setup exists at the point the kernel is entered.
> 
> machine_init() is called after relocation and after some initial
> mapping of PAGE_OFFSET has been established (typically using BATs
> on 6xx/7xx/7xxx processors or some form of bolted TLB on others).
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bd7c93cca36911baf2eb2bc386

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [09/15] cxl: Add kernel APIs to get & set the max irqs per context

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:08 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> These APIs will be used by the Mellanox CX4 support. While they function
> standalone to configure existing behaviour, their primary purpose is to
> allow the Mellanox driver to inform the cxl driver of a hardware
> limitation, which will be used in a future patch.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Frederic Barrat 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/79384e4b71240abf50c375eea5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [08/41] powerpc: Move epapr_paravirt_early_init() to early_init_devtree()

2016-07-15 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:03:44 UTC, Benjamin Herrenschmidt wrote:
> The function is called by both 32-bit and 64-bit early setup right
> after early_init_devtree(). All it does is run yet another early
> DT parser which is precisely what early_init_devtree() is about,
> so move it in there.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/da6a97bf12d57e341029b3624e

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [08/15] cxl: Add support for using the kernel API with a real PHB

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:07 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> This hooks up support for using the kernel API with a real PHB. After
> the AFU initialisation has completed it calls into the PHB code to pass
> it the AFU that will be used by other peer physical functions on the
> adapter.
> 
> The cxl_pci_to_afu API is extended to work with peer PCI devices,
> retrieving the peer AFU from the PHB. This API may also now return an
> error if it is called on a PCI device that is not associated with either
> a cxl vPHB or a peer PCI device to an AFU, and this error is propagated
> down.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/317f5ef1b363417b6f1e93b90d

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:06 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> This adds support for the peer model of the cxl kernel api to the
> PowerNV PHB, in which physical function 0 represents the cxl function on
> the card (an XSL in the case of the CX4), which other physical functions
> will use for memory access and interrupt services. It is referred to as
> the peer model as these functions are peers of one another, as opposed
> to the Virtual PHB model which forms a hierarchy.
> 
> This patch exports APIs to enable the peer mode, check if a PCI device
> is attached to a PHB in this mode, and to set and get the peer AFU for
> this mode.
> 
> The cxl driver will enable this mode for supported cards by calling
> pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
> that this mode is enabled, and switch out it's controller_ops for the
> cxl version.
> 
> The cxl version of the controller_ops struct implements it's own
> versions of the enable_device_hook and release_device to handle
> refcounting on the peer AFU and to allocate a default context for the
> device.
> 
> Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
> there is no safe way to disable cxl mode short of a reboot, so until
> that changes there is no reason to support the disable path.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4361b03430d685610e5feea3ec

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [06/15] cxl: Do not create vPHB if there are no AFU configuration records

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:05 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The vPHB model of the cxl kernel API is a hierarchy where the AFU is
> represented by the vPHB, and it's AFU configuration records are exposed
> as functions under that vPHB. If there are no AFU configuration records
> we will create a vPHB with nothing under it, which is a waste of
> resources and will opt us into EEH handling despite not having anything
> special to handle.
> 
> This also does not make sense for cards using the peer model of the cxl
> kernel API, where the other functions of the device are exposed via
> additional peer physical functions rather than AFU configuration
> records. This model will also not work with the existing EEH handling in
> the cxl driver, as that is designed around the vPHB model.
> 
> Skip creating the vPHB for AFUs without any AFU configuration records,
> and opt out of EEH handling for them.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e4f5fc001a6cb82bef91037245

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [05/15] cxl: Allow a default context to be associated with an external pci_dev

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:04 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The cxl kernel API has a concept of a default context associated with
> each PCI device under the virtual PHB. The Mellanox CX4 will also use
> the cxl kernel API, but it does not use a virtual PHB - rather, the AFU
> appears as a physical function as a peer to the networking functions.
> 
> In order to allow the kernel API to work with those networking
> functions, we will need to associate a default context with them as
> well. To this end, refactor the corresponding code to do this in vphb.c
> and export it so that it can be called from the PHB code.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Frederic Barrat 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a19bd79e31769626d288cc016e

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [04/15] cxl: Move cxl_afu_get / cxl_afu_put to base

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:03 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The Mellanox CX4 uses a model where the AFU is one physical function of
> the device, and is used by other peer physical functions of the same
> device. This will require those other devices to grab a reference on the
> AFU when they are initialised to make sure that it does not go away
> during their lifetime.
> 
> Move the AFU refcount functions to base.c so they can be called from
> the PHB code.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 
> Reviewed-by: Frederic Barrat 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/62ccf2d2efefa01d0eb92cd6ec

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [03/15] cxl: Enable bus mastering for devices using CAPP DMA mode

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:02 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
> master to be enabled in order for the CAPI traffic to flow. This should
> be harmless to enable for other cxl devices, so unconditionally enable
> it in the adapter init flow.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 
> Reviewed-by: Frederic Barrat 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/48b3adf33459c1c42766d9c206

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [02/15] cxl: Add cxl_slot_is_supported API

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:01 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> This extends the check that the adapter is in a CAPI capable slot so
> that it may be called by external users in the kernel API. This will be
> used by the upcoming Mellanox CX4 support, which needs to know ahead of
> time if the card can be switched to cxl mode so that it can leave it in
> PCI mode if it is not.
> 
> This API takes a parameter to check if CAPP DMA mode is supported, which
> it currently only allows on P8NVL systems, since that mode currently has
> issues accessing memory < 4GB on P8, and we cannot realistically avoid
> that.
> 
> This API does not currently check if a CAPP unit is available (i.e. not
> already assigned to another PHB) on P8. Doing so would be racy since it
> is assigned on a first come first serve basis, and so long as CAPP DMA
> mode is not supported on P8 we don't need this, since the only
> anticipated user of this API requires CAPP DMA mode.
> 
> Cc: Philippe Bergheaud 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 
> Reviewed-by: Frederic Barrat 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4e56f858bdde5cbfb70f61badd

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [01/15] powerpc/powernv: Split cxl code out into a separate file

2016-07-15 Thread Michael Ellerman
On Wed, 2016-13-07 at 21:17:00 UTC, Ian Munsie wrote:
> From: Ian Munsie 
> 
> The support for using the Mellanox CX4 in cxl mode will require
> additions to the PHB code. In preparation for this, move the existing
> cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
> more organised.
> 
> Signed-off-by: Ian Munsie 
> Reviewed-by: Andrew Donnellan 
> Reviewed-by: Frederic Barrat 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f456834a6c1db36c290fdfe8ab

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [-next] cxl: Use for_each_compatible_node() macro

2016-07-15 Thread Michael Ellerman
On Tue, 2016-12-07 at 11:30:11 UTC, weiyj...@163.com wrote:
> From: Wei Yongjun 
> 
> Use for_each_compatible_node() macro instead of open coding it.
> 
> Generated by Coccinelle.
> 
> Signed-off-by: Wei Yongjun 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fc9f75ef2fdf46fc859b991dbf

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 09/14] resource limits: track highwater mark of locked memory

2016-07-15 Thread Topi Miettinen
Track maximum size of locked memory, to be able to configure
RLIMIT_MEMLOCK resource limits. The information is available
with taskstats and cgroupstats netlink socket.

Signed-off-by: Topi Miettinen 
---
 arch/ia64/kernel/perfmon.c | 1 +
 arch/powerpc/kvm/book3s_64_vio.c   | 2 ++
 arch/powerpc/mm/mmu_context_iommu.c| 2 ++
 drivers/infiniband/core/umem.c | 1 +
 drivers/infiniband/hw/hfi1/user_pages.c| 2 ++
 drivers/infiniband/hw/qib/qib_user_pages.c | 2 ++
 drivers/infiniband/hw/usnic/usnic_uiom.c   | 2 ++
 drivers/misc/mic/scif/scif_rma.c   | 1 +
 drivers/vfio/vfio_iommu_spapr_tce.c| 2 ++
 drivers/vfio/vfio_iommu_type1.c| 5 +
 kernel/bpf/syscall.c   | 8 
 kernel/events/core.c   | 1 +
 mm/mlock.c | 8 
 mm/mmap.c  | 4 
 mm/mremap.c| 4 
 15 files changed, 45 insertions(+)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 2436ad5..7c6ae72 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2341,6 +2341,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct 
file *filp, pfm_context_t
ctx->ctx_smpl_vaddr = (void *)vma->vm_start;
*(unsigned long *)user_vaddr = vma->vm_start;
 
+   task_update_resource_highwatermark(task, RLIMIT_MEMLOCK, size);
return 0;
 
 error:
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 18cf6d1..40ea177 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -71,6 +71,8 @@ static long kvmppc_account_memlimit(unsigned long stt_pages, 
bool inc)
ret = -ENOMEM;
else
current->mm->locked_vm += stt_pages;
+   update_resource_highwatermark(RLIMIT_MEMLOCK,
+ locked << PAGE_SHIFT);
} else {
if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
stt_pages = current->mm->locked_vm;
diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index da6a216..8c6bcbf 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -46,6 +46,8 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
ret = -ENOMEM;
else
mm->locked_vm += npages;
+   update_resource_highwatermark(RLIMIT_MEMLOCK,
+ locked << PAGE_SHIFT);
} else {
if (WARN_ON_ONCE(npages > mm->locked_vm))
npages = mm->locked_vm;
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index fe4d2e1..3c454eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -224,6 +224,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
 
ret = 0;
 
+   update_resource_highwatermark(RLIMIT_MEMLOCK, locked << PAGE_SHIFT);
 out:
if (ret < 0) {
if (need_release)
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index 88e10b5f..ca55f8c 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -111,6 +111,8 @@ int hfi1_acquire_user_pages(unsigned long vaddr, size_t 
npages, bool writable,
 
down_write(¤t->mm->mmap_sem);
current->mm->pinned_vm += ret;
+   update_resource_highwatermark(RLIMIT_MEMLOCK,
+ current->mm->pinned_vm << PAGE_SHIFT);
up_write(¤t->mm->mmap_sem);
 
return ret;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index 2d2b94f..3a103c4 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -74,6 +74,8 @@ static int __qib_get_user_pages(unsigned long start_page, 
size_t num_pages,
}
 
current->mm->pinned_vm += num_pages;
+   update_resource_highwatermark(RLIMIT_MEMLOCK,
+ current->mm->pinned_vm << PAGE_SHIFT);
 
ret = 0;
goto bail;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index a0b6ebe..6180654 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -178,6 +178,8 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t 
size, int writable,
ret = 0;
}
 
+   update_resource_highwatermark(RLIMIT_MEMLOCK, locked << PAGE_SHIFT);
+
 out:
if (ret < 0)
usnic_uiom_put_pages(chunk_list, 0);
diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c
i

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Russell King - ARM Linux
On Wed, Jul 13, 2016 at 03:13:42PM +0200, Arnd Bergmann wrote:
> On Wednesday, July 13, 2016 10:41:28 AM CEST Mark Rutland wrote:
> > The big question is whether this is a realistic case on a secure boot
> > system.
> 
> What does x86 do here? I assume changes to the command line are also
> limited.

They aren't.  You can specify /anything/ even with a fully-signed kernel
and initrd, which was one of the things I pointed out in my previous
set of responses.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch V2 30/67] powerpc/numa: Convert to hotplug state machine

2016-07-15 Thread Ingo Molnar

* Anton Blanchard  wrote:

> Hi Anna-Maria,
> 
> > >> Install the callbacks via the state machine and let the core invoke
> > >> the callbacks on the already online CPUs.  
> > >
> > > This is causing an oops on ppc64le QEMU, looks like a NULL
> > > pointer:  
> > 
> > Did you tested it against tip WIP.hotplug?
> 
> I noticed tip started failing in my CI environment which tests on QEMU.
> The failure bisected to commit 425209e0abaf2c6e3a90ce4fedb935c10652bf80

That's very useful, thanks Anton!

I have removed this commit from the series for the time being, refactored the 
followup commits (there was one trivial conflict). We can re-try this patch 
when a 
fix is found.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 5/5] powerpc/kvm/stats: Implement existing and add new halt polling vcpu stats

2016-07-15 Thread Suraj Jitindar Singh


On 14/07/16 03:20, David Matlack wrote:
> On Tue, Jul 12, 2016 at 11:07 PM, Suraj Jitindar Singh
>  wrote:
>> On 12/07/16 16:17, Suraj Jitindar Singh wrote:
>>> On 12/07/16 02:49, David Matlack wrote:
> [snip]
 It's possible to poll and wait in one halt, conflating this stat with
 polling time. Is it useful to split out a third stat,
 halt_poll_fail_ns which counts how long we polled which ended up
 sleeping? Then halt_wait_time only counts the time the VCPU spent on
 the wait queue. The sum of all 3 is still the total time spent halted.

>>> I see what you're saying. I would say that in the event that you do wait
>>> then the most useful number is going to be the total block time (the sum
>>> of the wait and poll time) as this is the minimum value you would have to
>>> set the halt_poll_max_ns module parameter in order to ensure you poll
>>> for long enough (in most circumstances) to avoid waiting, which is the main
>>> use case I envision for this statistic. That being said this is definitely
>>> a source of ambiguity and splitting this into two statistics would make the
>>> distinction clearer without any loss of data, you could simply sum the two
>>> stats to get the same number.
>>>
>>> Either way I don't think it really makes much of a difference, but in the
>>> interest of clarity I think I'll split the statistic.
>> On further though, I really think that splitting this statistic is an
>> unnecessary source of ambiguity. In reality the interesting piece of
>> information is going to be the average time that you blocked on
>> either an unsuccessful poll or a successful poll.
>>
>> So instead of splitting the statistic I'm going to rename them as:
>> halt_poll_time -> halt_block_time_successful_poll
>> halt_wait_time -> halt_block_time_waited
> The downside of having only these 2 stats is there is no way to see
> the total time spent halt-polling. Halt-polling shows up as host
> kernel CPU usage on the VCPU thread, despite it really being idle
> cycles that could be reclaimed. It's useful to have the total amount
> of time spent halt-polling (halt_poll_fail + halt_poll_success) to
> feed into provisioning/monitoring systems that look at CPU usage.
>
> FWIW, I have a very similar patch internally. It adds 2 stats,
> halt_poll_success_ns and halt_poll_fail_ns, to the halt-polling code
> in virt/kvm/kvm_main.c. So if you agree splitting the stats makes
> sense, it would be helpful to us if we can adopt the same naming
> convention.

Ok, I didn't realise that was a use case.

Makes sense, I'll split it and adopt those names.

Thanks

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 4/5] kvm/stats: Add provisioning for 64-bit vcpu statistics

2016-07-15 Thread Suraj Jitindar Singh


On 14/07/16 19:42, Paolo Bonzini wrote:
>
> On 13/07/2016 20:00, Christian Borntraeger wrote:
>> I thought u64 still existed on 32-bit architectures. unsigned long
>> would be fine but with the caveat that certain stats would overflow on
>> 32-bit architectures.
 Yes, but not all 32-bit architectures can do atomic read-modify-write
 (e.g. add) operations on 64-bit values.
>> So what about only doing it for the VCPU events? Those should be only
>> modified by one CPU. We would have some odd values on 32bit overflow, but
>> this will be certainly better than just start with 0
> If that's good enough for PPC, that's fine.
>
> Paolo

I'm don't feel great about having vcpu_stats as u64 and vm_stats still as u32
it's just a bit inconsistent.

That being said, it's only the vcpu_stats which I require to be u64 at this
stage so it's possible to just upgrade those.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 4/5] kvm/stats: Add provisioning for 64-bit vm and vcpu statistics

2016-07-15 Thread Suraj Jitindar Singh


On 13/07/16 19:04, Christian Borntraeger wrote:
> On 07/13/2016 10:53 AM, Suraj Jitindar Singh wrote:
>> vms and vcpus have statistics associated with them which can be viewed
>> within the debugfs. Currently it is assumed within the vcpu_stat_get() and
>> vm_stat_get() functions that all of these statistics are represented as
>> u32s, however the next patch adds some u64 statistics.
>>
>> Thus modify these two functions, vcpu_stat_get() and vm_stat_get(), such
>> that they expect u64 statistics and update vm and vcpu statistics to u64s
>> accordingly.
>>
>> ---
>> Change Log:
>>
>> V1 -> V2:
>>  - Nothing
>> V2 -> V3:
>>  - Instead of implementing separate u32 and u64 functions keep the
>>generic functions and modify them to expect u64s. Thus update all
>>vm and vcpu statistics to u64s accordingly.
> Have not looked into everything, but I agree with changing everything to 
> 64bit.
>
>
>> @@ -3583,8 +3583,8 @@ static const struct file_operations 
>> vcpu_stat_get_per_vm_fops = {
>>  };
>>
>>  static const struct file_operations *stat_fops_per_vm[] = {
>> -[KVM_STAT_VCPU] = &vcpu_stat_get_per_vm_fops,
>> -[KVM_STAT_VM]   = &vm_stat_get_per_vm_fops,
>> +[KVM_STAT_VCPU] = &vcpu_stat_get_per_vm_fops,
>> +[KVM_STAT_VM]   = &vm_stat_get_per_vm_fops,
>>  };
> unrelated white space changes?

Woops, I'll fix that. Thanks

>
>>  static int vm_stat_get(void *_offset, u64 *val)
>> @@ -3628,8 +3628,8 @@ static int vcpu_stat_get(void *_offset, u64 *val)
>>  DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n");
>>
>>  static const struct file_operations *stat_fops[] = {
>> -[KVM_STAT_VCPU] = &vcpu_stat_fops,
>> -[KVM_STAT_VM]   = &vm_stat_fops,
>> +[KVM_STAT_VCPU] = &vcpu_stat_fops,
>> +[KVM_STAT_VM]   = &vm_stat_fops,
>>  };
>>
>>  static int kvm_init_debug(void)
>>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-15 Thread Arnd Bergmann
On Thursday, July 14, 2016 10:44:14 PM CEST Thiago Jung Bauermann wrote:
> Am Donnerstag, 14 Juli 2016, 10:29:11 schrieb Arnd Bergmann:

> > 
> > Right, but the question remains whether this helps while you allow the
> > boot loader to modify the dtb. If an attacker gets in and cannot modify
> > the kernel or initid but can modify the DT, a successful attack would
> > be a bit harder than having a modified kernel, but you may still need
> > to treat the system as compromised.
> 
> Yes, and the same question also remains regarding the kernel command line.
> 
> We can have the kernel perform sanity checks on the device tree, just as the 
> kernel needs to sanity check the command line.
> 
> There's the point that was raised about not wanting to increase the attack 
> surface, and that's a valid point. But at least in the way Petitboot works 
> today, it needs to modify the device tree and pass it to the kernel.
> 
> One thing that is unavoidable to come from userspace is 
> /chosen/linux,stdout-path, because it's Petitboot that knows from which 
> console the user is interacting with. The other modification to set 
> properties in vga@0 can be done in the kernel.
> 
> Given that on DTB-based systems /chosen is an important and established way 
> to pass information to the operating system being booted, I'd like to 
> suggest the following, then:
> 
> Extend the syscall as shown in this RFC from Takahiro AKASHI, but instead of 
> accepting a complete DTB from userspace, the syscall would accept a DTB 
> containing only a /chosen node. If the DTB contains any other node, the 
> syscall fails with EINVAL. The kernel can then add the properties in /chosen 
> to the device tree that it will pass to the next kernel.
> 
> What do you think?

I think that helps, as it makes the problem space correspond to that
of modifying the command line, but I can still come up with countless
attacks based on modifications of the /chosen node and/or the command
line, in fact it's probably easier than any other node.

What methods to we have in place for command line changes today on
other architectures?

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

  1   2   >