Re: Latest Git kernel: avahi-daemon[2410]: ioctl(): Inappropriate ioctl for device

2020-02-02 Thread Christophe Leroy

Hello,

Le 02/02/2020 à 01:08, Christian Zigotzky a écrit :

Hello,

We regularly compile and test Linux kernels every day during the merge 
window. Since Thuesday we have very high CPU loads because of the avahi 
daemon on our desktop Linux systems (Ubuntu, Debian etc).


Error message: avahi-daemon[2410]: ioctl(): Inappropriate ioctl for device


Do you know which ioctl, on which device ?
Can you take a trace of running avahi-daemon with 'strace' ?

Can you bisect ?

Christophe


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-02-02 Thread Anshuman Khandual



On 01/30/2020 06:34 PM, Anshuman Khandual wrote:
> On 01/28/2020 10:35 PM, Christophe Leroy wrote:
>>
>> Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :
>>> This adds tests which will validate architecture page table helpers and
>>> other accessors in their compliance with expected generic MM semantics.
>>> This will help various architectures in validating changes to existing
>>> page table helpers or addition of new ones.
>>>
>>> This test covers basic page table entry transformations including but not
>>> limited to old, young, dirty, clean, write, write protect etc at various
>>> level along with populating intermediate entries with next page table page
>>> and validating them.
>>>
>>> Test page table pages are allocated from system memory with required size
>>> and alignments. The mapped pfns at page table levels are derived from a
>>> real pfn representing a valid kernel text symbol. This test gets called
>>> right after page_alloc_init_late().
>>>
>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>> arm64. Going forward, other architectures too can enable this after fixing
>>> build or runtime problems (if any) with their page table helpers.
>>>
>>> Folks interested in making sure that a given platform's page table helpers
>>> conform to expected generic MM semantics should enable the above config
>>> which will just trigger this test during boot. Any non conformity here will
>>> be reported as an warning which would need to be fixed. This test will help
>>> catch any changes to the agreed upon semantics expected from generic MM and
>>> enable platforms to accommodate it thereafter.
>>>
>> [...]
>>
>>> Tested-by: Christophe Leroy     #PPC32
>> Also tested on PPC64 (under QEMU): book3s/64 64k pages, book3s/64 4k pages 
>> and book3e/64
> Hmm but earlier Michael Ellerman had reported some problems while
> running these tests on PPC64, a soft lock up in hash__pte_update()
> and a kernel BUG (radix MMU). Are those problems gone away now ?
> 
> Details in this thread - https://patchwork.kernel.org/patch/11214603/
> 

It is always better to have more platforms enabled than not. But lets keep
this test disabled on PPC64 for now, if there is any inconsistency between
results while running this under QEMU and on actual systems.


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-02-02 Thread Christophe Leroy




Le 02/02/2020 à 08:18, Anshuman Khandual a écrit :

On 01/30/2020 07:43 PM, Christophe Leroy wrote:



Le 30/01/2020 à 14:04, Anshuman Khandual a écrit :


On 01/28/2020 10:35 PM, Christophe Leroy wrote:





I think we could make it standalone and 'default y if DEBUG_VM' instead.


Which will yield the same result like before but in a different way. But
yes, this test could go about either way but unless there is a good enough
reason why change the current one.


I think if we want people to really use it on other architectures it must be 
possible to activate it without having to modify Kconfig. Otherwise people 
won't even know the test exists and the architecture fails the test.

The purpose of a test suite is to detect bugs. If you can't run the test until 
you have fixed the bugs, I guess nobody will ever detect the bugs and they will 
never be fixed.

So I think:
- the test should be 'default y' when ARCH_HAS_DEBUG_VM_PGTABLE is selected
- the test should be 'default n' when ARCH_HAS_DEBUG_VM_PGTABLE is not 
selected, and it should be user selectable if EXPERT is selected.

Something like:

config DEBUG_VM_PGTABLE
     bool "Debug arch page table for semantics compliance" if 
ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
     depends on MMU


(ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT) be moved along side MMU on the same line ?


Yes could also go along side MMU, or could be a depend by itself:
depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT




     default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
     default 'y' if DEBUG_VM


This looks good, at least until we get all platforms enabled. Will do all these
changes along with s390 enablement and re-spin.


Christophe


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-02-02 Thread Qian Cai



> On Jan 30, 2020, at 9:13 AM, Christophe Leroy  wrote:
> 
> config DEBUG_VM_PGTABLE
>bool "Debug arch page table for semantics compliance" if 
> ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>depends on MMU
>default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
>default 'y' if DEBUG_VM

Does it really necessary to potentially force all bots to run this? Syzbot, 
kernel test robot etc? Does it ever pay off for all their machine times there?

Re: Latest Git kernel: avahi-daemon[2410]: ioctl(): Inappropriate ioctl for device

2020-02-02 Thread Christian Zigotzky

On 02 February 2020 at 09:19 am, Christophe Leroy wrote:

Hello,

Le 02/02/2020 à 01:08, Christian Zigotzky a écrit :

Hello,

We regularly compile and test Linux kernels every day during the 
merge window. Since Thuesday we have very high CPU loads because of 
the avahi daemon on our desktop Linux systems (Ubuntu, Debian etc).


Error message: avahi-daemon[2410]: ioctl(): Inappropriate ioctl for 
device


Do you know which ioctl, on which device ?
Can you take a trace of running avahi-daemon with 'strace' ?

Can you bisect ?

Christophe

Hi Christophe,
Hi All,

I figured out that the avahi-daemon has a problem with the IPv6 address 
of a network interface since the Git kernel from Thursday. (Log attached)
This generates high CPU usage because the avahi-daemon tries to access 
the IPv6 address again and again and thereby it produces a lot of log 
messages.


We figured out that the networking updates aren't responsible for this 
issue because we created a test kernel on Wednesday. The issue is 
somewhere in the commits from Wednesday night to Thursday (CET).


Please compile the latest Git kernel and test it with a desktop linux 
distribution for example Ubuntu. In my point of view there are many 
desktop machines affected. Many server systems don't use the avahi 
daemon so they aren't affected.


It's possible to deactivate the access to the IPv6 address with the 
following line in the file "/etc/avahi/avahi-daemon.conf":


use-ipv6=no

After a reboot the CPU usage is normal again. This is only a temporary 
solution.


Unfortunately I don't have the time for bisecting next week. I have a 
lot of other work to do. In my point of view it is very important that 
you also compile the latest Git kernels. Then you will see the issue and 
then you have a better possibility to fix the issue.


Thanks,
Christian
Kernel 5.5.0: journalctl | grep -i avahi
Feb 02 13:57:05 DC1 systemd[1]: Listening on Avahi mDNS/DNS-SD Stack Activation 
Socket.
Feb 02 13:57:05 DC1 systemd[1]: Starting Avahi mDNS/DNS-SD Stack...
Feb 02 13:57:05 DC1 avahi-daemon[4314]: Found user 'avahi' (UID 112) and group 
'avahi' (GID 122).
Feb 02 13:57:05 DC1 avahi-daemon[4314]: Successfully dropped root privileges.
Feb 02 13:57:05 DC1 avahi-daemon[4314]: avahi-daemon 0.6.32-rc starting up.
Feb 02 13:57:06 DC1 systemd[1]: Started Avahi DNS Configuration Daemon.
Feb 02 13:57:06 DC1 avahi-daemon[4314]: Successfully called chroot().
Feb 02 13:57:06 DC1 avahi-daemon[4314]: Successfully dropped remaining 
capabilities.
Feb 02 13:57:06 DC1 avahi-daemon[4314]: No service file found in 
/etc/avahi/services.
Feb 02 13:57:06 DC1 avahi-daemon[4314]: Network interface enumeration completed.
Feb 02 13:57:06 DC1 avahi-daemon[4314]: Server startup complete. Host name is 
DC1.local. Local service cookie is 3202921551.
Feb 02 13:57:06 DC1 avahi-daemon[4314]: Failed to parse address 'localhost', 
ignoring.
Feb 02 13:57:06 DC1 avahi-dnsconfd[4487]: Successfully connected to Avahi 
daemon.
Feb 02 13:57:06 DC1 systemd[1]: Started Avahi mDNS/DNS-SD Stack.
Feb 02 13:57:07 DC1 root[4749]: /etc/dhcp/dhclient-enter-hooks.d/avahi-autoipd 
returned non-zero exit status 1
Feb 02 13:57:07 DC1 avahi-daemon[4314]: Joining mDNS multicast group on 
interface enP4096p4s4.IPv4 with address 192.168.178.47.
Feb 02 13:57:07 DC1 avahi-daemon[4314]: New relevant interface enP4096p4s4.IPv4 
for mDNS.
Feb 02 13:57:07 DC1 avahi-daemon[4314]: Registering new address record for 
192.168.178.47 on enP4096p4s4.IPv4.
Feb 02 13:57:09 DC1 avahi-daemon[4314]: Joining mDNS multicast group on 
interface enP4096p4s4.IPv6 with address fe80::250:fcff:fecb:5181.
Feb 02 13:57:09 DC1 avahi-daemon[4314]: New relevant interface enP4096p4s4.IPv6 
for mDNS.
Feb 02 13:57:09 DC1 avahi-daemon[4314]: Registering new address record for 
fe80::250:fcff:fecb:5181 on enP4096p4s4.*.
Feb 02 13:57:10 DC1 avahi-daemon[4314]: Leaving mDNS multicast group on 
interface enP4096p4s4.IPv6 with address fe80::250:fcff:fecb:5181.
Feb 02 13:57:10 DC1 avahi-daemon[4314]: Joining mDNS multicast group on 
interface enP4096p4s4.IPv6 with address 2a02:8109:89c0:ebfc:250:fcff:fecb:5181.
Feb 02 13:57:10 DC1 avahi-daemon[4314]: Registering new address record for 
2a02:8109:89c0:ebfc:250:fcff:fecb:5181 on enP4096p4s4.*.
Feb 02 13:57:10 DC1 avahi-daemon[4314]: Withdrawing address record for 
fe80::250:fcff:fecb:5181 on enP4096p4s4.


--


Latest Git kernel (5.6): journalctl | grep -i avahi

Feb 02 14:04:04 DC1 systemd[1]: Listening on Avahi mDNS/DNS-SD Stack Activation 
Socket.
Feb 02 14:04:05 DC1 systemd[1]: Started Avahi DNS Configuration Daemon.
Feb 02 14:04:05 DC1 systemd[1]: Starting Avahi mDNS/DNS-SD Stack...
Feb 02 14:04:05 DC1 avahi-daemon[4573]: Found user 'avahi' (UID 112) and group 
'avahi' (GID 122).
Feb 02 14:04:05 DC1 avahi-daemon[4573]: Successfully dropped root privileges.
Feb 02 14:04:05 DC1 avahi-daemon[4573]: avahi-daemon 0.6.32-rc starting up.
Feb 02 14:04:05 DC1 avahi-daemon[4573]: Successfully called chroot().
Feb 02

Re: [PATCH v6 1/5] powerpc/mm: Implement set_memory() routines

2020-02-02 Thread Russell Currey
On Wed, 2020-01-08 at 13:52 +0100, Christophe Leroy wrote:
> 
> Le 24/12/2019 à 06:55, Russell Currey a écrit :
> > The set_memory_{ro/rw/nx/x}() functions are required for
> > STRICT_MODULE_RWX,
> > and are generally useful primitives to have.  This implementation
> > is
> > designed to be completely generic across powerpc's many MMUs.
> > 
> > It's possible that this could be optimised to be faster for
> > specific
> > MMUs, but the focus is on having a generic and safe implementation
> > for
> > now.
> > 
> > This implementation does not handle cases where the caller is
> > attempting
> > to change the mapping of the page it is executing from, or if
> > another
> > CPU is concurrently using the page being altered.  These cases
> > likely
> > shouldn't happen, but a more complex implementation with MMU-
> > specific code
> > could safely handle them, so that is left as a TODO for now.
> > 
> > Signed-off-by: Russell Currey 
> > ---
> >   arch/powerpc/Kconfig  |  1 +
> >   arch/powerpc/include/asm/set_memory.h | 32 +++
> >   arch/powerpc/mm/Makefile  |  1 +
> >   arch/powerpc/mm/pageattr.c| 83
> > +++
> >   4 files changed, 117 insertions(+)
> >   create mode 100644 arch/powerpc/include/asm/set_memory.h
> >   create mode 100644 arch/powerpc/mm/pageattr.c
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 1ec34e16ed65..f0b9b47b5353 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -133,6 +133,7 @@ config PPC
> > select ARCH_HAS_PTE_SPECIAL
> > select ARCH_HAS_MEMBARRIER_CALLBACKS
> > select ARCH_HAS_SCALED_CPUTIME  if
> > VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64
> > +   select ARCH_HAS_SET_MEMORY
> > select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 ||
> > PPC32) && !RELOCATABLE && !HIBERNATION)
> > select ARCH_HAS_TICK_BROADCAST  if
> > GENERIC_CLOCKEVENTS_BROADCAST
> > select ARCH_HAS_UACCESS_FLUSHCACHE
> > diff --git a/arch/powerpc/include/asm/set_memory.h
> > b/arch/powerpc/include/asm/set_memory.h
> > new file mode 100644
> > index ..5230ddb2fefd
> > --- /dev/null
> > +++ b/arch/powerpc/include/asm/set_memory.h
> > @@ -0,0 +1,32 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_POWERPC_SET_MEMORY_H
> > +#define _ASM_POWERPC_SET_MEMORY_H
> > +
> > +#define SET_MEMORY_RO  1
> > +#define SET_MEMORY_RW  2
> > +#define SET_MEMORY_NX  3
> > +#define SET_MEMORY_X   4
> 
> Maybe going from 0 to 3 would be better than 1 to 4
> 
> > +
> > +int change_memory_attr(unsigned long addr, int numpages, int
> > action);
> 
> action could be unsigned.
> 
> > +
> > +static inline int set_memory_ro(unsigned long addr, int numpages)
> > +{
> > +   return change_memory_attr(addr, numpages, SET_MEMORY_RO);
> > +}
> > +
> > +static inline int set_memory_rw(unsigned long addr, int numpages)
> > +{
> > +   return change_memory_attr(addr, numpages, SET_MEMORY_RW);
> > +}
> > +
> > +static inline int set_memory_nx(unsigned long addr, int numpages)
> > +{
> > +   return change_memory_attr(addr, numpages, SET_MEMORY_NX);
> > +}
> > +
> > +static inline int set_memory_x(unsigned long addr, int numpages)
> > +{
> > +   return change_memory_attr(addr, numpages, SET_MEMORY_X);
> > +}
> > +
> > +#endif
> > diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> > index 5e147986400d..d0a0bcbc9289 100644
> > --- a/arch/powerpc/mm/Makefile
> > +++ b/arch/powerpc/mm/Makefile
> > @@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM) += highmem.o
> >   obj-$(CONFIG_PPC_COPRO_BASE)  += copro_fault.o
> >   obj-$(CONFIG_PPC_PTDUMP)  += ptdump/
> >   obj-$(CONFIG_KASAN)   += kasan/
> > +obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pageattr.o
> 
> CONFIG_ARCH_HAS_SET_MEMORY is set inconditionnally, I think you
> should 
> add pageattr.o to obj-y instead. CONFIG_ARCH_HAS_XXX are almost
> never 
> used in Makefiles

Fair enough, will keep that in mind

> 
> > diff --git a/arch/powerpc/mm/pageattr.c
> > b/arch/powerpc/mm/pageattr.c
> > new file mode 100644
> > index ..15d5fb04f531
> > --- /dev/null
> > +++ b/arch/powerpc/mm/pageattr.c
> > @@ -0,0 +1,83 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * MMU-generic set_memory implementation for powerpc
> > + *
> > + * Copyright 2019, IBM Corporation.
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +
> > +/*
> > + * Updates the attributes of a page in three steps:
> > + *
> > + * 1. invalidate the page table entry
> > + * 2. flush the TLB
> > + * 3. install the new entry with the updated attributes
> > + *
> > + * This is unsafe if the caller is attempting to change the
> > mapping of the
> > + * page it is executing from, or if another CPU is concurrently
> > using the
> > + * page being altered.
> > + *
> > + * TODO make the implementation resistant to this.
> > + */
> > +static int __c

Re: [PATCH v2 1/7] powerpc/mm: Implement set_memory() routines

2020-02-02 Thread kbuild test robot
Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20200131]
[cannot apply to v5.5]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-mm-Implement-set_memory-routines/20200203-060234
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ppc64_defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 7.5.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.5.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/mm/pageattr.c: In function 'change_page_attr':
>> arch/powerpc/mm/pageattr.c:32:15: error: cast from pointer to integer of 
>> different size [-Werror=pointer-to-int-cast]
 int action = (int)data;
  ^
   arch/powerpc/mm/pageattr.c: In function 'change_memory_attr':
>> arch/powerpc/mm/pageattr.c:73:68: error: cast to pointer from integer of 
>> different size [-Werror=int-to-pointer-cast]
 return apply_to_page_range(&init_mm, start, sz, change_page_attr, (void 
*)action);
   ^
   cc1: all warnings being treated as errors

vim +32 arch/powerpc/mm/pageattr.c

15  
16  
17  /*
18   * Updates the attributes of a page in three steps:
19   *
20   * 1. invalidate the page table entry
21   * 2. flush the TLB
22   * 3. install the new entry with the updated attributes
23   *
24   * This is unsafe if the caller is attempting to change the mapping of 
the
25   * page it is executing from, or if another CPU is concurrently using 
the
26   * page being altered.
27   *
28   * TODO make the implementation resistant to this.
29   */
30  static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
31  {
  > 32  int action = (int)data;
33  pte_t pte;
34  
35  spin_lock(&init_mm.page_table_lock);
36  
37  /* invalidate the PTE so it's safe to modify */
38  pte = ptep_get_and_clear(&init_mm, addr, ptep);
39  flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
40  
41  /* modify the PTE bits as desired, then apply */
42  switch (action) {
43  case SET_MEMORY_RO:
44  pte = pte_wrprotect(pte);
45  break;
46  case SET_MEMORY_RW:
47  pte = pte_mkwrite(pte);
48  break;
49  case SET_MEMORY_NX:
50  pte = pte_exprotect(pte);
51  break;
52  case SET_MEMORY_X:
53  pte = pte_mkexec(pte);
54  break;
55  default:
56  break;
57  }
58  
59  set_pte_at(&init_mm, addr, ptep, pte);
60  spin_unlock(&init_mm.page_table_lock);
61  
62  return 0;
63  }
64  
65  int change_memory_attr(unsigned long addr, int numpages, int action)
66  {
67  unsigned long start = ALIGN_DOWN(addr, PAGE_SIZE);
68  unsigned long sz = numpages * PAGE_SIZE;
69  
70  if (!numpages)
71  return 0;
72  
  > 73  return apply_to_page_range(&init_mm, start, sz, 
change_page_attr, (void *)action);

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2020-02-02 Thread Gautham R Shenoy
Hello Nathan,


On Wed, Dec 04, 2019 at 04:24:31PM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy"  writes:
> > @@ -1067,6 +1097,8 @@ static int __init topology_init(void)
> > register_cpu(c, cpu);
> >  
> > device_create_file(&c->dev, &dev_attr_physical_id);
> > +   if (firmware_has_feature(FW_FEATURE_SPLPAR))
> > +   create_idle_purr_spurr_sysfs_entry(&c->dev);
> 
> Architecturally speaking PURR/SPURR aren't strongly linked to the PAPR
> SPLPAR option, are they? I'm not sure it's right for these attributes to
> be absent if the platform does not support shared processor mode.

Doesn't FW_FEATURE_SPLPAR refer to all Pseries guests ? It is perhaps
incorrectly named, but from the other uses in the kernel, it seems to
indicate that we are running as a guest instead of on a bare-metal
system.

--
Thanks and Regards
gautham.


Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2020-02-02 Thread Gautham R Shenoy
Hi Naveen,

On Thu, Dec 05, 2019 at 10:23:58PM +0530, Naveen N. Rao wrote:
> >diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> >index 80a676d..42ade55 100644
> >--- a/arch/powerpc/kernel/sysfs.c
> >+++ b/arch/powerpc/kernel/sysfs.c
> >@@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> > }
> > static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
> >
> >+static ssize_t idle_purr_show(struct device *dev,
> >+  struct device_attribute *attr, char *buf)
> >+{
> >+struct cpu *cpu = container_of(dev, struct cpu, dev);
> >+unsigned int cpuid = cpu->dev.id;
> >+struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> >+u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> >+
> >+return sprintf(buf, "%llx\n", idle_purr_cycles);
> >+}
> >+static DEVICE_ATTR_RO(idle_purr);
> >+
> >+DECLARE_PER_CPU(u64, idle_spurr_cycles);
> >+static ssize_t idle_spurr_show(struct device *dev,
> >+   struct device_attribute *attr, char *buf)
> >+{
> >+struct cpu *cpu = container_of(dev, struct cpu, dev);
> >+unsigned int cpuid = cpu->dev.id;
> >+u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);
> 
> Is it possible for a user to read stale values if a particular cpu is in an
> extended cede? Is it possible to use smp_call_function_single() to force the
> cpu out of idle?

Yes, if the CPU whose idle_spurr cycle is being read is still in idle,
then we will miss reporting the delta spurr cycles for this last
idle-duration. Yes, we can use an smp_call_function_single(), though
that will introduce IPI noise. How often will idle_[s]purr be read ?

> 
> - Naveen
>

--
Thanks and Regards
gautham.


Re: [PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

2020-02-02 Thread Gautham R Shenoy
Hello Nathan,

On Wed, Dec 04, 2019 at 04:24:52PM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy"  writes:
> > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> > index a36fd05..708ec68 100644
> > --- a/arch/powerpc/kernel/idle.c
> > +++ b/arch/powerpc/kernel/idle.c
> > @@ -33,6 +33,8 @@
> >  unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
> >  EXPORT_SYMBOL(cpuidle_disable);
> >  
> > +DEFINE_PER_CPU(u64, idle_spurr_cycles);
> > +
> 
> Does idle_spurr_cycles need any special treatment for CPU
> online/offline?

If offline uses extended cede, then we need to take a snapshot of the
idle_spurr_cycles before going offline and add the delta once we are
back online. However, since the plan is to deprecate the use of
extended cede for CPU-Offline and use only rtas-stop-self, we don't
need any special handling there.


> 
> >  static int __init powersave_off(char *arg)
> >  {
> > ppc_md.power_save = NULL;
> > diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> > b/drivers/cpuidle/cpuidle-pseries.c
> > index 74c2479..45e2be4 100644
> > --- a/drivers/cpuidle/cpuidle-pseries.c
> > +++ b/drivers/cpuidle/cpuidle-pseries.c
> > @@ -30,11 +30,14 @@ struct cpuidle_driver pseries_idle_driver = {
> >  static struct cpuidle_state *cpuidle_state_table __read_mostly;
> >  static u64 snooze_timeout __read_mostly;
> >  static bool snooze_timeout_en __read_mostly;
> > +DECLARE_PER_CPU(u64, idle_spurr_cycles);
> 
> This belongs in a header...

Will move it to the header file.

> 
> 
> > -static inline void idle_loop_prolog(unsigned long *in_purr)
> > +static inline void idle_loop_prolog(unsigned long *in_purr,
> > +   unsigned long *in_spurr)
> >  {
> > ppc64_runlatch_off();
> > *in_purr = mfspr(SPRN_PURR);
> > +   *in_spurr = mfspr(SPRN_SPURR);
> > /*
> >  * Indicate to the HV that we are idle. Now would be
> >  * a good time to find other work to dispatch.
> > @@ -42,13 +45,16 @@ static inline void idle_loop_prolog(unsigned long 
> > *in_purr)
> > get_lppaca()->idle = 1;
> >  }
> >  
> > -static inline void idle_loop_epilog(unsigned long in_purr)
> > +static inline void idle_loop_epilog(unsigned long in_purr,
> > +   unsigned long in_spurr)
> >  {
> > u64 wait_cycles;
> > +   u64 *idle_spurr_cycles_ptr = this_cpu_ptr(&idle_spurr_cycles);
> >  
> > wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> > wait_cycles += mfspr(SPRN_PURR) - in_purr;
> > get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> > +   *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;
> 
> ... and the sampling and increment logic probably should be further
> encapsulated in accessor functions that can be used in both the cpuidle
> driver and the default/generic idle implementation. Or is there some
> reason this is specific to the pseries cpuidle driver?

I am not sure if we use SPURR and PURR for performing accounting on
Bare-Metal systems. IIUC, the patches proposed by Kamalesh is only to
use idle_[s]purr and [s]purr on POWERVM LPARs. This is why I coded the
sampling/increment logic in the pseries cpuidle driver. But you are
right, in the absence of cpuidle, when we use the default idle
implementation, we will still need to note the value of
idle_purr/spurr.

--
Thanks and Regards
gautham.




Re: [PATCH v2 2/7] powerpc/kprobes: Mark newly allocated probes as RO

2020-02-02 Thread Russell Currey
On Fri, 2020-01-31 at 13:34 +, Christophe Leroy wrote:
> With CONFIG_STRICT_KERNEL_RWX=y and CONFIG_KPROBES=y, there will be
> one
> W+X page at boot by default.  This can be tested with
> CONFIG_PPC_PTDUMP=y and CONFIG_PPC_DEBUG_WX=y set, and checking the
> kernel log during boot.
> 
> powerpc doesn't implement its own alloc() for kprobes like other
> architectures do, but we couldn't immediately mark RO anyway since we
> do
> a memcpy to the page we allocate later.  After that, nothing should
> be
> allowed to modify the page, and write permissions are removed well
> before the kprobe is armed.
> 
> The memcpy() would fail if >1 probes were allocated, so use
> patch_instruction() instead which is safe for RO.
> 
> Reviewed-by: Daniel Axtens 
> Signed-off-by: Russell Currey 
> Signed-off-by: Christophe Leroy 
> ---
> v2: removed the redundant flush
> ---
>  arch/powerpc/kernel/kprobes.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c
> b/arch/powerpc/kernel/kprobes.c
> index 2d27ec4feee4..d3e594e6094c 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
>  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
> @@ -124,13 +125,12 @@ int arch_prepare_kprobe(struct kprobe *p)
>   }
>  
>   if (!ret) {
> - memcpy(p->ainsn.insn, p->addr,
> - MAX_INSN_SIZE *
> sizeof(kprobe_opcode_t));
> + patch_instruction(p->ainsn.insn, *p->addr);
>   p->opcode = *p->addr;
> - flush_icache_range((unsigned long)p->ainsn.insn,
> - (unsigned long)p->ainsn.insn +
> sizeof(kprobe_opcode_t));
>   }
>  
> + set_memory_ro((unsigned long)p->ainsn.insn, 1);
> +


Since this can be called multiple times on the same page, can avoid by
implementing:

void *alloc_insn_page(void)
{
void *page;

page = vmalloc_exec(PAGE_SIZE);
if (page)
set_memory_ro((unsigned long)page, 1);

return page;
}

Which is pretty much the same as what's in arm64.  Works for me and
passes ftracetest, I was originally doing this but cut it because it
broke with the memcpy, but works with patch_instruction().

>   p->ainsn.boostable = 0;
>   return ret;
>  }



Re: [linuxppc-dev] Patch notification: 1 patch updated

2020-02-02 Thread Christophe Leroy




Le 03/02/2020 à 02:00, Patchwork a écrit :

Hello,

The following patch (submitted by you) has been updated in Patchwork:

  * linuxppc-dev: powerpc/nohash: Don't flush all TLBs when flushing one page
  - http://patchwork.ozlabs.org/patch/1231983/
  - for: Linux PPC development
 was: New
 now: Superseded


Superseded ? By what ?

I sent a v2 for book3s/32, but this one is for nohash.

Christophe


Re: [PATCH v2 2/7] powerpc/kprobes: Mark newly allocated probes as RO

2020-02-02 Thread Christophe Leroy




Le 03/02/2020 à 05:50, Russell Currey a écrit :

On Fri, 2020-01-31 at 13:34 +, Christophe Leroy wrote:

With CONFIG_STRICT_KERNEL_RWX=y and CONFIG_KPROBES=y, there will be
one
W+X page at boot by default.  This can be tested with
CONFIG_PPC_PTDUMP=y and CONFIG_PPC_DEBUG_WX=y set, and checking the
kernel log during boot.

powerpc doesn't implement its own alloc() for kprobes like other
architectures do, but we couldn't immediately mark RO anyway since we
do
a memcpy to the page we allocate later.  After that, nothing should
be
allowed to modify the page, and write permissions are removed well
before the kprobe is armed.

The memcpy() would fail if >1 probes were allocated, so use
patch_instruction() instead which is safe for RO.

Reviewed-by: Daniel Axtens 
Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
v2: removed the redundant flush
---
  arch/powerpc/kernel/kprobes.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c
b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..d3e594e6094c 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -24,6 +24,7 @@
  #include 
  #include 
  #include 
+#include 
  
  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;

  DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
@@ -124,13 +125,12 @@ int arch_prepare_kprobe(struct kprobe *p)
}
  
  	if (!ret) {

-   memcpy(p->ainsn.insn, p->addr,
-   MAX_INSN_SIZE *
sizeof(kprobe_opcode_t));
+   patch_instruction(p->ainsn.insn, *p->addr);
p->opcode = *p->addr;
-   flush_icache_range((unsigned long)p->ainsn.insn,
-   (unsigned long)p->ainsn.insn +
sizeof(kprobe_opcode_t));
}
  
+	set_memory_ro((unsigned long)p->ainsn.insn, 1);

+



Since this can be called multiple times on the same page, can avoid by
implementing:

void *alloc_insn_page(void)
{
void *page;

page = vmalloc_exec(PAGE_SIZE);
if (page)
set_memory_ro((unsigned long)page, 1);

return page;
}

Which is pretty much the same as what's in arm64.  Works for me and
passes ftracetest, I was originally doing this but cut it because it
broke with the memcpy, but works with patch_instruction().


p->ainsn.boostable = 0;
return ret;
  }


Ok. I'll send out v3 as patch 1 fails on PPC64, so I'll take that in.

Christophe


Re: [PATCH v6 1/5] powerpc/mm: Implement set_memory() routines

2020-02-02 Thread Christophe Leroy




Le 03/02/2020 à 01:46, Russell Currey a écrit :

On Wed, 2020-01-08 at 13:52 +0100, Christophe Leroy wrote:


Le 24/12/2019 à 06:55, Russell Currey a écrit :

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 5e147986400d..d0a0bcbc9289 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM) += highmem.o
   obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o
   obj-$(CONFIG_PPC_PTDUMP) += ptdump/
   obj-$(CONFIG_KASAN)  += kasan/
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pageattr.o


CONFIG_ARCH_HAS_SET_MEMORY is set inconditionnally, I think you
should
add pageattr.o to obj-y instead. CONFIG_ARCH_HAS_XXX are almost
never
used in Makefiles


Fair enough, will keep that in mind


I forgot I commented that. I'll do it in v3.


+   pte_t pte_val;
+
+   // invalidate the PTE so it's safe to modify
+   pte_val = ptep_get_and_clear(&init_mm, addr, ptep);
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);


Why flush a range for a single page ? On most targets this will do a
tlbia which is heavy, while a tlbie would suffice.

I think flush_tlb_kernel_range() should be replaced by something
flushing only a single page.


You might be able to help me out here, I wanted to do that but the only
functions I could find that flushed single pages needed a
vm_area_struct, which I can't get.


I sent out two patches for that, one for book3s/32 and one for nohash:
https://patchwork.ozlabs.org/patch/1231983/
https://patchwork.ozlabs.org/patch/1232223/

Maybe one for book3s/64 would be needed as well ? Can you do it if needed ?







+
+   // modify the PTE bits as desired, then apply
+   switch (action) {
+   case SET_MEMORY_RO:
+   pte_val = pte_wrprotect(pte_val);
+   break;
+   case SET_MEMORY_RW:
+   pte_val = pte_mkwrite(pte_val);
+   break;
+   case SET_MEMORY_NX:
+   pte_val = pte_exprotect(pte_val);
+   break;
+   case SET_MEMORY_X:
+   pte_val = pte_mkexec(pte_val);
+   break;
+   default:
+   WARN_ON(true);
+   return -EINVAL;


Is it worth checking that the action is valid for each page ? I
think
validity of action should be checked in change_memory_attr(). All
other
functions are static so you know they won't be called from outside.

Once done, you can squash __change_page_attr() into
change_page_attr(),
remove the ret var and return 0 all the time.


Makes sense to fold things into a single function, but in terms of
performance it shouldn't make a difference, right?  I still have to
check the action to determine what to change (unless I replace passing
SET_MEMORY_RO into apply_to_page_range() with a function pointer to
pte_wrprotect() for example).


pte_wrprotect() is a static inline.






+   }
+
+   set_pte_at(&init_mm, addr, ptep, pte_val);
+
+   return 0;
+}
+
+static int change_page_attr(pte_t *ptep, unsigned long addr, void
*data)
+{
+   int ret;
+
+   spin_lock(&init_mm.page_table_lock);
+   ret = __change_page_attr(ptep, addr, data);
+   spin_unlock(&init_mm.page_table_lock);
+
+   return ret;
+}
+
+int change_memory_attr(unsigned long addr, int numpages, int
action)
+{
+   unsigned long start = ALIGN_DOWN(addr, PAGE_SIZE);
+   unsigned long size = numpages * PAGE_SIZE;
+
+   if (!numpages)
+   return 0;
+
+   return apply_to_page_range(&init_mm, start, size,
change_page_attr, &action);


Use (void*)action instead of &action (see upper comment)


To get this to work I had to use (void *)(size_t)action to stop the
compiler from complaining about casting an int to a void*, is there a
better way to go about it?  Works fine, just looks gross.


Yes, use long instead (see my v3)

Christophe


[PATCH v3 1/7] powerpc/mm: Implement set_memory() routines

2020-02-02 Thread Christophe Leroy
The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX,
and are generally useful primitives to have.  This implementation is
designed to be completely generic across powerpc's many MMUs.

It's possible that this could be optimised to be faster for specific
MMUs, but the focus is on having a generic and safe implementation for
now.

This implementation does not handle cases where the caller is attempting
to change the mapping of the page it is executing from, or if another
CPU is concurrently using the page being altered.  These cases likely
shouldn't happen, but a more complex implementation with MMU-specific code
could safely handle them, so that is left as a TODO for now.

Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
v3:
- Changes 'action' from int to long to avoid build failure on PPC64 when 
casting to/from void*
- Move pageattr.o into obj-y in Makefile

v2:
- use integers instead of pointers for action
- drop action check, nobody should call change_memory_attr() directly.
Should it happen, the function will just do nothing.
- Renamed confusing 'pte_val' var to 'pte' as pte_val() is already a function.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/set_memory.h | 32 
 arch/powerpc/mm/Makefile  |  2 +-
 arch/powerpc/mm/pageattr.c| 74 +++
 4 files changed, 108 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/set_memory.h
 create mode 100644 arch/powerpc/mm/pageattr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 730c06668f22..d0c6e7b7a62d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,6 +129,7 @@ config PPC
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
+   select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
diff --git a/arch/powerpc/include/asm/set_memory.h 
b/arch/powerpc/include/asm/set_memory.h
new file mode 100644
index ..64011ea444b4
--- /dev/null
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_SET_MEMORY_H
+#define _ASM_POWERPC_SET_MEMORY_H
+
+#define SET_MEMORY_RO  0
+#define SET_MEMORY_RW  1
+#define SET_MEMORY_NX  2
+#define SET_MEMORY_X   3
+
+int change_memory_attr(unsigned long addr, int numpages, long action);
+
+static inline int set_memory_ro(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_RO);
+}
+
+static inline int set_memory_rw(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_RW);
+}
+
+static inline int set_memory_nx(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_NX);
+}
+
+static inline int set_memory_x(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_X);
+}
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 5e147986400d..a998fdac52f9 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -5,7 +5,7 @@
 
 ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
-obj-y  := fault.o mem.o pgtable.o mmap.o \
+obj-y  := fault.o mem.o pgtable.o mmap.o pageattr.o \
   init_$(BITS).o pgtable_$(BITS).o \
   pgtable-frag.o ioremap.o ioremap_$(BITS).o \
   init-common.o mmu_context.o drmem.o
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
new file mode 100644
index ..2b573768a7f7
--- /dev/null
+++ b/arch/powerpc/mm/pageattr.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * MMU-generic set_memory implementation for powerpc
+ *
+ * Copyright 2019, IBM Corporation.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+
+/*
+ * Updates the attributes of a page in three steps:
+ *
+ * 1. invalidate the page table entry
+ * 2. flush the TLB
+ * 3. install the new entry with the updated attributes
+ *
+ * This is unsafe if the caller is attempting to change the mapping of the
+ * page it is executing from, or if another CPU is concurrently using the
+ * page being altered.
+ *
+ * TODO make the implementation resistant to this.
+ */
+static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
+{
+   long action = (long)data;
+   pte_t pte;
+
+   spin_lock(&init_mm.page_table_lock);
+
+   /* invalidate the PTE so it's safe to modify */
+   pte = ptep_get_and_clear(&init_mm, addr, ptep);
+   f

[PATCH v3 2/7] powerpc/kprobes: Mark newly allocated probes as RO

2020-02-02 Thread Christophe Leroy
With CONFIG_STRICT_KERNEL_RWX=y and CONFIG_KPROBES=y, there will be one
W+X page at boot by default.  This can be tested with
CONFIG_PPC_PTDUMP=y and CONFIG_PPC_DEBUG_WX=y set, and checking the
kernel log during boot.

powerpc doesn't implement its own alloc() for kprobes like other
architectures do, but we couldn't immediately mark RO anyway since we do
a memcpy to the page we allocate later.  After that, nothing should be
allowed to modify the page, and write permissions are removed well
before the kprobe is armed.

The memcpy() would fail if >1 probes were allocated, so use
patch_instruction() instead which is safe for RO.

Reviewed-by: Daniel Axtens 
Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
v3: copied alloc_insn_page() from arm64, set_memory_ro() is now called there.
v2: removed the redundant flush
---
 arch/powerpc/kernel/kprobes.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..bfab91ded234 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
@@ -102,6 +104,16 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
+void *alloc_insn_page(void)
+{
+   void *page = vmalloc_exec(PAGE_SIZE);
+
+   if (page)
+   set_memory_ro((unsigned long)page, 1);
+
+   return page;
+}
+
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
@@ -124,11 +136,8 @@ int arch_prepare_kprobe(struct kprobe *p)
}
 
if (!ret) {
-   memcpy(p->ainsn.insn, p->addr,
-   MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   patch_instruction(p->ainsn.insn, *p->addr);
p->opcode = *p->addr;
-   flush_icache_range((unsigned long)p->ainsn.insn,
-   (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
}
 
p->ainsn.boostable = 0;
-- 
2.25.0



[PATCH v3 3/7] powerpc/mm/ptdump: debugfs handler for W+X checks at runtime

2020-02-02 Thread Christophe Leroy
From: Russell Currey 

Very rudimentary, just

echo 1 > [debugfs]/check_wx_pages

and check the kernel log.  Useful for testing strict module RWX.

Updated the Kconfig entry to reflect this.

Also fixed a typo.

Signed-off-by: Russell Currey 
---
v3: no change
v2: no change
---
 arch/powerpc/Kconfig.debug  |  6 --
 arch/powerpc/mm/ptdump/ptdump.c | 21 -
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 0b063830eea8..e37960ef68c6 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -370,7 +370,7 @@ config PPC_PTDUMP
  If you are unsure, say N.
 
 config PPC_DEBUG_WX
-   bool "Warn on W+X mappings at boot"
+   bool "Warn on W+X mappings at boot & enable manual checks at runtime"
depends on PPC_PTDUMP && STRICT_KERNEL_RWX
help
  Generate a warning if any W+X mappings are found at boot.
@@ -384,7 +384,9 @@ config PPC_DEBUG_WX
  of other unfixed kernel bugs easier.
 
  There is no runtime or memory usage effect of this option
- once the kernel has booted up - it's a one time check.
+ once the kernel has booted up, it only automatically checks once.
+
+ Enables the "check_wx_pages" debugfs entry for checking at runtime.
 
  If in doubt, say "Y".
 
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 206156255247..a15e19a3b14e 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -4,7 +4,7 @@
  *
  * This traverses the kernel pagetables and dumps the
  * information about the used sections of memory to
- * /sys/kernel/debug/kernel_pagetables.
+ * /sys/kernel/debug/kernel_page_tables.
  *
  * Derived from the arm64 implementation:
  * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
@@ -413,6 +413,25 @@ void ptdump_check_wx(void)
else
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
 }
+
+static int check_wx_debugfs_set(void *data, u64 val)
+{
+   if (val != 1ULL)
+   return -EINVAL;
+
+   ptdump_check_wx();
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(check_wx_fops, NULL, check_wx_debugfs_set, "%llu\n");
+
+static int ptdump_check_wx_init(void)
+{
+   return debugfs_create_file("check_wx_pages", 0200, NULL,
+  NULL, &check_wx_fops) ? 0 : -ENOMEM;
+}
+device_initcall(ptdump_check_wx_init);
 #endif
 
 static int ptdump_init(void)
-- 
2.25.0



[PATCH v3 4/7] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX

2020-02-02 Thread Christophe Leroy
From: Russell Currey 

To enable strict module RWX on powerpc, set:

CONFIG_STRICT_MODULE_RWX=y

You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real
security benefit.

ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX.
This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that
makes STRICT_MODULE_RWX *on by default* in configurations where
STRICT_KERNEL_RWX is *unavailable*.

Since this doesn't make much sense, and module RWX without kernel RWX
doesn't make much sense, having the same dependencies as kernel RWX
works around this problem.

Signed-off-by: Russell Currey 
---
v3: no change
v2: no change
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d0c6e7b7a62d..1a26b8526be0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,6 +131,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
+   select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UACCESS_MCSAFE  if PPC64
-- 
2.25.0



[PATCH v3 5/7] powerpc/configs: Enable STRICT_MODULE_RWX in skiroot_defconfig

2020-02-02 Thread Christophe Leroy
From: Russell Currey 

skiroot_defconfig is the only powerpc defconfig with STRICT_KERNEL_RWX
enabled, and if you want memory protection for kernel text you'd want it
for modules too, so enable STRICT_MODULE_RWX there.

Acked-by: Joel Stanley 
Signed-off-by: Russell Currey 
---
v3: no change
v2: no change
---
 arch/powerpc/configs/skiroot_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/skiroot_defconfig 
b/arch/powerpc/configs/skiroot_defconfig
index 1b6bdad36b13..66d20dbe67b7 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -51,6 +51,7 @@ CONFIG_CMDLINE="console=tty0 console=hvc0 ipr.fast_reboot=1 
quiet"
 # CONFIG_PPC_MEM_KEYS is not set
 CONFIG_JUMP_LABEL=y
 CONFIG_STRICT_KERNEL_RWX=y
+CONFIG_STRICT_MODULE_RWX=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 CONFIG_MODULE_SIG_FORCE=y
-- 
2.25.0



[PATCH v3 7/7] powerpc/32: use set_memory_attr()

2020-02-02 Thread Christophe Leroy
Use set_memory_attr() instead of the PPC32 specific change_page_attr()

change_page_attr() was checking that the address was not mapped by
blocks and was handling highmem, but that's unneeded because the
affected pages can't be in highmem and block mapping verification
is already done by the callers.

Signed-off-by: Christophe Leroy 
---
v3: no change
v2: new
---
 arch/powerpc/mm/pgtable_32.c | 95 
 1 file changed, 10 insertions(+), 85 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5fb90edd865e..3d92eaf3ee2f 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -121,99 +122,20 @@ void __init mapin_ram(void)
}
 }
 
-/* Scan the real Linux page tables and return a PTE pointer for
- * a virtual address in a context.
- * Returns true (1) if PTE was found, zero otherwise.  The pointer to
- * the PTE pointer is unmodified if PTE is not found.
- */
-static int
-get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t 
**pmdp)
-{
-pgd_t  *pgd;
-   pud_t   *pud;
-pmd_t  *pmd;
-pte_t  *pte;
-int retval = 0;
-
-pgd = pgd_offset(mm, addr & PAGE_MASK);
-if (pgd) {
-   pud = pud_offset(pgd, addr & PAGE_MASK);
-   if (pud && pud_present(*pud)) {
-   pmd = pmd_offset(pud, addr & PAGE_MASK);
-   if (pmd_present(*pmd)) {
-   pte = pte_offset_map(pmd, addr & PAGE_MASK);
-   if (pte) {
-   retval = 1;
-   *ptep = pte;
-   if (pmdp)
-   *pmdp = pmd;
-   /* XXX caller needs to do pte_unmap, 
yuck */
-   }
-   }
-   }
-}
-return(retval);
-}
-
-static int __change_page_attr_noflush(struct page *page, pgprot_t prot)
-{
-   pte_t *kpte;
-   pmd_t *kpmd;
-   unsigned long address;
-
-   BUG_ON(PageHighMem(page));
-   address = (unsigned long)page_address(page);
-
-   if (v_block_mapped(address))
-   return 0;
-   if (!get_pteptr(&init_mm, address, &kpte, &kpmd))
-   return -EINVAL;
-   __set_pte_at(&init_mm, address, kpte, mk_pte(page, prot), 0);
-   pte_unmap(kpte);
-
-   return 0;
-}
-
-/*
- * Change the page attributes of an page in the linear mapping.
- *
- * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
- */
-static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
-{
-   int i, err = 0;
-   unsigned long flags;
-   struct page *start = page;
-
-   local_irq_save(flags);
-   for (i = 0; i < numpages; i++, page++) {
-   err = __change_page_attr_noflush(page, prot);
-   if (err)
-   break;
-   }
-   wmb();
-   local_irq_restore(flags);
-   flush_tlb_kernel_range((unsigned long)page_address(start),
-  (unsigned long)page_address(page));
-   return err;
-}
-
 void mark_initmem_nx(void)
 {
-   struct page *page = virt_to_page(_sinittext);
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
 PFN_DOWN((unsigned long)_sinittext);
 
if (v_block_mapped((unsigned long)_stext + 1))
mmu_mark_initmem_nx();
else
-   change_page_attr(page, numpages, PAGE_KERNEL);
+   set_memory_attr((unsigned long)_sinittext, numpages, 
PAGE_KERNEL);
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
 void mark_rodata_ro(void)
 {
-   struct page *page;
unsigned long numpages;
 
if (v_block_mapped((unsigned long)_sinittext)) {
@@ -222,20 +144,18 @@ void mark_rodata_ro(void)
return;
}
 
-   page = virt_to_page(_stext);
numpages = PFN_UP((unsigned long)_etext) -
   PFN_DOWN((unsigned long)_stext);
 
-   change_page_attr(page, numpages, PAGE_KERNEL_ROX);
+   set_memory_attr((unsigned long)_stext, numpages, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
 */
-   page = virt_to_page(__start_rodata);
numpages = PFN_UP((unsigned long)__init_begin) -
   PFN_DOWN((unsigned long)__start_rodata);
 
-   change_page_attr(page, numpages, PAGE_KERNEL_RO);
+   set_memory_attr((unsigned long)__start_rodata, numpages, 
PAGE_KERNEL_RO);
 
// mark_initmem_nx() should have already run by now
ptdump_check_wx();
@@ -245,9 +165,14 @@ void mark_rodata_ro(void)
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_

[PATCH v3 6/7] powerpc/mm: implement set_memory_attr()

2020-02-02 Thread Christophe Leroy
In addition to the set_memory_xx() functions which allows to change
the memory attributes of not (yet) used memory regions, implement a
set_memory_attr() function to:
- set the final memory protection after init on currently used
kernel regions.
- enable/disable kernel memory regions in the scope of DEBUG_PAGEALLOC.

Unlike the set_memory_xx() which can act in three step as the regions
are unused, this function must modify 'on the fly' as the kernel is
executing from them. At the moment only PPC32 will use it and changing
page attributes on the fly is not an issue.

Signed-off-by: Christophe Leroy 
---
v3: no change
v2: new
---
 arch/powerpc/include/asm/set_memory.h |  2 ++
 arch/powerpc/mm/pageattr.c| 33 +++
 2 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/set_memory.h 
b/arch/powerpc/include/asm/set_memory.h
index 64011ea444b4..b040094f7920 100644
--- a/arch/powerpc/include/asm/set_memory.h
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -29,4 +29,6 @@ static inline int set_memory_x(unsigned long addr, int 
numpages)
return change_memory_attr(addr, numpages, SET_MEMORY_X);
 }
 
+int set_memory_attr(unsigned long addr, int numpages, pgprot_t prot);
+
 #endif
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 2b573768a7f7..b78a2a656dea 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -72,3 +72,36 @@ int change_memory_attr(unsigned long addr, int numpages, 
long action)
 
return apply_to_page_range(&init_mm, start, sz, change_page_attr, (void 
*)action);
 }
+
+/*
+ * Set the attributes of a page:
+ *
+ * This function is used by PPC32 at the end of init to set final kernel memory
+ * protection. It includes changing the maping of the page it is executing from
+ * and data pages it is using.
+ */
+static int set_page_attr(pte_t *ptep, unsigned long addr, void *data)
+{
+   pgprot_t prot = __pgprot((int)data);
+
+   spin_lock(&init_mm.page_table_lock);
+
+   set_pte_at(&init_mm, addr, ptep, pte_modify(*ptep, prot));
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+
+   spin_unlock(&init_mm.page_table_lock);
+
+   return 0;
+}
+
+int set_memory_attr(unsigned long addr, int numpages, pgprot_t prot)
+{
+   unsigned long start = ALIGN_DOWN(addr, PAGE_SIZE);
+   unsigned long sz = numpages * PAGE_SIZE;
+
+   if (!numpages)
+   return 0;
+
+   return apply_to_page_range(&init_mm, start, sz, set_page_attr,
+  (void *)pgprot_val(prot));
+}
-- 
2.25.0