[PATCH] powerpc/mm: Fix build error with FLATMEM book3s64 config

2019-04-02 Thread Aneesh Kumar K.V
The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs.
We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32
bit configs never expected a value to be set for MAX_PHYSMEM_BITS.

Dependent code such as zsmalloc derived the right values based on other
fields. Instead of finding a value that works with different configs,
use new values only for book3s_64. For 64 bit booke, use the definition
of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase 
sparsemem defaults")
That change was done in 2005 and hopefully will work with book3e 64.

Fixes: 8bc086899816 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM 
configurations")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 15 +++
 arch/powerpc/include/asm/mmu.h   | 15 ---
 arch/powerpc/include/asm/nohash/64/mmu.h |  2 ++
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 1ceee000c18d..a809bdd77322 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -35,6 +35,21 @@ typedef pte_t *pgtable_t;
 
 #endif /* __ASSEMBLY__ */
 
+/*
+ * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
+ * page_to_nid does a page->section->node lookup
+ * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME 
reduce
+ * memory requirements with large number of sections.
+ * 51 bits is the max physical real address on POWER9
+ */
+#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) &&  
\
+   defined(CONFIG_PPC_64K_PAGES)
+#define MAX_PHYSMEM_BITS 51
+#else
+#define MAX_PHYSMEM_BITS 46
+#endif
+
 /* 64-bit classic hash table MMU */
 #include 
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 598cdcdd1355..78d53c4396ac 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -341,21 +341,6 @@ static inline bool strict_kernel_rwx_enabled(void)
  */
 #define MMU_PAGE_COUNT 16
 
-/*
- * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
- * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
- * page_to_nid does a page->section->node lookup
- * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME 
reduce
- * memory requirements with large number of sections.
- * 51 bits is the max physical real address on POWER9
- */
-#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) &&  
\
-   defined (CONFIG_PPC_64K_PAGES)
-#define MAX_PHYSMEM_BITS51
-#elif defined(CONFIG_SPARSEMEM)
-#define MAX_PHYSMEM_BITS46
-#endif
-
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
 #else /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index e6585480dfc4..81cf30c370e5 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
 #define _ASM_POWERPC_NOHASH_64_MMU_H_
 
+#define MAX_PHYSMEM_BITS44
+
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
-- 
2.20.1



Re: VLC doesn't play videos anymore since the PowerPC fixes 5.1-3

2019-04-02 Thread Christophe Leroy

Le 03/04/2019 à 05:52, Christian Zigotzky a écrit :

Please test VLC with the RC3 of kernel 5.1.

The removing of the PowerPC fixes 5.1-3 has solved the VLC issue. Another user 
has already confirmed that [1]. This isn’t an April Fool‘s. ;-)


Could you bisect to identify the guilty commit ?

Thanks
Christophe



Thanks

[1] 
http://forum.hyperion-entertainment.com/viewtopic.php?f=58&t=4256&start=20#p47561



Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 22:41, Daniel Jordan a écrit :

With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.


Could you please detail the change ? It looks like this is not the only 
change. I'm wondering what the consequences are.


Before we did:
- lock
- calculate future value
- check the future value is acceptable
- update value if future value acceptable
- return error if future value non acceptable
- unlock

Now we do:
- atomic update with future (possibly too high) value
- check the new value is acceptable
- atomic update back with older value if new value not acceptable and 
return error


So if a concurrent action wants to increase locked_vm with an acceptable 
step while another one has temporarily set it too high, it will now fail.


I think we should keep the previous approach and do a cmpxchg after 
validating the new value.


Christophe



Signed-off-by: Daniel Jordan 
Cc: Alexey Kardashevskiy 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Davidlohr Bueso 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: 
Cc: 
Cc: 
---
  arch/powerpc/mm/mmu_context_iommu.c | 27 +++
  1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index 8038ac24a312..a4ef22b67c07 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -54,34 +54,29 @@ struct mm_iommu_table_group_mem_t {
  static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
unsigned long npages, bool incr)
  {
-   long ret = 0, locked, lock_limit;
+   long ret = 0;
+   unsigned long lock_limit;
s64 locked_vm;
  
  	if (!npages)

return 0;
  
-	down_write(&mm->mmap_sem);

-   locked_vm = atomic64_read(&mm->locked_vm);
if (incr) {
-   locked = locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+   locked_vm = atomic64_add_return(npages, &mm->locked_vm);
+   if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) {
ret = -ENOMEM;
-   else
-   atomic64_add(npages, &mm->locked_vm);
+   atomic64_sub(npages, &mm->locked_vm);
+   }
} else {
-   if (WARN_ON_ONCE(npages > locked_vm))
-   npages = locked_vm;
-   atomic64_sub(npages, &mm->locked_vm);
+   locked_vm = atomic64_sub_return(npages, &mm->locked_vm);
+   WARN_ON_ONCE(locked_vm < 0);
}
  
-	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",

-   current ? current->pid : 0,
-   incr ? '+' : '-',
-   npages << PAGE_SHIFT,
-   atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
+   pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%lu %lld/%lu\n",
+   current ? current->pid : 0, incr ? '+' : '-',
+   npages << PAGE_SHIFT, locked_vm << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK));
-   up_write(&mm->mmap_sem);
  
  	return ret;

  }



Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 22:41, Daniel Jordan a écrit :

Taking and dropping mmap_sem to modify a single counter, locked_vm, is
overkill when the counter could be synchronized separately.

Make mmap_sem a little less coarse by changing locked_vm to an atomic,
the 64-bit variety to avoid issues with overflow on 32-bit systems.


Can you elaborate on the above ? Previously it was 'unsigned long', what 
were the issues ? If there was such issues, shouldn't there be a first 
patch moving it from unsigned long to u64 before this atomic64_t change 
? Or at least it should be clearly explain here what the issues are and 
how switching to a 64 bit counter fixes them.


Christophe



Signed-off-by: Daniel Jordan 
Cc: Alan Tull 
Cc: Alexey Kardashevskiy 
Cc: Alex Williamson 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Davidlohr Bueso 
Cc: Michael Ellerman 
Cc: Moritz Fischer 
Cc: Paul Mackerras 
Cc: Wu Hao 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
---
  arch/powerpc/kvm/book3s_64_vio.c| 14 --
  arch/powerpc/mm/mmu_context_iommu.c | 15 ---
  drivers/fpga/dfl-afu-dma-region.c   | 18 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 17 +
  drivers/vfio/vfio_iommu_type1.c | 10 ++
  fs/proc/task_mmu.c  |  2 +-
  include/linux/mm_types.h|  2 +-
  kernel/fork.c   |  2 +-
  mm/debug.c  |  5 +++--
  mm/mlock.c  |  4 ++--
  mm/mmap.c   | 18 +-
  mm/mremap.c |  6 +++---
  12 files changed, 61 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index f02b04973710..e7fdb6d10eeb 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long 
tce_pages)
  static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
  {
long ret = 0;
+   s64 locked_vm;
  
  	if (!current || !current->mm)

return ret; /* process exited */
  
  	down_write(¤t->mm->mmap_sem);
  
+	locked_vm = atomic64_read(¤t->mm->locked_vm);

if (inc) {
unsigned long locked, lock_limit;
  
-		locked = current->mm->locked_vm + stt_pages;

+   locked = locked_vm + stt_pages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK))
ret = -ENOMEM;
else
-   current->mm->locked_vm += stt_pages;
+   atomic64_add(stt_pages, ¤t->mm->locked_vm);
} else {
-   if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
-   stt_pages = current->mm->locked_vm;
+   if (WARN_ON_ONCE(stt_pages > locked_vm))
+   stt_pages = locked_vm;
  
-		current->mm->locked_vm -= stt_pages;

+   atomic64_sub(stt_pages, ¤t->mm->locked_vm);
}
  
  	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,

inc ? '+' : '-',
stt_pages << PAGE_SHIFT,
-   current->mm->locked_vm << PAGE_SHIFT,
+   atomic64_read(¤t->mm->locked_vm) << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK),
ret ? " - exceeded" : "");
  
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c

index e7a9c4f6bfca..8038ac24a312 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
unsigned long npages, bool incr)
  {
long ret = 0, locked, lock_limit;
+   s64 locked_vm;
  
  	if (!npages)

return 0;
  
  	down_write(&mm->mmap_sem);

-
+   locked_vm = atomic64_read(&mm->locked_vm);
if (incr) {
-   locked = mm->locked_vm + npages;
+   locked = locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK))
ret = -ENOMEM;
else
-   mm->locked_vm += npages;
+   atomic64_add(npages, &mm->locked_vm);
} else {
-   if (WARN_ON_ONCE(npages > mm->locked_vm))
-   npages = mm->locked_vm;
-   mm->locked_vm -= npages;
+   if (WARN_ON_ONCE(npages > locked_vm))
+   npages = locked_vm;
+   atomic64_sub(npages, &mm->locked_vm);
}
  
  	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",

current ? current->pid : 0,
incr ? '+' : '-',
npages << PAGE_SHIFT,
- 

Re: [PATCH stable v4.14 13/32] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E

2019-04-02 Thread Joakim Tjernlund
On Wed, 2019-04-03 at 11:53 +1100, Michael Ellerman wrote:
> 
> Joakim Tjernlund  writes:
> > On Tue, 2019-04-02 at 17:19 +1100, Michael Ellerman wrote:
> > > Joakim Tjernlund  writes:
> ...
> > > > Can I compile it away?
> > > 
> > > You can't actually, but you can disable it at runtime with
> > > "nospectre_v1" on the kernel command line.
> > > 
> > > We could make it a user selectable compile time option if you really
> > > want it to be.
> > 
> > I think yes. Considering that these patches are fairly untested and the 
> > impact
> > in the wild unknown. Requiring systems to change their boot config over 
> > night is
> > too fast.
> 
> OK. Just to be clear, you're actually using 4.14 on an NXP board and
> would actually use this option? I don't want to add another option just
> for a theoretical use case.

Correct, we use 4.14 on several custom boards using NXP CPUs and would 
appreciate
if I could control spectre with a build switch.

Thanks a lot!
   Jocke


Re: [PATCH] powerpc/xmon: add read-only mode

2019-04-02 Thread Christophe Leroy




Le 03/04/2019 à 05:38, Christopher M Riedl a écrit :

On March 29, 2019 at 3:41 AM Christophe Leroy  wrote:




Le 29/03/2019 à 05:21, cmr a écrit :

Operations which write to memory should be restricted on secure systems
and optionally to avoid self-destructive behaviors.

Add a config option, XMON_RO, to control default xmon behavior along
with kernel cmdline options xmon=ro and xmon=rw for explicit control.
The default is to enable read-only mode.

The following xmon operations are affected:
memops:
disable memmove
disable memset
memex:
no-op'd mwrite
super_regs:
no-op'd write_spr
bpt_cmds:
disable
proc_call:
disable

Signed-off-by: cmr 


A Fully qualified name should be used.


What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY?


I mean it should be

Signed-off-by: Christopher M Riedl 

instead of

Signed-off-by: cmr 






---
   arch/powerpc/Kconfig.debug |  7 +++
   arch/powerpc/xmon/xmon.c   | 24 
   2 files changed, 31 insertions(+)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 4e00cb0a5464..33cc01adf4cb 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -117,6 +117,13 @@ config XMON_DISASSEMBLY
  to say Y here, unless you're building for a memory-constrained
  system.
   
+config XMON_RO

+   bool "Set xmon read-only mode"
+   depends on XMON
+   default y


Should it really be always default y ?
I would set default 'y' only when some security options are also set.



This is a good point, I based this on an internal Slack suggestion but giving 
this more thought, disabling read-only mode by default makes more sense. I'm 
not sure what security options could be set though?



Maybe starting with CONFIG_STRICT_KERNEL_RWX

Another point that may also be addressed by your patch is the definition 
of PAGE_KERNEL_TEXT:


#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || 
defined(CONFIG_BDI_SWITCH) ||\

defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
#define PAGE_KERNEL_TEXTPAGE_KERNEL_X
#else
#define PAGE_KERNEL_TEXTPAGE_KERNEL_ROX
#endif

The above let me think that it would be better if you add a config 
XMON_RW instead of XMON_RO, with default !STRICT_KERNEL_RWX


Christophe


[PATCH kernel v2 1/2] powerpc/mm_iommu: Fix potential deadlock

2019-04-02 Thread Alexey Kardashevskiy
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
this locks &mem_list_mutex and then locks mm::mmap_sem
several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
this is called with mm::mmap_sem held already and it locks
&mem_list_mutex.

So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).

We did not hit this yet because QEMU constructs the machine in a single
thread.

This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.

This moves locked_vm adjustment from under &mem_list_mutex.

This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.

This is one of the lockdep warnings:
==
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
--
qemu-system-ppc/8038 is trying to acquire lock:
2ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490

but task is already holding lock:
fd7da97f (&mm->mmap_sem){}, at: vm_mmap_pgoff+0xf0/0x160

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){}:
   lock_acquire+0xf8/0x260
   down_write+0x44/0xa0
   mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
   mm_iommu_do_alloc+0x310/0x490
   tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
   vfio_fops_unl_ioctl+0x94/0x430 [vfio]
   do_vfs_ioctl+0xe4/0x930
   ksys_ioctl+0xc4/0x110
   sys_ioctl+0x28/0x80
   system_call+0x5c/0x70

-> #0 (mem_list_mutex){+.+.}:
   __lock_acquire+0x1484/0x1900
   lock_acquire+0xf8/0x260
   __mutex_lock+0x88/0xa70
   mm_iommu_do_alloc+0x70/0x490
   vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
   vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
   vfio_device_fops_mmap+0x44/0x70 [vfio]
   mmap_region+0x5d4/0x770
   do_mmap+0x42c/0x650
   vm_mmap_pgoff+0x124/0x160
   ksys_mmap_pgoff+0xdc/0x2f0
   sys_mmap+0x40/0x80
   system_call+0x5c/0x70

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(&mm->mmap_sem);
   lock(mem_list_mutex);
   lock(&mm->mmap_sem);
  lock(mem_list_mutex);

 *** DEADLOCK ***

1 lock held by qemu-system-ppc/8038:
 #0: fd7da97f (&mm->mmap_sem){}, at: vm_mmap_pgoff+0xf0/0x160

Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 
2018-12-19)
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/mm/mmu_context_iommu.c | 75 +++--
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index e7a9c4f6bfca..9d9be850f8c2 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -95,28 +95,14 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
  unsigned long entries, unsigned long dev_hpa,
  struct mm_iommu_table_group_mem_t **pmem)
 {
-   struct mm_iommu_table_group_mem_t *mem;
-   long i, ret, locked_entries = 0;
+   struct mm_iommu_table_group_mem_t *mem, *mem2;
+   long i, ret, locked_entries = 0, pinned = 0;
unsigned int pageshift;
 
-   mutex_lock(&mem_list_mutex);
-
-   list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list,
-   next) {
-   /* Overlap? */
-   if ((mem->ua < (ua + (entries << PAGE_SHIFT))) &&
-   (ua < (mem->ua +
-  (mem->entries << PAGE_SHIFT {
-   ret = -EINVAL;
-   goto unlock_exit;
-   }
-
-   }
-
if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
ret = mm_iommu_adjust_locked_vm(mm, entries, true);
if (ret)
-   goto unlock_exit;
+   return ret;
 
locked_entries = entries;
}
@@ -150,15 +136,10 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
down_read(&mm->mmap_sem);
ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, 
NULL);
up_read(&mm->mmap_sem);
+   pinned = ret > 0 ? ret : 0;
if (ret != entries) {
-   /* free the reference taken */
-   for (i = 0; i < ret; i++)
-   put_page(mem->hpages[i]);
-
-   vfree(mem->hpas);
-   kfree(mem);
ret

[PATCH kernel v2 2/2] powerpc/mm_iommu: Allow pinning large regions

2019-04-02 Thread Alexey Kardashevskiy
When called with vmas_arg==NULL, get_user_pages_longterm() allocates
an array of nr_pages*8 which can easily get greater that the max order,
for example, registering memory for a 256GB guest does this and fails
in __alloc_pages_nodemask().

This adds a loop over chunks of entries to fit the max order limit.

Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages 
during mm_iommu_do_alloc", 2019-03-05)
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/mm/mmu_context_iommu.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index 9d9be850f8c2..8330f135294f 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -98,6 +98,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned 
long ua,
struct mm_iommu_table_group_mem_t *mem, *mem2;
long i, ret, locked_entries = 0, pinned = 0;
unsigned int pageshift;
+   unsigned long entry, chunk;
 
if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
ret = mm_iommu_adjust_locked_vm(mm, entries, true);
@@ -134,11 +135,26 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
}
 
down_read(&mm->mmap_sem);
-   ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, 
NULL);
+   chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
+   sizeof(struct vm_area_struct *);
+   chunk = min(chunk, entries);
+   for (entry = 0; entry < entries; entry += chunk) {
+   unsigned long n = min(entries - entry, chunk);
+
+   ret = get_user_pages_longterm(ua + (entry << PAGE_SHIFT), n,
+   FOLL_WRITE, mem->hpages + entry, NULL);
+   if (ret == n) {
+   pinned += n;
+   continue;
+   }
+   if (ret > 0)
+   pinned += ret;
+   break;
+   }
up_read(&mm->mmap_sem);
-   pinned = ret > 0 ? ret : 0;
-   if (ret != entries) {
-   ret = -EFAULT;
+   if (pinned != entries) {
+   if (!ret)
+   ret = -EFAULT;
goto free_exit;
}
 
-- 
2.17.1



[PATCH kernel v2 0/2] powerpc/mm_iommu: Fixes

2019-04-02 Thread Alexey Kardashevskiy
The patches do independent things but touch exact same code so
the order in which they should apply matters.

This supercedes:
[PATCH kernel] powerpc/mm_iommu: Allow pinning large regions
[PATCH kernel 1/2] powerpc/mm_iommu: Prepare for less locking
[PATCH kernel 2/2] powerpc/mm_iommu: Fix potential deadlock


This is based on sha1
5e7a8ca31926 Linus Torvalds "Merge branch 'work.aio' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs".

Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/mm_iommu: Fix potential deadlock
  powerpc/mm_iommu: Allow pinning large regions

 arch/powerpc/mm/mmu_context_iommu.c | 97 +
 1 file changed, 58 insertions(+), 39 deletions(-)

-- 
2.17.1



VLC doesn't play videos anymore since the PowerPC fixes 5.1-3

2019-04-02 Thread Christian Zigotzky
Please test VLC with the RC3 of kernel 5.1.

The removing of the PowerPC fixes 5.1-3 has solved the VLC issue. Another user 
has already confirmed that [1]. This isn’t an April Fool‘s. ;-)

Thanks

[1] 
http://forum.hyperion-entertainment.com/viewtopic.php?f=58&t=4256&start=20#p47561

Re: [PATCH] powerpc/xmon: add read-only mode

2019-04-02 Thread Christopher M Riedl
> On March 29, 2019 at 3:41 AM Christophe Leroy  wrote:
> 
> 
> 
> 
> Le 29/03/2019 à 05:21, cmr a écrit :
> > Operations which write to memory should be restricted on secure systems
> > and optionally to avoid self-destructive behaviors.
> > 
> > Add a config option, XMON_RO, to control default xmon behavior along
> > with kernel cmdline options xmon=ro and xmon=rw for explicit control.
> > The default is to enable read-only mode.
> > 
> > The following xmon operations are affected:
> > memops:
> > disable memmove
> > disable memset
> > memex:
> > no-op'd mwrite
> > super_regs:
> > no-op'd write_spr
> > bpt_cmds:
> > disable
> > proc_call:
> > disable
> > 
> > Signed-off-by: cmr 
> 
> A Fully qualified name should be used.

What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY?

> 
> > ---
> >   arch/powerpc/Kconfig.debug |  7 +++
> >   arch/powerpc/xmon/xmon.c   | 24 
> >   2 files changed, 31 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> > index 4e00cb0a5464..33cc01adf4cb 100644
> > --- a/arch/powerpc/Kconfig.debug
> > +++ b/arch/powerpc/Kconfig.debug
> > @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY
> >   to say Y here, unless you're building for a memory-constrained
> >   system.
> >   
> > +config XMON_RO
> > +   bool "Set xmon read-only mode"
> > +   depends on XMON
> > +   default y
> 
> Should it really be always default y ?
> I would set default 'y' only when some security options are also set.
> 

This is a good point, I based this on an internal Slack suggestion but giving 
this more thought, disabling read-only mode by default makes more sense. I'm 
not sure what security options could be set though?


Re: [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

2019-04-02 Thread Ravi Bangoria



On 4/2/19 4:55 PM, Nicholas Piggin wrote:
> Using a jiffies timer creates a dependency on the tick_do_timer_cpu
> incrementing jiffies. If that CPU has locked up and jiffies is not
> incrementing, the watchdog heartbeat timer for all CPUs stops and
> creates false positives and confusing warnings on local CPUs, and
> also causes the SMP detector to stop, so the root cause is never
> detected.
> 
> Fix this by using hrtimer based timers for the watchdog heartbeat,
> like the generic kernel hardlockup detector.
> 
> Reported-by: Ravikumar Bangoria 

Reported-by: Ravi Bangoria 

Thanks,
Ravi



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-02 Thread Michael Ellerman
Arnd Bergmann  writes:
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index b18abb0c3dae..00f5a63c8d9a 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -505,3 +505,7 @@
>  421  32  rt_sigtimedwait_time64  sys_rt_sigtimedwait 
> compat_sys_rt_sigtimedwait_time64
>  422  32  futex_time64sys_futex   
> sys_futex
>  423  32  sched_rr_get_interval_time64sys_sched_rr_get_interval   
> sys_sched_rr_get_interval
> +424  common  pidfd_send_signal   sys_pidfd_send_signal
> +425  common  io_uring_setup  sys_io_uring_setup
> +426  common  io_uring_enter  sys_io_uring_enter
> +427  common  io_uring_register   sys_io_uring_register

Acked-by: Michael Ellerman  (powerpc)

Lightly tested.

The pidfd_test selftest passes.

Ran the io_uring example from fio, which prints lots of:

IOPS=209952, IOS/call=32/32, inflight=117 (117), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=116 (116), Cachehit=0.00%
IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=112 (112), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=110 (110), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=104 (104), Cachehit=0.00%
IOPS=210080, IOS/call=32/32, inflight=102 (102), Cachehit=0.00%
IOPS=210112, IOS/call=32/32, inflight=100 (100), Cachehit=0.00%
IOPS=210080, IOS/call=32/32, inflight=97 (97), Cachehit=0.00%
IOPS=210112, IOS/call=32/32, inflight=97 (97), Cachehit=0.00%
IOPS=210112, IOS/call=32/31, inflight=126 (126), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=126 (126), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=125 (125), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=119 (119), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=117 (117), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=111 (111), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=108 (108), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=107 (107), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00%

Which is good I think?


cheers


Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-02 Thread Michael Ellerman
Arnd Bergmann  writes:
> On Sun, Mar 31, 2019 at 5:47 PM Michael Ellerman  wrote:
>>
>> Arnd Bergmann  writes:
>> > Add the io_uring and pidfd_send_signal system calls to all architectures.
>> >
>> > These system calls are designed to handle both native and compat tasks,
>> > so all entries are the same across architectures, only arm-compat and
>> > the generic tale still use an old format.
>> >
>> > Signed-off-by: Arnd Bergmann 
>> > ---
>> >  arch/alpha/kernel/syscalls/syscall.tbl  | 4 
>> >  arch/arm/tools/syscall.tbl  | 4 
>> >  arch/arm64/include/asm/unistd.h | 2 +-
>> >  arch/arm64/include/asm/unistd32.h   | 8 
>> >  arch/ia64/kernel/syscalls/syscall.tbl   | 4 
>> >  arch/m68k/kernel/syscalls/syscall.tbl   | 4 
>> >  arch/microblaze/kernel/syscalls/syscall.tbl | 4 
>> >  arch/mips/kernel/syscalls/syscall_n32.tbl   | 4 
>> >  arch/mips/kernel/syscalls/syscall_n64.tbl   | 4 
>> >  arch/mips/kernel/syscalls/syscall_o32.tbl   | 4 
>> >  arch/parisc/kernel/syscalls/syscall.tbl | 4 
>> >  arch/powerpc/kernel/syscalls/syscall.tbl| 4 
>>
>> Have you done any testing?
>>
>> I'd rather not wire up syscalls that have never been tested at all on
>> powerpc.
>
> No, I have not. I did review the system calls carefully and added the first
> patch to fix the bug on x86 compat mode before adding the same bug
> on the other compat architectures though ;-)
>
> Generally, my feeling is that adding system calls is not fundamentally
> different from adding other ABIs, and we should really do it at
> the same time across all architectures, rather than waiting for each
> maintainer to get around to reviewing and testing the new calls
> first. This is not a problem on powerpc, but a lot of other architectures
> are less active, which is how we have always ended up with
> different sets of system calls across architectures.

Well it's still something of a problem on powerpc. No one has
volunteered to test io_uring on powerpc, so at this stage it will go in
completely untested.

If there was a selftest in the tree I'd be a bit happier, because at
least then our CI would start testing it as soon as the syscalls were
wired up in linux-next.

And yeah obviously I should test it, but I don't have infinite time
unfortunately.

> The problem here is that this makes it harder for the C library to
> know when a system call is guaranteed to be available. glibc
> still needs a feature test for newly added syscalls to see if they
> are working (they might be backported to an older kernel, or
> disabled), but whenever the minimum kernel version is increased,
> it makes sense to drop those checks and assume non-optional
> system calls will work if they were part of that minimum version.

But that's the thing, if we just wire them up untested they may not
actually work. And then you have the far worse situation where the
syscall exists in kernel version x but does not actually work properly.

See the mess we have with pkeys for example.

> In the future, I'd hope that any new system calls get added
> right away on all architectures when they land (it was a bit
> tricky this time, because I still did a bunch of reworks that
> conflicted with the new calls). Bugs will happen of course, but
> I think adding them sooner makes it more likely to catch those
> bugs early on so we have a chance to fix them properly,
> and need fewer arch specific workarounds (ideally none)
> for system calls.

For syscalls that have a selftest in the tree, and don't rely on
anything arch specific I agree.

I'm a bit more wary of things that are not easily tested and have the
potential to work differently across arches.

cheers


Re: [PATCH stable v4.14 13/32] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E

2019-04-02 Thread Michael Ellerman
Joakim Tjernlund  writes:
> On Tue, 2019-04-02 at 17:19 +1100, Michael Ellerman wrote:
>> Joakim Tjernlund  writes:
...
>> 
>> > Can I compile it away?
>> 
>> You can't actually, but you can disable it at runtime with
>> "nospectre_v1" on the kernel command line.
>> 
>> We could make it a user selectable compile time option if you really
>> want it to be.
>
> I think yes. Considering that these patches are fairly untested and the impact
> in the wild unknown. Requiring systems to change their boot config over night 
> is
> too fast.

OK. Just to be clear, you're actually using 4.14 on an NXP board and
would actually use this option? I don't want to add another option just
for a theoretical use case.

cheers


[PATCH] powerpc: config: skiroot: Add (back) MLX5 ethernet support

2019-04-02 Thread Joel Stanley
It turns out that some defconfig changes and kernel config option
changes meant we accidentally dropped Ethernet support for Mellanox CLX5
cards.

Reported-by: Carol L Soto 
Suggested-by: Carol L Soto 
Signed-off-by: Stewart Smith 
Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/skiroot_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/configs/skiroot_defconfig 
b/arch/powerpc/configs/skiroot_defconfig
index 5ba131c30f6b..6038b9347d9e 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -163,6 +163,8 @@ CONFIG_S2IO=m
 CONFIG_MLX4_EN=m
 # CONFIG_MLX4_CORE_GEN2 is not set
 CONFIG_MLX5_CORE=m
+CONFIG_MLX5_CORE_EN=y
+# CONFIG_MLX5_EN_RXNFC is not set
 # CONFIG_NET_VENDOR_MICREL is not set
 # CONFIG_NET_VENDOR_MICROSEMI is not set
 CONFIG_MYRI10GE=m
-- 
2.20.1



Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-02 Thread Davidlohr Bueso

On Tue, 02 Apr 2019, Andrew Morton wrote:


Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
thinking that the benefit of removing a few mmap_sem-takings from a few
obscure drivers (sorry ;)) is pretty small.


afaik porting the remaining incorrect users of locked_vm to pinned_vm was
the next step before this one, which made converting locked_vm to atomic
hardly worth it. Daniel?

Thanks,
Davidlohr


Re: [PATCH 0/4] Enabling secure boot on PowerNV systems

2019-04-02 Thread Claudio Carvalho


On 4/2/19 6:51 PM, Matthew Garrett wrote:
> On Tue, Apr 2, 2019 at 2:11 PM Claudio Carvalho  
> wrote:
>> We want to use the efivarfs for compatibility with existing userspace
>> tools. We will track and match any EFI changes that affect us.
> So you implement the full PK/KEK/db/dbx/dbt infrastructure, and
> updates are signed in the same way?

For the first version, our firmware will implement a simplistic PK, KEK and
db infrastructure (without dbx and dbt) where only the Setup and User modes
will be supported.

PK, KEK and db updates will be signed the same way, that is, using
userspace tooling like efitools in PowerNV. As for the authentication
descriptors, only the EFI_VARIABLE_AUTHENTICATION_2 descriptor will be
supported.


>> Our use case is restricted to secure boot - this is not going to be a
>> general purpose EFI variable implementation.
> In that case we might be better off with a generic interface for this
> purpose that we can expose on all platforms that implement a secure
> boot key hierarchy. Having an efivarfs that doesn't allow the creation
> of arbitrary attributes may break other existing userland
> expectations.
>
For what it's worth, gsmi uses the efivars infrastructure for EFI-like
variables.

What might a generic interface look like?  It would have to work for
existing secure boot solutions - including EFI - which would seem to imply
changes to userspace tools.

Claudio




Re: [PATCH v1 2/4] soc/fsl/guts: Add definition for LX2160A

2019-04-02 Thread Li Yang
On Tue, Feb 26, 2019 at 4:12 AM Vabhav Sharma  wrote:
>
> Adding compatible string "lx2160a-dcfg" to
> initialize guts driver for lx2160 and SoC die
> attribute definition for LX2160A


Applied to branch next.  Thanks.

Regards,
Leo
>
> Signed-off-by: Vabhav Sharma 
> Signed-off-by: Yinbo Zhu 
> Acked-by: Li Yang 
> ---
>  drivers/soc/fsl/guts.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> index 302e0c8..bcab1ee 100644
> --- a/drivers/soc/fsl/guts.c
> +++ b/drivers/soc/fsl/guts.c
> @@ -100,6 +100,11 @@ static const struct fsl_soc_die_attr fsl_soc_die[] = {
>   .svr  = 0x8700,
>   .mask = 0xfff7,
> },
> +   /* Die: LX2160A, SoC: LX2160A/LX2120A/LX2080A */
> +   { .die  = "LX2160A",
> + .svr  = 0x8736,
> + .mask = 0xff3f,
> +   },
> { },
>  };
>
> @@ -222,6 +227,7 @@ static const struct of_device_id fsl_guts_of_match[] = {
> { .compatible = "fsl,ls1088a-dcfg", },
> { .compatible = "fsl,ls1012a-dcfg", },
> { .compatible = "fsl,ls1046a-dcfg", },
> +   { .compatible = "fsl,lx2160a-dcfg", },
> {}
>  };
>  MODULE_DEVICE_TABLE(of, fsl_guts_of_match);
> --
> 2.7.4
>


Re: [PATCH] soc/fsl/qe: Fix an error code in qe_pin_request()

2019-04-02 Thread Li Yang
On Thu, Mar 28, 2019 at 9:21 AM Dan Carpenter  wrote:
>
> We forgot to set "err" on this error path.
>
> Fixes: 1a2d397a6eb5 ("gpio/powerpc: Eliminate duplication of 
> of_get_named_gpio_flags()")
> Signed-off-by: Dan Carpenter 

Applied to fix branch.  Thanks.

Regards,
Leo
> ---
>  drivers/soc/fsl/qe/gpio.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/soc/fsl/qe/gpio.c b/drivers/soc/fsl/qe/gpio.c
> index 819bed0f5667..51b3a47b5a55 100644
> --- a/drivers/soc/fsl/qe/gpio.c
> +++ b/drivers/soc/fsl/qe/gpio.c
> @@ -179,8 +179,10 @@ struct qe_pin *qe_pin_request(struct device_node *np, 
> int index)
> if (err < 0)
> goto err0;
> gc = gpio_to_chip(err);
> -   if (WARN_ON(!gc))
> +   if (WARN_ON(!gc)) {
> +   err = -ENODEV;
> goto err0;
> +   }
>
> if (!of_device_is_compatible(gc->of_node, 
> "fsl,mpc8323-qe-pario-bank")) {
> pr_debug("%s: tried to get a non-qe pin\n", __func__);
> --
> 2.17.1
>


Re: [PATCH 0/4] Enabling secure boot on PowerNV systems

2019-04-02 Thread Matthew Garrett
On Tue, Apr 2, 2019 at 2:11 PM Claudio Carvalho  wrote:
> We want to use the efivarfs for compatibility with existing userspace
> tools. We will track and match any EFI changes that affect us.

So you implement the full PK/KEK/db/dbx/dbt infrastructure, and
updates are signed in the same way?

> Our use case is restricted to secure boot - this is not going to be a
> general purpose EFI variable implementation.

In that case we might be better off with a generic interface for this
purpose that we can expose on all platforms that implement a secure
boot key hierarchy. Having an efivarfs that doesn't allow the creation
of arbitrary attributes may break other existing userland
expectations.


[PATCH 0/6] convert locked_vm from unsigned long to atomic64_t

2019-04-02 Thread Daniel Jordan
Hi,

>From patch 1:

  Taking and dropping mmap_sem to modify a single counter, locked_vm, is
  overkill when the counter could be synchronized separately.
  
  Make mmap_sem a little less coarse by changing locked_vm to an atomic,
  the 64-bit variety to avoid issues with overflow on 32-bit systems.

This is a more conservative alternative to [1] with no user-visible
effects.  Thanks to Alexey Kardashevskiy for pointing out the racy
atomics and to Alex Williamson, Christoph Lameter, Ira Weiny, and Jason
Gunthorpe for their comments on [1].

Davidlohr Bueso recently did a similar conversion for pinned_vm[2].

Testing
 1. passes LTP mlock[all], munlock[all], fork, mmap, and mremap tests in an
x86 kvm guest
 2. a VFIO-enabled x86 kvm guest shows the same VmLck in
/proc/pid/status before and after this change
 3. cross-compiles on powerpc

The series is based on v5.1-rc3.  Please consider for 5.2.

Daniel

[1] 
https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jor...@oracle.com/
[2] https://lore.kernel.org/linux-mm/20190206175920.31082-1-d...@stgolabs.net/

Daniel Jordan (6):
  mm: change locked_vm's type from unsigned long to atomic64_t
  vfio/type1: drop mmap_sem now that locked_vm is atomic
  vfio/spapr_tce: drop mmap_sem now that locked_vm is atomic
  fpga/dlf/afu: drop mmap_sem now that locked_vm is atomic
  powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  kvm/book3s: drop mmap_sem now that locked_vm is atomic

 arch/powerpc/kvm/book3s_64_vio.c| 34 ++--
 arch/powerpc/mm/mmu_context_iommu.c | 28 +---
 drivers/fpga/dfl-afu-dma-region.c   | 40 -
 drivers/vfio/vfio_iommu_spapr_tce.c | 37 --
 drivers/vfio/vfio_iommu_type1.c | 31 +-
 fs/proc/task_mmu.c  |  2 +-
 include/linux/mm_types.h|  2 +-
 kernel/fork.c   |  2 +-
 mm/debug.c  |  5 ++--
 mm/mlock.c  |  4 +--
 mm/mmap.c   | 18 ++---
 mm/mremap.c |  6 ++---
 12 files changed, 89 insertions(+), 120 deletions(-)


base-commit: 79a3aaa7b82e3106be97842dedfd8429248896e6
-- 
2.21.0



Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-02 Thread Andrew Morton
On Tue,  2 Apr 2019 16:41:53 -0400 Daniel Jordan  
wrote:

> Taking and dropping mmap_sem to modify a single counter, locked_vm, is
> overkill when the counter could be synchronized separately.
> 
> Make mmap_sem a little less coarse by changing locked_vm to an atomic,
> the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> ...
>
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long 
> tce_pages)
>  static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
>  {
>   long ret = 0;
> + s64 locked_vm;
>  
>   if (!current || !current->mm)
>   return ret; /* process exited */
>  
>   down_write(¤t->mm->mmap_sem);
>  
> + locked_vm = atomic64_read(¤t->mm->locked_vm);
>   if (inc) {
>   unsigned long locked, lock_limit;
>  
> - locked = current->mm->locked_vm + stt_pages;
> + locked = locked_vm + stt_pages;
>   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>   ret = -ENOMEM;
>   else
> - current->mm->locked_vm += stt_pages;
> + atomic64_add(stt_pages, ¤t->mm->locked_vm);
>   } else {
> - if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> - stt_pages = current->mm->locked_vm;
> + if (WARN_ON_ONCE(stt_pages > locked_vm))
> + stt_pages = locked_vm;
>  
> - current->mm->locked_vm -= stt_pages;
> + atomic64_sub(stt_pages, ¤t->mm->locked_vm);
>   }

With the current code, current->mm->locked_vm cannot go negative. 
After the patch, it can go negative.  If someone else decreased
current->mm->locked_vm between this function's atomic64_read() and
atomic64_sub().

I guess this is a can't-happen in this case because the racing code
which performed the modification would have taken it negative anyway.

But this all makes me rather queazy.


Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
thinking that the benefit of removing a few mmap_sem-takings from a few
obscure drivers (sorry ;)) is pretty small.


Also, the argument for switching 32-bit arches to a 64-bit counter was
suspiciously vague.  What overflow issues?  Or are we just being lazy?



[PATCH 6/6] kvm/book3s: drop mmap_sem now that locked_vm is atomic

2019-04-02 Thread Daniel Jordan
With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan 
Cc: Alexey Kardashevskiy 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Davidlohr Bueso 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: 
Cc: 
Cc: 
Cc: 
---
 arch/powerpc/kvm/book3s_64_vio.c | 34 +++-
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index e7fdb6d10eeb..8e034c3a5d25 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -56,7 +56,7 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE;
 }
 
-static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
+static long kvmppc_account_memlimit(unsigned long pages, bool inc)
 {
long ret = 0;
s64 locked_vm;
@@ -64,33 +64,23 @@ static long kvmppc_account_memlimit(unsigned long 
stt_pages, bool inc)
if (!current || !current->mm)
return ret; /* process exited */
 
-   down_write(¤t->mm->mmap_sem);
-
-   locked_vm = atomic64_read(¤t->mm->locked_vm);
if (inc) {
-   unsigned long locked, lock_limit;
+   unsigned long lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-   locked = locked_vm + stt_pages;
-   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+   locked_vm = atomic64_add_return(pages, ¤t->mm->locked_vm);
+   if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) {
ret = -ENOMEM;
-   else
-   atomic64_add(stt_pages, ¤t->mm->locked_vm);
+   atomic64_sub(pages, ¤t->mm->locked_vm);
+   }
} else {
-   if (WARN_ON_ONCE(stt_pages > locked_vm))
-   stt_pages = locked_vm;
-
-   atomic64_sub(stt_pages, ¤t->mm->locked_vm);
+   locked_vm = atomic64_sub_return(pages, ¤t->mm->locked_vm);
+   WARN_ON_ONCE(locked_vm < 0);
}
 
-   pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
-   inc ? '+' : '-',
-   stt_pages << PAGE_SHIFT,
-   atomic64_read(¤t->mm->locked_vm) << PAGE_SHIFT,
-   rlimit(RLIMIT_MEMLOCK),
-   ret ? " - exceeded" : "");
-
-   up_write(¤t->mm->mmap_sem);
+   pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%lu %lld/%lu%s\n", current->pid,
+   inc ? '+' : '-', pages << PAGE_SHIFT,
+   locked_vm << PAGE_SHIFT,
+   rlimit(RLIMIT_MEMLOCK), ret ? " - exceeded" : "");
 
return ret;
 }
-- 
2.21.0



Re: [PATCH 0/4] Enabling secure boot on PowerNV systems

2019-04-02 Thread Claudio Carvalho


On 4/2/19 4:36 PM, Matthew Garrett wrote:
> On Tue, Apr 2, 2019 at 11:15 AM Claudio Carvalho  
> wrote:
>> 1. Enable efivarfs by selecting CONFIG_EFI in the CONFIG_OPAL_SECVAR
>>introduced in this patch set. With CONFIG_EFIVAR_FS, userspace tools can
>>be used to manage the secure variables.
> efivarfs has some pretty significant behavioural semantics that
> directly reflect the EFI specification. Using it to expose non-EFI
> variable data feels like it's going to increase fragility - there's a
> risk that we'll change things in a way that makes sense for the EFI
> spec but breaks your use case. Is the desire to use efivarfs to
> maintain consistency with existing userland tooling, or just to avoid
> having a separate filesystem?
>
We want to use the efivarfs for compatibility with existing userspace
tools. We will track and match any EFI changes that affect us.

Our use case is restricted to secure boot - this is not going to be a
general purpose EFI variable implementation.

Claudio




[PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic

2019-04-02 Thread Daniel Jordan
With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan 
Cc: Alexey Kardashevskiy 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Davidlohr Bueso 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: 
Cc: 
Cc: 
---
 arch/powerpc/mm/mmu_context_iommu.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index 8038ac24a312..a4ef22b67c07 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -54,34 +54,29 @@ struct mm_iommu_table_group_mem_t {
 static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
unsigned long npages, bool incr)
 {
-   long ret = 0, locked, lock_limit;
+   long ret = 0;
+   unsigned long lock_limit;
s64 locked_vm;
 
if (!npages)
return 0;
 
-   down_write(&mm->mmap_sem);
-   locked_vm = atomic64_read(&mm->locked_vm);
if (incr) {
-   locked = locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+   locked_vm = atomic64_add_return(npages, &mm->locked_vm);
+   if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) {
ret = -ENOMEM;
-   else
-   atomic64_add(npages, &mm->locked_vm);
+   atomic64_sub(npages, &mm->locked_vm);
+   }
} else {
-   if (WARN_ON_ONCE(npages > locked_vm))
-   npages = locked_vm;
-   atomic64_sub(npages, &mm->locked_vm);
+   locked_vm = atomic64_sub_return(npages, &mm->locked_vm);
+   WARN_ON_ONCE(locked_vm < 0);
}
 
-   pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
-   current ? current->pid : 0,
-   incr ? '+' : '-',
-   npages << PAGE_SHIFT,
-   atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
+   pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%lu %lld/%lu\n",
+   current ? current->pid : 0, incr ? '+' : '-',
+   npages << PAGE_SHIFT, locked_vm << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK));
-   up_write(&mm->mmap_sem);
 
return ret;
 }
-- 
2.21.0



[PATCH v3 5/5] Lib: sort.h: remove the size argument from the swap function

2019-04-02 Thread Andrey Abramov
Removes size argument from the swap function because:
1) It wasn't used.
2) Custom swap function knows what kind of objects it swaps,
so it already knows their sizes.

Signed-off-by: Andrey Abramov 
Reviewed by: George Spelvin 
---
 arch/x86/kernel/unwind_orc.c | 2 +-
 include/linux/sort.h | 2 +-
 kernel/jump_label.c  | 2 +-
 lib/extable.c| 2 +-
 lib/sort.c   | 7 +++
 5 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 89be1be1790c..dc410b567189 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -176,7 +176,7 @@ static struct orc_entry *orc_find(unsigned long ip)
return orc_ftrace_find(ip);
 }
 
-static void orc_sort_swap(void *_a, void *_b, int size)
+static void orc_sort_swap(void *_a, void *_b)
 {
struct orc_entry *orc_a, *orc_b;
struct orc_entry orc_tmp;
diff --git a/include/linux/sort.h b/include/linux/sort.h
index 2b99a5dd073d..13bb4635b5f1 100644
--- a/include/linux/sort.h
+++ b/include/linux/sort.h
@@ -6,6 +6,6 @@
 
 void sort(void *base, size_t num, size_t size,
  int (*cmp)(const void *, const void *),
- void (*swap)(void *, void *, int));
+ void (*swap)(void *, void *));
 
 #endif
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index bad96b476eb6..6b1187b8a060 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -45,7 +45,7 @@ static int jump_label_cmp(const void *a, const void *b)
return 0;
 }
 
-static void jump_label_swap(void *a, void *b, int size)
+static void jump_label_swap(void *a, void *b)
 {
long delta = (unsigned long)a - (unsigned long)b;
struct jump_entry *jea = a;
diff --git a/lib/extable.c b/lib/extable.c
index f54996fdd0b8..0515a94538ca 100644
--- a/lib/extable.c
+++ b/lib/extable.c
@@ -28,7 +28,7 @@ static inline unsigned long ex_to_insn(const struct 
exception_table_entry *x)
 #ifndef ARCH_HAS_RELATIVE_EXTABLE
 #define swap_exNULL
 #else
-static void swap_ex(void *a, void *b, int size)
+static void swap_ex(void *a, void *b)
 {
struct exception_table_entry *x = a, *y = b, tmp;
int delta = b - a;
diff --git a/lib/sort.c b/lib/sort.c
index 50855ea8c262..8704750e6bde 100644
--- a/lib/sort.c
+++ b/lib/sort.c
@@ -114,7 +114,7 @@ static void swap_bytes(void *a, void *b, size_t n)
} while (n);
 }
 
-typedef void (*swap_func_t)(void *a, void *b, int size);
+typedef void (*swap_func_t)(void *a, void *b);
 
 /*
  * The values are arbitrary as long as they can't be confused with
@@ -138,7 +138,7 @@ static void do_swap(void *a, void *b, size_t size, 
swap_func_t swap_func)
else if (swap_func == SWAP_BYTES)
swap_bytes(a, b, size);
else
-   swap_func(a, b, (int)size);
+   swap_func(a, b);
 }
 
 /**
@@ -186,8 +186,7 @@ static size_t parent(size_t i, unsigned int lsbit, size_t 
size)
  * it less suitable for kernel use.
  */
 void sort(void *base, size_t num, size_t size,
- int (*cmp_func)(const void *, const void *),
- void (*swap_func)(void *, void *, int size))
+ int (*cmp_func)(const void *, const void *), swap_func_t swap_func)
 {
/* pre-scale counters for performance */
size_t n = num * size, a = (num/2) * size;
-- 
2.21.0




[PATCH v3 4/5] ubifs: find.c: replace swap function with built-in one

2019-04-02 Thread Andrey Abramov
Replace swap_dirty_idx function with built-in one,
because swap_dirty_idx does only a simple byte to byte swap.

Since Spectre mitigations have made indirect function calls more
expensive, and the default simple byte copies swap is implemented
without them, an "optimized" custom swap function is now
a waste of time as well as code.

Signed-off-by: Andrey Abramov 
Reviewed by: George Spelvin 
---
v2->v3: nothing changed

 fs/ubifs/find.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c
index f9646835b026..5deaae7fcead 100644
--- a/fs/ubifs/find.c
+++ b/fs/ubifs/find.c
@@ -747,12 +747,6 @@ static int cmp_dirty_idx(const struct ubifs_lprops **a,
return lpa->dirty + lpa->free - lpb->dirty - lpb->free;
 }
 
-static void swap_dirty_idx(struct ubifs_lprops **a, struct ubifs_lprops **b,
-  int size)
-{
-   swap(*a, *b);
-}
-
 /**
  * ubifs_save_dirty_idx_lnums - save an array of the most dirty index LEB nos.
  * @c: the UBIFS file-system description object
@@ -772,8 +766,7 @@ int ubifs_save_dirty_idx_lnums(struct ubifs_info *c)
   sizeof(void *) * c->dirty_idx.cnt);
/* Sort it so that the dirtiest is now at the end */
sort(c->dirty_idx.arr, c->dirty_idx.cnt, sizeof(void *),
-(int (*)(const void *, const void *))cmp_dirty_idx,
-(void (*)(void *, void *, int))swap_dirty_idx);
+(int (*)(const void *, const void *))cmp_dirty_idx, NULL);
dbg_find("found %d dirty index LEBs", c->dirty_idx.cnt);
if (c->dirty_idx.cnt)
dbg_find("dirtiest index LEB is %d with dirty %d and free %d",
-- 
2.21.0




[PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-02 Thread Daniel Jordan
Taking and dropping mmap_sem to modify a single counter, locked_vm, is
overkill when the counter could be synchronized separately.

Make mmap_sem a little less coarse by changing locked_vm to an atomic,
the 64-bit variety to avoid issues with overflow on 32-bit systems.

Signed-off-by: Daniel Jordan 
Cc: Alan Tull 
Cc: Alexey Kardashevskiy 
Cc: Alex Williamson 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Davidlohr Bueso 
Cc: Michael Ellerman 
Cc: Moritz Fischer 
Cc: Paul Mackerras 
Cc: Wu Hao 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
---
 arch/powerpc/kvm/book3s_64_vio.c| 14 --
 arch/powerpc/mm/mmu_context_iommu.c | 15 ---
 drivers/fpga/dfl-afu-dma-region.c   | 18 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 17 +
 drivers/vfio/vfio_iommu_type1.c | 10 ++
 fs/proc/task_mmu.c  |  2 +-
 include/linux/mm_types.h|  2 +-
 kernel/fork.c   |  2 +-
 mm/debug.c  |  5 +++--
 mm/mlock.c  |  4 ++--
 mm/mmap.c   | 18 +-
 mm/mremap.c |  6 +++---
 12 files changed, 61 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index f02b04973710..e7fdb6d10eeb 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long 
tce_pages)
 static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
 {
long ret = 0;
+   s64 locked_vm;
 
if (!current || !current->mm)
return ret; /* process exited */
 
down_write(¤t->mm->mmap_sem);
 
+   locked_vm = atomic64_read(¤t->mm->locked_vm);
if (inc) {
unsigned long locked, lock_limit;
 
-   locked = current->mm->locked_vm + stt_pages;
+   locked = locked_vm + stt_pages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK))
ret = -ENOMEM;
else
-   current->mm->locked_vm += stt_pages;
+   atomic64_add(stt_pages, ¤t->mm->locked_vm);
} else {
-   if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
-   stt_pages = current->mm->locked_vm;
+   if (WARN_ON_ONCE(stt_pages > locked_vm))
+   stt_pages = locked_vm;
 
-   current->mm->locked_vm -= stt_pages;
+   atomic64_sub(stt_pages, ¤t->mm->locked_vm);
}
 
pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
inc ? '+' : '-',
stt_pages << PAGE_SHIFT,
-   current->mm->locked_vm << PAGE_SHIFT,
+   atomic64_read(¤t->mm->locked_vm) << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK),
ret ? " - exceeded" : "");
 
diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index e7a9c4f6bfca..8038ac24a312 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
unsigned long npages, bool incr)
 {
long ret = 0, locked, lock_limit;
+   s64 locked_vm;
 
if (!npages)
return 0;
 
down_write(&mm->mmap_sem);
-
+   locked_vm = atomic64_read(&mm->locked_vm);
if (incr) {
-   locked = mm->locked_vm + npages;
+   locked = locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK))
ret = -ENOMEM;
else
-   mm->locked_vm += npages;
+   atomic64_add(npages, &mm->locked_vm);
} else {
-   if (WARN_ON_ONCE(npages > mm->locked_vm))
-   npages = mm->locked_vm;
-   mm->locked_vm -= npages;
+   if (WARN_ON_ONCE(npages > locked_vm))
+   npages = locked_vm;
+   atomic64_sub(npages, &mm->locked_vm);
}
 
pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
current ? current->pid : 0,
incr ? '+' : '-',
npages << PAGE_SHIFT,
-   mm->locked_vm << PAGE_SHIFT,
+   atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK));
up_write(&mm->mmap_sem);
 
diff --git a/drivers/fpga/dfl-afu-dma-region.c 
b/drivers/fpga/dfl-afu-dma-region.c
index e18a786fc943..08132fd9b6b7 100644
--- a/drivers/fpga/dfl-afu-dma-region.c
+++ b/drivers/f

[PATCH v3 3/5] ocfs2: dir, refcounttree, xattr: replace swap functions with built-in one

2019-04-02 Thread Andrey Abramov
Replace dx_leaf_sort_swap, swap_refcount_rec and swap_xe functions
with built-in one, because they do only a simple byte to byte swap.

Since Spectre mitigations have made indirect function calls more
expensive, and the default simple byte copies swap is implemented
without them, an "optimized" custom swap function is now
a waste of time as well as code.

Signed-off-by: Andrey Abramov 
Reviewed by: George Spelvin 
---
v2->v3: nothing changed

 fs/ocfs2/dir.c  | 13 +
 fs/ocfs2/refcounttree.c | 13 +++--
 fs/ocfs2/xattr.c| 15 +++
 3 files changed, 7 insertions(+), 34 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index c121abbdfc7d..4b86b181df0a 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -3529,16 +3529,6 @@ static int dx_leaf_sort_cmp(const void *a, const void *b)
return 0;
 }
 
-static void dx_leaf_sort_swap(void *a, void *b, int size)
-{
-   struct ocfs2_dx_entry *entry1 = a;
-   struct ocfs2_dx_entry *entry2 = b;
-
-   BUG_ON(size != sizeof(*entry1));
-
-   swap(*entry1, *entry2);
-}
-
 static int ocfs2_dx_leaf_same_major(struct ocfs2_dx_leaf *dx_leaf)
 {
struct ocfs2_dx_entry_list *dl_list = &dx_leaf->dl_list;
@@ -3799,8 +3789,7 @@ static int ocfs2_dx_dir_rebalance(struct ocfs2_super 
*osb, struct inode *dir,
 * This block is changing anyway, so we can sort it in place.
 */
sort(dx_leaf->dl_list.de_entries, num_used,
-sizeof(struct ocfs2_dx_entry), dx_leaf_sort_cmp,
-dx_leaf_sort_swap);
+sizeof(struct ocfs2_dx_entry), dx_leaf_sort_cmp, NULL);
 
ocfs2_journal_dirty(handle, dx_leaf_bh);
 
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 1dc9a08e8bdc..7bbc94d23a0c 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -1400,13 +1400,6 @@ static int cmp_refcount_rec_by_cpos(const void *a, const 
void *b)
return 0;
 }
 
-static void swap_refcount_rec(void *a, void *b, int size)
-{
-   struct ocfs2_refcount_rec *l = a, *r = b;
-
-   swap(*l, *r);
-}
-
 /*
  * The refcount cpos are ordered by their 64bit cpos,
  * But we will use the low 32 bit to be the e_cpos in the b-tree.
@@ -1482,7 +1475,7 @@ static int ocfs2_divide_leaf_refcount_block(struct 
buffer_head *ref_leaf_bh,
 */
sort(&rl->rl_recs, le16_to_cpu(rl->rl_used),
 sizeof(struct ocfs2_refcount_rec),
-cmp_refcount_rec_by_low_cpos, swap_refcount_rec);
+cmp_refcount_rec_by_low_cpos, NULL);
 
ret = ocfs2_find_refcount_split_pos(rl, &cpos, &split_index);
if (ret) {
@@ -1507,11 +1500,11 @@ static int ocfs2_divide_leaf_refcount_block(struct 
buffer_head *ref_leaf_bh,
 
sort(&rl->rl_recs, le16_to_cpu(rl->rl_used),
 sizeof(struct ocfs2_refcount_rec),
-cmp_refcount_rec_by_cpos, swap_refcount_rec);
+cmp_refcount_rec_by_cpos, NULL);
 
sort(&new_rl->rl_recs, le16_to_cpu(new_rl->rl_used),
 sizeof(struct ocfs2_refcount_rec),
-cmp_refcount_rec_by_cpos, swap_refcount_rec);
+cmp_refcount_rec_by_cpos, NULL);
 
*split_cpos = cpos;
return 0;
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3a24ce3deb01..b3e6f42baf78 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -4175,15 +4175,6 @@ static int cmp_xe(const void *a, const void *b)
return 0;
 }
 
-static void swap_xe(void *a, void *b, int size)
-{
-   struct ocfs2_xattr_entry *l = a, *r = b, tmp;
-
-   tmp = *l;
-   memcpy(l, r, sizeof(struct ocfs2_xattr_entry));
-   memcpy(r, &tmp, sizeof(struct ocfs2_xattr_entry));
-}
-
 /*
  * When the ocfs2_xattr_block is filled up, new bucket will be created
  * and all the xattr entries will be moved to the new bucket.
@@ -4249,7 +4240,7 @@ static void ocfs2_cp_xattr_block_to_bucket(struct inode 
*inode,
trace_ocfs2_cp_xattr_block_to_bucket_end(offset, size, off_change);
 
sort(target + offset, count, sizeof(struct ocfs2_xattr_entry),
-cmp_xe, swap_xe);
+cmp_xe, NULL);
 }
 
 /*
@@ -,7 +4435,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
 */
sort(entries, le16_to_cpu(xh->xh_count),
 sizeof(struct ocfs2_xattr_entry),
-cmp_xe_offset, swap_xe);
+cmp_xe_offset, NULL);
 
/* Move all name/values to the end of the bucket. */
xe = xh->xh_entries;
@@ -4486,7 +4477,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
/* sort the entries by their name_hash. */
sort(entries, le16_to_cpu(xh->xh_count),
 sizeof(struct ocfs2_xattr_entry),
-cmp_xe, swap_xe);
+cmp_xe, NULL);
 
buf = bucket_buf;
for (i = 0; i < bucket->bu_blocks; i++, buf += blocksize)
-- 
2.21.0




[PATCH v3 2/5] powerpc: module_[32|64].c: replace swap function with built-in one

2019-04-02 Thread Andrey Abramov
Replace relaswap with built-in one, because relaswap
does a simple byte to byte swap.

Since Spectre mitigations have made indirect function calls more
expensive, and the default simple byte copies swap is implemented
without them, an "optimized" custom swap function is now
a waste of time as well as code.

Signed-off-by: Andrey Abramov 
Reviewed by: George Spelvin 
Acked-by: Michael Ellerman  (powerpc)
---
v2->v3: nothing changed

 arch/powerpc/kernel/module_32.c | 17 +
 arch/powerpc/kernel/module_64.c | 17 +
 2 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index 88d83771f462..c311e8575d10 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -79,21 +79,6 @@ static int relacmp(const void *_x, const void *_y)
return 0;
 }
 
-static void relaswap(void *_x, void *_y, int size)
-{
-   uint32_t *x, *y, tmp;
-   int i;
-
-   y = (uint32_t *)_x;
-   x = (uint32_t *)_y;
-
-   for (i = 0; i < sizeof(Elf32_Rela) / sizeof(uint32_t); i++) {
-   tmp = x[i];
-   x[i] = y[i];
-   y[i] = tmp;
-   }
-}
-
 /* Get the potential trampolines size required of the init and
non-init sections */
 static unsigned long get_plt_size(const Elf32_Ehdr *hdr,
@@ -130,7 +115,7 @@ static unsigned long get_plt_size(const Elf32_Ehdr *hdr,
 */
sort((void *)hdr + sechdrs[i].sh_offset,
 sechdrs[i].sh_size / sizeof(Elf32_Rela),
-sizeof(Elf32_Rela), relacmp, relaswap);
+sizeof(Elf32_Rela), relacmp, NULL);
 
ret += count_relocs((void *)hdr
 + sechdrs[i].sh_offset,
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 8661eea78503..0c833d7f36f1 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -231,21 +231,6 @@ static int relacmp(const void *_x, const void *_y)
return 0;
 }
 
-static void relaswap(void *_x, void *_y, int size)
-{
-   uint64_t *x, *y, tmp;
-   int i;
-
-   y = (uint64_t *)_x;
-   x = (uint64_t *)_y;
-
-   for (i = 0; i < sizeof(Elf64_Rela) / sizeof(uint64_t); i++) {
-   tmp = x[i];
-   x[i] = y[i];
-   y[i] = tmp;
-   }
-}
-
 /* Get size of potential trampolines required. */
 static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
const Elf64_Shdr *sechdrs)
@@ -269,7 +254,7 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 */
sort((void *)sechdrs[i].sh_addr,
 sechdrs[i].sh_size / sizeof(Elf64_Rela),
-sizeof(Elf64_Rela), relacmp, relaswap);
+sizeof(Elf64_Rela), relacmp, NULL);
 
relocs += count_relocs((void *)sechdrs[i].sh_addr,
   sechdrs[i].sh_size
-- 
2.21.0




[PATCH v3 1/5] arch/arc: unwind.c: replace swap function with built-in one

2019-04-02 Thread Andrey Abramov
Replace swap_eh_frame_hdr_table_entries with built-in one, because
swap_eh_frame_hdr_table_entries does a simple byte to byte swap.

Since Spectre mitigations have made indirect function calls more
expensive, and the default simple byte copies swap is implemented
without them, an "optimized" custom swap function is now
a waste of time as well as code.

Signed-off-by: Andrey Abramov 
Reviewed by: George Spelvin 
Acked-by: Vineet Gupta 
---
v2->v3: nothing changed

 arch/arc/kernel/unwind.c | 20 ++--
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/arc/kernel/unwind.c b/arch/arc/kernel/unwind.c
index 271e9fafa479..7610fe84afea 100644
--- a/arch/arc/kernel/unwind.c
+++ b/arch/arc/kernel/unwind.c
@@ -248,20 +248,6 @@ static int cmp_eh_frame_hdr_table_entries(const void *p1, 
const void *p2)
return (e1->start > e2->start) - (e1->start < e2->start);
 }
 
-static void swap_eh_frame_hdr_table_entries(void *p1, void *p2, int size)
-{
-   struct eh_frame_hdr_table_entry *e1 = p1;
-   struct eh_frame_hdr_table_entry *e2 = p2;
-   unsigned long v;
-
-   v = e1->start;
-   e1->start = e2->start;
-   e2->start = v;
-   v = e1->fde;
-   e1->fde = e2->fde;
-   e2->fde = v;
-}
-
 static void init_unwind_hdr(struct unwind_table *table,
void *(*alloc) (unsigned long))
 {
@@ -354,10 +340,8 @@ static void init_unwind_hdr(struct unwind_table *table,
}
WARN_ON(n != header->fde_count);
 
-   sort(header->table,
-n,
-sizeof(*header->table),
-cmp_eh_frame_hdr_table_entries, swap_eh_frame_hdr_table_entries);
+   sort(header->table, n,
+sizeof(*header->table), cmp_eh_frame_hdr_table_entries, NULL);
 
table->hdrsz = hdrSize;
smp_wmb();
-- 
2.21.0




[PATCH v3 0/5] simple sort swap function improvements

2019-04-02 Thread Andrey Abramov
This is the logical continuation of the "lib/sort & lib/list_sort:
faster and smaller" series by George Spelvin (added to linux-next
recently).

Since Spectre mitigations have made indirect function calls more
expensive, and the previous patch series implements the default
simple byte copies without them, an "optimized" custom swap
function is now a waste of time as well as code.

Patches 1 to 4 replace trivial swap functions with the built-in
(which is now much faster) and are grouped by subsystem.
Being pure code deletion patches, they are sure to bring joy to
Linus's heart.

Having reviewed all call sites, only three non-trivial swap
functions remain:  arch/x86/kernel/unwind_orc.c,
kernel/jump_label.c and lib/extable.c.

Patch #5 removes size argument from the swap function because:
1) It wasn't used.
2) Custom swap function knows what kind of objects it swaps,
so it already knows their sizes.

v1->v2: Only commit messages have changed to better explain
the purpose of commits. (Thanks to George Spelvin and Greg KH)
v2->v3: Patch #5 now completely removes the size argument

Andrey Abramov (5):
  arch/arc: unwind.c: replace swap function with built-in one
  powerpc: module_[32|64].c: replace swap function with built-in one
  ocfs2: dir,refcounttree,xattr: replace swap functions with built-in
one
  ubifs: find.c: replace swap function with built-in one
  Lib: sort.h: remove the size argument from the swap function

 arch/arc/kernel/unwind.c| 20 ++--
 arch/powerpc/kernel/module_32.c | 17 +
 arch/powerpc/kernel/module_64.c | 17 +
 arch/x86/kernel/unwind_orc.c|  2 +-
 fs/ocfs2/dir.c  | 13 +
 fs/ocfs2/refcounttree.c | 13 +++--
 fs/ocfs2/xattr.c| 15 +++
 fs/ubifs/find.c |  9 +
 include/linux/sort.h|  2 +-
 kernel/jump_label.c |  2 +-
 lib/extable.c   |  2 +-
 lib/sort.c  |  7 +++
 12 files changed, 19 insertions(+), 100 deletions(-)

-- 
2.21.0




Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 18:14, Andrey Ryabinin a écrit :



On 4/2/19 12:43 PM, Christophe Leroy wrote:

Hi Dmitry, Andrey and others,

Do you have any comments to this series ?



I don't see justification for adding all these non-instrumented functions. We 
need only some subset of these functions
and only on powerpc so far. Arches that don't use str*() that early simply 
doesn't need not-instrumented __str*() variant.

Also I don't think that auto-replace str* to __str* for all not instrumented 
files is a good idea, as this will reduce KASAN coverage.
E.g. we don't instrument slub.c but there is no reason to use non-instrumented 
__str*() functions there.


Ok, I didn't see it that way.

In fact I was seeing the opposite and was considering it as an 
opportunity to increase KASAN coverage. E.g.: at the time being things 
like the above (from arch/xtensa/include/asm/string.h) are not covered 
at all I believe:


#define __HAVE_ARCH_STRCPY
static inline char *strcpy(char *__dest, const char *__src)
{
register char *__xdest = __dest;
unsigned long __dummy;

__asm__ __volatile__("1:\n\t"
"l8ui  %2, %1, 0\n\t"
"s8i   %2, %0, 0\n\t"
"addi  %1, %1, 1\n\t"
"addi  %0, %0, 1\n\t"
"bnez  %2, 1b\n\t"
: "=r" (__dest), "=r" (__src), "=&r" (__dummy)
: "0" (__dest), "1" (__src)
: "memory");

return __xdest;
}

In my series, I have deactivated optimised string functions when KASAN 
is selected like arm64 do. See https://patchwork.ozlabs.org/patch/1055780/
But not every arch does that, meaning that some string functions remains 
not instrumented at all.


Also, I was seeing it as a way to reduce impact on performance with 
KASAN. Because instrumenting each byte access of the non-optimised 
string functions is a performance genocide.




And finally, this series make bug reporting slightly worse. E.g. let's look at 
strcpy():

+char *strcpy(char *dest, const char *src)
+{
+   size_t len = __strlen(src) + 1;
+
+   check_memory_region((unsigned long)src, len, false, _RET_IP_);
+   check_memory_region((unsigned long)dest, len, true, _RET_IP_);
+
+   return __strcpy(dest, src);
+}

If src is not-null terminated string we might not see proper out-of-bounds 
report from KASAN only a crash in __strlen().
Which might make harder to identify where 'src' comes from, where it was 
allocated and what's the size of allocated area.



I'd like to know if this approach is ok or if it is better to keep doing as in 
https://patchwork.ozlabs.org/patch/1055788/


I think the patch from link is a better solution to the problem.



Ok, I'll stick with it then. Thanks for your feedback

Christophe


Re: [PATCH] ASoC: fsl_esai: Support synchronous mode

2019-04-02 Thread Nicolin Chen
> > On Mon, Apr 01, 2019 at 11:39:10AM +, S.j. Wang wrote:
> > > In ESAI synchronous mode, the clock is generated by Tx, So we should
> > > always set registers of Tx which relate with the bit clock and frame
> > > clock generation (TCCR, TCR, ECR), even there is only Rx is working.
> > >
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  sound/soc/fsl/fsl_esai.c | 28 +++-
> > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c index
> > > 3623aa9a6f2e..d9fcddd55c02 100644
> > > --- a/sound/soc/fsl/fsl_esai.c
> > > +++ b/sound/soc/fsl/fsl_esai.c
> > > @@ -230,6 +230,21 @@ static int fsl_esai_set_dai_sysclk(struct
> > snd_soc_dai *dai, int clk_id,
> > >   return -EINVAL;
> > >   }
> > >
> > > + if (esai_priv->synchronous && !tx) {
> > > + switch (clk_id) {
> > > + case ESAI_HCKR_FSYS:
> > > + fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_FSYS,
> > > + freq, dir);
> > > + break;
> > > + case ESAI_HCKR_EXTAL:
> > > + fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_EXTAL,
> > > + freq, dir);
> > 
> > Not sure why you call set_dai_sysclk inside set_dai_sysclk again. It feels 
> > very
> > confusing to do so, especially without a comments.
> 
> For sync mode, only RX is enabled,  the register of tx should be set, so call 
> the
> Set_dai_sysclk again.

Yea, I understood that. But why not just replace RX with TX on the
register-writing level? Do we need to set both TCCR and RCCR? Your
change in hw_params() only sets TCCR inside fsl_esai_set_bclk(), so
we probably only need to change TCCR for recordings running in sync
mode, right?

>From the commit message, it feels like that only the clock-related
fields in the TX registers need to be set. Things like calculation
and setting the direction of HCKx pin don't need to run again.

> > > @@ -537,10 +552,21 @@ static int fsl_esai_hw_params(struct
> > > snd_pcm_substream *substream,
> > >
> > >   bclk = params_rate(params) * slot_width * esai_priv->slots;
> > >
> > > - ret = fsl_esai_set_bclk(dai, tx, bclk);
> > > + ret = fsl_esai_set_bclk(dai, esai_priv->synchronous ? true : tx,
> > > +bclk);
> > >   if (ret)
> > >   return ret;
> > >
> > > + if (esai_priv->synchronous && !tx) {
> > > + /* Use Normal mode to support monaural audio */
> > > + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> > > +ESAI_xCR_xMOD_MASK,
> > params_channels(params) > 1 ?
> > > +ESAI_xCR_xMOD_NETWORK : 0);
> > > +
> > > + mask = ESAI_xCR_xSWS_MASK | ESAI_xCR_PADC;
> > > + val = ESAI_xCR_xSWS(slot_width, width) | ESAI_xCR_PADC;
> > > + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> > mask, val);
> > > + }
> > 
> > Does synchronous mode require to set both TCR and RCR? or just TCR?

> Both TCR and RCR.  RCR will be set in normal flow.

OK. Settings both xCRs makes sense. Would you please try this:

===
@@ -537,14 +552,20 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
*substream,

bclk = params_rate(params) * slot_width * esai_priv->slots;

-   ret = fsl_esai_set_bclk(dai, tx, bclk);
+   /* Synchronous mode uses TX clock generator */
+   ret = fsl_esai_set_bclk(dai, esai_priv->synchronous || tx, bclk);
if (ret)
return ret;

+   mask = ESAI_xCR_xMOD_MASK | ESAI_xCR_xSWS_MASK;
+   val = ESAI_xCR_xSWS(slot_width, width);
/* Use Normal mode to support monaural audio */
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
-  ESAI_xCR_xMOD_MASK, params_channels(params) > 1 ?
-  ESAI_xCR_xMOD_NETWORK : 0);
+   val |= params_channels(params) > 1 ? ESAI_xCR_xMOD_NETWORK : 0;
+
+   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
+   /* Recording in synchronous mode needs to set TCR also */
+   if (!tx && esai_priv->synchronous)
+   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, mask, val);

regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx),
   ESAI_xFCR_xFR_MASK, ESAI_xFCR_xFR);
@@ -556,10 +577,10 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
*substream,

regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), mask, val);

-   mask = ESAI_xCR_xSWS_MASK | (tx ? ESAI_xCR_PADC : 0);
-   val = ESAI_xCR_xSWS(slot_width, width) | (tx ? ESAI_xCR_PADC : 0);
-
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
+   /* Only TCR has padding bit and needs to be set for synchronous mode */
+   if (tx || esai_priv->synchronous)
+   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
+  ESAI_xCR_PADC, ESAI_xCR_PA

Re: [PATCH 2/5] powerpc: module_[32|64].c: replace swap function with built-in one

2019-04-02 Thread Andrey Abramov
01.04.2019, 13:11, "Michael Ellerman" :
> This looks OK. It's a bit of a pity to replace the 8-byte-at-a-time copy
> with a byte-at-a-time copy, but I suspect it's insignificant compared to
> the overhead of calling the comparison and swap functions.
>
> And we could always add a generic 8-byte-at-a-time swap function if it's
> a bottleneck.

I am sorry, I forgot to quickly comment on your letter.
Now (after George Spelvin's patches) the generic swap is able
to use u64 or u32 if the alignment and size are divisible
by 4 or 8, so we lose nothing here.


Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 20:31, Christophe Leroy a écrit :



Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit :

Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask, subpage protection details. We
can skip allocating those on radix. This will help us to save
8K per mm_context with radix translation.

With the patch applied we have

sizeof(mm_context_t)  = 136
sizeof(struct hash_mm_context)  = 8288

Signed-off-by: Aneesh Kumar K.V 
---
NOTE:

If we want to do this, I am still trying to figure out how best we can 
do this

without all the #ifdef and other overhead for 8xx book3e


  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
  arch/powerpc/include/asm/book3s/64/mmu.h  | 48 +++
  arch/powerpc/include/asm/book3s/64/slice.h    |  6 +--
  arch/powerpc/kernel/paca.c    |  9 ++--
  arch/powerpc/kernel/setup-common.c    |  7 ++-
  arch/powerpc/mm/hash_utils_64.c   | 10 ++--
  arch/powerpc/mm/mmu_context_book3s64.c    | 16 ++-
  arch/powerpc/mm/slb.c |  2 +-
  arch/powerpc/mm/slice.c   | 48 +--
  arch/powerpc/mm/subpage-prot.c    |  8 ++--
  10 files changed, 91 insertions(+), 65 deletions(-)



[...]


@@ -253,7 +253,7 @@ static void slice_convert(struct mm_struct *mm,
   */
  spin_lock_irqsave(&slice_convert_lock, flags);
-    lpsizes = mm->context.low_slices_psize;
+    lpsizes = mm->context.hash_context->low_slices_psize;


A help to get ->low_slices_psize would help,
something like:

In nohash/32/mmu-8xx:

unsigned char *slice_low_slices_psize(context_t *ctx)
{
 return mm->context.low_slices_psize;


Of course here I meant:

unsigned char *slice_low_slices_psize(mm_context_t *ctx)
{
return ctx->low_slices_psize;
}


}

And in book3s/64/mmu.h:

unsigned char *slice_low_slices_psize(context_t *ctx)
{
 return mm->context.hash_context->low_slices_psize;


and

unsigned char *slice_low_slices_psize(mm_context_t *ctx)
{
return ctx->hash_context->low_slices_psize;
}

Christophe


Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 17:42, Aneesh Kumar K.V a écrit :

On 4/2/19 9:06 PM, Christophe Leroy wrote:



Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit :

Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask, subpage protection details. We
can skip allocating those on radix. This will help us to save
8K per mm_context with radix translation.

With the patch applied we have

sizeof(mm_context_t)  = 136
sizeof(struct hash_mm_context)  = 8288

Signed-off-by: Aneesh Kumar K.V 
---
NOTE:

If we want to do this, I am still trying to figure out how best we 
can do this

without all the #ifdef and other overhead for 8xx book3e


Did you have a look at my series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170 ?


It tries to reduce as much as feasible the #ifdefs and stuff.




Not yet. But a cursory look tell me introducing hash_mm_context 
complicates this further unless I introduce something similar for nohash 
32? Are you ok with that?


Have a look at my review in the other mail, I think we can limit the 
changes and avoid introducing the hash_mm_context for 8xx.


Otherwise, we should call it something else, for instance 
extended_mm_context, but that looks unnecessary from my point of view.


Christophe


Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit :

Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask, subpage protection details. We
can skip allocating those on radix. This will help us to save
8K per mm_context with radix translation.

With the patch applied we have

sizeof(mm_context_t)  = 136
sizeof(struct hash_mm_context)  = 8288

Signed-off-by: Aneesh Kumar K.V 
---
NOTE:

If we want to do this, I am still trying to figure out how best we can do this
without all the #ifdef and other overhead for 8xx book3e


  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
  arch/powerpc/include/asm/book3s/64/mmu.h  | 48 +++
  arch/powerpc/include/asm/book3s/64/slice.h|  6 +--
  arch/powerpc/kernel/paca.c|  9 ++--
  arch/powerpc/kernel/setup-common.c|  7 ++-
  arch/powerpc/mm/hash_utils_64.c   | 10 ++--
  arch/powerpc/mm/mmu_context_book3s64.c| 16 ++-
  arch/powerpc/mm/slb.c |  2 +-
  arch/powerpc/mm/slice.c   | 48 +--
  arch/powerpc/mm/subpage-prot.c|  8 ++--
  10 files changed, 91 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index a28a28079edb..d801be977623 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -657,7 +657,7 @@ extern void slb_set_size(u16 size);
  
  /* 4 bits per slice and we have one slice per 1TB */

  #define SLICE_ARRAY_SIZE  (H_PGTABLE_RANGE >> 41)
-#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41)
+#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.hash_context->slb_addr_limit >> 
41)
  
  #ifndef __ASSEMBLY__
  
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h

index a809bdd77322..07e76e304a3b 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -114,6 +114,33 @@ struct slice_mask {
DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
  };
  
+struct hash_mm_context {

+
+   u16 user_psize; /* page size index */


Could we keep that in mm_context_t ?


+
+#ifdef CONFIG_PPC_MM_SLICES


CONFIG_PPC_MM_SLICES is always selected on book3s64 so this #ifdef is 
useless.



+   /* SLB page size encodings*/
+   unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
+   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
+   unsigned long slb_addr_limit;


Could we keep slb_addr_limit in mm_context_t too ?



+#ifdef CONFIG_PPC_64K_PAGES
+   struct slice_mask mask_64k;
+#endif
+   struct slice_mask mask_4k;
+#ifdef CONFIG_HUGETLB_PAGE
+   struct slice_mask mask_16m;
+   struct slice_mask mask_16g;
+#endif
+#else
+   u16 sllp; /* SLB page size encoding */


This can get away as CONFIG_PPC_MM_SLICES is always set.


+#endif
+
+#ifdef CONFIG_PPC_SUBPAGE_PROT
+   struct subpage_prot_table spt;
+#endif /* CONFIG_PPC_SUBPAGE_PROT */
+
+};
+
  typedef struct {
union {
/*
@@ -127,7 +154,6 @@ typedef struct {
mm_context_id_t id;
mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE];
};
-   u16 user_psize; /* page size index */
  
  	/* Number of bits in the mm_cpumask */

atomic_t active_cpus;
@@ -137,27 +163,9 @@ typedef struct {
  
  	/* NPU NMMU context */

struct npu_context *npu_context;
+   struct hash_mm_context *hash_context;
  
-#ifdef CONFIG_PPC_MM_SLICES

-/* SLB page size encodings*/
-   unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
-   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned long slb_addr_limit;
-# ifdef CONFIG_PPC_64K_PAGES
-   struct slice_mask mask_64k;
-# endif
-   struct slice_mask mask_4k;
-# ifdef CONFIG_HUGETLB_PAGE
-   struct slice_mask mask_16m;
-   struct slice_mask mask_16g;
-# endif
-#else
-   u16 sllp;   /* SLB page size encoding */
-#endif
unsigned long vdso_base;
-#ifdef CONFIG_PPC_SUBPAGE_PROT
-   struct subpage_prot_table spt;
-#endif /* CONFIG_PPC_SUBPAGE_PROT */
/*
 * pagetable fragment support
 */
diff --git a/arch/powerpc/include/asm/book3s/64/slice.h 
b/arch/powerpc/include/asm/book3s/64/slice.h
index db0dedab65ee..3ca1bebe258e 100644
--- a/arch/powerpc/include/asm/book3s/64/slice.h
+++ b/arch/powerpc/include/asm/book3s/64/slice.h
@@ -15,11 +15,11 @@
  
  #else /* CONFIG_PPC_MM_SLICES */


That never happens since book3s/64 always selects CONFIG_PPC_MM_SLICES

  
-#define get_slice_psize(mm, addr)	((mm)->context.user_psize)

+#define get_slice_psize(mm, addr)  ((mm)->context.hash_context->user_psize)
  #define slice_set_user_psize(mm, psize)   \
  do {  \
-   

[PATCH 4/4] powerpc: Add support to initialize ima policy rules

2019-04-02 Thread Claudio Carvalho
From: Nayna Jain 

PowerNV secure boot relies on the kernel IMA security subsystem to
perform the OS kernel image signature verification. Since each secure
boot mode has different IMA policy requirements, dynamic definition of
the policy rules based on the runtime secure boot mode of the system is
required. On systems that support secure boot, but have it disabled,
only measurement policy rules of the kernel image and modules are
defined.

This patch defines the arch-specific implementation to retrieve the
secure boot mode of the system and accordingly configures the IMA policy
rules.

This patch will provide arch-specific IMA policies if PPC_SECURE_BOOT
config is enabled.

Signed-off-by: Nayna Jain 
---
 arch/powerpc/Kconfig   | 12 
 arch/powerpc/kernel/Makefile   |  1 +
 arch/powerpc/kernel/ima_arch.c | 54 ++
 include/linux/ima.h|  3 +-
 4 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/ima_arch.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2d0be82c3061..e0ba9a9114b3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -901,6 +901,18 @@ config PPC_MEM_KEYS
 
  If unsure, say y.
 
+config PPC_SECURE_BOOT
+   prompt "Enable PowerPC Secure Boot"
+   bool
+   default n
+   depends on IMA
+   depends on IMA_ARCH_POLICY
+   help
+ Linux on POWER with firmware secure boot enabled needs to define
+ security policies to extend secure boot to the OS.
+ This config allows user to enable OS Secure Boot on PowerPC systems
+ that have firmware secure boot support.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index cddadccf551d..0f08ed7dfd1b 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -119,6 +119,7 @@ ifdef CONFIG_IMA
 obj-y  += ima_kexec.o
 endif
 endif
+obj-$(CONFIG_IMA)  += ima_arch.o
 
 obj-$(CONFIG_AUDIT)+= audit.o
 obj64-$(CONFIG_AUDIT)  += compat_audit.o
diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c
new file mode 100644
index ..871b321656fb
--- /dev/null
+++ b/arch/powerpc/kernel/ima_arch.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain 
+ *
+ * ima_arch.c
+ *  - initialize ima policies for PowerPC Secure Boot
+ */
+
+#include 
+#include 
+
+bool arch_ima_get_secureboot(void)
+{
+   bool sb_mode;
+
+   sb_mode = get_powerpc_sb_mode();
+   if (sb_mode)
+   return true;
+   else
+   return false;
+}
+
+/*
+ * File signature verification is not needed, include only measurements
+ */
+static const char *const default_arch_rules[] = {
+   "measure func=KEXEC_KERNEL_CHECK",
+   "measure func=MODULE_CHECK",
+   NULL
+};
+
+/* Both file signature verification and measurements are needed */
+static const char *const sb_arch_rules[] = {
+   "measure func=KEXEC_KERNEL_CHECK",
+   "measure func=MODULE_CHECK",
+   "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig",
+#if !IS_ENABLED(CONFIG_MODULE_SIG)
+   "appraise func=MODULE_CHECK appraise_type=imasig",
+#endif
+   NULL
+};
+
+/*
+ * On PowerPC, file measurements are to be added to the IMA measurement list
+ * irrespective of the secure boot state of the system. Signature verification
+ * is conditionally enabled based on the secure boot state.
+ */
+const char *const *arch_get_ima_policy(void)
+{
+   if (IS_ENABLED(CONFIG_IMA_ARCH_POLICY) && arch_ima_get_secureboot())
+   return sb_arch_rules;
+   return default_arch_rules;
+}
diff --git a/include/linux/ima.h b/include/linux/ima.h
index dc12fbcf484c..32f46d69ebd7 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -31,7 +31,8 @@ extern void ima_post_path_mknod(struct dentry *dentry);
 extern void ima_add_kexec_buffer(struct kimage *image);
 #endif
 
-#if defined(CONFIG_X86) && defined(CONFIG_EFI)
+#if defined(CONFIG_X86) && defined(CONFIG_EFI) \
+   || defined(CONFIG_PPC_SECURE_BOOT)
 extern bool arch_ima_get_secureboot(void);
 extern const char * const *arch_get_ima_policy(void);
 #else
-- 
2.20.1



[PATCH 3/4] powerpc/powernv: Detect the secure boot mode of the system

2019-04-02 Thread Claudio Carvalho
From: Nayna Jain 

PowerNV secure boot defines different IMA policies based on the secure
boot state of the system.

This patch defines a function to detect the secure boot state of the
system.

Signed-off-by: Nayna Jain 
---
 arch/powerpc/include/asm/secboot.h   | 21 +
 arch/powerpc/platforms/powernv/Makefile  |  2 +-
 arch/powerpc/platforms/powernv/secboot.c | 54 
 3 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/secboot.h
 create mode 100644 arch/powerpc/platforms/powernv/secboot.c

diff --git a/arch/powerpc/include/asm/secboot.h 
b/arch/powerpc/include/asm/secboot.h
new file mode 100644
index ..1904fb4a3352
--- /dev/null
+++ b/arch/powerpc/include/asm/secboot.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PowerPC secure boot definitions
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain 
+ *
+ */
+#ifndef POWERPC_SECBOOT_H
+#define POWERPC_SECBOOT_H
+
+#if defined(CONFIG_OPAL_SECVAR)
+extern bool get_powerpc_sb_mode(void);
+#else
+static inline bool get_powerpc_sb_mode(void)
+{
+   return false;
+}
+#endif
+
+#endif
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 1511d836fd19..a36e22f8ecf8 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -16,4 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
 obj-$(CONFIG_OCXL_BASE)+= ocxl.o
-obj-$(CONFIG_OPAL_SECVAR)  += opal-secvar.o
+obj-$(CONFIG_OPAL_SECVAR)  += opal-secvar.o secboot.o
diff --git a/arch/powerpc/platforms/powernv/secboot.c 
b/arch/powerpc/platforms/powernv/secboot.c
new file mode 100644
index ..afb1552636c5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/secboot.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain 
+ *
+ * secboot.c
+ *  - util functions to get powerpc secboot state
+ *
+ */
+#include 
+#include 
+
+bool get_powerpc_sb_mode(void)
+{
+   efi_char16_t efi_SecureBoot_name[] = L"SecureBoot";
+   efi_char16_t efi_SetupMode_name[] = L"SetupMode";
+   efi_guid_t efi_variable_guid = EFI_GLOBAL_VARIABLE_GUID;
+   efi_status_t status;
+   u8 secboot, setupmode;
+   unsigned long size = sizeof(secboot);
+
+   status = efi.get_variable(efi_SecureBoot_name, &efi_variable_guid,
+ NULL, &size, &secboot);
+
+   /*
+* For now assume all failures reading the SecureBoot variable implies
+* secure boot is not enabled. Later differentiate failure types.
+*/
+   if (status != EFI_SUCCESS) {
+   secboot = 0;
+   setupmode = 0;
+   goto out;
+   }
+
+   size = sizeof(setupmode);
+   status = efi.get_variable(efi_SetupMode_name, &efi_variable_guid,
+ NULL, &size, &setupmode);
+
+   /*
+* Failure to read the SetupMode variable does not prevent
+* secure boot mode
+*/
+   if (status != EFI_SUCCESS)
+   setupmode = 0;
+
+out:
+   if ((secboot == 0) || (setupmode == 1)) {
+   pr_info("ima: secureboot mode disabled\n");
+   return false;
+   }
+
+   pr_info("ima: secureboot mode enabled\n");
+   return true;
+}
-- 
2.20.1



[PATCH 2/4] powerpc/powernv: Add support for OPAL secure variables

2019-04-02 Thread Claudio Carvalho
The X.509 certificates trusted by the platform and other information
required to secure boot the host OS kernel are wrapped in secure
variables, which are controlled by OPAL.

The OPAL secure variables can be handled through the following OPAL
calls.

OPAL_SECVAR_GET:
Returns the data for a given secure variable name and vendor GUID.

OPAL_SECVAR_GET_NEXT:
For a given secure variable, it returns the name and vendor GUID
of the next variable.

OPAL_SECVAR_ENQUEUE:
Enqueue the supplied secure variable update so that it can be processed
by OPAL in the next boot. Variable updates cannot be be processed right
away because the variable storage is write locked at runtime.

OPAL_SECVAR_INFO:
Returns size information about the variable.

This patch adds support for OPAL secure variables by setting up the EFI
runtime variable services to make OPAL calls.

This patch also introduces CONFIG_OPAL_SECVAR for enabling the OPAL
secure variables support in the kernel. Since CONFIG_OPAL_SECVAR selects
CONFIG_EFI, it also allow us to manage the OPAL secure variables from
userspace via efivarfs.

Signed-off-by: Claudio Carvalho 
---
This patch depends on new OPAL calls that are being added to skiboot.
The patch set that implements the new calls has been posted to
https://patchwork.ozlabs.org/project/skiboot/list/?series=99805
---
 arch/powerpc/include/asm/opal-api.h  |   6 +-
 arch/powerpc/include/asm/opal.h  |  10 ++
 arch/powerpc/platforms/Kconfig   |   3 +
 arch/powerpc/platforms/powernv/Kconfig   |   9 +
 arch/powerpc/platforms/powernv/Makefile  |   1 +
 arch/powerpc/platforms/powernv/opal-call.c   |   4 +
 arch/powerpc/platforms/powernv/opal-secvar.c | 179 +++
 7 files changed, 211 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 870fb7b239ea..d3066f29cb7a 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -210,7 +210,11 @@
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   165
 #defineOPAL_NX_COPROC_INIT 167
-#define OPAL_LAST  167
+#define OPAL_SECVAR_GET170
+#define OPAL_SECVAR_GET_NEXT   171
+#define OPAL_SECVAR_ENQUEUE172
+#define OPAL_SECVAR_INFO   173
+#define OPAL_LAST  173
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a55b01c90bb1..fdfd8dd7b326 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -385,6 +385,16 @@ void opal_powercap_init(void);
 void opal_psr_init(void);
 void opal_sensor_groups_init(void);
 
+extern int opal_secvar_get(uint64_t name, uint64_t vendor, uint64_t attr,
+  uint64_t data_size, uint64_t data);
+extern int opal_secvar_get_next(uint64_t name_size, uint64_t name,
+   uint64_t vendor);
+extern int opal_secvar_enqueue(uint64_t name, uint64_t vendor, uint64_t attr,
+  uint64_t data_size, uint64_t data);
+extern int opal_secvar_info(uint64_t attr, uint64_t storage_space,
+   uint64_t remaining_space,
+   uint64_t max_variable_size);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f3fb79fccc72..8e30510bc0c1 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -326,4 +326,7 @@ config XILINX_PCI
bool "Xilinx PCI host bridge support"
depends on PCI && XILINX_VIRTEX
 
+config EFI
+   bool
+
 endmenu
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 850eee860cf2..879f8e766098 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -47,3 +47,12 @@ config PPC_VAS
  VAS adapters are found in POWER9 based systems.
 
  If unsure, say N.
+
+config OPAL_SECVAR
+   bool "OPAL Secure Variables"
+   depends on PPC_POWERNV && !CPU_BIG_ENDIAN
+   select UCS2_STRING
+   select EFI
+   help
+ This enables the kernel to access OPAL secure variables via EFI
+ runtime variable services.
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index da2e99efbd04..1511d836fd19 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_

[PATCH 1/4] powerpc/include: Override unneeded early ioremap functions

2019-04-02 Thread Claudio Carvalho
When CONFIG_EFI is enabled, the EFI driver includes the generic
early_ioremap header, which assumes that architectures may want to
provide their own early ioremap functions.

This patch overrides the ioremap functions in powerpc because they are
not required for secure boot on powerpc systems.

Signed-off-by: Claudio Carvalho 
---
 arch/powerpc/include/asm/early_ioremap.h | 41 
 1 file changed, 41 insertions(+)
 create mode 100644 arch/powerpc/include/asm/early_ioremap.h

diff --git a/arch/powerpc/include/asm/early_ioremap.h 
b/arch/powerpc/include/asm/early_ioremap.h
new file mode 100644
index ..a86a06e9f3b9
--- /dev/null
+++ b/arch/powerpc/include/asm/early_ioremap.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Early ioremap definitions
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Claudio Carvalho 
+ *
+ */
+#ifndef _ASM_POWERPC_EARLY_IOREMAP_H
+#define _ASM_POWERPC_EARLY_IOREMAP_H
+
+static inline void __iomem *early_ioremap(resource_size_t phys_addr,
+ unsigned long size)
+{
+   return NULL;
+}
+
+static inline void *early_memremap(resource_size_t phys_addr,
+  unsigned long size)
+{
+   return NULL;
+}
+
+static inline void *early_memremap_ro(resource_size_t phys_addr,
+ unsigned long size)
+{
+   return NULL;
+}
+
+static inline void *early_memremap_prot(resource_size_t phys_addr,
+   unsigned long size,
+   unsigned long prot_val)
+{
+   return NULL;
+}
+
+static inline void early_iounmap(void __iomem *addr, unsigned long size) { }
+static inline void early_memunmap(void *addr, unsigned long size) { }
+static inline void early_ioremap_shutdown(void) { }
+
+#endif
-- 
2.20.1



[PATCH 0/4] Enabling secure boot on PowerNV systems

2019-04-02 Thread Claudio Carvalho
This patch set is part of a series that implements secure boot on
PowerNV systems.

In order to verify the OS kernel on PowerNV, secure boot requires X.509
certificates trusted by the platform, the secure boot modes, and several
other pieces of information. These are stored in secure variables
controlled by OPAL, also known as OPAL secure variables.

This patch set adds the following features:

1. Enable efivarfs by selecting CONFIG_EFI in the CONFIG_OPAL_SECVAR
   introduced in this patch set. With CONFIG_EFIVAR_FS, userspace tools can
   be used to manage the secure variables.
2. Add support for OPAL secure variables by overwriting the EFI hooks
   (get_variable, get_next_variable, set_variable and query_variable_info)
   with OPAL call wrappers. There is probably a better way to add this
   support, for example, we are investigating if we could register the
   efivar_operations rather than overwriting the EFI hooks. In this patch
   set, CONFIG_OPAL_SECVAR selects CONFIG_EFI. If, instead, we registered
   efivar_operations, CONFIG_EFIVAR_FS would need to depend on
   CONFIG_EFI|| CONFIG_OPAL_SECVAR. Comments or suggestions on the
   preferred technique would be greatly appreciated.
3. Define IMA arch-specific policies based on the secure boot state and
   mode of the system. On secure boot enabled powernv systems, the host OS
   kernel signature will be verified by IMA appraisal.

Claudio Carvalho (2):
  powerpc/include: Override unneeded early ioremap functions
  powerpc/powernv: Add support for OPAL secure variables

Nayna Jain (2):
  powerpc/powernv: Detect the secure boot mode of the system
  powerpc: Add support to initialize ima policy rules

 arch/powerpc/Kconfig |  12 ++
 arch/powerpc/include/asm/early_ioremap.h |  41 +
 arch/powerpc/include/asm/opal-api.h  |   6 +-
 arch/powerpc/include/asm/opal.h  |  10 ++
 arch/powerpc/include/asm/secboot.h   |  21 +++
 arch/powerpc/kernel/Makefile |   1 +
 arch/powerpc/kernel/ima_arch.c   |  54 ++
 arch/powerpc/platforms/Kconfig   |   3 +
 arch/powerpc/platforms/powernv/Kconfig   |   9 +
 arch/powerpc/platforms/powernv/Makefile  |   1 +
 arch/powerpc/platforms/powernv/opal-call.c   |   4 +
 arch/powerpc/platforms/powernv/opal-secvar.c | 179 +++
 arch/powerpc/platforms/powernv/secboot.c |  54 ++
 include/linux/ima.h  |   3 +-
 14 files changed, 396 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/early_ioremap.h
 create mode 100644 arch/powerpc/include/asm/secboot.h
 create mode 100644 arch/powerpc/kernel/ima_arch.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c
 create mode 100644 arch/powerpc/platforms/powernv/secboot.c

-- 
2.20.1



Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions

2019-04-02 Thread Andrey Ryabinin



On 4/2/19 12:43 PM, Christophe Leroy wrote:
> Hi Dmitry, Andrey and others,
> 
> Do you have any comments to this series ?
> 

I don't see justification for adding all these non-instrumented functions. We 
need only some subset of these functions
and only on powerpc so far. Arches that don't use str*() that early simply 
doesn't need not-instrumented __str*() variant.

Also I don't think that auto-replace str* to __str* for all not instrumented 
files is a good idea, as this will reduce KASAN coverage.
E.g. we don't instrument slub.c but there is no reason to use non-instrumented 
__str*() functions there.

And finally, this series make bug reporting slightly worse. E.g. let's look at 
strcpy():

+char *strcpy(char *dest, const char *src)
+{
+   size_t len = __strlen(src) + 1;
+
+   check_memory_region((unsigned long)src, len, false, _RET_IP_);
+   check_memory_region((unsigned long)dest, len, true, _RET_IP_);
+
+   return __strcpy(dest, src);
+}

If src is not-null terminated string we might not see proper out-of-bounds 
report from KASAN only a crash in __strlen().
Which might make harder to identify where 'src' comes from, where it was 
allocated and what's the size of allocated area.


> I'd like to know if this approach is ok or if it is better to keep doing as 
> in https://patchwork.ozlabs.org/patch/1055788/
>
I think the patch from link is a better solution to the problem.



Re: [PATCH stable v4.14 00/32] powerpc spectre backports for 4.14

2019-04-02 Thread Greg KH
On Tue, Apr 02, 2019 at 03:21:09PM +, Diana Madalina Craciun wrote:
> On 3/31/2019 12:53 PM, Michael Ellerman wrote:
> > Greg KH  writes:
> >> On Fri, Mar 29, 2019 at 03:51:16PM +0100, Greg KH wrote:
> >>> On Fri, Mar 29, 2019 at 10:25:48PM +1100, Michael Ellerman wrote:
>  -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Greg, Please queue
>  up these powerpc patches for 4.14 if you have no objections. 
> >>> Some of these also need to go to 4.19, right? Want me to add them
> >>> there, or are you going to provide a backported series? 
> > Yes some of them do, but I wasn't sure if they'd go cleanly.
> >> Nevermind, I've queued up the missing ones to 4.19.y, and one missing
> >> one to 5.0.y. If I've missed anything, please let me know. 
> > Thanks. I'll check everything's working as expected.
> 
> I have validated on NXP PowerPC and worked as expected on both kernel
> 4.14 and kernel 4.19.

Great, thanks for testing!

greg k-h


Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix

2019-04-02 Thread Aneesh Kumar K.V

On 4/2/19 9:06 PM, Christophe Leroy wrote:



Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit :

Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask, subpage protection details. We
can skip allocating those on radix. This will help us to save
8K per mm_context with radix translation.

With the patch applied we have

sizeof(mm_context_t)  = 136
sizeof(struct hash_mm_context)  = 8288

Signed-off-by: Aneesh Kumar K.V 
---
NOTE:

If we want to do this, I am still trying to figure out how best we can 
do this

without all the #ifdef and other overhead for 8xx book3e


Did you have a look at my series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170 ?


It tries to reduce as much as feasible the #ifdefs and stuff.




Not yet. But a cursory look tell me introducing hash_mm_context 
complicates this further unless I introduce something similar for nohash 
32? Are you ok with that?


-aneesh



Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit :

Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask, subpage protection details. We
can skip allocating those on radix. This will help us to save
8K per mm_context with radix translation.

With the patch applied we have

sizeof(mm_context_t)  = 136
sizeof(struct hash_mm_context)  = 8288

Signed-off-by: Aneesh Kumar K.V 
---
NOTE:

If we want to do this, I am still trying to figure out how best we can do this
without all the #ifdef and other overhead for 8xx book3e


Did you have a look at my series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170 ?


It tries to reduce as much as feasible the #ifdefs and stuff.

Christophe




  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
  arch/powerpc/include/asm/book3s/64/mmu.h  | 48 +++
  arch/powerpc/include/asm/book3s/64/slice.h|  6 +--
  arch/powerpc/kernel/paca.c|  9 ++--
  arch/powerpc/kernel/setup-common.c|  7 ++-
  arch/powerpc/mm/hash_utils_64.c   | 10 ++--
  arch/powerpc/mm/mmu_context_book3s64.c| 16 ++-
  arch/powerpc/mm/slb.c |  2 +-
  arch/powerpc/mm/slice.c   | 48 +--
  arch/powerpc/mm/subpage-prot.c|  8 ++--
  10 files changed, 91 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index a28a28079edb..d801be977623 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -657,7 +657,7 @@ extern void slb_set_size(u16 size);
  
  /* 4 bits per slice and we have one slice per 1TB */

  #define SLICE_ARRAY_SIZE  (H_PGTABLE_RANGE >> 41)
-#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41)
+#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.hash_context->slb_addr_limit >> 
41)
  
  #ifndef __ASSEMBLY__
  
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h

index a809bdd77322..07e76e304a3b 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -114,6 +114,33 @@ struct slice_mask {
DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
  };
  
+struct hash_mm_context {

+
+   u16 user_psize; /* page size index */
+
+#ifdef CONFIG_PPC_MM_SLICES
+   /* SLB page size encodings*/
+   unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
+   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
+   unsigned long slb_addr_limit;
+#ifdef CONFIG_PPC_64K_PAGES
+   struct slice_mask mask_64k;
+#endif
+   struct slice_mask mask_4k;
+#ifdef CONFIG_HUGETLB_PAGE
+   struct slice_mask mask_16m;
+   struct slice_mask mask_16g;
+#endif
+#else
+   u16 sllp; /* SLB page size encoding */
+#endif
+
+#ifdef CONFIG_PPC_SUBPAGE_PROT
+   struct subpage_prot_table spt;
+#endif /* CONFIG_PPC_SUBPAGE_PROT */
+
+};
+
  typedef struct {
union {
/*
@@ -127,7 +154,6 @@ typedef struct {
mm_context_id_t id;
mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE];
};
-   u16 user_psize; /* page size index */
  
  	/* Number of bits in the mm_cpumask */

atomic_t active_cpus;
@@ -137,27 +163,9 @@ typedef struct {
  
  	/* NPU NMMU context */

struct npu_context *npu_context;
+   struct hash_mm_context *hash_context;
  
-#ifdef CONFIG_PPC_MM_SLICES

-/* SLB page size encodings*/
-   unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
-   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned long slb_addr_limit;
-# ifdef CONFIG_PPC_64K_PAGES
-   struct slice_mask mask_64k;
-# endif
-   struct slice_mask mask_4k;
-# ifdef CONFIG_HUGETLB_PAGE
-   struct slice_mask mask_16m;
-   struct slice_mask mask_16g;
-# endif
-#else
-   u16 sllp;   /* SLB page size encoding */
-#endif
unsigned long vdso_base;
-#ifdef CONFIG_PPC_SUBPAGE_PROT
-   struct subpage_prot_table spt;
-#endif /* CONFIG_PPC_SUBPAGE_PROT */
/*
 * pagetable fragment support
 */
diff --git a/arch/powerpc/include/asm/book3s/64/slice.h 
b/arch/powerpc/include/asm/book3s/64/slice.h
index db0dedab65ee..3ca1bebe258e 100644
--- a/arch/powerpc/include/asm/book3s/64/slice.h
+++ b/arch/powerpc/include/asm/book3s/64/slice.h
@@ -15,11 +15,11 @@
  
  #else /* CONFIG_PPC_MM_SLICES */
  
-#define get_slice_psize(mm, addr)	((mm)->context.user_psize)

+#define get_slice_psize(mm, addr)  ((mm)->context.hash_context->user_psize)
  #define slice_set_user_psize(mm, psize)   \
  do {  \
-   (mm)->context.user_psize = (psize);  \
-   (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \
+   (mm)->co

Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-02 Thread Jan Kara
On Tue 02-04-19 17:21:25, Aneesh Kumar K.V wrote:
> With some architectures like ppc64, set_pmd_at() cannot cope with
> a situation where there is already some (different) valid entry present.
> 
> Use pmdp_set_access_flags() instead to modify the pfn which is built to
> deal with modifying existing PMD entries.
> 
> This is similar to
> commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by 
> insert_pfn()")
> 
> We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support
> pud pfn entries now.
> 
> Without this patch we also see the below message in kernel log
> "BUG: non-zero pgtables_bytes on freeing mm:"
> 
> CC: sta...@vger.kernel.org
> Reported-by: Chandan Rajendra 
> Signed-off-by: Aneesh Kumar K.V 

Looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
> Changes from v1:
> * Fix the pgtable leak 
> 
>  mm/huge_memory.c | 36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 404acdcd0455..165ea46bf149 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -755,6 +755,21 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, 
> unsigned long addr,
>   spinlock_t *ptl;
>  
>   ptl = pmd_lock(mm, pmd);
> + if (!pmd_none(*pmd)) {
> + if (write) {
> + if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
> + WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
> + goto out_unlock;
> + }
> + entry = pmd_mkyoung(*pmd);
> + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> + if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
> + update_mmu_cache_pmd(vma, addr, pmd);
> + }
> +
> + goto out_unlock;
> + }
> +
>   entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
>   if (pfn_t_devmap(pfn))
>   entry = pmd_mkdevmap(entry);
> @@ -766,11 +781,16 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, 
> unsigned long addr,
>   if (pgtable) {
>   pgtable_trans_huge_deposit(mm, pmd, pgtable);
>   mm_inc_nr_ptes(mm);
> + pgtable = NULL;
>   }
>  
>   set_pmd_at(mm, addr, pmd, entry);
>   update_mmu_cache_pmd(vma, addr, pmd);
> +
> +out_unlock:
>   spin_unlock(ptl);
> + if (pgtable)
> + pte_free(mm, pgtable);
>  }
>  
>  vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
> @@ -821,6 +841,20 @@ static void insert_pfn_pud(struct vm_area_struct *vma, 
> unsigned long addr,
>   spinlock_t *ptl;
>  
>   ptl = pud_lock(mm, pud);
> + if (!pud_none(*pud)) {
> + if (write) {
> + if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) {
> + WARN_ON_ONCE(!is_huge_zero_pud(*pud));
> + goto out_unlock;
> + }
> + entry = pud_mkyoung(*pud);
> + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma);
> + if (pudp_set_access_flags(vma, addr, pud, entry, 1))
> + update_mmu_cache_pud(vma, addr, pud);
> + }
> + goto out_unlock;
> + }
> +
>   entry = pud_mkhuge(pfn_t_pud(pfn, prot));
>   if (pfn_t_devmap(pfn))
>   entry = pud_mkdevmap(entry);
> @@ -830,6 +864,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, 
> unsigned long addr,
>   }
>   set_pud_at(mm, addr, pud, entry);
>   update_mmu_cache_pud(vma, addr, pud);
> +
> +out_unlock:
>   spin_unlock(ptl);
>  }
>  
> -- 
> 2.20.1
> 
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH stable v4.14 00/32] powerpc spectre backports for 4.14

2019-04-02 Thread Diana Madalina Craciun
On 3/31/2019 12:53 PM, Michael Ellerman wrote:
> Greg KH  writes:
>> On Fri, Mar 29, 2019 at 03:51:16PM +0100, Greg KH wrote:
>>> On Fri, Mar 29, 2019 at 10:25:48PM +1100, Michael Ellerman wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Greg, Please queue
 up these powerpc patches for 4.14 if you have no objections. 
>>> Some of these also need to go to 4.19, right? Want me to add them
>>> there, or are you going to provide a backported series? 
> Yes some of them do, but I wasn't sure if they'd go cleanly.
>> Nevermind, I've queued up the missing ones to 4.19.y, and one missing
>> one to 5.0.y. If I've missed anything, please let me know. 
> Thanks. I'll check everything's working as expected.

I have validated on NXP PowerPC and worked as expected on both kernel
4.14 and kernel 4.19.

Thanks,
Diana



Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions

2019-04-02 Thread Christophe Leroy




Le 02/04/2019 à 14:58, Dmitry Vyukov a écrit :

On Tue, Apr 2, 2019 at 11:43 AM Christophe Leroy
 wrote:


Hi Dmitry, Andrey and others,

Do you have any comments to this series ?

I'd like to know if this approach is ok or if it is better to keep doing
as in https://patchwork.ozlabs.org/patch/1055788/


Hi Christophe,

Forking every kernel function does not look like a scalable approach
to me. There is not much special about str* functions. There is
something a bit special about memset/memcpy as compiler emits them for
struct set/copy.
Could powerpc do the same as x86 and map some shadow early enough
(before "prom")? Then we would not need anything of this? Sorry if we
already discussed this, I am losing context quickly.


Hi Dmitry,

I'm afraid we can't map shadow ram that early. This code gets run by 
third party BIOS SW which manages the MMU and provides a 1:1 mapping, so 
there is no way we can map shadow memory.


If you feel providing interceptors for the string functions is not a 
good idea, I'm ok with it, I'll keep the necessary string functions in 
prom_init.c


I was proposing the interceptor's approach because behind the specific 
need for handling early prom_init code, I thought it was also a way to 
limit KASAN performance impact on string functions, and it was also a 
way to handle all the optimised string functions provided by architectures.
In my series I have a patch that disables powerpc's optimised string 
functions (https://patchwork.ozlabs.org/patch/1055780/). The 
interceptor's approach was a way to avoid that. As far as I can see, at 
the time being the other arches don't disable their optimised string 
functions, meaning the KASAN checks are skipped.


Thanks
Christophe







Thanks
Christophe

Le 28/03/2019 à 16:00, Christophe Leroy a écrit :

In the same spirit as commit 393f203f5fd5 ("x86_64: kasan: add
interceptors for memset/memmove/memcpy functions"), this patch
adds interceptors for string manipulation functions so that we
can compile lib/string.o without kasan support hence allow the
string functions to also be used from places where kasan has
to be disabled.

Signed-off-by: Christophe Leroy 
---
   v2: Fixed a few checkpatch stuff and added missing EXPORT_SYMBOL() and 
missing #undefs

   include/linux/string.h |  79 ++
   lib/Makefile   |   2 +
   lib/string.c   |   8 +
   mm/kasan/string.c  | 394 
+
   4 files changed, 483 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index 7927b875f80c..3d2aff2ed402 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -19,54 +19,117 @@ extern void *memdup_user_nul(const void __user *, size_t);
*/
   #include 

+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+/*
+ * For files that are not instrumented (e.g. mm/slub.c) we
+ * should use not instrumented version of mem* functions.
+ */
+#define memset16 __memset16
+#define memset32 __memset32
+#define memset64 __memset64
+#define memzero_explicit __memzero_explicit
+#define strcpy   __strcpy
+#define strncpy  __strncpy
+#define strlcpy  __strlcpy
+#define strscpy  __strscpy
+#define strcat   __strcat
+#define strncat  __strncat
+#define strlcat  __strlcat
+#define strcmp   __strcmp
+#define strncmp  __strncmp
+#define strcasecmp   __strcasecmp
+#define strncasecmp  __strncasecmp
+#define strchr   __strchr
+#define strchrnul__strchrnul
+#define strrchr  __strrchr
+#define strnchr  __strnchr
+#define skip_spaces  __skip_spaces
+#define strim__strim
+#define strstr   __strstr
+#define strnstr  __strnstr
+#define strlen   __strlen
+#define strnlen  __strnlen
+#define strpbrk  __strpbrk
+#define strsep   __strsep
+#define strspn   __strspn
+#define strcspn  __strcspn
+#define memscan  __memscan
+#define memcmp   __memcmp
+#define memchr   __memchr
+#define memchr_inv   __memchr_inv
+#define strreplace   __strreplace
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
+#endif
+
   #ifndef __HAVE_ARCH_STRCPY
   extern char * strcpy(char *,const char *);
+char *__strcpy(char *, const char *);
   #endif
   #ifndef __HAVE_ARCH_STRNCPY
   extern char * strncpy(char *,const char *, __kernel_size_t);
+char *__strncpy(char *, const char *, __kernel_size_t);
   #endif
   #ifndef __HAVE_ARCH_STRLCPY
   size_t strlcpy(char *, const char *, size_t);
+size_t __strlcpy(char *, const char *, size_t);
   #endif
   #ifndef __HAVE_ARCH_STRSCPY
   ssize_t strscpy(char *, const char *, size_t);
+ssize_t __strscpy(char *, const char *, size_t);
   #endif
   #ifndef __HAVE_ARCH_STRCAT
   extern char * strcat(char *, c

[RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix

2019-04-02 Thread Aneesh Kumar K.V
Currently, our mm_context_t on book3s64 include all hash specific
context details like slice mask, subpage protection details. We
can skip allocating those on radix. This will help us to save
8K per mm_context with radix translation.

With the patch applied we have

sizeof(mm_context_t)  = 136
sizeof(struct hash_mm_context)  = 8288

Signed-off-by: Aneesh Kumar K.V 
---
NOTE:

If we want to do this, I am still trying to figure out how best we can do this
without all the #ifdef and other overhead for 8xx book3e


 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  | 48 +++
 arch/powerpc/include/asm/book3s/64/slice.h|  6 +--
 arch/powerpc/kernel/paca.c|  9 ++--
 arch/powerpc/kernel/setup-common.c|  7 ++-
 arch/powerpc/mm/hash_utils_64.c   | 10 ++--
 arch/powerpc/mm/mmu_context_book3s64.c| 16 ++-
 arch/powerpc/mm/slb.c |  2 +-
 arch/powerpc/mm/slice.c   | 48 +--
 arch/powerpc/mm/subpage-prot.c|  8 ++--
 10 files changed, 91 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index a28a28079edb..d801be977623 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -657,7 +657,7 @@ extern void slb_set_size(u16 size);
 
 /* 4 bits per slice and we have one slice per 1TB */
 #define SLICE_ARRAY_SIZE   (H_PGTABLE_RANGE >> 41)
-#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41)
+#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.hash_context->slb_addr_limit >> 
41)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index a809bdd77322..07e76e304a3b 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -114,6 +114,33 @@ struct slice_mask {
DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
 };
 
+struct hash_mm_context {
+
+   u16 user_psize; /* page size index */
+
+#ifdef CONFIG_PPC_MM_SLICES
+   /* SLB page size encodings*/
+   unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
+   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
+   unsigned long slb_addr_limit;
+#ifdef CONFIG_PPC_64K_PAGES
+   struct slice_mask mask_64k;
+#endif
+   struct slice_mask mask_4k;
+#ifdef CONFIG_HUGETLB_PAGE
+   struct slice_mask mask_16m;
+   struct slice_mask mask_16g;
+#endif
+#else
+   u16 sllp; /* SLB page size encoding */
+#endif
+
+#ifdef CONFIG_PPC_SUBPAGE_PROT
+   struct subpage_prot_table spt;
+#endif /* CONFIG_PPC_SUBPAGE_PROT */
+
+};
+
 typedef struct {
union {
/*
@@ -127,7 +154,6 @@ typedef struct {
mm_context_id_t id;
mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE];
};
-   u16 user_psize; /* page size index */
 
/* Number of bits in the mm_cpumask */
atomic_t active_cpus;
@@ -137,27 +163,9 @@ typedef struct {
 
/* NPU NMMU context */
struct npu_context *npu_context;
+   struct hash_mm_context *hash_context;
 
-#ifdef CONFIG_PPC_MM_SLICES
-/* SLB page size encodings*/
-   unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
-   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned long slb_addr_limit;
-# ifdef CONFIG_PPC_64K_PAGES
-   struct slice_mask mask_64k;
-# endif
-   struct slice_mask mask_4k;
-# ifdef CONFIG_HUGETLB_PAGE
-   struct slice_mask mask_16m;
-   struct slice_mask mask_16g;
-# endif
-#else
-   u16 sllp;   /* SLB page size encoding */
-#endif
unsigned long vdso_base;
-#ifdef CONFIG_PPC_SUBPAGE_PROT
-   struct subpage_prot_table spt;
-#endif /* CONFIG_PPC_SUBPAGE_PROT */
/*
 * pagetable fragment support
 */
diff --git a/arch/powerpc/include/asm/book3s/64/slice.h 
b/arch/powerpc/include/asm/book3s/64/slice.h
index db0dedab65ee..3ca1bebe258e 100644
--- a/arch/powerpc/include/asm/book3s/64/slice.h
+++ b/arch/powerpc/include/asm/book3s/64/slice.h
@@ -15,11 +15,11 @@
 
 #else /* CONFIG_PPC_MM_SLICES */
 
-#define get_slice_psize(mm, addr)  ((mm)->context.user_psize)
+#define get_slice_psize(mm, addr)  ((mm)->context.hash_context->user_psize)
 #define slice_set_user_psize(mm, psize)\
 do {   \
-   (mm)->context.user_psize = (psize); \
-   (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \
+   (mm)->context.hash_context->user_psize = (psize);   \
+   (mm)->context.hash_context->sllp = SLB_VSID_USER | 
mmu_psize_defs[(psize)].sllp; \
 } while (0)
 
 #endif /* CONFIG_PPC_MM_SLICES */
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel

[PATCH 3.16 92/99] block/swim3: Fix -EBUSY error when re-opening device after unmount

2019-04-02 Thread Ben Hutchings
3.16.65-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Finn Thain 

commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 upstream.

When the block device is opened with FMODE_EXCL, ref_count is set to -1.
This value doesn't get reset when the device is closed which means the
device cannot be opened again. Fix this by checking for refcount <= 0
in the release method.

Reported-and-tested-by: Stan Johnson 
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Finn Thain 
Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings 
---
 drivers/block/swim3.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/block/swim3.c
+++ b/drivers/block/swim3.c
@@ -1027,7 +1027,11 @@ static void floppy_release(struct gendis
struct swim3 __iomem *sw = fs->swim3;
 
mutex_lock(&swim3_mutex);
-   if (fs->ref_count > 0 && --fs->ref_count == 0) {
+   if (fs->ref_count > 0)
+   --fs->ref_count;
+   else if (fs->ref_count == -1)
+   fs->ref_count = 0;
+   if (fs->ref_count == 0) {
swim3_action(fs, MOTOR_OFF);
out_8(&sw->control_bic, 0xff);
swim3_select(fs, RELAX);



VLC doesn't play videos anymore since the PowerPC fixes 5.1-3

2019-04-02 Thread Christian Zigotzky

Hi All,

I figured out, that the VLC player doesn't play videos anymore since the 
PowerPC fixes 5.1-3 [1]. VLC plays videos with the RC1 of kernel 5.1 
without any problems.


VLC error messages:

[100ea580] ts demux warning: first packet for pid=1104 cc=0xe
[100ea580] ts demux warning: first packet for pid=1102 cc=0x4
[100ea580] ts demux warning: first packet for pid=1101 cc=0x8
[10109218] core decoder warning: can't get output picture
[10109218] avcodec decoder warning: disabling direct rendering
[10109218] core decoder warning: can't get output picture


dmesg: https://bugs.freedesktop.org/attachment.cgi?id=143840

I created a bug report because of the VLC issue with the kernel 5.1-rc2 
and higher today [2]. I got an answer from Michel Dänzer today.


Quote Michel Dänzer:

None of them directly affect the radeon driver.

It's quite likely that this is a PPC specific issue. Your best bet is 
bisecting between rc1 and rc2.


I haven't seen any other similar reports.



I was able to remove the PowerPC fixes 5.1-4 and 5.1-3 with the 
following commands:


git revert 6536c5f2c8cf79db0d37e79afcdb227dc854509c -m 1

Output: [master 4b4a8cf] Revert "Merge tag 'powerpc-5.1-4' of 
git://git.kernel.org/pub/scm/linux/kern ... erpc/linux"


git revert a5ed1e96cafde5ba48638f486bfca0685dc6ddc9 -m 1

Output: [master 0c70b7b] Revert "Merge tag 'powerpc-5.1-3' of 
git://git.kernel.org/pub/scm/linux/kern ... erpc/linux"


The removing of the PowerPC fixes 5.1-4 and 5.1-3 has solved the VLC issue.

The problematic code is definitely in the PowerPC fixes 5.1-3 [1].

Please check the PowerPC fixes 5.1-3 [1].

Thanks,

Christian

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.1-rc2&id=a5ed1e96cafde5ba48638f486bfca0685dc6ddc9

[2] https://bugs.freedesktop.org/show_bug.cgi?id=110304


Re: powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations

2019-04-02 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> Ben Hutchings  writes:
>> On Mon, 2019-03-25 at 01:03 +0100, Andreas Schwab wrote:
>>> On Mär 24 2019, Ben Hutchings  wrote:
>>> 
>>> > Presumably you have CONFIG_PPC_BOOK3S_64 enabled and
>>> > CONFIG_SPARSEMEM
>>> > disabled?  Was this configuration actually usable?
>>> 
>>> Why not?
>>
>> I assume that CONFIG_SPARSEMEM is the default for a good reason.
>> What I don't know is how strong that reason is (I am not a Power expert
>> at all).  Looking a bit further, it seems to be related to CONFIG_NUMA
>> in that you can enable CONFIG_FLATMEM if and only if that's disabled. 
>> So I suppose the configuration you used works for non-NUMA systems.
>
> Aneesh pointed out this fix would break FLATMEM after I'd merged it, but
> it didn't break any of our defconfigs so I wondered if anyone would
> notice.
>
> I checked today and a G5 will boot with FLATMEM, which I assume is what
> Andreas is using.
>
> I guess we should fix this build break for now.
>
> Even some G5's have discontiguous memory, so FLATMEM is not clearly a
> good choice even for all G5's, and actually a fresh g5_defconfig uses
> SPARSEMEM.
>
> So I'm inclined to just switch to always using SPARSEMEM on 64-bit
> Book3S, because that's what's well tested and we hardly need more code
> paths to test. Unless anyone has a strong objection, I haven't actually
> benchmarked FLATMEM vs SPARSEMEM on a G5.
>

How about

>From 207fb0036065d8db44853e63bb858c4fd9952106 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" 
Date: Mon, 1 Apr 2019 17:51:17 +0530
Subject: [PATCH] powerpc/mm: Fix build error 

The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs.
We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32
bit configs never expected a value to be set for MAX_PHYSMEM_BITS.

Dependent code such as zsmalloc derived the right values based on other
fields. Instead of finding a value that works with different configs,
use new values only for book3s_64. For 64 bit booke, use the definition
of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase 
sparsemem defaults")
That change was done in 2005 and hopefully will work with book3e 64.

Fixes: 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to 2PB")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 15 +++
 arch/powerpc/include/asm/mmu.h   | 15 ---
 arch/powerpc/include/asm/nohash/64/mmu.h |  2 ++
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 1ceee000c18d..a809bdd77322 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -35,6 +35,21 @@ typedef pte_t *pgtable_t;
 
 #endif /* __ASSEMBLY__ */
 
+/*
+ * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
+ * page_to_nid does a page->section->node lookup
+ * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME 
reduce
+ * memory requirements with large number of sections.
+ * 51 bits is the max physical real address on POWER9
+ */
+#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) &&  
\
+   defined(CONFIG_PPC_64K_PAGES)
+#define MAX_PHYSMEM_BITS 51
+#else
+#define MAX_PHYSMEM_BITS 46
+#endif
+
 /* 64-bit classic hash table MMU */
 #include 
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 598cdcdd1355..78d53c4396ac 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -341,21 +341,6 @@ static inline bool strict_kernel_rwx_enabled(void)
  */
 #define MMU_PAGE_COUNT 16
 
-/*
- * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
- * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
- * page_to_nid does a page->section->node lookup
- * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME 
reduce
- * memory requirements with large number of sections.
- * 51 bits is the max physical real address on POWER9
- */
-#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) &&  
\
-   defined (CONFIG_PPC_64K_PAGES)
-#define MAX_PHYSMEM_BITS51
-#elif defined(CONFIG_SPARSEMEM)
-#define MAX_PHYSMEM_BITS46
-#endif
-
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
 #else /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index e6585480dfc4..81cf30c370e5 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
 #define _ASM_POWERPC_NOHASH_64_MMU_H_
 
+#define MAX_PHYSMEM_BITS44
+
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
-

[PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-02 Thread Aneesh Kumar K.V
With some architectures like ppc64, set_pmd_at() cannot cope with
a situation where there is already some (different) valid entry present.

Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.

This is similar to
commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by 
insert_pfn()")

We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support
pud pfn entries now.

Without this patch we also see the below message in kernel log
"BUG: non-zero pgtables_bytes on freeing mm:"

CC: sta...@vger.kernel.org
Reported-by: Chandan Rajendra 
Signed-off-by: Aneesh Kumar K.V 
---
Changes from v1:
* Fix the pgtable leak 

 mm/huge_memory.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 404acdcd0455..165ea46bf149 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -755,6 +755,21 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, 
unsigned long addr,
spinlock_t *ptl;
 
ptl = pmd_lock(mm, pmd);
+   if (!pmd_none(*pmd)) {
+   if (write) {
+   if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
+   WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
+   goto out_unlock;
+   }
+   entry = pmd_mkyoung(*pmd);
+   entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+   if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
+   update_mmu_cache_pmd(vma, addr, pmd);
+   }
+
+   goto out_unlock;
+   }
+
entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pmd_mkdevmap(entry);
@@ -766,11 +781,16 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, 
unsigned long addr,
if (pgtable) {
pgtable_trans_huge_deposit(mm, pmd, pgtable);
mm_inc_nr_ptes(mm);
+   pgtable = NULL;
}
 
set_pmd_at(mm, addr, pmd, entry);
update_mmu_cache_pmd(vma, addr, pmd);
+
+out_unlock:
spin_unlock(ptl);
+   if (pgtable)
+   pte_free(mm, pgtable);
 }
 
 vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
@@ -821,6 +841,20 @@ static void insert_pfn_pud(struct vm_area_struct *vma, 
unsigned long addr,
spinlock_t *ptl;
 
ptl = pud_lock(mm, pud);
+   if (!pud_none(*pud)) {
+   if (write) {
+   if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) {
+   WARN_ON_ONCE(!is_huge_zero_pud(*pud));
+   goto out_unlock;
+   }
+   entry = pud_mkyoung(*pud);
+   entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma);
+   if (pudp_set_access_flags(vma, addr, pud, entry, 1))
+   update_mmu_cache_pud(vma, addr, pud);
+   }
+   goto out_unlock;
+   }
+
entry = pud_mkhuge(pfn_t_pud(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pud_mkdevmap(entry);
@@ -830,6 +864,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, 
unsigned long addr,
}
set_pud_at(mm, addr, pud, entry);
update_mmu_cache_pud(vma, addr, pud);
+
+out_unlock:
spin_unlock(ptl);
 }
 
-- 
2.20.1



[PATCH v2 3/3] powernv/mce: print additional information about mce error.

2019-04-02 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

Print more information about mce error whether it is an hardware or
software error.

Some of the mce errors can be easily categorized as hardware or software
errors e.g. UEs are due to hardware error, where as error triggered due to
invalid usage of tlbie is a pure software bug. But not all the mce errors
can be easily categorize into either software or hardware. There are errors
like multihit errors which are usually result of a software bug, but in
some rare cases a hardware failure can cause a multihit error. In past, we
have seen case where after replacing faulty chip, multihit errors stopped
occurring. Same with parity errors, which are usually due to faulty hardware
but there are chances where multihit can also cause an parity error. Such
errors are difficult to determine what really caused it. Hence this patch
classifies mce errors into following four categorize:
1. Hardware error:
UE and Link timeout failure errors.
2. Probable hardware error (some chance of software cause)
SLB/ERAT/TLB Parity errors.
3. Software error
Invalid tlbie form.
4. Probable software error (some chance of hardware cause)
SLB/ERAT/TLB Multihit errors.

Sample o/p:

[ 1289.447571] MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 
01001b6e0320 [Recovered]
[ 1289.447615] MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: 
[7fffa309dc60]
[ 1289.447634] MCE: CPU80: Probable Software error (some chance of hardware 
cause)

Signed-off-by: Mahesh Salgaonkar 
---
Change in v2:
- Rephrase the wording for error class as suggested by Michael.
---
 arch/powerpc/include/asm/mce.h  |   10 
 arch/powerpc/kernel/mce.c   |   12 
 arch/powerpc/kernel/mce_power.c |  107 +++
 3 files changed, 86 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index b1f4bf863c95..8741f4c21a1a 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -56,6 +56,14 @@ enum MCE_ErrorType {
MCE_ERROR_TYPE_LINK = 7,
 };
 
+enum MCE_ErrorClass {
+   MCE_ECLASS_UNKNOWN = 0,
+   MCE_ECLASS_HARDWARE,
+   MCE_ECLASS_HARD_INDETERMINATE,
+   MCE_ECLASS_SOFTWARE,
+   MCE_ECLASS_SOFT_INDETERMINATE,
+};
+
 enum MCE_UeErrorType {
MCE_UE_ERROR_INDETERMINATE = 0,
MCE_UE_ERROR_IFETCH = 1,
@@ -115,6 +123,7 @@ struct machine_check_event {
enum MCE_Severity   severity:8;
enum MCE_Initiator  initiator:8;
enum MCE_ErrorType  error_type:8;
+   enum MCE_ErrorClass error_class:8;
enum MCE_Dispositiondisposition:8;
boolsync_error;
u16 cpu;
@@ -195,6 +204,7 @@ struct mce_error_info {
} u;
enum MCE_Severity   severity:8;
enum MCE_Initiator  initiator:8;
+   enum MCE_ErrorClass error_class:8;
boolsync_error;
 };
 
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 0f961583bd51..1d978c3477a0 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -123,6 +123,7 @@ void save_mce_event(struct pt_regs *regs, long handled,
mce->initiator = mce_err->initiator;
mce->severity = mce_err->severity;
mce->sync_error = mce_err->sync_error;
+   mce->error_class = mce_err->error_class;
 
/*
 * Populate the mce error_type and type-specific error_type.
@@ -361,6 +362,13 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
"Store (timeout)",
"Page table walk Load/Store (timeout)",
};
+   static const char *mc_error_class[] = {
+   "Unknown",
+   "Hardware error",
+   "Probable Hardware error (some chance of software cause)",
+   "Software error",
+   "Probable Software error (some chance of hardware cause)",
+   };
 
/* Print things out */
if (evt->version != MCE_V1) {
@@ -478,6 +486,10 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
printk("%sMCE: CPU%d: NIP: [%016llx] %pS\n",
level, evt->cpu, evt->srr0, (void *)evt->srr0);
}
+
+   subtype = evt->error_class < ARRAY_SIZE(mc_error_class) ?
+   mc_error_class[evt->error_class] : "Unknown";
+   printk("%sMCE: CPU%d: %s\n", level, evt->cpu, subtype);
 }
 EXPORT_SYMBOL_GPL(machine_check_print_event_info);
 
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 606af87a4dda..3658af85e48a 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -131,6 +131,7 @@ struct mce_ierror_table {
bool nip_valid; /* nip is a valid indicator of faulting address */
unsigned int error_type;
  

[PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

2019-04-02 Thread Nicholas Piggin
Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.

Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.

Reported-by: Ravikumar Bangoria 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/watchdog.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3c6ab22a0c4e..59a0e5942f6b 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -77,7 +77,7 @@ static u64 wd_smp_panic_timeout_tb __read_mostly; /* panic 
other CPUs */
 
 static u64 wd_timer_period_ms __read_mostly;  /* interval between heartbeat */
 
-static DEFINE_PER_CPU(struct timer_list, wd_timer);
+static DEFINE_PER_CPU(struct hrtimer, wd_hrtimer);
 static DEFINE_PER_CPU(u64, wd_timer_tb);
 
 /* SMP checker bits */
@@ -293,21 +293,21 @@ void soft_nmi_interrupt(struct pt_regs *regs)
nmi_exit();
 }
 
-static void wd_timer_reset(unsigned int cpu, struct timer_list *t)
-{
-   t->expires = jiffies + msecs_to_jiffies(wd_timer_period_ms);
-   if (wd_timer_period_ms > 1000)
-   t->expires = __round_jiffies_up(t->expires, cpu);
-   add_timer_on(t, cpu);
-}
-
-static void wd_timer_fn(struct timer_list *t)
+static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
 
+   if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+   return HRTIMER_NORESTART;
+
+   if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
+   return HRTIMER_NORESTART;
+
watchdog_timer_interrupt(cpu);
 
-   wd_timer_reset(cpu, t);
+   hrtimer_forward_now(hrtimer, ms_to_ktime(wd_timer_period_ms));
+
+   return HRTIMER_RESTART;
 }
 
 void arch_touch_nmi_watchdog(void)
@@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
 static void start_watchdog_timer_on(unsigned int cpu)
 {
-   struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
+   struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
 
per_cpu(wd_timer_tb, cpu) = get_tb();
 
-   timer_setup(t, wd_timer_fn, TIMER_PINNED);
-   wd_timer_reset(cpu, t);
+   hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer->function = watchdog_timer_fn;
+   hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
+ HRTIMER_MODE_REL_PINNED);
 }
 
 static void stop_watchdog_timer_on(unsigned int cpu)
 {
-   struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
+   struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
 
-   del_timer_sync(t);
+   hrtimer_cancel(hrtimer);
 }
 
 static int start_wd_on_cpu(unsigned int cpu)
-- 
2.20.1



[PATCH v2 1/3] powernv/mce: reduce mce console logs to lesser lines.

2019-04-02 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

Also add cpu number while displaying mce log. This will help cleaner logs
when mce hits on multiple cpus simultaneously.

before the changes the mce o/p was:

[  127.223515] Severe Machine check interrupt [Recovered]
[  127.223530]   NIP [dba80280]: 
insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
[  127.223539]   Initiator: CPU
[  127.223544]   Error type: SLB [Multihit]
[  127.223550] Effective address: dba80280

After this patch series changes the mce o/p will be:

[  471.959843] MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered]
[  471.959870] MCE: CPU80: NIP: [db550280] 
insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
[  471.959892] MCE: CPU80: Probable software error (some chance of hardware 
cause)

and for MCE in Guest:

[ 1289.447571] MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 
01001b6e0320 [Recovered]
[ 1289.447615] MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: 
[7fffa309dc60]
[ 1289.447634] MCE: CPU80: Probable software error (some chance of hardware 
cause)

Signed-off-by: Mahesh Salgaonkar 
---
Change in v2:
- Address comments from Michael.
---
 arch/powerpc/include/asm/mce.h |2 -
 arch/powerpc/kernel/mce.c  |   82 
 2 files changed, 41 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 17996bc9382b..8d0b1c24c636 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -116,7 +116,7 @@ struct machine_check_event {
enum MCE_Initiator  initiator:8;/* 0x03 */
enum MCE_ErrorType  error_type:8;   /* 0x04 */
enum MCE_Dispositiondisposition:8;  /* 0x05 */
-   uint8_t reserved_1[2];  /* 0x06 */
+   uint16_tcpu;/* 0x06 */
uint64_tgpr3;   /* 0x08 */
uint64_tsrr0;   /* 0x10 */
uint64_tsrr1;   /* 0x18 */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index b5fec1f9751a..d3ee099e0981 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -112,6 +112,7 @@ void save_mce_event(struct pt_regs *regs, long handled,
mce->srr1 = regs->msr;
mce->gpr3 = regs->gpr[3];
mce->in_use = 1;
+   mce->cpu = get_paca()->paca_index;
 
/* Mark it recovered if we have handled it and MSR(RI=1). */
if (handled && (regs->msr & MSR_RI))
@@ -310,7 +311,9 @@ static void machine_check_process_queued_event(struct 
irq_work *work)
 void machine_check_print_event_info(struct machine_check_event *evt,
bool user_mode, bool in_guest)
 {
-   const char *level, *sevstr, *subtype;
+   const char *level, *sevstr, *subtype, *err_type;
+   uint64_t ea = 0;
+   char dar_str[50];
static const char *mc_ue_types[] = {
"Indeterminate",
"Instruction fetch",
@@ -384,101 +387,96 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
break;
}
 
-   printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
-  evt->disposition == MCE_DISPOSITION_RECOVERED ?
-  "Recovered" : "Not recovered");
-
-   if (in_guest) {
-   printk("%s  Guest NIP: %016llx\n", level, evt->srr0);
-   } else if (user_mode) {
-   printk("%s  NIP: [%016llx] PID: %d Comm: %s\n", level,
-   evt->srr0, current->pid, current->comm);
-   } else {
-   printk("%s  NIP [%016llx]: %pS\n", level, evt->srr0,
-  (void *)evt->srr0);
-   }
-
-   printk("%s  Initiator: %s\n", level,
-  evt->initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
switch (evt->error_type) {
case MCE_ERROR_TYPE_UE:
+   err_type = "UE";
subtype = evt->u.ue_error.ue_error_type <
ARRAY_SIZE(mc_ue_types) ?
mc_ue_types[evt->u.ue_error.ue_error_type]
: "Unknown";
-   printk("%s  Error type: UE [%s]\n", level, subtype);
if (evt->u.ue_error.effective_address_provided)
-   printk("%sEffective address: %016llx\n",
-  level, evt->u.ue_error.effective_address);
-   if (evt->u.ue_error.physical_address_provided)
-   printk("%sPhysical address:  %016llx\n",
-  level, evt->u.ue_error.physical_address);
+   ea = evt->u.ue_error.effective_address;
break;
case MCE_ERROR_TYPE_SLB:
+   err_type = "SLB";
subtype = evt->u.slb_error.slb_error_type <
ARRAY_SIZE(mc_slb_types) ?
mc_slb_types[

[PATCH v2 2/3] powernv/mce: Print correct severity for mce error.

2019-04-02 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

Currently all machine check errors are printed as severe errors which isn't
correct. Print soft errors as warning instead of severe errors.

Signed-off-by: Mahesh Salgaonkar 
---
change in v2:
- Use kernel types i.e. u8, u64 etc.
- Define sync_error as bool.
---
 arch/powerpc/include/asm/mce.h|   86 ++--
 arch/powerpc/kernel/mce.c |5 +
 arch/powerpc/kernel/mce_power.c   |  144 +
 arch/powerpc/platforms/powernv/opal.c |2 
 4 files changed, 123 insertions(+), 114 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 8d0b1c24c636..b1f4bf863c95 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -31,7 +31,7 @@ enum MCE_Version {
 enum MCE_Severity {
MCE_SEV_NO_ERROR = 0,
MCE_SEV_WARNING = 1,
-   MCE_SEV_ERROR_SYNC = 2,
+   MCE_SEV_SEVERE = 2,
MCE_SEV_FATAL = 3,
 };
 
@@ -110,73 +110,74 @@ enum MCE_LinkErrorType {
 };
 
 struct machine_check_event {
-   enum MCE_Versionversion:8;  /* 0x00 */
-   uint8_t in_use; /* 0x01 */
-   enum MCE_Severity   severity:8; /* 0x02 */
-   enum MCE_Initiator  initiator:8;/* 0x03 */
-   enum MCE_ErrorType  error_type:8;   /* 0x04 */
-   enum MCE_Dispositiondisposition:8;  /* 0x05 */
-   uint16_tcpu;/* 0x06 */
-   uint64_tgpr3;   /* 0x08 */
-   uint64_tsrr0;   /* 0x10 */
-   uint64_tsrr1;   /* 0x18 */
-   union { /* 0x20 */
+   enum MCE_Versionversion:8;
+   u8  in_use;
+   enum MCE_Severity   severity:8;
+   enum MCE_Initiator  initiator:8;
+   enum MCE_ErrorType  error_type:8;
+   enum MCE_Dispositiondisposition:8;
+   boolsync_error;
+   u16 cpu;
+   u64 gpr3;
+   u64 srr0;
+   u64 srr1;
+   union {
struct {
enum MCE_UeErrorType ue_error_type:8;
-   uint8_t effective_address_provided;
-   uint8_t physical_address_provided;
-   uint8_t reserved_1[5];
-   uint64_teffective_address;
-   uint64_tphysical_address;
-   uint8_t reserved_2[8];
+   u8  effective_address_provided;
+   u8  physical_address_provided;
+   u8  reserved_1[5];
+   u64 effective_address;
+   u64 physical_address;
+   u8  reserved_2[8];
} ue_error;
 
struct {
enum MCE_SlbErrorType slb_error_type:8;
-   uint8_t effective_address_provided;
-   uint8_t reserved_1[6];
-   uint64_teffective_address;
-   uint8_t reserved_2[16];
+   u8  effective_address_provided;
+   u8  reserved_1[6];
+   u64 effective_address;
+   u8  reserved_2[16];
} slb_error;
 
struct {
enum MCE_EratErrorType erat_error_type:8;
-   uint8_t effective_address_provided;
-   uint8_t reserved_1[6];
-   uint64_teffective_address;
-   uint8_t reserved_2[16];
+   u8  effective_address_provided;
+   u8  reserved_1[6];
+   u64 effective_address;
+   u8  reserved_2[16];
} erat_error;
 
struct {
enum MCE_TlbErrorType tlb_error_type:8;
-   uint8_t effective_address_provided;
-   uint8_t reserved_1[6];
-   uint64_teffective_address;
-   uint8_t reserved_2[16];
+   u8  effective_address_provided;
+   u8  reserved_1[6];
+   u64 effective_address;
+   u8  reserved_2[16];
} tlb_error;
 
struct {
enum MCE_UserErrorType user_error_type:8;
-   uint8_t effective_address_provided;
-

RE: [PATCH] ASoC: fsl_esai: Support synchronous mode

2019-04-02 Thread S.j. Wang
Hi

> 
> Shengjiu,
> 
> On Mon, Apr 01, 2019 at 11:39:10AM +, S.j. Wang wrote:
> > In ESAI synchronous mode, the clock is generated by Tx, So we should
> > always set registers of Tx which relate with the bit clock and frame
> > clock generation (TCCR, TCR, ECR), even there is only Rx is working.
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
> >  sound/soc/fsl/fsl_esai.c | 28 +++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c index
> > 3623aa9a6f2e..d9fcddd55c02 100644
> > --- a/sound/soc/fsl/fsl_esai.c
> > +++ b/sound/soc/fsl/fsl_esai.c
> > @@ -230,6 +230,21 @@ static int fsl_esai_set_dai_sysclk(struct
> snd_soc_dai *dai, int clk_id,
> > return -EINVAL;
> > }
> >
> > +   if (esai_priv->synchronous && !tx) {
> > +   switch (clk_id) {
> > +   case ESAI_HCKR_FSYS:
> > +   fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_FSYS,
> > +   freq, dir);
> > +   break;
> > +   case ESAI_HCKR_EXTAL:
> > +   fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_EXTAL,
> > +   freq, dir);
> 
> Not sure why you call set_dai_sysclk inside set_dai_sysclk again. It feels 
> very
> confusing to do so, especially without a comments.

For sync mode, only RX is enabled,  the register of tx should be set, so call 
the
Set_dai_sysclk again.

> 
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +   }
> > +
> > /* Bypass divider settings if the requirement doesn't change */
> > if (freq == esai_priv->hck_rate[tx] && dir == esai_priv->hck_dir[tx])
> > return 0;
> > @@ -537,10 +552,21 @@ static int fsl_esai_hw_params(struct
> > snd_pcm_substream *substream,
> >
> > bclk = params_rate(params) * slot_width * esai_priv->slots;
> >
> > -   ret = fsl_esai_set_bclk(dai, tx, bclk);
> > +   ret = fsl_esai_set_bclk(dai, esai_priv->synchronous ? true : tx,
> > +bclk);
> > if (ret)
> > return ret;
> >
> > +   if (esai_priv->synchronous && !tx) {
> > +   /* Use Normal mode to support monaural audio */
> > +   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> > +  ESAI_xCR_xMOD_MASK,
> params_channels(params) > 1 ?
> > +  ESAI_xCR_xMOD_NETWORK : 0);
> > +
> > +   mask = ESAI_xCR_xSWS_MASK | ESAI_xCR_PADC;
> > +   val = ESAI_xCR_xSWS(slot_width, width) | ESAI_xCR_PADC;
> > +   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> mask, val);
> > +   }
> 
> Does synchronous mode require to set both TCR and RCR? or just TCR?
> The code behind this part is doing the same setting to RCR. If that is not
> needed any more for a synchronous recording, we should reuse it instead
> of inserting a piece of redundant code. Otherwise, if we need to set both,
> we should have two regmap_update_bits operations back-to-back for TCR
> and RCR (and other registers too).

Both TCR and RCR.  RCR will be set in normal flow.
> 
> > +
> > /* Use Normal mode to support monaural audio */
> > regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
> >ESAI_xCR_xMOD_MASK,
> params_channels(params) > 1 ?
> 
> In case that we only need to set TCR (more likely I feel), it would feel less
> confusing to me, if we changed REG_ESAI_xCR(tx) here, for example, to
> REG_ESAI_xCR(tx || sync). Yea, please add to the top a 'bool sync =
> esai_priv->synchronous;'.
> 
> Similarly, for ECR_ETO and ECR_ERO:
>   (tx || sync) ? ESAI_ECR_ETO : ESAI_ECR_ERO;

Both TCR and RCR should be set.
 


Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions

2019-04-02 Thread Christophe Leroy

Hi Dmitry, Andrey and others,

Do you have any comments to this series ?

I'd like to know if this approach is ok or if it is better to keep doing 
as in https://patchwork.ozlabs.org/patch/1055788/


Thanks
Christophe

Le 28/03/2019 à 16:00, Christophe Leroy a écrit :

In the same spirit as commit 393f203f5fd5 ("x86_64: kasan: add
interceptors for memset/memmove/memcpy functions"), this patch
adds interceptors for string manipulation functions so that we
can compile lib/string.o without kasan support hence allow the
string functions to also be used from places where kasan has
to be disabled.

Signed-off-by: Christophe Leroy 
---
  v2: Fixed a few checkpatch stuff and added missing EXPORT_SYMBOL() and 
missing #undefs

  include/linux/string.h |  79 ++
  lib/Makefile   |   2 +
  lib/string.c   |   8 +
  mm/kasan/string.c  | 394 +
  4 files changed, 483 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index 7927b875f80c..3d2aff2ed402 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -19,54 +19,117 @@ extern void *memdup_user_nul(const void __user *, size_t);
   */
  #include 
  
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)

+/*
+ * For files that are not instrumented (e.g. mm/slub.c) we
+ * should use not instrumented version of mem* functions.
+ */
+#define memset16   __memset16
+#define memset32   __memset32
+#define memset64   __memset64
+#define memzero_explicit   __memzero_explicit
+#define strcpy __strcpy
+#define strncpy__strncpy
+#define strlcpy__strlcpy
+#define strscpy__strscpy
+#define strcat __strcat
+#define strncat__strncat
+#define strlcat__strlcat
+#define strcmp __strcmp
+#define strncmp__strncmp
+#define strcasecmp __strcasecmp
+#define strncasecmp__strncasecmp
+#define strchr __strchr
+#define strchrnul  __strchrnul
+#define strrchr__strrchr
+#define strnchr__strnchr
+#define skip_spaces__skip_spaces
+#define strim  __strim
+#define strstr __strstr
+#define strnstr__strnstr
+#define strlen __strlen
+#define strnlen__strnlen
+#define strpbrk__strpbrk
+#define strsep __strsep
+#define strspn __strspn
+#define strcspn__strcspn
+#define memscan__memscan
+#define memcmp __memcmp
+#define memchr __memchr
+#define memchr_inv __memchr_inv
+#define strreplace __strreplace
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
+#endif
+
  #ifndef __HAVE_ARCH_STRCPY
  extern char * strcpy(char *,const char *);
+char *__strcpy(char *, const char *);
  #endif
  #ifndef __HAVE_ARCH_STRNCPY
  extern char * strncpy(char *,const char *, __kernel_size_t);
+char *__strncpy(char *, const char *, __kernel_size_t);
  #endif
  #ifndef __HAVE_ARCH_STRLCPY
  size_t strlcpy(char *, const char *, size_t);
+size_t __strlcpy(char *, const char *, size_t);
  #endif
  #ifndef __HAVE_ARCH_STRSCPY
  ssize_t strscpy(char *, const char *, size_t);
+ssize_t __strscpy(char *, const char *, size_t);
  #endif
  #ifndef __HAVE_ARCH_STRCAT
  extern char * strcat(char *, const char *);
+char *__strcat(char *, const char *);
  #endif
  #ifndef __HAVE_ARCH_STRNCAT
  extern char * strncat(char *, const char *, __kernel_size_t);
+char *__strncat(char *, const char *, __kernel_size_t);
  #endif
  #ifndef __HAVE_ARCH_STRLCAT
  extern size_t strlcat(char *, const char *, __kernel_size_t);
+size_t __strlcat(char *, const char *, __kernel_size_t);
  #endif
  #ifndef __HAVE_ARCH_STRCMP
  extern int strcmp(const char *,const char *);
+int __strcmp(const char *, const char *);
  #endif
  #ifndef __HAVE_ARCH_STRNCMP
  extern int strncmp(const char *,const char *,__kernel_size_t);
+int __strncmp(const char *, const char *, __kernel_size_t);
  #endif
  #ifndef __HAVE_ARCH_STRCASECMP
  extern int strcasecmp(const char *s1, const char *s2);
+int __strcasecmp(const char *s1, const char *s2);
  #endif
  #ifndef __HAVE_ARCH_STRNCASECMP
  extern int strncasecmp(const char *s1, const char *s2, size_t n);
+int __strncasecmp(const char *s1, const char *s2, size_t n);
  #endif
  #ifndef __HAVE_ARCH_STRCHR
  extern char * strchr(const char *,int);
+char *__strchr(const char *, int);
  #endif
  #ifndef __HAVE_ARCH_STRCHRNUL
  extern char * strchrnul(const char *,int);
+char *__strchrnul(const char *, int);
  #endif
  #ifndef __HAVE_ARCH_STRNCHR
  extern char * strnchr(const char *, size_t, int);
+char *__strnchr(const char *, size_t, int);
  #endif
  #ifndef __HAVE_ARCH_STRRCHR
  extern char * strrchr(const char *,int);
+char *__strrchr(const char *, int);
  #endif
  extern char * __must_check skip_spaces(const char *);
+char * __must_check __

[PATCH 9/9] powerpc: use generic CMDLINE manipulations

2019-04-02 Thread Christophe Leroy
This patch moves powerpc to the centraly defined CMDLINE options.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 48 +++-
 1 file changed, 3 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22d6a48bd2ca..6a71d7c514cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -182,6 +182,7 @@ config PPC
select HAVE_CBPF_JITif !PPC64
select HAVE_STACKPROTECTOR  if PPC64 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
+   select HAVE_CMDLINE
select HAVE_CONTEXT_TRACKINGif PPC64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_STACKOVERFLOW
@@ -828,52 +829,9 @@ config PPC_DENORMALISATION
  Add support for handling denormalisation of single precision
  values.  Useful for bare metal only.  If unsure say Y here.
 
-config CMDLINE_BOOL
-   bool "Default bootloader kernel arguments"
-
-config CMDLINE
-   string "Initial kernel command string"
-   depends on CMDLINE_BOOL
+config DEFAULT_CMDLINE
+   string
default "console=ttyS0,9600 console=tty0 root=/dev/sda2"
-   help
- On some platforms, there is currently no way for the boot loader to
- pass arguments to the kernel. For these platforms, you can supply
- some command-line options at build time by entering them here.  In
- most cases you will need to specify the root device here.
-
-choice
-   prompt "Kernel command line type" if CMDLINE != ""
-   default CMDLINE_FROM_BOOTLOADER
-   help
- Selects the way you want to use the default kernel arguments.
-
-config CMDLINE_FROM_BOOTLOADER
-   bool "Use bootloader kernel arguments if available"
-   help
- Uses the command-line options passed by the boot loader. If
- the boot loader doesn't provide any, the default kernel command
- string provided in CMDLINE will be used.
-
-config CMDLINE_EXTEND
-   bool "Extend bootloader kernel arguments"
-   help
- The default kernel command string will be appended to the
- command-line arguments provided during boot.
-
-config CMDLINE_PREPEND
-   bool "Prepend bootloader kernel arguments"
-   help
- The default kernel command string will be prepend to the
- command-line arguments provided during boot.
-
-config CMDLINE_FORCE
-   bool "Always use the default kernel command string"
-   help
- Always use the default kernel command string, even if the boot
- loader passes other arguments to the kernel.
- This is useful if you cannot or don't want to change the
- command-line options your boot loader passes to the kernel.
-endchoice
 
 config EXTRA_TARGETS
string "Additional default image types"
-- 
2.13.3



[PATCH 8/9] Gives arches opportunity to use generically defined boot cmdline manipulation

2019-04-02 Thread Christophe Leroy
Most arches have similar boot command line manipulation options.
This patchs adds the definition in init/Kconfig, gated by
CONFIG_HAVE_CMDLINE that the arches can select to use them.

In order to use this, a few arches will have to change their
CONFIG options:
- riscv has to replace CMDLINE_FALLBACK by CMDLINE_FROM_BOOTLOADER
- arches using CONFIG_CMDLINE_OVERRIDE or CONFIG_CMDLINE_OVERWRITE
have to replace them by CONFIG_CMDLINE_FORCE

Arches also have to define CONFIG_DEFAULT_CMDLINE

Signed-off-by: Christophe Leroy 
---
 init/Kconfig | 56 
 1 file changed, 56 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index 4592bf7997c0..83537603412c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -80,6 +80,62 @@ config INIT_ENV_ARG_LIMIT
  Maximum of each of the number of arguments and environment
  variables passed to init from the kernel command line.
 
+config HAVE_CMDLINE
+   bool
+
+config CMDLINE_BOOL
+   bool "Default bootloader kernel arguments"
+   depends on HAVE_CMDLINE
+   help
+ On some platforms, there is currently no way for the boot loader to
+ pass arguments to the kernel. For these platforms, you can supply
+ some command-line options at build time by entering them here.  In
+ most cases you will need to specify the root device here.
+
+config CMDLINE
+   string "Initial kernel command string"
+   depends on CMDLINE_BOOL
+   default DEFAULT_CMDLINE
+   help
+ On some platforms, there is currently no way for the boot loader to
+ pass arguments to the kernel. For these platforms, you can supply
+ some command-line options at build time by entering them here.  In
+ most cases you will need to specify the root device here.
+
+choice
+   prompt "Kernel command line type" if CMDLINE != ""
+   default CMDLINE_FROM_BOOTLOADER
+   help
+ Selects the way you want to use the default kernel arguments.
+
+config CMDLINE_FROM_BOOTLOADER
+   bool "Use bootloader kernel arguments if available"
+   help
+ Uses the command-line options passed by the boot loader. If
+ the boot loader doesn't provide any, the default kernel command
+ string provided in CMDLINE will be used.
+
+config CMDLINE_EXTEND
+   bool "Extend bootloader kernel arguments"
+   help
+ The default kernel command string will be appended to the
+ command-line arguments provided during boot.
+
+config CMDLINE_PREPEND
+   bool "Prepend bootloader kernel arguments"
+   help
+ The default kernel command string will be prepend to the
+ command-line arguments provided during boot.
+
+config CMDLINE_FORCE
+   bool "Always use the default kernel command string"
+   help
+ Always use the default kernel command string, even if the boot
+ loader passes other arguments to the kernel.
+ This is useful if you cannot or don't want to change the
+ command-line options your boot loader passes to the kernel.
+endchoice
+
 config COMPILE_TEST
bool "Compile also drivers which will not load"
depends on !UML
-- 
2.13.3



[PATCH 5/9] powerpc: convert to generic builtin command line

2019-04-02 Thread Christophe Leroy
This updates the powerpc code to use the new cmdline building function.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/prom_init.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index d4889ba04ddd..08f3db25b2f1 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -155,7 +156,7 @@ static struct prom_t __prombss prom;
 static unsigned long __prombss prom_entry;
 
 static char __prombss of_stdout_device[256];
-static char __prombss prom_scratch[256];
+static char __prombss prom_scratch[COMMAND_LINE_SIZE];
 
 static unsigned long __prombss dt_header_start;
 static unsigned long __prombss dt_struct_start, dt_struct_end;
@@ -627,18 +628,14 @@ static unsigned long prom_memparse(const char *ptr, const 
char **retptr)
 static void __init early_cmdline_parse(void)
 {
const char *opt;
+   int l = 0;
 
-   char *p;
-   int l __maybe_unused = 0;
-
-   prom_cmd_line[0] = 0;
-   p = prom_cmd_line;
if ((long)prom.chosen > 0)
-   l = prom_getprop(prom.chosen, "bootargs", p, 
COMMAND_LINE_SIZE-1);
-#ifdef CONFIG_CMDLINE
-   if (l <= 0 || p[0] == '\0' || IS_ENABLED(CONFIG_CMDLINE_EXTEND)) /* dbl 
check */
-   strlcat(prom_cmd_line, CONFIG_CMDLINE, sizeof(prom_cmd_line));
-#endif /* CONFIG_CMDLINE */
+   l = prom_getprop(prom.chosen, "bootargs", prom_scratch,
+COMMAND_LINE_SIZE - 1);
+
+   cmdline_build(prom_cmd_line, l > 0 ? prom_scratch : NULL, 
sizeof(prom_scratch));
+
prom_printf("command line: %s\n", prom_cmd_line);
 
 #ifdef CONFIG_PPC64
-- 
2.13.3



[PATCH 7/9] powerpc: add capability to prepend default command line

2019-04-02 Thread Christophe Leroy
This patch activates the capability to prepend default
arguments to the command line.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2972348e52be..22d6a48bd2ca 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -860,6 +860,12 @@ config CMDLINE_EXTEND
  The default kernel command string will be appended to the
  command-line arguments provided during boot.
 
+config CMDLINE_PREPEND
+   bool "Prepend bootloader kernel arguments"
+   help
+ The default kernel command string will be prepend to the
+ command-line arguments provided during boot.
+
 config CMDLINE_FORCE
bool "Always use the default kernel command string"
help
-- 
2.13.3



[PATCH 6/9] Add capability to prepend the command line

2019-04-02 Thread Christophe Leroy
This patchs adds an option of prepend a text to the command
line instead of appending it.

Signed-off-by: Christophe Leroy 
---
 include/linux/cmdline.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
index afcc00d7628d..5caf3724c1ab 100644
--- a/include/linux/cmdline.h
+++ b/include/linux/cmdline.h
@@ -3,7 +3,7 @@
 #define _LINUX_CMDLINE_H
 
 /*
- * This function will append a builtin command line to the command
+ * This function will append or prepend a builtin command line to the command
  * line provided by the bootloader. Kconfig options can be used to alter
  * the behavior of this builtin command line.
  * @dest: The destination of the final appended/prepended string.
@@ -22,6 +22,9 @@ static __always_inline void cmdline_build(char *dest, const 
char *src, size_t le
strlcat(dest, CONFIG_CMDLINE, length);
return;
}
+
+   if (IS_ENABLED(CONFIG_CMDLINE_PREPEND) && sizeof(CONFIG_CMDLINE) > 1)
+   strlcat(dest, CONFIG_CMDLINE " ", length);
 #endif
if (dest != src)
strlcat(dest, src, length);
-- 
2.13.3



[PATCH 4/9] powerpc/prom_init: get rid of PROM_SCRATCH_SIZE

2019-04-02 Thread Christophe Leroy
PROM_SCRATCH_SIZE is same as sizeof(prom_scratch)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/prom_init.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index a6cd52240c58..d4889ba04ddd 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -154,10 +154,8 @@ static struct prom_t __prombss prom;
 
 static unsigned long __prombss prom_entry;
 
-#define PROM_SCRATCH_SIZE 256
-
 static char __prombss of_stdout_device[256];
-static char __prombss prom_scratch[PROM_SCRATCH_SIZE];
+static char __prombss prom_scratch[256];
 
 static unsigned long __prombss dt_header_start;
 static unsigned long __prombss dt_struct_start, dt_struct_end;
@@ -1486,8 +1484,8 @@ static void __init prom_init_mem(void)
endp = p + (plen / sizeof(cell_t));
 
 #ifdef DEBUG_PROM
-   memset(path, 0, PROM_SCRATCH_SIZE);
-   call_prom("package-to-path", 3, 1, node, path, 
PROM_SCRATCH_SIZE-1);
+   memset(path, 0, sizeof(prom_scratch));
+   call_prom("package-to-path", 3, 1, node, path, 
sizeof(prom_scratch) - 1);
prom_debug("  node %s :\n", path);
 #endif /* DEBUG_PROM */
 
@@ -1795,10 +1793,10 @@ static void __init prom_initialize_tce_table(void)
local_alloc_bottom = base;
 
/* It seems OF doesn't null-terminate the path :-( */
-   memset(path, 0, PROM_SCRATCH_SIZE);
+   memset(path, 0, sizeof(prom_scratch));
/* Call OF to setup the TCE hardware */
if (call_prom("package-to-path", 3, 1, node,
- path, PROM_SCRATCH_SIZE-1) == PROM_ERROR) {
+ path, sizeof(prom_scratch) - 1) == PROM_ERROR) {
prom_printf("package-to-path failed\n");
}
 
@@ -2159,14 +2157,14 @@ static void __init prom_check_displays(void)
 
/* It seems OF doesn't null-terminate the path :-( */
path = prom_scratch;
-   memset(path, 0, PROM_SCRATCH_SIZE);
+   memset(path, 0, sizeof(prom_scratch));
 
/*
 * leave some room at the end of the path for appending extra
 * arguments
 */
if (call_prom("package-to-path", 3, 1, node, path,
- PROM_SCRATCH_SIZE-10) == PROM_ERROR)
+ sizeof(prom_scratch) - 10) == PROM_ERROR)
continue;
prom_printf("found display   : %s, opening... ", path);

@@ -2362,8 +2360,8 @@ static void __init scan_dt_build_struct(phandle node, 
unsigned long *mem_start,
 
/* get it again for debugging */
path = prom_scratch;
-   memset(path, 0, PROM_SCRATCH_SIZE);
-   call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1);
+   memset(path, 0, sizeof(prom_scratch));
+   call_prom("package-to-path", 3, 1, node, path, sizeof(prom_scratch) - 
1);
 
/* get and store all properties */
prev_name = "";
-- 
2.13.3



[PATCH 3/9] drivers: of: use cmdline building function

2019-04-02 Thread Christophe Leroy
This patch uses the new cmdline building function to
concatenate the of provided cmdline with built-in parts
based on compile-time options.

Signed-off-by: Christophe Leroy 
---
 drivers/of/fdt.c| 23 ---
 include/linux/cmdline.h |  2 +-
 2 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4734223ab702..c6d941785b37 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include   /* for COMMAND_LINE_SIZE */
 #include 
@@ -1090,26 +1091,10 @@ int __init early_init_dt_scan_chosen(unsigned long 
node, const char *uname,
 
/* Retrieve command line */
p = of_get_flat_dt_prop(node, "bootargs", &l);
-   if (p != NULL && l > 0)
-   strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
+   if (l <= 0)
+   p = NULL;
 
-   /*
-* CONFIG_CMDLINE is meant to be a default in case nothing else
-* managed to set the command line, unless CONFIG_CMDLINE_FORCE
-* is set in which case we override whatever was found earlier.
-*/
-#ifdef CONFIG_CMDLINE
-#if defined(CONFIG_CMDLINE_EXTEND)
-   strlcat(data, " ", COMMAND_LINE_SIZE);
-   strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#elif defined(CONFIG_CMDLINE_FORCE)
-   strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#else
-   /* No arguments from boot loader, use kernel's  cmdl*/
-   if (!((char *)data)[0])
-   strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#endif
-#endif /* CONFIG_CMDLINE */
+   cmdline_build(data, p, COMMAND_LINE_SIZE);
 
pr_debug("Command line is: %s\n", (char*)data);
 
diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
index 8610ddf813ff..afcc00d7628d 100644
--- a/include/linux/cmdline.h
+++ b/include/linux/cmdline.h
@@ -10,7 +10,7 @@
  * @src: The starting string or NULL if there isn't one. Must not equal dest.
  * @length: the length of dest buffer.
  */
-static __always_inline void cmdline_build(char *dest, char *src, size_t length)
+static __always_inline void cmdline_build(char *dest, const char *src, size_t 
length)
 {
if (length <= 0)
return;
-- 
2.13.3



[PATCH 1/9] powerpc: enable appending of CONFIG_CMDLINE to bootloader's cmdline.

2019-04-02 Thread Christophe Leroy
Today, powerpc defined CONFIG_CMDLINE for when bootloader doesn't
provide a command line or for overriding it.

On same way as ARM, this patch adds the option of appending the
CONFIG_CMDLINE to bootloader's provided command line.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 21 -
 arch/powerpc/kernel/prom_init.c|  5 ++---
 arch/powerpc/kernel/prom_init_check.sh |  2 +-
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2d0be82c3061..2972348e52be 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -841,14 +841,33 @@ config CMDLINE
  some command-line options at build time by entering them here.  In
  most cases you will need to specify the root device here.
 
+choice
+   prompt "Kernel command line type" if CMDLINE != ""
+   default CMDLINE_FROM_BOOTLOADER
+   help
+ Selects the way you want to use the default kernel arguments.
+
+config CMDLINE_FROM_BOOTLOADER
+   bool "Use bootloader kernel arguments if available"
+   help
+ Uses the command-line options passed by the boot loader. If
+ the boot loader doesn't provide any, the default kernel command
+ string provided in CMDLINE will be used.
+
+config CMDLINE_EXTEND
+   bool "Extend bootloader kernel arguments"
+   help
+ The default kernel command string will be appended to the
+ command-line arguments provided during boot.
+
 config CMDLINE_FORCE
bool "Always use the default kernel command string"
-   depends on CMDLINE_BOOL
help
  Always use the default kernel command string, even if the boot
  loader passes other arguments to the kernel.
  This is useful if you cannot or don't want to change the
  command-line options your boot loader passes to the kernel.
+endchoice
 
 config EXTRA_TARGETS
string "Additional default image types"
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f33ff4163a51..a6cd52240c58 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -638,9 +638,8 @@ static void __init early_cmdline_parse(void)
if ((long)prom.chosen > 0)
l = prom_getprop(prom.chosen, "bootargs", p, 
COMMAND_LINE_SIZE-1);
 #ifdef CONFIG_CMDLINE
-   if (l <= 0 || p[0] == '\0') /* dbl check */
-   strlcpy(prom_cmd_line,
-   CONFIG_CMDLINE, sizeof(prom_cmd_line));
+   if (l <= 0 || p[0] == '\0' || IS_ENABLED(CONFIG_CMDLINE_EXTEND)) /* dbl 
check */
+   strlcat(prom_cmd_line, CONFIG_CMDLINE, sizeof(prom_cmd_line));
 #endif /* CONFIG_CMDLINE */
prom_printf("command line: %s\n", prom_cmd_line);
 
diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 667df97d2595..cbcf18846392 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -19,7 +19,7 @@
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
 _end enter_prom memcpy memset reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
-strcmp strcpy strlcpy strlen strncmp strstr kstrtobool logo_linux_clut224
+strcmp strcpy strlcat strlen strncmp strstr kstrtobool logo_linux_clut224
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
 __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
 
-- 
2.13.3



[PATCH 2/9] Add generic function to build command line.

2019-04-02 Thread Christophe Leroy
This code provides architectures with a way to build command line
based on what is built in the kernel and what is handed over by the
bootloader, based on selected compile-time options.

Signed-off-by: Christophe Leroy 
---
 include/linux/cmdline.h | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 include/linux/cmdline.h

diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
new file mode 100644
index ..8610ddf813ff
--- /dev/null
+++ b/include/linux/cmdline.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CMDLINE_H
+#define _LINUX_CMDLINE_H
+
+/*
+ * This function will append a builtin command line to the command
+ * line provided by the bootloader. Kconfig options can be used to alter
+ * the behavior of this builtin command line.
+ * @dest: The destination of the final appended/prepended string.
+ * @src: The starting string or NULL if there isn't one. Must not equal dest.
+ * @length: the length of dest buffer.
+ */
+static __always_inline void cmdline_build(char *dest, char *src, size_t length)
+{
+   if (length <= 0)
+   return;
+
+   dest[0] = 0;
+
+#ifdef CONFIG_CMDLINE
+   if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !src || !src[0]) {
+   strlcat(dest, CONFIG_CMDLINE, length);
+   return;
+   }
+#endif
+   if (dest != src)
+   strlcat(dest, src, length);
+#ifdef CONFIG_CMDLINE
+   if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) && sizeof(CONFIG_CMDLINE) > 1)
+   strlcat(dest, " " CONFIG_CMDLINE, length);
+#endif
+}
+
+#endif /* _LINUX_CMDLINE_H */
-- 
2.13.3



[PATCH 0/9] Improve boot command line handling

2019-04-02 Thread Christophe Leroy
The purpose of this series is to improve and enhance the
handling of kernel boot arguments.

It is first focussed on powerpc but also extends the capability
for other arches.

This is based on suggestion from Daniel Walker 

Christophe Leroy (9):
  powerpc: enable appending of CONFIG_CMDLINE to bootloader's cmdline.
  Add generic function to build command line.
  drivers: of: use cmdline building function
  powerpc/prom_init: get rid of PROM_SCRATCH_SIZE
  powerpc: convert to generic builtin command line
  Add capability to prepend the command line
  powerpc: add capability to prepend default command line
  Gives arches opportunity to use generically defined boot cmdline
manipulation
  powerpc: use generic CMDLINE manipulations

 arch/powerpc/Kconfig   | 23 ++
 arch/powerpc/kernel/prom_init.c| 38 ++-
 arch/powerpc/kernel/prom_init_check.sh |  2 +-
 drivers/of/fdt.c   | 23 +++---
 include/linux/cmdline.h| 37 ++
 init/Kconfig   | 56 ++
 6 files changed, 117 insertions(+), 62 deletions(-)
 create mode 100644 include/linux/cmdline.h

-- 
2.13.3



Re: [PATCH 2/5] powerpc: Fix vDSO clock_getres()

2019-04-02 Thread Vincenzo Frascino
On 02/04/2019 07:14, Christophe Leroy wrote:
> 
> 
> On 04/01/2019 11:51 AM, Vincenzo Frascino wrote:
>> clock_getres in the vDSO library has to preserve the same behaviour
>> of posix_get_hrtimer_res().
>>
>> In particular, posix_get_hrtimer_res() does:
>>  sec = 0;
>>  ns = hrtimer_resolution;
>> and hrtimer_resolution depends on the enablement of the high
>> resolution timers that can happen either at compile or at run time.
>>
>> Fix the powerpc vdso implementation of clock_getres keeping a copy of
>> hrtimer_resolution in vdso data and using that directly.
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Signed-off-by: Vincenzo Frascino 
>> ---
>>   arch/powerpc/include/asm/vdso_datapage.h  |  2 ++
>>   arch/powerpc/kernel/asm-offsets.c |  2 +-
>>   arch/powerpc/kernel/time.c|  1 +
>>   arch/powerpc/kernel/vdso32/gettimeofday.S | 22 +++---
>>   arch/powerpc/kernel/vdso64/gettimeofday.S | 22 +++---
>>   5 files changed, 34 insertions(+), 15 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
>> b/arch/powerpc/kernel/vdso32/gettimeofday.S
>> index 1e0bc5955a40..b21630079496 100644
>> --- a/arch/powerpc/kernel/vdso32/gettimeofday.S
>> +++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
>> @@ -160,14 +160,21 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
>>  crorcr0*4+eq,cr0*4+eq,cr1*4+eq
>>  bne cr0,99f
>>   
>> -li  r3,0
>> -cmpli   cr0,r4,0
>> +mflrr12
>> +  .cfi_register lr,r12
>> +mr  r11,r4
>> +bl  __get_datapage@local
>> +lwz r5,CLOCK_REALTIME_RES(r3)
>> +li  r4,0
>> +cmplwi  r11,0   /* check if res is NULL */
>> +beq 1f
>> +
>> +stw r4,TSPC32_TV_SEC(r11)
>> +stw r5,TSPC32_TV_NSEC(r11)
>> +
>> +1:  mtlrr12
>>  crclr   cr0*4+so
>> -beqlr
>> -lis r5,CLOCK_REALTIME_RES@h
>> -ori r5,r5,CLOCK_REALTIME_RES@l
>> -stw r3,TSPC32_TV_SEC(r4)
>> -stw r5,TSPC32_TV_NSEC(r4)
>> +li  r3,0
>>  blr
> 
> The above can be done simpler, see below
> 
> @@ -160,12 +160,15 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
>   crorcr0*4+eq,cr0*4+eq,cr1*4+eq
>   bne cr0,99f
> 
> + mflrr12
> +  .cfi_register lr,r12
> + bl  __get_datapage@local
> + lwz r5,CLOCK_REALTIME_RES(r3)
> + mtlrr12
>   li  r3,0
>   cmpli   cr0,r4,0
>   crclr   cr0*4+so
>   beqlr
> - lis r5,CLOCK_REALTIME_RES@h
> - ori r5,r5,CLOCK_REALTIME_RES@l
>   stw r3,TSPC32_TV_SEC(r4)
>   stw r5,TSPC32_TV_NSEC(r4)
>   blr
> 

Thank you for this, I will update my code accordingly before posting v2.

> Christophe
> 
>>   
>>  /*
>> @@ -175,6 +182,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
>>   */
>>   99:
>>  li  r0,__NR_clock_getres
>> +  .cfi_restore lr
>>  sc
>>  blr
>> .cfi_endproc
>> diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
>> b/arch/powerpc/kernel/vdso64/gettimeofday.S
>> index a4ed9edfd5f0..a7e49bddd475 100644
>> --- a/arch/powerpc/kernel/vdso64/gettimeofday.S
>> +++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
>> @@ -190,14 +190,21 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
>>  crorcr0*4+eq,cr0*4+eq,cr1*4+eq
>>  bne cr0,99f
>>   
>> -li  r3,0
>> -cmpldi  cr0,r4,0
>> +mflrr12
>> +  .cfi_register lr,r12
>> +mr  r11, r4
>> +bl  V_LOCAL_FUNC(__get_datapage)
>> +lwz r5,CLOCK_REALTIME_RES(r3)
>> +li  r4,0
>> +cmpldi  r11,0   /* check if res is NULL */
>> +beq 1f
>> +
>> +std r4,TSPC64_TV_SEC(r11)
>> +std r5,TSPC64_TV_NSEC(r11)
>> +
>> +1:  mtlrr12
>>  crclr   cr0*4+so
>> -beqlr
>> -lis r5,CLOCK_REALTIME_RES@h
>> -ori r5,r5,CLOCK_REALTIME_RES@l
>> -std r3,TSPC64_TV_SEC(r4)
>> -std r5,TSPC64_TV_NSEC(r4)
>> +li  r3,0
>>  blr
> 
> The same type of simplification applies here too.
> 
> Christophe
> 
> 
>>   
>>  /*
>> @@ -205,6 +212,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
>>   */
>>   99:
>>  li  r0,__NR_clock_getres
>> +  .cfi_restore lr
>>  sc
>>  blr
>> .cfi_endproc
>>

-- 
Regards,
Vincenzo


Re: [PATCH 2/5] powerpc: Fix vDSO clock_getres()

2019-04-02 Thread Vincenzo Frascino
Hi Christophe,

thank you for your review.

On 02/04/2019 06:54, Christophe Leroy wrote:
> 
> 
> On 04/01/2019 11:51 AM, Vincenzo Frascino wrote:
>> clock_getres in the vDSO library has to preserve the same behaviour
>> of posix_get_hrtimer_res().
>>
>> In particular, posix_get_hrtimer_res() does:
>>  sec = 0;
>>  ns = hrtimer_resolution;
>> and hrtimer_resolution depends on the enablement of the high
>> resolution timers that can happen either at compile or at run time.
>>
>> Fix the powerpc vdso implementation of clock_getres keeping a copy of
>> hrtimer_resolution in vdso data and using that directly.
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Signed-off-by: Vincenzo Frascino 
>> ---
>>   arch/powerpc/include/asm/vdso_datapage.h  |  2 ++
> 
> Conflicts with commit b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC 
> inconsistencies across Y2038")
> 

Thanks for pointing this out, I will rebase my code on top of the latest version
before reissuing v2.

...

-- 
Regards,
Vincenzo


Re: [PATCH stable v4.14 13/32] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E

2019-04-02 Thread Joakim Tjernlund
On Tue, 2019-04-02 at 17:19 +1100, Michael Ellerman wrote:
> 
> Joakim Tjernlund  writes:
> > On Fri, 2019-03-29 at 22:26 +1100, Michael Ellerman wrote:
> > > From: Diana Craciun 
> > > 
> > > commit ebcd1bfc33c7a90df941df68a6e5d4018c022fba upstream.
> > > 
> > > Implement the barrier_nospec as a isync;sync instruction sequence.
> > > The implementation uses the infrastructure built for BOOK3S 64.
> > > 
> > > Signed-off-by: Diana Craciun 
> > > [mpe: Split out of larger patch]
> > > Signed-off-by: Michael Ellerman 
> > 
> > What is the performanc impact of these spectre fixes?
> 
> I've not seen any numbers from anyone.

Thanks for getting back to me.

> 
> It will depend on the workload, it's copy to/from user that is most
> likely to show an impact.
> 
> We have a context switch benchmark in
> tools/testing/selftests/powerpc/benchmarks/context_switch.c.
> 
> Running that with "--no-vector --no-altivec --no-fp --test=pipe" shows
> about a 2.3% slow down vs booting with "nospectre_v1".
> 
> > Can I compile it away?
> 
> You can't actually, but you can disable it at runtime with
> "nospectre_v1" on the kernel command line.
> 
> We could make it a user selectable compile time option if you really
> want it to be.

I think yes. Considering that these patches are fairly untested and the impact
in the wild unknown. Requiring systems to change their boot config over night is
too fast.

 Jocke



[PATCH v10 18/18] LS1021A: dtsi: add ftm quad decoder entries

2019-04-02 Thread William Breathitt Gray
From: Patrick Havelange 

Add the 4 Quadrature counters for this board.

Reviewed-by: Esben Haabendal 
Signed-off-by: Patrick Havelange 
Signed-off-by: William Breathitt Gray 
---
 arch/arm/boot/dts/ls1021a.dtsi | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index ed0941292172..0168fb62590a 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -433,6 +433,34 @@
status = "disabled";
};
 
+   counter0: counter@29d {
+   compatible = "fsl,ftm-quaddec";
+   reg = <0x0 0x29d 0x0 0x1>;
+   big-endian;
+   status = "disabled";
+   };
+
+   counter1: counter@29e {
+   compatible = "fsl,ftm-quaddec";
+   reg = <0x0 0x29e 0x0 0x1>;
+   big-endian;
+   status = "disabled";
+   };
+
+   counter2: counter@29f {
+   compatible = "fsl,ftm-quaddec";
+   reg = <0x0 0x29f 0x0 0x1>;
+   big-endian;
+   status = "disabled";
+   };
+
+   counter3: counter@2a0 {
+   compatible = "fsl,ftm-quaddec";
+   reg = <0x0 0x2a0 0x0 0x1>;
+   big-endian;
+   status = "disabled";
+   };
+
gpio0: gpio@230 {
compatible = "fsl,ls1021a-gpio", "fsl,qoriq-gpio";
reg = <0x0 0x230 0x0 0x1>;
-- 
2.21.0



[PATCH v10 17/18] counter: ftm-quaddec: Documentation: Add specific counter sysfs documentation

2019-04-02 Thread William Breathitt Gray
From: Patrick Havelange 

This adds documentation for the specific prescaler entry.

Signed-off-by: Patrick Havelange 
Signed-off-by: William Breathitt Gray 
---
 .../ABI/testing/sysfs-bus-counter-ftm-quaddec| 16 
 1 file changed, 16 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec

diff --git a/Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec 
b/Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec
new file mode 100644
index ..7d2e7b363467
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec
@@ -0,0 +1,16 @@
+What:  /sys/bus/counter/devices/counterX/countY/prescaler_available
+KernelVersion: 5.2
+Contact:   linux-...@vger.kernel.org
+Description:
+   Discrete set of available values for the respective Count Y
+   configuration are listed in this file. Values are delimited by
+   newline characters.
+
+What:  /sys/bus/counter/devices/counterX/countY/prescaler
+KernelVersion: 5.2
+Contact:   linux-...@vger.kernel.org
+Description:
+   Configure the prescaler value associated with Count Y.
+   On the FlexTimer, the counter clock source passes through a
+   prescaler (i.e. a counter). This acts like a clock
+   divider.
-- 
2.21.0