[PATCH] powerpc/mm: Fix build error with FLATMEM book3s64 config
The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs. We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32 bit configs never expected a value to be set for MAX_PHYSMEM_BITS. Dependent code such as zsmalloc derived the right values based on other fields. Instead of finding a value that works with different configs, use new values only for book3s_64. For 64 bit booke, use the definition of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase sparsemem defaults") That change was done in 2005 and hopefully will work with book3e 64. Fixes: 8bc086899816 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations") Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/mmu.h | 15 +++ arch/powerpc/include/asm/mmu.h | 15 --- arch/powerpc/include/asm/nohash/64/mmu.h | 2 ++ 3 files changed, 17 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 1ceee000c18d..a809bdd77322 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -35,6 +35,21 @@ typedef pte_t *pgtable_t; #endif /* __ASSEMBLY__ */ +/* + * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS + * if we increase SECTIONS_WIDTH we will not store node details in page->flags and + * page_to_nid does a page->section->node lookup + * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce + * memory requirements with large number of sections. + * 51 bits is the max physical real address on POWER9 + */ +#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \ + defined(CONFIG_PPC_64K_PAGES) +#define MAX_PHYSMEM_BITS 51 +#else +#define MAX_PHYSMEM_BITS 46 +#endif + /* 64-bit classic hash table MMU */ #include diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 598cdcdd1355..78d53c4396ac 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -341,21 +341,6 @@ static inline bool strict_kernel_rwx_enabled(void) */ #define MMU_PAGE_COUNT 16 -/* - * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS - * if we increase SECTIONS_WIDTH we will not store node details in page->flags and - * page_to_nid does a page->section->node lookup - * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce - * memory requirements with large number of sections. - * 51 bits is the max physical real address on POWER9 - */ -#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \ - defined (CONFIG_PPC_64K_PAGES) -#define MAX_PHYSMEM_BITS51 -#elif defined(CONFIG_SPARSEMEM) -#define MAX_PHYSMEM_BITS46 -#endif - #ifdef CONFIG_PPC_BOOK3S_64 #include #else /* CONFIG_PPC_BOOK3S_64 */ diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h b/arch/powerpc/include/asm/nohash/64/mmu.h index e6585480dfc4..81cf30c370e5 100644 --- a/arch/powerpc/include/asm/nohash/64/mmu.h +++ b/arch/powerpc/include/asm/nohash/64/mmu.h @@ -2,6 +2,8 @@ #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_ #define _ASM_POWERPC_NOHASH_64_MMU_H_ +#define MAX_PHYSMEM_BITS44 + /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */ #include -- 2.20.1
Re: VLC doesn't play videos anymore since the PowerPC fixes 5.1-3
Le 03/04/2019 à 05:52, Christian Zigotzky a écrit : Please test VLC with the RC3 of kernel 5.1. The removing of the PowerPC fixes 5.1-3 has solved the VLC issue. Another user has already confirmed that [1]. This isn’t an April Fool‘s. ;-) Could you bisect to identify the guilty commit ? Thanks Christophe Thanks [1] http://forum.hyperion-entertainment.com/viewtopic.php?f=58&t=4256&start=20#p47561
Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
Le 02/04/2019 à 22:41, Daniel Jordan a écrit : With locked_vm now an atomic, there is no need to take mmap_sem as writer. Delete and refactor accordingly. Could you please detail the change ? It looks like this is not the only change. I'm wondering what the consequences are. Before we did: - lock - calculate future value - check the future value is acceptable - update value if future value acceptable - return error if future value non acceptable - unlock Now we do: - atomic update with future (possibly too high) value - check the new value is acceptable - atomic update back with older value if new value not acceptable and return error So if a concurrent action wants to increase locked_vm with an acceptable step while another one has temporarily set it too high, it will now fail. I think we should keep the previous approach and do a cmpxchg after validating the new value. Christophe Signed-off-by: Daniel Jordan Cc: Alexey Kardashevskiy Cc: Andrew Morton Cc: Benjamin Herrenschmidt Cc: Christoph Lameter Cc: Davidlohr Bueso Cc: Michael Ellerman Cc: Paul Mackerras Cc: Cc: Cc: --- arch/powerpc/mm/mmu_context_iommu.c | 27 +++ 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 8038ac24a312..a4ef22b67c07 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -54,34 +54,29 @@ struct mm_iommu_table_group_mem_t { static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, unsigned long npages, bool incr) { - long ret = 0, locked, lock_limit; + long ret = 0; + unsigned long lock_limit; s64 locked_vm; if (!npages) return 0; - down_write(&mm->mmap_sem); - locked_vm = atomic64_read(&mm->locked_vm); if (incr) { - locked = locked_vm + npages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) + locked_vm = atomic64_add_return(npages, &mm->locked_vm); + if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) { ret = -ENOMEM; - else - atomic64_add(npages, &mm->locked_vm); + atomic64_sub(npages, &mm->locked_vm); + } } else { - if (WARN_ON_ONCE(npages > locked_vm)) - npages = locked_vm; - atomic64_sub(npages, &mm->locked_vm); + locked_vm = atomic64_sub_return(npages, &mm->locked_vm); + WARN_ON_ONCE(locked_vm < 0); } - pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", - current ? current->pid : 0, - incr ? '+' : '-', - npages << PAGE_SHIFT, - atomic64_read(&mm->locked_vm) << PAGE_SHIFT, + pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%lu %lld/%lu\n", + current ? current->pid : 0, incr ? '+' : '-', + npages << PAGE_SHIFT, locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK)); - up_write(&mm->mmap_sem); return ret; }
Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
Le 02/04/2019 à 22:41, Daniel Jordan a écrit : Taking and dropping mmap_sem to modify a single counter, locked_vm, is overkill when the counter could be synchronized separately. Make mmap_sem a little less coarse by changing locked_vm to an atomic, the 64-bit variety to avoid issues with overflow on 32-bit systems. Can you elaborate on the above ? Previously it was 'unsigned long', what were the issues ? If there was such issues, shouldn't there be a first patch moving it from unsigned long to u64 before this atomic64_t change ? Or at least it should be clearly explain here what the issues are and how switching to a 64 bit counter fixes them. Christophe Signed-off-by: Daniel Jordan Cc: Alan Tull Cc: Alexey Kardashevskiy Cc: Alex Williamson Cc: Andrew Morton Cc: Benjamin Herrenschmidt Cc: Christoph Lameter Cc: Davidlohr Bueso Cc: Michael Ellerman Cc: Moritz Fischer Cc: Paul Mackerras Cc: Wu Hao Cc: Cc: Cc: Cc: Cc: Cc: --- arch/powerpc/kvm/book3s_64_vio.c| 14 -- arch/powerpc/mm/mmu_context_iommu.c | 15 --- drivers/fpga/dfl-afu-dma-region.c | 18 ++ drivers/vfio/vfio_iommu_spapr_tce.c | 17 + drivers/vfio/vfio_iommu_type1.c | 10 ++ fs/proc/task_mmu.c | 2 +- include/linux/mm_types.h| 2 +- kernel/fork.c | 2 +- mm/debug.c | 5 +++-- mm/mlock.c | 4 ++-- mm/mmap.c | 18 +- mm/mremap.c | 6 +++--- 12 files changed, 61 insertions(+), 52 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index f02b04973710..e7fdb6d10eeb 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages) static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) { long ret = 0; + s64 locked_vm; if (!current || !current->mm) return ret; /* process exited */ down_write(¤t->mm->mmap_sem); + locked_vm = atomic64_read(¤t->mm->locked_vm); if (inc) { unsigned long locked, lock_limit; - locked = current->mm->locked_vm + stt_pages; + locked = locked_vm + stt_pages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; if (locked > lock_limit && !capable(CAP_IPC_LOCK)) ret = -ENOMEM; else - current->mm->locked_vm += stt_pages; + atomic64_add(stt_pages, ¤t->mm->locked_vm); } else { - if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm)) - stt_pages = current->mm->locked_vm; + if (WARN_ON_ONCE(stt_pages > locked_vm)) + stt_pages = locked_vm; - current->mm->locked_vm -= stt_pages; + atomic64_sub(stt_pages, ¤t->mm->locked_vm); } pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid, inc ? '+' : '-', stt_pages << PAGE_SHIFT, - current->mm->locked_vm << PAGE_SHIFT, + atomic64_read(¤t->mm->locked_vm) << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK), ret ? " - exceeded" : ""); diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index e7a9c4f6bfca..8038ac24a312 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, unsigned long npages, bool incr) { long ret = 0, locked, lock_limit; + s64 locked_vm; if (!npages) return 0; down_write(&mm->mmap_sem); - + locked_vm = atomic64_read(&mm->locked_vm); if (incr) { - locked = mm->locked_vm + npages; + locked = locked_vm + npages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; if (locked > lock_limit && !capable(CAP_IPC_LOCK)) ret = -ENOMEM; else - mm->locked_vm += npages; + atomic64_add(npages, &mm->locked_vm); } else { - if (WARN_ON_ONCE(npages > mm->locked_vm)) - npages = mm->locked_vm; - mm->locked_vm -= npages; + if (WARN_ON_ONCE(npages > locked_vm)) + npages = locked_vm; + atomic64_sub(npages, &mm->locked_vm); } pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", current ? current->pid : 0, incr ? '+' : '-', npages << PAGE_SHIFT, -
Re: [PATCH stable v4.14 13/32] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
On Wed, 2019-04-03 at 11:53 +1100, Michael Ellerman wrote: > > Joakim Tjernlund writes: > > On Tue, 2019-04-02 at 17:19 +1100, Michael Ellerman wrote: > > > Joakim Tjernlund writes: > ... > > > > Can I compile it away? > > > > > > You can't actually, but you can disable it at runtime with > > > "nospectre_v1" on the kernel command line. > > > > > > We could make it a user selectable compile time option if you really > > > want it to be. > > > > I think yes. Considering that these patches are fairly untested and the > > impact > > in the wild unknown. Requiring systems to change their boot config over > > night is > > too fast. > > OK. Just to be clear, you're actually using 4.14 on an NXP board and > would actually use this option? I don't want to add another option just > for a theoretical use case. Correct, we use 4.14 on several custom boards using NXP CPUs and would appreciate if I could control spectre with a build switch. Thanks a lot! Jocke
Re: [PATCH] powerpc/xmon: add read-only mode
Le 03/04/2019 à 05:38, Christopher M Riedl a écrit : On March 29, 2019 at 3:41 AM Christophe Leroy wrote: Le 29/03/2019 à 05:21, cmr a écrit : Operations which write to memory should be restricted on secure systems and optionally to avoid self-destructive behaviors. Add a config option, XMON_RO, to control default xmon behavior along with kernel cmdline options xmon=ro and xmon=rw for explicit control. The default is to enable read-only mode. The following xmon operations are affected: memops: disable memmove disable memset memex: no-op'd mwrite super_regs: no-op'd write_spr bpt_cmds: disable proc_call: disable Signed-off-by: cmr A Fully qualified name should be used. What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY? I mean it should be Signed-off-by: Christopher M Riedl instead of Signed-off-by: cmr --- arch/powerpc/Kconfig.debug | 7 +++ arch/powerpc/xmon/xmon.c | 24 2 files changed, 31 insertions(+) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 4e00cb0a5464..33cc01adf4cb 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY to say Y here, unless you're building for a memory-constrained system. +config XMON_RO + bool "Set xmon read-only mode" + depends on XMON + default y Should it really be always default y ? I would set default 'y' only when some security options are also set. This is a good point, I based this on an internal Slack suggestion but giving this more thought, disabling read-only mode by default makes more sense. I'm not sure what security options could be set though? Maybe starting with CONFIG_STRICT_KERNEL_RWX Another point that may also be addressed by your patch is the definition of PAGE_KERNEL_TEXT: #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) #define PAGE_KERNEL_TEXTPAGE_KERNEL_X #else #define PAGE_KERNEL_TEXTPAGE_KERNEL_ROX #endif The above let me think that it would be better if you add a config XMON_RW instead of XMON_RO, with default !STRICT_KERNEL_RWX Christophe
[PATCH kernel v2 1/2] powerpc/mm_iommu: Fix potential deadlock
Currently mm_iommu_do_alloc() is called in 2 cases: - VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory: this locks &mem_list_mutex and then locks mm::mmap_sem several times when adjusting locked_vm or pinning pages; - vfio_pci_nvgpu_regops::mmap() for GPU memory: this is called with mm::mmap_sem held already and it locks &mem_list_mutex. So one can craft a userspace program to do special ioctl and mmap in 2 threads concurrently and cause a deadlock which lockdep warns about (below). We did not hit this yet because QEMU constructs the machine in a single thread. This moves the overlap check next to where the new entry is added and reduces the amount of time spent with &mem_list_mutex held. This moves locked_vm adjustment from under &mem_list_mutex. This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0. This is one of the lockdep warnings: == WARNING: possible circular locking dependency detected 5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted -- qemu-system-ppc/8038 is trying to acquire lock: 2ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490 but task is already holding lock: fd7da97f (&mm->mmap_sem){}, at: vm_mmap_pgoff+0xf0/0x160 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){}: lock_acquire+0xf8/0x260 down_write+0x44/0xa0 mm_iommu_adjust_locked_vm.part.1+0x4c/0x190 mm_iommu_do_alloc+0x310/0x490 tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce] vfio_fops_unl_ioctl+0x94/0x430 [vfio] do_vfs_ioctl+0xe4/0x930 ksys_ioctl+0xc4/0x110 sys_ioctl+0x28/0x80 system_call+0x5c/0x70 -> #0 (mem_list_mutex){+.+.}: __lock_acquire+0x1484/0x1900 lock_acquire+0xf8/0x260 __mutex_lock+0x88/0xa70 mm_iommu_do_alloc+0x70/0x490 vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci] vfio_pci_mmap+0x198/0x2a0 [vfio_pci] vfio_device_fops_mmap+0x44/0x70 [vfio] mmap_region+0x5d4/0x770 do_mmap+0x42c/0x650 vm_mmap_pgoff+0x124/0x160 ksys_mmap_pgoff+0xdc/0x2f0 sys_mmap+0x40/0x80 system_call+0x5c/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(&mm->mmap_sem); lock(mem_list_mutex); lock(&mm->mmap_sem); lock(mem_list_mutex); *** DEADLOCK *** 1 lock held by qemu-system-ppc/8038: #0: fd7da97f (&mm->mmap_sem){}, at: vm_mmap_pgoff+0xf0/0x160 Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19) Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/mm/mmu_context_iommu.c | 75 +++-- 1 file changed, 39 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index e7a9c4f6bfca..9d9be850f8c2 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -95,28 +95,14 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, unsigned long entries, unsigned long dev_hpa, struct mm_iommu_table_group_mem_t **pmem) { - struct mm_iommu_table_group_mem_t *mem; - long i, ret, locked_entries = 0; + struct mm_iommu_table_group_mem_t *mem, *mem2; + long i, ret, locked_entries = 0, pinned = 0; unsigned int pageshift; - mutex_lock(&mem_list_mutex); - - list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, - next) { - /* Overlap? */ - if ((mem->ua < (ua + (entries << PAGE_SHIFT))) && - (ua < (mem->ua + - (mem->entries << PAGE_SHIFT { - ret = -EINVAL; - goto unlock_exit; - } - - } - if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) { ret = mm_iommu_adjust_locked_vm(mm, entries, true); if (ret) - goto unlock_exit; + return ret; locked_entries = entries; } @@ -150,15 +136,10 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, down_read(&mm->mmap_sem); ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL); up_read(&mm->mmap_sem); + pinned = ret > 0 ? ret : 0; if (ret != entries) { - /* free the reference taken */ - for (i = 0; i < ret; i++) - put_page(mem->hpages[i]); - - vfree(mem->hpas); - kfree(mem); ret
[PATCH kernel v2 2/2] powerpc/mm_iommu: Allow pinning large regions
When called with vmas_arg==NULL, get_user_pages_longterm() allocates an array of nr_pages*8 which can easily get greater that the max order, for example, registering memory for a 256GB guest does this and fails in __alloc_pages_nodemask(). This adds a loop over chunks of entries to fit the max order limit. Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05) Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/mm/mmu_context_iommu.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 9d9be850f8c2..8330f135294f 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -98,6 +98,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, struct mm_iommu_table_group_mem_t *mem, *mem2; long i, ret, locked_entries = 0, pinned = 0; unsigned int pageshift; + unsigned long entry, chunk; if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) { ret = mm_iommu_adjust_locked_vm(mm, entries, true); @@ -134,11 +135,26 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, } down_read(&mm->mmap_sem); - ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL); + chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) / + sizeof(struct vm_area_struct *); + chunk = min(chunk, entries); + for (entry = 0; entry < entries; entry += chunk) { + unsigned long n = min(entries - entry, chunk); + + ret = get_user_pages_longterm(ua + (entry << PAGE_SHIFT), n, + FOLL_WRITE, mem->hpages + entry, NULL); + if (ret == n) { + pinned += n; + continue; + } + if (ret > 0) + pinned += ret; + break; + } up_read(&mm->mmap_sem); - pinned = ret > 0 ? ret : 0; - if (ret != entries) { - ret = -EFAULT; + if (pinned != entries) { + if (!ret) + ret = -EFAULT; goto free_exit; } -- 2.17.1
[PATCH kernel v2 0/2] powerpc/mm_iommu: Fixes
The patches do independent things but touch exact same code so the order in which they should apply matters. This supercedes: [PATCH kernel] powerpc/mm_iommu: Allow pinning large regions [PATCH kernel 1/2] powerpc/mm_iommu: Prepare for less locking [PATCH kernel 2/2] powerpc/mm_iommu: Fix potential deadlock This is based on sha1 5e7a8ca31926 Linus Torvalds "Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs". Please comment. Thanks. Alexey Kardashevskiy (2): powerpc/mm_iommu: Fix potential deadlock powerpc/mm_iommu: Allow pinning large regions arch/powerpc/mm/mmu_context_iommu.c | 97 + 1 file changed, 58 insertions(+), 39 deletions(-) -- 2.17.1
VLC doesn't play videos anymore since the PowerPC fixes 5.1-3
Please test VLC with the RC3 of kernel 5.1. The removing of the PowerPC fixes 5.1-3 has solved the VLC issue. Another user has already confirmed that [1]. This isn’t an April Fool‘s. ;-) Thanks [1] http://forum.hyperion-entertainment.com/viewtopic.php?f=58&t=4256&start=20#p47561
Re: [PATCH] powerpc/xmon: add read-only mode
> On March 29, 2019 at 3:41 AM Christophe Leroy wrote: > > > > > Le 29/03/2019 à 05:21, cmr a écrit : > > Operations which write to memory should be restricted on secure systems > > and optionally to avoid self-destructive behaviors. > > > > Add a config option, XMON_RO, to control default xmon behavior along > > with kernel cmdline options xmon=ro and xmon=rw for explicit control. > > The default is to enable read-only mode. > > > > The following xmon operations are affected: > > memops: > > disable memmove > > disable memset > > memex: > > no-op'd mwrite > > super_regs: > > no-op'd write_spr > > bpt_cmds: > > disable > > proc_call: > > disable > > > > Signed-off-by: cmr > > A Fully qualified name should be used. What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY? > > > --- > > arch/powerpc/Kconfig.debug | 7 +++ > > arch/powerpc/xmon/xmon.c | 24 > > 2 files changed, 31 insertions(+) > > > > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug > > index 4e00cb0a5464..33cc01adf4cb 100644 > > --- a/arch/powerpc/Kconfig.debug > > +++ b/arch/powerpc/Kconfig.debug > > @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY > > to say Y here, unless you're building for a memory-constrained > > system. > > > > +config XMON_RO > > + bool "Set xmon read-only mode" > > + depends on XMON > > + default y > > Should it really be always default y ? > I would set default 'y' only when some security options are also set. > This is a good point, I based this on an internal Slack suggestion but giving this more thought, disabling read-only mode by default makes more sense. I'm not sure what security options could be set though?
Re: [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
On 4/2/19 4:55 PM, Nicholas Piggin wrote: > Using a jiffies timer creates a dependency on the tick_do_timer_cpu > incrementing jiffies. If that CPU has locked up and jiffies is not > incrementing, the watchdog heartbeat timer for all CPUs stops and > creates false positives and confusing warnings on local CPUs, and > also causes the SMP detector to stop, so the root cause is never > detected. > > Fix this by using hrtimer based timers for the watchdog heartbeat, > like the generic kernel hardlockup detector. > > Reported-by: Ravikumar Bangoria Reported-by: Ravi Bangoria Thanks, Ravi
Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere
Arnd Bergmann writes: > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl > b/arch/powerpc/kernel/syscalls/syscall.tbl > index b18abb0c3dae..00f5a63c8d9a 100644 > --- a/arch/powerpc/kernel/syscalls/syscall.tbl > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl > @@ -505,3 +505,7 @@ > 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait > compat_sys_rt_sigtimedwait_time64 > 422 32 futex_time64sys_futex > sys_futex > 423 32 sched_rr_get_interval_time64sys_sched_rr_get_interval > sys_sched_rr_get_interval > +424 common pidfd_send_signal sys_pidfd_send_signal > +425 common io_uring_setup sys_io_uring_setup > +426 common io_uring_enter sys_io_uring_enter > +427 common io_uring_register sys_io_uring_register Acked-by: Michael Ellerman (powerpc) Lightly tested. The pidfd_test selftest passes. Ran the io_uring example from fio, which prints lots of: IOPS=209952, IOS/call=32/32, inflight=117 (117), Cachehit=0.00% IOPS=209952, IOS/call=32/32, inflight=116 (116), Cachehit=0.00% IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00% IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00% IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00% IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00% IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00% IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=113 (113), Cachehit=0.00% IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=112 (112), Cachehit=0.00% IOPS=210016, IOS/call=32/32, inflight=110 (110), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=104 (104), Cachehit=0.00% IOPS=210080, IOS/call=32/32, inflight=102 (102), Cachehit=0.00% IOPS=210112, IOS/call=32/32, inflight=100 (100), Cachehit=0.00% IOPS=210080, IOS/call=32/32, inflight=97 (97), Cachehit=0.00% IOPS=210112, IOS/call=32/32, inflight=97 (97), Cachehit=0.00% IOPS=210112, IOS/call=32/31, inflight=126 (126), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=126 (126), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=125 (125), Cachehit=0.00% IOPS=210016, IOS/call=32/32, inflight=119 (119), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=117 (117), Cachehit=0.00% IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=111 (111), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=108 (108), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=107 (107), Cachehit=0.00% IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00% Which is good I think? cheers
Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere
Arnd Bergmann writes: > On Sun, Mar 31, 2019 at 5:47 PM Michael Ellerman wrote: >> >> Arnd Bergmann writes: >> > Add the io_uring and pidfd_send_signal system calls to all architectures. >> > >> > These system calls are designed to handle both native and compat tasks, >> > so all entries are the same across architectures, only arm-compat and >> > the generic tale still use an old format. >> > >> > Signed-off-by: Arnd Bergmann >> > --- >> > arch/alpha/kernel/syscalls/syscall.tbl | 4 >> > arch/arm/tools/syscall.tbl | 4 >> > arch/arm64/include/asm/unistd.h | 2 +- >> > arch/arm64/include/asm/unistd32.h | 8 >> > arch/ia64/kernel/syscalls/syscall.tbl | 4 >> > arch/m68k/kernel/syscalls/syscall.tbl | 4 >> > arch/microblaze/kernel/syscalls/syscall.tbl | 4 >> > arch/mips/kernel/syscalls/syscall_n32.tbl | 4 >> > arch/mips/kernel/syscalls/syscall_n64.tbl | 4 >> > arch/mips/kernel/syscalls/syscall_o32.tbl | 4 >> > arch/parisc/kernel/syscalls/syscall.tbl | 4 >> > arch/powerpc/kernel/syscalls/syscall.tbl| 4 >> >> Have you done any testing? >> >> I'd rather not wire up syscalls that have never been tested at all on >> powerpc. > > No, I have not. I did review the system calls carefully and added the first > patch to fix the bug on x86 compat mode before adding the same bug > on the other compat architectures though ;-) > > Generally, my feeling is that adding system calls is not fundamentally > different from adding other ABIs, and we should really do it at > the same time across all architectures, rather than waiting for each > maintainer to get around to reviewing and testing the new calls > first. This is not a problem on powerpc, but a lot of other architectures > are less active, which is how we have always ended up with > different sets of system calls across architectures. Well it's still something of a problem on powerpc. No one has volunteered to test io_uring on powerpc, so at this stage it will go in completely untested. If there was a selftest in the tree I'd be a bit happier, because at least then our CI would start testing it as soon as the syscalls were wired up in linux-next. And yeah obviously I should test it, but I don't have infinite time unfortunately. > The problem here is that this makes it harder for the C library to > know when a system call is guaranteed to be available. glibc > still needs a feature test for newly added syscalls to see if they > are working (they might be backported to an older kernel, or > disabled), but whenever the minimum kernel version is increased, > it makes sense to drop those checks and assume non-optional > system calls will work if they were part of that minimum version. But that's the thing, if we just wire them up untested they may not actually work. And then you have the far worse situation where the syscall exists in kernel version x but does not actually work properly. See the mess we have with pkeys for example. > In the future, I'd hope that any new system calls get added > right away on all architectures when they land (it was a bit > tricky this time, because I still did a bunch of reworks that > conflicted with the new calls). Bugs will happen of course, but > I think adding them sooner makes it more likely to catch those > bugs early on so we have a chance to fix them properly, > and need fewer arch specific workarounds (ideally none) > for system calls. For syscalls that have a selftest in the tree, and don't rely on anything arch specific I agree. I'm a bit more wary of things that are not easily tested and have the potential to work differently across arches. cheers
Re: [PATCH stable v4.14 13/32] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
Joakim Tjernlund writes: > On Tue, 2019-04-02 at 17:19 +1100, Michael Ellerman wrote: >> Joakim Tjernlund writes: ... >> >> > Can I compile it away? >> >> You can't actually, but you can disable it at runtime with >> "nospectre_v1" on the kernel command line. >> >> We could make it a user selectable compile time option if you really >> want it to be. > > I think yes. Considering that these patches are fairly untested and the impact > in the wild unknown. Requiring systems to change their boot config over night > is > too fast. OK. Just to be clear, you're actually using 4.14 on an NXP board and would actually use this option? I don't want to add another option just for a theoretical use case. cheers
[PATCH] powerpc: config: skiroot: Add (back) MLX5 ethernet support
It turns out that some defconfig changes and kernel config option changes meant we accidentally dropped Ethernet support for Mellanox CLX5 cards. Reported-by: Carol L Soto Suggested-by: Carol L Soto Signed-off-by: Stewart Smith Signed-off-by: Joel Stanley --- arch/powerpc/configs/skiroot_defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig index 5ba131c30f6b..6038b9347d9e 100644 --- a/arch/powerpc/configs/skiroot_defconfig +++ b/arch/powerpc/configs/skiroot_defconfig @@ -163,6 +163,8 @@ CONFIG_S2IO=m CONFIG_MLX4_EN=m # CONFIG_MLX4_CORE_GEN2 is not set CONFIG_MLX5_CORE=m +CONFIG_MLX5_CORE_EN=y +# CONFIG_MLX5_EN_RXNFC is not set # CONFIG_NET_VENDOR_MICREL is not set # CONFIG_NET_VENDOR_MICROSEMI is not set CONFIG_MYRI10GE=m -- 2.20.1
Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
On Tue, 02 Apr 2019, Andrew Morton wrote: Also, we didn't remove any down_write(mmap_sem)s from core code so I'm thinking that the benefit of removing a few mmap_sem-takings from a few obscure drivers (sorry ;)) is pretty small. afaik porting the remaining incorrect users of locked_vm to pinned_vm was the next step before this one, which made converting locked_vm to atomic hardly worth it. Daniel? Thanks, Davidlohr
Re: [PATCH 0/4] Enabling secure boot on PowerNV systems
On 4/2/19 6:51 PM, Matthew Garrett wrote: > On Tue, Apr 2, 2019 at 2:11 PM Claudio Carvalho > wrote: >> We want to use the efivarfs for compatibility with existing userspace >> tools. We will track and match any EFI changes that affect us. > So you implement the full PK/KEK/db/dbx/dbt infrastructure, and > updates are signed in the same way? For the first version, our firmware will implement a simplistic PK, KEK and db infrastructure (without dbx and dbt) where only the Setup and User modes will be supported. PK, KEK and db updates will be signed the same way, that is, using userspace tooling like efitools in PowerNV. As for the authentication descriptors, only the EFI_VARIABLE_AUTHENTICATION_2 descriptor will be supported. >> Our use case is restricted to secure boot - this is not going to be a >> general purpose EFI variable implementation. > In that case we might be better off with a generic interface for this > purpose that we can expose on all platforms that implement a secure > boot key hierarchy. Having an efivarfs that doesn't allow the creation > of arbitrary attributes may break other existing userland > expectations. > For what it's worth, gsmi uses the efivars infrastructure for EFI-like variables. What might a generic interface look like? It would have to work for existing secure boot solutions - including EFI - which would seem to imply changes to userspace tools. Claudio
Re: [PATCH v1 2/4] soc/fsl/guts: Add definition for LX2160A
On Tue, Feb 26, 2019 at 4:12 AM Vabhav Sharma wrote: > > Adding compatible string "lx2160a-dcfg" to > initialize guts driver for lx2160 and SoC die > attribute definition for LX2160A Applied to branch next. Thanks. Regards, Leo > > Signed-off-by: Vabhav Sharma > Signed-off-by: Yinbo Zhu > Acked-by: Li Yang > --- > drivers/soc/fsl/guts.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c > index 302e0c8..bcab1ee 100644 > --- a/drivers/soc/fsl/guts.c > +++ b/drivers/soc/fsl/guts.c > @@ -100,6 +100,11 @@ static const struct fsl_soc_die_attr fsl_soc_die[] = { > .svr = 0x8700, > .mask = 0xfff7, > }, > + /* Die: LX2160A, SoC: LX2160A/LX2120A/LX2080A */ > + { .die = "LX2160A", > + .svr = 0x8736, > + .mask = 0xff3f, > + }, > { }, > }; > > @@ -222,6 +227,7 @@ static const struct of_device_id fsl_guts_of_match[] = { > { .compatible = "fsl,ls1088a-dcfg", }, > { .compatible = "fsl,ls1012a-dcfg", }, > { .compatible = "fsl,ls1046a-dcfg", }, > + { .compatible = "fsl,lx2160a-dcfg", }, > {} > }; > MODULE_DEVICE_TABLE(of, fsl_guts_of_match); > -- > 2.7.4 >
Re: [PATCH] soc/fsl/qe: Fix an error code in qe_pin_request()
On Thu, Mar 28, 2019 at 9:21 AM Dan Carpenter wrote: > > We forgot to set "err" on this error path. > > Fixes: 1a2d397a6eb5 ("gpio/powerpc: Eliminate duplication of > of_get_named_gpio_flags()") > Signed-off-by: Dan Carpenter Applied to fix branch. Thanks. Regards, Leo > --- > drivers/soc/fsl/qe/gpio.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/soc/fsl/qe/gpio.c b/drivers/soc/fsl/qe/gpio.c > index 819bed0f5667..51b3a47b5a55 100644 > --- a/drivers/soc/fsl/qe/gpio.c > +++ b/drivers/soc/fsl/qe/gpio.c > @@ -179,8 +179,10 @@ struct qe_pin *qe_pin_request(struct device_node *np, > int index) > if (err < 0) > goto err0; > gc = gpio_to_chip(err); > - if (WARN_ON(!gc)) > + if (WARN_ON(!gc)) { > + err = -ENODEV; > goto err0; > + } > > if (!of_device_is_compatible(gc->of_node, > "fsl,mpc8323-qe-pario-bank")) { > pr_debug("%s: tried to get a non-qe pin\n", __func__); > -- > 2.17.1 >
Re: [PATCH 0/4] Enabling secure boot on PowerNV systems
On Tue, Apr 2, 2019 at 2:11 PM Claudio Carvalho wrote: > We want to use the efivarfs for compatibility with existing userspace > tools. We will track and match any EFI changes that affect us. So you implement the full PK/KEK/db/dbx/dbt infrastructure, and updates are signed in the same way? > Our use case is restricted to secure boot - this is not going to be a > general purpose EFI variable implementation. In that case we might be better off with a generic interface for this purpose that we can expose on all platforms that implement a secure boot key hierarchy. Having an efivarfs that doesn't allow the creation of arbitrary attributes may break other existing userland expectations.
[PATCH 0/6] convert locked_vm from unsigned long to atomic64_t
Hi, >From patch 1: Taking and dropping mmap_sem to modify a single counter, locked_vm, is overkill when the counter could be synchronized separately. Make mmap_sem a little less coarse by changing locked_vm to an atomic, the 64-bit variety to avoid issues with overflow on 32-bit systems. This is a more conservative alternative to [1] with no user-visible effects. Thanks to Alexey Kardashevskiy for pointing out the racy atomics and to Alex Williamson, Christoph Lameter, Ira Weiny, and Jason Gunthorpe for their comments on [1]. Davidlohr Bueso recently did a similar conversion for pinned_vm[2]. Testing 1. passes LTP mlock[all], munlock[all], fork, mmap, and mremap tests in an x86 kvm guest 2. a VFIO-enabled x86 kvm guest shows the same VmLck in /proc/pid/status before and after this change 3. cross-compiles on powerpc The series is based on v5.1-rc3. Please consider for 5.2. Daniel [1] https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jor...@oracle.com/ [2] https://lore.kernel.org/linux-mm/20190206175920.31082-1-d...@stgolabs.net/ Daniel Jordan (6): mm: change locked_vm's type from unsigned long to atomic64_t vfio/type1: drop mmap_sem now that locked_vm is atomic vfio/spapr_tce: drop mmap_sem now that locked_vm is atomic fpga/dlf/afu: drop mmap_sem now that locked_vm is atomic powerpc/mmu: drop mmap_sem now that locked_vm is atomic kvm/book3s: drop mmap_sem now that locked_vm is atomic arch/powerpc/kvm/book3s_64_vio.c| 34 ++-- arch/powerpc/mm/mmu_context_iommu.c | 28 +--- drivers/fpga/dfl-afu-dma-region.c | 40 - drivers/vfio/vfio_iommu_spapr_tce.c | 37 -- drivers/vfio/vfio_iommu_type1.c | 31 +- fs/proc/task_mmu.c | 2 +- include/linux/mm_types.h| 2 +- kernel/fork.c | 2 +- mm/debug.c | 5 ++-- mm/mlock.c | 4 +-- mm/mmap.c | 18 ++--- mm/mremap.c | 6 ++--- 12 files changed, 89 insertions(+), 120 deletions(-) base-commit: 79a3aaa7b82e3106be97842dedfd8429248896e6 -- 2.21.0
Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
On Tue, 2 Apr 2019 16:41:53 -0400 Daniel Jordan wrote: > Taking and dropping mmap_sem to modify a single counter, locked_vm, is > overkill when the counter could be synchronized separately. > > Make mmap_sem a little less coarse by changing locked_vm to an atomic, > the 64-bit variety to avoid issues with overflow on 32-bit systems. > > ... > > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long > tce_pages) > static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) > { > long ret = 0; > + s64 locked_vm; > > if (!current || !current->mm) > return ret; /* process exited */ > > down_write(¤t->mm->mmap_sem); > > + locked_vm = atomic64_read(¤t->mm->locked_vm); > if (inc) { > unsigned long locked, lock_limit; > > - locked = current->mm->locked_vm + stt_pages; > + locked = locked_vm + stt_pages; > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > if (locked > lock_limit && !capable(CAP_IPC_LOCK)) > ret = -ENOMEM; > else > - current->mm->locked_vm += stt_pages; > + atomic64_add(stt_pages, ¤t->mm->locked_vm); > } else { > - if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm)) > - stt_pages = current->mm->locked_vm; > + if (WARN_ON_ONCE(stt_pages > locked_vm)) > + stt_pages = locked_vm; > > - current->mm->locked_vm -= stt_pages; > + atomic64_sub(stt_pages, ¤t->mm->locked_vm); > } With the current code, current->mm->locked_vm cannot go negative. After the patch, it can go negative. If someone else decreased current->mm->locked_vm between this function's atomic64_read() and atomic64_sub(). I guess this is a can't-happen in this case because the racing code which performed the modification would have taken it negative anyway. But this all makes me rather queazy. Also, we didn't remove any down_write(mmap_sem)s from core code so I'm thinking that the benefit of removing a few mmap_sem-takings from a few obscure drivers (sorry ;)) is pretty small. Also, the argument for switching 32-bit arches to a 64-bit counter was suspiciously vague. What overflow issues? Or are we just being lazy?
[PATCH 6/6] kvm/book3s: drop mmap_sem now that locked_vm is atomic
With locked_vm now an atomic, there is no need to take mmap_sem as writer. Delete and refactor accordingly. Signed-off-by: Daniel Jordan Cc: Alexey Kardashevskiy Cc: Andrew Morton Cc: Benjamin Herrenschmidt Cc: Christoph Lameter Cc: Davidlohr Bueso Cc: Michael Ellerman Cc: Paul Mackerras Cc: Cc: Cc: Cc: --- arch/powerpc/kvm/book3s_64_vio.c | 34 +++- 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index e7fdb6d10eeb..8e034c3a5d25 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -56,7 +56,7 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages) return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE; } -static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) +static long kvmppc_account_memlimit(unsigned long pages, bool inc) { long ret = 0; s64 locked_vm; @@ -64,33 +64,23 @@ static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) if (!current || !current->mm) return ret; /* process exited */ - down_write(¤t->mm->mmap_sem); - - locked_vm = atomic64_read(¤t->mm->locked_vm); if (inc) { - unsigned long locked, lock_limit; + unsigned long lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - locked = locked_vm + stt_pages; - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) + locked_vm = atomic64_add_return(pages, ¤t->mm->locked_vm); + if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) { ret = -ENOMEM; - else - atomic64_add(stt_pages, ¤t->mm->locked_vm); + atomic64_sub(pages, ¤t->mm->locked_vm); + } } else { - if (WARN_ON_ONCE(stt_pages > locked_vm)) - stt_pages = locked_vm; - - atomic64_sub(stt_pages, ¤t->mm->locked_vm); + locked_vm = atomic64_sub_return(pages, ¤t->mm->locked_vm); + WARN_ON_ONCE(locked_vm < 0); } - pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid, - inc ? '+' : '-', - stt_pages << PAGE_SHIFT, - atomic64_read(¤t->mm->locked_vm) << PAGE_SHIFT, - rlimit(RLIMIT_MEMLOCK), - ret ? " - exceeded" : ""); - - up_write(¤t->mm->mmap_sem); + pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%lu %lld/%lu%s\n", current->pid, + inc ? '+' : '-', pages << PAGE_SHIFT, + locked_vm << PAGE_SHIFT, + rlimit(RLIMIT_MEMLOCK), ret ? " - exceeded" : ""); return ret; } -- 2.21.0
Re: [PATCH 0/4] Enabling secure boot on PowerNV systems
On 4/2/19 4:36 PM, Matthew Garrett wrote: > On Tue, Apr 2, 2019 at 11:15 AM Claudio Carvalho > wrote: >> 1. Enable efivarfs by selecting CONFIG_EFI in the CONFIG_OPAL_SECVAR >>introduced in this patch set. With CONFIG_EFIVAR_FS, userspace tools can >>be used to manage the secure variables. > efivarfs has some pretty significant behavioural semantics that > directly reflect the EFI specification. Using it to expose non-EFI > variable data feels like it's going to increase fragility - there's a > risk that we'll change things in a way that makes sense for the EFI > spec but breaks your use case. Is the desire to use efivarfs to > maintain consistency with existing userland tooling, or just to avoid > having a separate filesystem? > We want to use the efivarfs for compatibility with existing userspace tools. We will track and match any EFI changes that affect us. Our use case is restricted to secure boot - this is not going to be a general purpose EFI variable implementation. Claudio
[PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
With locked_vm now an atomic, there is no need to take mmap_sem as writer. Delete and refactor accordingly. Signed-off-by: Daniel Jordan Cc: Alexey Kardashevskiy Cc: Andrew Morton Cc: Benjamin Herrenschmidt Cc: Christoph Lameter Cc: Davidlohr Bueso Cc: Michael Ellerman Cc: Paul Mackerras Cc: Cc: Cc: --- arch/powerpc/mm/mmu_context_iommu.c | 27 +++ 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 8038ac24a312..a4ef22b67c07 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -54,34 +54,29 @@ struct mm_iommu_table_group_mem_t { static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, unsigned long npages, bool incr) { - long ret = 0, locked, lock_limit; + long ret = 0; + unsigned long lock_limit; s64 locked_vm; if (!npages) return 0; - down_write(&mm->mmap_sem); - locked_vm = atomic64_read(&mm->locked_vm); if (incr) { - locked = locked_vm + npages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) + locked_vm = atomic64_add_return(npages, &mm->locked_vm); + if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) { ret = -ENOMEM; - else - atomic64_add(npages, &mm->locked_vm); + atomic64_sub(npages, &mm->locked_vm); + } } else { - if (WARN_ON_ONCE(npages > locked_vm)) - npages = locked_vm; - atomic64_sub(npages, &mm->locked_vm); + locked_vm = atomic64_sub_return(npages, &mm->locked_vm); + WARN_ON_ONCE(locked_vm < 0); } - pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", - current ? current->pid : 0, - incr ? '+' : '-', - npages << PAGE_SHIFT, - atomic64_read(&mm->locked_vm) << PAGE_SHIFT, + pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%lu %lld/%lu\n", + current ? current->pid : 0, incr ? '+' : '-', + npages << PAGE_SHIFT, locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK)); - up_write(&mm->mmap_sem); return ret; } -- 2.21.0
[PATCH v3 5/5] Lib: sort.h: remove the size argument from the swap function
Removes size argument from the swap function because: 1) It wasn't used. 2) Custom swap function knows what kind of objects it swaps, so it already knows their sizes. Signed-off-by: Andrey Abramov Reviewed by: George Spelvin --- arch/x86/kernel/unwind_orc.c | 2 +- include/linux/sort.h | 2 +- kernel/jump_label.c | 2 +- lib/extable.c| 2 +- lib/sort.c | 7 +++ 5 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 89be1be1790c..dc410b567189 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -176,7 +176,7 @@ static struct orc_entry *orc_find(unsigned long ip) return orc_ftrace_find(ip); } -static void orc_sort_swap(void *_a, void *_b, int size) +static void orc_sort_swap(void *_a, void *_b) { struct orc_entry *orc_a, *orc_b; struct orc_entry orc_tmp; diff --git a/include/linux/sort.h b/include/linux/sort.h index 2b99a5dd073d..13bb4635b5f1 100644 --- a/include/linux/sort.h +++ b/include/linux/sort.h @@ -6,6 +6,6 @@ void sort(void *base, size_t num, size_t size, int (*cmp)(const void *, const void *), - void (*swap)(void *, void *, int)); + void (*swap)(void *, void *)); #endif diff --git a/kernel/jump_label.c b/kernel/jump_label.c index bad96b476eb6..6b1187b8a060 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -45,7 +45,7 @@ static int jump_label_cmp(const void *a, const void *b) return 0; } -static void jump_label_swap(void *a, void *b, int size) +static void jump_label_swap(void *a, void *b) { long delta = (unsigned long)a - (unsigned long)b; struct jump_entry *jea = a; diff --git a/lib/extable.c b/lib/extable.c index f54996fdd0b8..0515a94538ca 100644 --- a/lib/extable.c +++ b/lib/extable.c @@ -28,7 +28,7 @@ static inline unsigned long ex_to_insn(const struct exception_table_entry *x) #ifndef ARCH_HAS_RELATIVE_EXTABLE #define swap_exNULL #else -static void swap_ex(void *a, void *b, int size) +static void swap_ex(void *a, void *b) { struct exception_table_entry *x = a, *y = b, tmp; int delta = b - a; diff --git a/lib/sort.c b/lib/sort.c index 50855ea8c262..8704750e6bde 100644 --- a/lib/sort.c +++ b/lib/sort.c @@ -114,7 +114,7 @@ static void swap_bytes(void *a, void *b, size_t n) } while (n); } -typedef void (*swap_func_t)(void *a, void *b, int size); +typedef void (*swap_func_t)(void *a, void *b); /* * The values are arbitrary as long as they can't be confused with @@ -138,7 +138,7 @@ static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func) else if (swap_func == SWAP_BYTES) swap_bytes(a, b, size); else - swap_func(a, b, (int)size); + swap_func(a, b); } /** @@ -186,8 +186,7 @@ static size_t parent(size_t i, unsigned int lsbit, size_t size) * it less suitable for kernel use. */ void sort(void *base, size_t num, size_t size, - int (*cmp_func)(const void *, const void *), - void (*swap_func)(void *, void *, int size)) + int (*cmp_func)(const void *, const void *), swap_func_t swap_func) { /* pre-scale counters for performance */ size_t n = num * size, a = (num/2) * size; -- 2.21.0
[PATCH v3 4/5] ubifs: find.c: replace swap function with built-in one
Replace swap_dirty_idx function with built-in one, because swap_dirty_idx does only a simple byte to byte swap. Since Spectre mitigations have made indirect function calls more expensive, and the default simple byte copies swap is implemented without them, an "optimized" custom swap function is now a waste of time as well as code. Signed-off-by: Andrey Abramov Reviewed by: George Spelvin --- v2->v3: nothing changed fs/ubifs/find.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c index f9646835b026..5deaae7fcead 100644 --- a/fs/ubifs/find.c +++ b/fs/ubifs/find.c @@ -747,12 +747,6 @@ static int cmp_dirty_idx(const struct ubifs_lprops **a, return lpa->dirty + lpa->free - lpb->dirty - lpb->free; } -static void swap_dirty_idx(struct ubifs_lprops **a, struct ubifs_lprops **b, - int size) -{ - swap(*a, *b); -} - /** * ubifs_save_dirty_idx_lnums - save an array of the most dirty index LEB nos. * @c: the UBIFS file-system description object @@ -772,8 +766,7 @@ int ubifs_save_dirty_idx_lnums(struct ubifs_info *c) sizeof(void *) * c->dirty_idx.cnt); /* Sort it so that the dirtiest is now at the end */ sort(c->dirty_idx.arr, c->dirty_idx.cnt, sizeof(void *), -(int (*)(const void *, const void *))cmp_dirty_idx, -(void (*)(void *, void *, int))swap_dirty_idx); +(int (*)(const void *, const void *))cmp_dirty_idx, NULL); dbg_find("found %d dirty index LEBs", c->dirty_idx.cnt); if (c->dirty_idx.cnt) dbg_find("dirtiest index LEB is %d with dirty %d and free %d", -- 2.21.0
[PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
Taking and dropping mmap_sem to modify a single counter, locked_vm, is overkill when the counter could be synchronized separately. Make mmap_sem a little less coarse by changing locked_vm to an atomic, the 64-bit variety to avoid issues with overflow on 32-bit systems. Signed-off-by: Daniel Jordan Cc: Alan Tull Cc: Alexey Kardashevskiy Cc: Alex Williamson Cc: Andrew Morton Cc: Benjamin Herrenschmidt Cc: Christoph Lameter Cc: Davidlohr Bueso Cc: Michael Ellerman Cc: Moritz Fischer Cc: Paul Mackerras Cc: Wu Hao Cc: Cc: Cc: Cc: Cc: Cc: --- arch/powerpc/kvm/book3s_64_vio.c| 14 -- arch/powerpc/mm/mmu_context_iommu.c | 15 --- drivers/fpga/dfl-afu-dma-region.c | 18 ++ drivers/vfio/vfio_iommu_spapr_tce.c | 17 + drivers/vfio/vfio_iommu_type1.c | 10 ++ fs/proc/task_mmu.c | 2 +- include/linux/mm_types.h| 2 +- kernel/fork.c | 2 +- mm/debug.c | 5 +++-- mm/mlock.c | 4 ++-- mm/mmap.c | 18 +- mm/mremap.c | 6 +++--- 12 files changed, 61 insertions(+), 52 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index f02b04973710..e7fdb6d10eeb 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages) static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) { long ret = 0; + s64 locked_vm; if (!current || !current->mm) return ret; /* process exited */ down_write(¤t->mm->mmap_sem); + locked_vm = atomic64_read(¤t->mm->locked_vm); if (inc) { unsigned long locked, lock_limit; - locked = current->mm->locked_vm + stt_pages; + locked = locked_vm + stt_pages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; if (locked > lock_limit && !capable(CAP_IPC_LOCK)) ret = -ENOMEM; else - current->mm->locked_vm += stt_pages; + atomic64_add(stt_pages, ¤t->mm->locked_vm); } else { - if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm)) - stt_pages = current->mm->locked_vm; + if (WARN_ON_ONCE(stt_pages > locked_vm)) + stt_pages = locked_vm; - current->mm->locked_vm -= stt_pages; + atomic64_sub(stt_pages, ¤t->mm->locked_vm); } pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid, inc ? '+' : '-', stt_pages << PAGE_SHIFT, - current->mm->locked_vm << PAGE_SHIFT, + atomic64_read(¤t->mm->locked_vm) << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK), ret ? " - exceeded" : ""); diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index e7a9c4f6bfca..8038ac24a312 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, unsigned long npages, bool incr) { long ret = 0, locked, lock_limit; + s64 locked_vm; if (!npages) return 0; down_write(&mm->mmap_sem); - + locked_vm = atomic64_read(&mm->locked_vm); if (incr) { - locked = mm->locked_vm + npages; + locked = locked_vm + npages; lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; if (locked > lock_limit && !capable(CAP_IPC_LOCK)) ret = -ENOMEM; else - mm->locked_vm += npages; + atomic64_add(npages, &mm->locked_vm); } else { - if (WARN_ON_ONCE(npages > mm->locked_vm)) - npages = mm->locked_vm; - mm->locked_vm -= npages; + if (WARN_ON_ONCE(npages > locked_vm)) + npages = locked_vm; + atomic64_sub(npages, &mm->locked_vm); } pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", current ? current->pid : 0, incr ? '+' : '-', npages << PAGE_SHIFT, - mm->locked_vm << PAGE_SHIFT, + atomic64_read(&mm->locked_vm) << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK)); up_write(&mm->mmap_sem); diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c index e18a786fc943..08132fd9b6b7 100644 --- a/drivers/fpga/dfl-afu-dma-region.c +++ b/drivers/f
[PATCH v3 3/5] ocfs2: dir, refcounttree, xattr: replace swap functions with built-in one
Replace dx_leaf_sort_swap, swap_refcount_rec and swap_xe functions with built-in one, because they do only a simple byte to byte swap. Since Spectre mitigations have made indirect function calls more expensive, and the default simple byte copies swap is implemented without them, an "optimized" custom swap function is now a waste of time as well as code. Signed-off-by: Andrey Abramov Reviewed by: George Spelvin --- v2->v3: nothing changed fs/ocfs2/dir.c | 13 + fs/ocfs2/refcounttree.c | 13 +++-- fs/ocfs2/xattr.c| 15 +++ 3 files changed, 7 insertions(+), 34 deletions(-) diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c index c121abbdfc7d..4b86b181df0a 100644 --- a/fs/ocfs2/dir.c +++ b/fs/ocfs2/dir.c @@ -3529,16 +3529,6 @@ static int dx_leaf_sort_cmp(const void *a, const void *b) return 0; } -static void dx_leaf_sort_swap(void *a, void *b, int size) -{ - struct ocfs2_dx_entry *entry1 = a; - struct ocfs2_dx_entry *entry2 = b; - - BUG_ON(size != sizeof(*entry1)); - - swap(*entry1, *entry2); -} - static int ocfs2_dx_leaf_same_major(struct ocfs2_dx_leaf *dx_leaf) { struct ocfs2_dx_entry_list *dl_list = &dx_leaf->dl_list; @@ -3799,8 +3789,7 @@ static int ocfs2_dx_dir_rebalance(struct ocfs2_super *osb, struct inode *dir, * This block is changing anyway, so we can sort it in place. */ sort(dx_leaf->dl_list.de_entries, num_used, -sizeof(struct ocfs2_dx_entry), dx_leaf_sort_cmp, -dx_leaf_sort_swap); +sizeof(struct ocfs2_dx_entry), dx_leaf_sort_cmp, NULL); ocfs2_journal_dirty(handle, dx_leaf_bh); diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index 1dc9a08e8bdc..7bbc94d23a0c 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -1400,13 +1400,6 @@ static int cmp_refcount_rec_by_cpos(const void *a, const void *b) return 0; } -static void swap_refcount_rec(void *a, void *b, int size) -{ - struct ocfs2_refcount_rec *l = a, *r = b; - - swap(*l, *r); -} - /* * The refcount cpos are ordered by their 64bit cpos, * But we will use the low 32 bit to be the e_cpos in the b-tree. @@ -1482,7 +1475,7 @@ static int ocfs2_divide_leaf_refcount_block(struct buffer_head *ref_leaf_bh, */ sort(&rl->rl_recs, le16_to_cpu(rl->rl_used), sizeof(struct ocfs2_refcount_rec), -cmp_refcount_rec_by_low_cpos, swap_refcount_rec); +cmp_refcount_rec_by_low_cpos, NULL); ret = ocfs2_find_refcount_split_pos(rl, &cpos, &split_index); if (ret) { @@ -1507,11 +1500,11 @@ static int ocfs2_divide_leaf_refcount_block(struct buffer_head *ref_leaf_bh, sort(&rl->rl_recs, le16_to_cpu(rl->rl_used), sizeof(struct ocfs2_refcount_rec), -cmp_refcount_rec_by_cpos, swap_refcount_rec); +cmp_refcount_rec_by_cpos, NULL); sort(&new_rl->rl_recs, le16_to_cpu(new_rl->rl_used), sizeof(struct ocfs2_refcount_rec), -cmp_refcount_rec_by_cpos, swap_refcount_rec); +cmp_refcount_rec_by_cpos, NULL); *split_cpos = cpos; return 0; diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index 3a24ce3deb01..b3e6f42baf78 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -4175,15 +4175,6 @@ static int cmp_xe(const void *a, const void *b) return 0; } -static void swap_xe(void *a, void *b, int size) -{ - struct ocfs2_xattr_entry *l = a, *r = b, tmp; - - tmp = *l; - memcpy(l, r, sizeof(struct ocfs2_xattr_entry)); - memcpy(r, &tmp, sizeof(struct ocfs2_xattr_entry)); -} - /* * When the ocfs2_xattr_block is filled up, new bucket will be created * and all the xattr entries will be moved to the new bucket. @@ -4249,7 +4240,7 @@ static void ocfs2_cp_xattr_block_to_bucket(struct inode *inode, trace_ocfs2_cp_xattr_block_to_bucket_end(offset, size, off_change); sort(target + offset, count, sizeof(struct ocfs2_xattr_entry), -cmp_xe, swap_xe); +cmp_xe, NULL); } /* @@ -,7 +4435,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode, */ sort(entries, le16_to_cpu(xh->xh_count), sizeof(struct ocfs2_xattr_entry), -cmp_xe_offset, swap_xe); +cmp_xe_offset, NULL); /* Move all name/values to the end of the bucket. */ xe = xh->xh_entries; @@ -4486,7 +4477,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode, /* sort the entries by their name_hash. */ sort(entries, le16_to_cpu(xh->xh_count), sizeof(struct ocfs2_xattr_entry), -cmp_xe, swap_xe); +cmp_xe, NULL); buf = bucket_buf; for (i = 0; i < bucket->bu_blocks; i++, buf += blocksize) -- 2.21.0
[PATCH v3 2/5] powerpc: module_[32|64].c: replace swap function with built-in one
Replace relaswap with built-in one, because relaswap does a simple byte to byte swap. Since Spectre mitigations have made indirect function calls more expensive, and the default simple byte copies swap is implemented without them, an "optimized" custom swap function is now a waste of time as well as code. Signed-off-by: Andrey Abramov Reviewed by: George Spelvin Acked-by: Michael Ellerman (powerpc) --- v2->v3: nothing changed arch/powerpc/kernel/module_32.c | 17 + arch/powerpc/kernel/module_64.c | 17 + 2 files changed, 2 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c index 88d83771f462..c311e8575d10 100644 --- a/arch/powerpc/kernel/module_32.c +++ b/arch/powerpc/kernel/module_32.c @@ -79,21 +79,6 @@ static int relacmp(const void *_x, const void *_y) return 0; } -static void relaswap(void *_x, void *_y, int size) -{ - uint32_t *x, *y, tmp; - int i; - - y = (uint32_t *)_x; - x = (uint32_t *)_y; - - for (i = 0; i < sizeof(Elf32_Rela) / sizeof(uint32_t); i++) { - tmp = x[i]; - x[i] = y[i]; - y[i] = tmp; - } -} - /* Get the potential trampolines size required of the init and non-init sections */ static unsigned long get_plt_size(const Elf32_Ehdr *hdr, @@ -130,7 +115,7 @@ static unsigned long get_plt_size(const Elf32_Ehdr *hdr, */ sort((void *)hdr + sechdrs[i].sh_offset, sechdrs[i].sh_size / sizeof(Elf32_Rela), -sizeof(Elf32_Rela), relacmp, relaswap); +sizeof(Elf32_Rela), relacmp, NULL); ret += count_relocs((void *)hdr + sechdrs[i].sh_offset, diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 8661eea78503..0c833d7f36f1 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -231,21 +231,6 @@ static int relacmp(const void *_x, const void *_y) return 0; } -static void relaswap(void *_x, void *_y, int size) -{ - uint64_t *x, *y, tmp; - int i; - - y = (uint64_t *)_x; - x = (uint64_t *)_y; - - for (i = 0; i < sizeof(Elf64_Rela) / sizeof(uint64_t); i++) { - tmp = x[i]; - x[i] = y[i]; - y[i] = tmp; - } -} - /* Get size of potential trampolines required. */ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr, const Elf64_Shdr *sechdrs) @@ -269,7 +254,7 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr, */ sort((void *)sechdrs[i].sh_addr, sechdrs[i].sh_size / sizeof(Elf64_Rela), -sizeof(Elf64_Rela), relacmp, relaswap); +sizeof(Elf64_Rela), relacmp, NULL); relocs += count_relocs((void *)sechdrs[i].sh_addr, sechdrs[i].sh_size -- 2.21.0
[PATCH v3 1/5] arch/arc: unwind.c: replace swap function with built-in one
Replace swap_eh_frame_hdr_table_entries with built-in one, because swap_eh_frame_hdr_table_entries does a simple byte to byte swap. Since Spectre mitigations have made indirect function calls more expensive, and the default simple byte copies swap is implemented without them, an "optimized" custom swap function is now a waste of time as well as code. Signed-off-by: Andrey Abramov Reviewed by: George Spelvin Acked-by: Vineet Gupta --- v2->v3: nothing changed arch/arc/kernel/unwind.c | 20 ++-- 1 file changed, 2 insertions(+), 18 deletions(-) diff --git a/arch/arc/kernel/unwind.c b/arch/arc/kernel/unwind.c index 271e9fafa479..7610fe84afea 100644 --- a/arch/arc/kernel/unwind.c +++ b/arch/arc/kernel/unwind.c @@ -248,20 +248,6 @@ static int cmp_eh_frame_hdr_table_entries(const void *p1, const void *p2) return (e1->start > e2->start) - (e1->start < e2->start); } -static void swap_eh_frame_hdr_table_entries(void *p1, void *p2, int size) -{ - struct eh_frame_hdr_table_entry *e1 = p1; - struct eh_frame_hdr_table_entry *e2 = p2; - unsigned long v; - - v = e1->start; - e1->start = e2->start; - e2->start = v; - v = e1->fde; - e1->fde = e2->fde; - e2->fde = v; -} - static void init_unwind_hdr(struct unwind_table *table, void *(*alloc) (unsigned long)) { @@ -354,10 +340,8 @@ static void init_unwind_hdr(struct unwind_table *table, } WARN_ON(n != header->fde_count); - sort(header->table, -n, -sizeof(*header->table), -cmp_eh_frame_hdr_table_entries, swap_eh_frame_hdr_table_entries); + sort(header->table, n, +sizeof(*header->table), cmp_eh_frame_hdr_table_entries, NULL); table->hdrsz = hdrSize; smp_wmb(); -- 2.21.0
[PATCH v3 0/5] simple sort swap function improvements
This is the logical continuation of the "lib/sort & lib/list_sort: faster and smaller" series by George Spelvin (added to linux-next recently). Since Spectre mitigations have made indirect function calls more expensive, and the previous patch series implements the default simple byte copies without them, an "optimized" custom swap function is now a waste of time as well as code. Patches 1 to 4 replace trivial swap functions with the built-in (which is now much faster) and are grouped by subsystem. Being pure code deletion patches, they are sure to bring joy to Linus's heart. Having reviewed all call sites, only three non-trivial swap functions remain: arch/x86/kernel/unwind_orc.c, kernel/jump_label.c and lib/extable.c. Patch #5 removes size argument from the swap function because: 1) It wasn't used. 2) Custom swap function knows what kind of objects it swaps, so it already knows their sizes. v1->v2: Only commit messages have changed to better explain the purpose of commits. (Thanks to George Spelvin and Greg KH) v2->v3: Patch #5 now completely removes the size argument Andrey Abramov (5): arch/arc: unwind.c: replace swap function with built-in one powerpc: module_[32|64].c: replace swap function with built-in one ocfs2: dir,refcounttree,xattr: replace swap functions with built-in one ubifs: find.c: replace swap function with built-in one Lib: sort.h: remove the size argument from the swap function arch/arc/kernel/unwind.c| 20 ++-- arch/powerpc/kernel/module_32.c | 17 + arch/powerpc/kernel/module_64.c | 17 + arch/x86/kernel/unwind_orc.c| 2 +- fs/ocfs2/dir.c | 13 + fs/ocfs2/refcounttree.c | 13 +++-- fs/ocfs2/xattr.c| 15 +++ fs/ubifs/find.c | 9 + include/linux/sort.h| 2 +- kernel/jump_label.c | 2 +- lib/extable.c | 2 +- lib/sort.c | 7 +++ 12 files changed, 19 insertions(+), 100 deletions(-) -- 2.21.0
Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions
Le 02/04/2019 à 18:14, Andrey Ryabinin a écrit : On 4/2/19 12:43 PM, Christophe Leroy wrote: Hi Dmitry, Andrey and others, Do you have any comments to this series ? I don't see justification for adding all these non-instrumented functions. We need only some subset of these functions and only on powerpc so far. Arches that don't use str*() that early simply doesn't need not-instrumented __str*() variant. Also I don't think that auto-replace str* to __str* for all not instrumented files is a good idea, as this will reduce KASAN coverage. E.g. we don't instrument slub.c but there is no reason to use non-instrumented __str*() functions there. Ok, I didn't see it that way. In fact I was seeing the opposite and was considering it as an opportunity to increase KASAN coverage. E.g.: at the time being things like the above (from arch/xtensa/include/asm/string.h) are not covered at all I believe: #define __HAVE_ARCH_STRCPY static inline char *strcpy(char *__dest, const char *__src) { register char *__xdest = __dest; unsigned long __dummy; __asm__ __volatile__("1:\n\t" "l8ui %2, %1, 0\n\t" "s8i %2, %0, 0\n\t" "addi %1, %1, 1\n\t" "addi %0, %0, 1\n\t" "bnez %2, 1b\n\t" : "=r" (__dest), "=r" (__src), "=&r" (__dummy) : "0" (__dest), "1" (__src) : "memory"); return __xdest; } In my series, I have deactivated optimised string functions when KASAN is selected like arm64 do. See https://patchwork.ozlabs.org/patch/1055780/ But not every arch does that, meaning that some string functions remains not instrumented at all. Also, I was seeing it as a way to reduce impact on performance with KASAN. Because instrumenting each byte access of the non-optimised string functions is a performance genocide. And finally, this series make bug reporting slightly worse. E.g. let's look at strcpy(): +char *strcpy(char *dest, const char *src) +{ + size_t len = __strlen(src) + 1; + + check_memory_region((unsigned long)src, len, false, _RET_IP_); + check_memory_region((unsigned long)dest, len, true, _RET_IP_); + + return __strcpy(dest, src); +} If src is not-null terminated string we might not see proper out-of-bounds report from KASAN only a crash in __strlen(). Which might make harder to identify where 'src' comes from, where it was allocated and what's the size of allocated area. I'd like to know if this approach is ok or if it is better to keep doing as in https://patchwork.ozlabs.org/patch/1055788/ I think the patch from link is a better solution to the problem. Ok, I'll stick with it then. Thanks for your feedback Christophe
Re: [PATCH] ASoC: fsl_esai: Support synchronous mode
> > On Mon, Apr 01, 2019 at 11:39:10AM +, S.j. Wang wrote: > > > In ESAI synchronous mode, the clock is generated by Tx, So we should > > > always set registers of Tx which relate with the bit clock and frame > > > clock generation (TCCR, TCR, ECR), even there is only Rx is working. > > > > > > Signed-off-by: Shengjiu Wang > > > --- > > > sound/soc/fsl/fsl_esai.c | 28 +++- > > > 1 file changed, 27 insertions(+), 1 deletion(-) > > > > > > diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c index > > > 3623aa9a6f2e..d9fcddd55c02 100644 > > > --- a/sound/soc/fsl/fsl_esai.c > > > +++ b/sound/soc/fsl/fsl_esai.c > > > @@ -230,6 +230,21 @@ static int fsl_esai_set_dai_sysclk(struct > > snd_soc_dai *dai, int clk_id, > > > return -EINVAL; > > > } > > > > > > + if (esai_priv->synchronous && !tx) { > > > + switch (clk_id) { > > > + case ESAI_HCKR_FSYS: > > > + fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_FSYS, > > > + freq, dir); > > > + break; > > > + case ESAI_HCKR_EXTAL: > > > + fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_EXTAL, > > > + freq, dir); > > > > Not sure why you call set_dai_sysclk inside set_dai_sysclk again. It feels > > very > > confusing to do so, especially without a comments. > > For sync mode, only RX is enabled, the register of tx should be set, so call > the > Set_dai_sysclk again. Yea, I understood that. But why not just replace RX with TX on the register-writing level? Do we need to set both TCCR and RCCR? Your change in hw_params() only sets TCCR inside fsl_esai_set_bclk(), so we probably only need to change TCCR for recordings running in sync mode, right? >From the commit message, it feels like that only the clock-related fields in the TX registers need to be set. Things like calculation and setting the direction of HCKx pin don't need to run again. > > > @@ -537,10 +552,21 @@ static int fsl_esai_hw_params(struct > > > snd_pcm_substream *substream, > > > > > > bclk = params_rate(params) * slot_width * esai_priv->slots; > > > > > > - ret = fsl_esai_set_bclk(dai, tx, bclk); > > > + ret = fsl_esai_set_bclk(dai, esai_priv->synchronous ? true : tx, > > > +bclk); > > > if (ret) > > > return ret; > > > > > > + if (esai_priv->synchronous && !tx) { > > > + /* Use Normal mode to support monaural audio */ > > > + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, > > > +ESAI_xCR_xMOD_MASK, > > params_channels(params) > 1 ? > > > +ESAI_xCR_xMOD_NETWORK : 0); > > > + > > > + mask = ESAI_xCR_xSWS_MASK | ESAI_xCR_PADC; > > > + val = ESAI_xCR_xSWS(slot_width, width) | ESAI_xCR_PADC; > > > + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, > > mask, val); > > > + } > > > > Does synchronous mode require to set both TCR and RCR? or just TCR? > Both TCR and RCR. RCR will be set in normal flow. OK. Settings both xCRs makes sense. Would you please try this: === @@ -537,14 +552,20 @@ static int fsl_esai_hw_params(struct snd_pcm_substream *substream, bclk = params_rate(params) * slot_width * esai_priv->slots; - ret = fsl_esai_set_bclk(dai, tx, bclk); + /* Synchronous mode uses TX clock generator */ + ret = fsl_esai_set_bclk(dai, esai_priv->synchronous || tx, bclk); if (ret) return ret; + mask = ESAI_xCR_xMOD_MASK | ESAI_xCR_xSWS_MASK; + val = ESAI_xCR_xSWS(slot_width, width); /* Use Normal mode to support monaural audio */ - regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), - ESAI_xCR_xMOD_MASK, params_channels(params) > 1 ? - ESAI_xCR_xMOD_NETWORK : 0); + val |= params_channels(params) > 1 ? ESAI_xCR_xMOD_NETWORK : 0; + + regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val); + /* Recording in synchronous mode needs to set TCR also */ + if (!tx && esai_priv->synchronous) + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, mask, val); regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), ESAI_xFCR_xFR_MASK, ESAI_xFCR_xFR); @@ -556,10 +577,10 @@ static int fsl_esai_hw_params(struct snd_pcm_substream *substream, regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), mask, val); - mask = ESAI_xCR_xSWS_MASK | (tx ? ESAI_xCR_PADC : 0); - val = ESAI_xCR_xSWS(slot_width, width) | (tx ? ESAI_xCR_PADC : 0); - - regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val); + /* Only TCR has padding bit and needs to be set for synchronous mode */ + if (tx || esai_priv->synchronous) + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, + ESAI_xCR_PADC, ESAI_xCR_PA
Re: [PATCH 2/5] powerpc: module_[32|64].c: replace swap function with built-in one
01.04.2019, 13:11, "Michael Ellerman" : > This looks OK. It's a bit of a pity to replace the 8-byte-at-a-time copy > with a byte-at-a-time copy, but I suspect it's insignificant compared to > the overhead of calling the comparison and swap functions. > > And we could always add a generic 8-byte-at-a-time swap function if it's > a bottleneck. I am sorry, I forgot to quickly comment on your letter. Now (after George Spelvin's patches) the generic swap is able to use u64 or u32 if the alignment and size are divisible by 4 or 8, so we lose nothing here.
Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix
Le 02/04/2019 à 20:31, Christophe Leroy a écrit : Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit : Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask, subpage protection details. We can skip allocating those on radix. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V --- NOTE: If we want to do this, I am still trying to figure out how best we can do this without all the #ifdef and other overhead for 8xx book3e arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/book3s/64/mmu.h | 48 +++ arch/powerpc/include/asm/book3s/64/slice.h | 6 +-- arch/powerpc/kernel/paca.c | 9 ++-- arch/powerpc/kernel/setup-common.c | 7 ++- arch/powerpc/mm/hash_utils_64.c | 10 ++-- arch/powerpc/mm/mmu_context_book3s64.c | 16 ++- arch/powerpc/mm/slb.c | 2 +- arch/powerpc/mm/slice.c | 48 +-- arch/powerpc/mm/subpage-prot.c | 8 ++-- 10 files changed, 91 insertions(+), 65 deletions(-) [...] @@ -253,7 +253,7 @@ static void slice_convert(struct mm_struct *mm, */ spin_lock_irqsave(&slice_convert_lock, flags); - lpsizes = mm->context.low_slices_psize; + lpsizes = mm->context.hash_context->low_slices_psize; A help to get ->low_slices_psize would help, something like: In nohash/32/mmu-8xx: unsigned char *slice_low_slices_psize(context_t *ctx) { return mm->context.low_slices_psize; Of course here I meant: unsigned char *slice_low_slices_psize(mm_context_t *ctx) { return ctx->low_slices_psize; } } And in book3s/64/mmu.h: unsigned char *slice_low_slices_psize(context_t *ctx) { return mm->context.hash_context->low_slices_psize; and unsigned char *slice_low_slices_psize(mm_context_t *ctx) { return ctx->hash_context->low_slices_psize; } Christophe
Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix
Le 02/04/2019 à 17:42, Aneesh Kumar K.V a écrit : On 4/2/19 9:06 PM, Christophe Leroy wrote: Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit : Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask, subpage protection details. We can skip allocating those on radix. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V --- NOTE: If we want to do this, I am still trying to figure out how best we can do this without all the #ifdef and other overhead for 8xx book3e Did you have a look at my series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170 ? It tries to reduce as much as feasible the #ifdefs and stuff. Not yet. But a cursory look tell me introducing hash_mm_context complicates this further unless I introduce something similar for nohash 32? Are you ok with that? Have a look at my review in the other mail, I think we can limit the changes and avoid introducing the hash_mm_context for 8xx. Otherwise, we should call it something else, for instance extended_mm_context, but that looks unnecessary from my point of view. Christophe
Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix
Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit : Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask, subpage protection details. We can skip allocating those on radix. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V --- NOTE: If we want to do this, I am still trying to figure out how best we can do this without all the #ifdef and other overhead for 8xx book3e arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/book3s/64/mmu.h | 48 +++ arch/powerpc/include/asm/book3s/64/slice.h| 6 +-- arch/powerpc/kernel/paca.c| 9 ++-- arch/powerpc/kernel/setup-common.c| 7 ++- arch/powerpc/mm/hash_utils_64.c | 10 ++-- arch/powerpc/mm/mmu_context_book3s64.c| 16 ++- arch/powerpc/mm/slb.c | 2 +- arch/powerpc/mm/slice.c | 48 +-- arch/powerpc/mm/subpage-prot.c| 8 ++-- 10 files changed, 91 insertions(+), 65 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index a28a28079edb..d801be977623 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -657,7 +657,7 @@ extern void slb_set_size(u16 size); /* 4 bits per slice and we have one slice per 1TB */ #define SLICE_ARRAY_SIZE (H_PGTABLE_RANGE >> 41) -#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41) +#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.hash_context->slb_addr_limit >> 41) #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index a809bdd77322..07e76e304a3b 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -114,6 +114,33 @@ struct slice_mask { DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); }; +struct hash_mm_context { + + u16 user_psize; /* page size index */ Could we keep that in mm_context_t ? + +#ifdef CONFIG_PPC_MM_SLICES CONFIG_PPC_MM_SLICES is always selected on book3s64 so this #ifdef is useless. + /* SLB page size encodings*/ + unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; + unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; + unsigned long slb_addr_limit; Could we keep slb_addr_limit in mm_context_t too ? +#ifdef CONFIG_PPC_64K_PAGES + struct slice_mask mask_64k; +#endif + struct slice_mask mask_4k; +#ifdef CONFIG_HUGETLB_PAGE + struct slice_mask mask_16m; + struct slice_mask mask_16g; +#endif +#else + u16 sllp; /* SLB page size encoding */ This can get away as CONFIG_PPC_MM_SLICES is always set. +#endif + +#ifdef CONFIG_PPC_SUBPAGE_PROT + struct subpage_prot_table spt; +#endif /* CONFIG_PPC_SUBPAGE_PROT */ + +}; + typedef struct { union { /* @@ -127,7 +154,6 @@ typedef struct { mm_context_id_t id; mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE]; }; - u16 user_psize; /* page size index */ /* Number of bits in the mm_cpumask */ atomic_t active_cpus; @@ -137,27 +163,9 @@ typedef struct { /* NPU NMMU context */ struct npu_context *npu_context; + struct hash_mm_context *hash_context; -#ifdef CONFIG_PPC_MM_SLICES -/* SLB page size encodings*/ - unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; - unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; - unsigned long slb_addr_limit; -# ifdef CONFIG_PPC_64K_PAGES - struct slice_mask mask_64k; -# endif - struct slice_mask mask_4k; -# ifdef CONFIG_HUGETLB_PAGE - struct slice_mask mask_16m; - struct slice_mask mask_16g; -# endif -#else - u16 sllp; /* SLB page size encoding */ -#endif unsigned long vdso_base; -#ifdef CONFIG_PPC_SUBPAGE_PROT - struct subpage_prot_table spt; -#endif /* CONFIG_PPC_SUBPAGE_PROT */ /* * pagetable fragment support */ diff --git a/arch/powerpc/include/asm/book3s/64/slice.h b/arch/powerpc/include/asm/book3s/64/slice.h index db0dedab65ee..3ca1bebe258e 100644 --- a/arch/powerpc/include/asm/book3s/64/slice.h +++ b/arch/powerpc/include/asm/book3s/64/slice.h @@ -15,11 +15,11 @@ #else /* CONFIG_PPC_MM_SLICES */ That never happens since book3s/64 always selects CONFIG_PPC_MM_SLICES -#define get_slice_psize(mm, addr) ((mm)->context.user_psize) +#define get_slice_psize(mm, addr) ((mm)->context.hash_context->user_psize) #define slice_set_user_psize(mm, psize) \ do { \ -
[PATCH 4/4] powerpc: Add support to initialize ima policy rules
From: Nayna Jain PowerNV secure boot relies on the kernel IMA security subsystem to perform the OS kernel image signature verification. Since each secure boot mode has different IMA policy requirements, dynamic definition of the policy rules based on the runtime secure boot mode of the system is required. On systems that support secure boot, but have it disabled, only measurement policy rules of the kernel image and modules are defined. This patch defines the arch-specific implementation to retrieve the secure boot mode of the system and accordingly configures the IMA policy rules. This patch will provide arch-specific IMA policies if PPC_SECURE_BOOT config is enabled. Signed-off-by: Nayna Jain --- arch/powerpc/Kconfig | 12 arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/ima_arch.c | 54 ++ include/linux/ima.h| 3 +- 4 files changed, 69 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/ima_arch.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2d0be82c3061..e0ba9a9114b3 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -901,6 +901,18 @@ config PPC_MEM_KEYS If unsure, say y. +config PPC_SECURE_BOOT + prompt "Enable PowerPC Secure Boot" + bool + default n + depends on IMA + depends on IMA_ARCH_POLICY + help + Linux on POWER with firmware secure boot enabled needs to define + security policies to extend secure boot to the OS. + This config allows user to enable OS Secure Boot on PowerPC systems + that have firmware secure boot support. + endmenu config ISA_DMA_API diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index cddadccf551d..0f08ed7dfd1b 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -119,6 +119,7 @@ ifdef CONFIG_IMA obj-y += ima_kexec.o endif endif +obj-$(CONFIG_IMA) += ima_arch.o obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c new file mode 100644 index ..871b321656fb --- /dev/null +++ b/arch/powerpc/kernel/ima_arch.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 IBM Corporation + * Author: Nayna Jain + * + * ima_arch.c + * - initialize ima policies for PowerPC Secure Boot + */ + +#include +#include + +bool arch_ima_get_secureboot(void) +{ + bool sb_mode; + + sb_mode = get_powerpc_sb_mode(); + if (sb_mode) + return true; + else + return false; +} + +/* + * File signature verification is not needed, include only measurements + */ +static const char *const default_arch_rules[] = { + "measure func=KEXEC_KERNEL_CHECK", + "measure func=MODULE_CHECK", + NULL +}; + +/* Both file signature verification and measurements are needed */ +static const char *const sb_arch_rules[] = { + "measure func=KEXEC_KERNEL_CHECK", + "measure func=MODULE_CHECK", + "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig", +#if !IS_ENABLED(CONFIG_MODULE_SIG) + "appraise func=MODULE_CHECK appraise_type=imasig", +#endif + NULL +}; + +/* + * On PowerPC, file measurements are to be added to the IMA measurement list + * irrespective of the secure boot state of the system. Signature verification + * is conditionally enabled based on the secure boot state. + */ +const char *const *arch_get_ima_policy(void) +{ + if (IS_ENABLED(CONFIG_IMA_ARCH_POLICY) && arch_ima_get_secureboot()) + return sb_arch_rules; + return default_arch_rules; +} diff --git a/include/linux/ima.h b/include/linux/ima.h index dc12fbcf484c..32f46d69ebd7 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -31,7 +31,8 @@ extern void ima_post_path_mknod(struct dentry *dentry); extern void ima_add_kexec_buffer(struct kimage *image); #endif -#if defined(CONFIG_X86) && defined(CONFIG_EFI) +#if defined(CONFIG_X86) && defined(CONFIG_EFI) \ + || defined(CONFIG_PPC_SECURE_BOOT) extern bool arch_ima_get_secureboot(void); extern const char * const *arch_get_ima_policy(void); #else -- 2.20.1
[PATCH 3/4] powerpc/powernv: Detect the secure boot mode of the system
From: Nayna Jain PowerNV secure boot defines different IMA policies based on the secure boot state of the system. This patch defines a function to detect the secure boot state of the system. Signed-off-by: Nayna Jain --- arch/powerpc/include/asm/secboot.h | 21 + arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/secboot.c | 54 3 files changed, 76 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/secboot.h create mode 100644 arch/powerpc/platforms/powernv/secboot.c diff --git a/arch/powerpc/include/asm/secboot.h b/arch/powerpc/include/asm/secboot.h new file mode 100644 index ..1904fb4a3352 --- /dev/null +++ b/arch/powerpc/include/asm/secboot.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PowerPC secure boot definitions + * + * Copyright (C) 2019 IBM Corporation + * Author: Nayna Jain + * + */ +#ifndef POWERPC_SECBOOT_H +#define POWERPC_SECBOOT_H + +#if defined(CONFIG_OPAL_SECVAR) +extern bool get_powerpc_sb_mode(void); +#else +static inline bool get_powerpc_sb_mode(void) +{ + return false; +} +#endif + +#endif diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 1511d836fd19..a36e22f8ecf8 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -16,4 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o obj-$(CONFIG_OCXL_BASE)+= ocxl.o -obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o +obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o secboot.o diff --git a/arch/powerpc/platforms/powernv/secboot.c b/arch/powerpc/platforms/powernv/secboot.c new file mode 100644 index ..afb1552636c5 --- /dev/null +++ b/arch/powerpc/platforms/powernv/secboot.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 IBM Corporation + * Author: Nayna Jain + * + * secboot.c + * - util functions to get powerpc secboot state + * + */ +#include +#include + +bool get_powerpc_sb_mode(void) +{ + efi_char16_t efi_SecureBoot_name[] = L"SecureBoot"; + efi_char16_t efi_SetupMode_name[] = L"SetupMode"; + efi_guid_t efi_variable_guid = EFI_GLOBAL_VARIABLE_GUID; + efi_status_t status; + u8 secboot, setupmode; + unsigned long size = sizeof(secboot); + + status = efi.get_variable(efi_SecureBoot_name, &efi_variable_guid, + NULL, &size, &secboot); + + /* +* For now assume all failures reading the SecureBoot variable implies +* secure boot is not enabled. Later differentiate failure types. +*/ + if (status != EFI_SUCCESS) { + secboot = 0; + setupmode = 0; + goto out; + } + + size = sizeof(setupmode); + status = efi.get_variable(efi_SetupMode_name, &efi_variable_guid, + NULL, &size, &setupmode); + + /* +* Failure to read the SetupMode variable does not prevent +* secure boot mode +*/ + if (status != EFI_SUCCESS) + setupmode = 0; + +out: + if ((secboot == 0) || (setupmode == 1)) { + pr_info("ima: secureboot mode disabled\n"); + return false; + } + + pr_info("ima: secureboot mode enabled\n"); + return true; +} -- 2.20.1
[PATCH 2/4] powerpc/powernv: Add support for OPAL secure variables
The X.509 certificates trusted by the platform and other information required to secure boot the host OS kernel are wrapped in secure variables, which are controlled by OPAL. The OPAL secure variables can be handled through the following OPAL calls. OPAL_SECVAR_GET: Returns the data for a given secure variable name and vendor GUID. OPAL_SECVAR_GET_NEXT: For a given secure variable, it returns the name and vendor GUID of the next variable. OPAL_SECVAR_ENQUEUE: Enqueue the supplied secure variable update so that it can be processed by OPAL in the next boot. Variable updates cannot be be processed right away because the variable storage is write locked at runtime. OPAL_SECVAR_INFO: Returns size information about the variable. This patch adds support for OPAL secure variables by setting up the EFI runtime variable services to make OPAL calls. This patch also introduces CONFIG_OPAL_SECVAR for enabling the OPAL secure variables support in the kernel. Since CONFIG_OPAL_SECVAR selects CONFIG_EFI, it also allow us to manage the OPAL secure variables from userspace via efivarfs. Signed-off-by: Claudio Carvalho --- This patch depends on new OPAL calls that are being added to skiboot. The patch set that implements the new calls has been posted to https://patchwork.ozlabs.org/project/skiboot/list/?series=99805 --- arch/powerpc/include/asm/opal-api.h | 6 +- arch/powerpc/include/asm/opal.h | 10 ++ arch/powerpc/platforms/Kconfig | 3 + arch/powerpc/platforms/powernv/Kconfig | 9 + arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/opal-call.c | 4 + arch/powerpc/platforms/powernv/opal-secvar.c | 179 +++ 7 files changed, 211 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 870fb7b239ea..d3066f29cb7a 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -210,7 +210,11 @@ #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165 #defineOPAL_NX_COPROC_INIT 167 -#define OPAL_LAST 167 +#define OPAL_SECVAR_GET170 +#define OPAL_SECVAR_GET_NEXT 171 +#define OPAL_SECVAR_ENQUEUE172 +#define OPAL_SECVAR_INFO 173 +#define OPAL_LAST 173 #define QUIESCE_HOLD 1 /* Spin all calls at entry */ #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index a55b01c90bb1..fdfd8dd7b326 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -385,6 +385,16 @@ void opal_powercap_init(void); void opal_psr_init(void); void opal_sensor_groups_init(void); +extern int opal_secvar_get(uint64_t name, uint64_t vendor, uint64_t attr, + uint64_t data_size, uint64_t data); +extern int opal_secvar_get_next(uint64_t name_size, uint64_t name, + uint64_t vendor); +extern int opal_secvar_enqueue(uint64_t name, uint64_t vendor, uint64_t attr, + uint64_t data_size, uint64_t data); +extern int opal_secvar_info(uint64_t attr, uint64_t storage_space, + uint64_t remaining_space, + uint64_t max_variable_size); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_OPAL_H */ diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index f3fb79fccc72..8e30510bc0c1 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -326,4 +326,7 @@ config XILINX_PCI bool "Xilinx PCI host bridge support" depends on PCI && XILINX_VIRTEX +config EFI + bool + endmenu diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index 850eee860cf2..879f8e766098 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -47,3 +47,12 @@ config PPC_VAS VAS adapters are found in POWER9 based systems. If unsure, say N. + +config OPAL_SECVAR + bool "OPAL Secure Variables" + depends on PPC_POWERNV && !CPU_BIG_ENDIAN + select UCS2_STRING + select EFI + help + This enables the kernel to access OPAL secure variables via EFI + runtime variable services. diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index da2e99efbd04..1511d836fd19 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -16,3 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o obj-$(CONFIG_
[PATCH 1/4] powerpc/include: Override unneeded early ioremap functions
When CONFIG_EFI is enabled, the EFI driver includes the generic early_ioremap header, which assumes that architectures may want to provide their own early ioremap functions. This patch overrides the ioremap functions in powerpc because they are not required for secure boot on powerpc systems. Signed-off-by: Claudio Carvalho --- arch/powerpc/include/asm/early_ioremap.h | 41 1 file changed, 41 insertions(+) create mode 100644 arch/powerpc/include/asm/early_ioremap.h diff --git a/arch/powerpc/include/asm/early_ioremap.h b/arch/powerpc/include/asm/early_ioremap.h new file mode 100644 index ..a86a06e9f3b9 --- /dev/null +++ b/arch/powerpc/include/asm/early_ioremap.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Early ioremap definitions + * + * Copyright (C) 2019 IBM Corporation + * Author: Claudio Carvalho + * + */ +#ifndef _ASM_POWERPC_EARLY_IOREMAP_H +#define _ASM_POWERPC_EARLY_IOREMAP_H + +static inline void __iomem *early_ioremap(resource_size_t phys_addr, + unsigned long size) +{ + return NULL; +} + +static inline void *early_memremap(resource_size_t phys_addr, + unsigned long size) +{ + return NULL; +} + +static inline void *early_memremap_ro(resource_size_t phys_addr, + unsigned long size) +{ + return NULL; +} + +static inline void *early_memremap_prot(resource_size_t phys_addr, + unsigned long size, + unsigned long prot_val) +{ + return NULL; +} + +static inline void early_iounmap(void __iomem *addr, unsigned long size) { } +static inline void early_memunmap(void *addr, unsigned long size) { } +static inline void early_ioremap_shutdown(void) { } + +#endif -- 2.20.1
[PATCH 0/4] Enabling secure boot on PowerNV systems
This patch set is part of a series that implements secure boot on PowerNV systems. In order to verify the OS kernel on PowerNV, secure boot requires X.509 certificates trusted by the platform, the secure boot modes, and several other pieces of information. These are stored in secure variables controlled by OPAL, also known as OPAL secure variables. This patch set adds the following features: 1. Enable efivarfs by selecting CONFIG_EFI in the CONFIG_OPAL_SECVAR introduced in this patch set. With CONFIG_EFIVAR_FS, userspace tools can be used to manage the secure variables. 2. Add support for OPAL secure variables by overwriting the EFI hooks (get_variable, get_next_variable, set_variable and query_variable_info) with OPAL call wrappers. There is probably a better way to add this support, for example, we are investigating if we could register the efivar_operations rather than overwriting the EFI hooks. In this patch set, CONFIG_OPAL_SECVAR selects CONFIG_EFI. If, instead, we registered efivar_operations, CONFIG_EFIVAR_FS would need to depend on CONFIG_EFI|| CONFIG_OPAL_SECVAR. Comments or suggestions on the preferred technique would be greatly appreciated. 3. Define IMA arch-specific policies based on the secure boot state and mode of the system. On secure boot enabled powernv systems, the host OS kernel signature will be verified by IMA appraisal. Claudio Carvalho (2): powerpc/include: Override unneeded early ioremap functions powerpc/powernv: Add support for OPAL secure variables Nayna Jain (2): powerpc/powernv: Detect the secure boot mode of the system powerpc: Add support to initialize ima policy rules arch/powerpc/Kconfig | 12 ++ arch/powerpc/include/asm/early_ioremap.h | 41 + arch/powerpc/include/asm/opal-api.h | 6 +- arch/powerpc/include/asm/opal.h | 10 ++ arch/powerpc/include/asm/secboot.h | 21 +++ arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/ima_arch.c | 54 ++ arch/powerpc/platforms/Kconfig | 3 + arch/powerpc/platforms/powernv/Kconfig | 9 + arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/opal-call.c | 4 + arch/powerpc/platforms/powernv/opal-secvar.c | 179 +++ arch/powerpc/platforms/powernv/secboot.c | 54 ++ include/linux/ima.h | 3 +- 14 files changed, 396 insertions(+), 2 deletions(-) create mode 100644 arch/powerpc/include/asm/early_ioremap.h create mode 100644 arch/powerpc/include/asm/secboot.h create mode 100644 arch/powerpc/kernel/ima_arch.c create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c create mode 100644 arch/powerpc/platforms/powernv/secboot.c -- 2.20.1
Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions
On 4/2/19 12:43 PM, Christophe Leroy wrote: > Hi Dmitry, Andrey and others, > > Do you have any comments to this series ? > I don't see justification for adding all these non-instrumented functions. We need only some subset of these functions and only on powerpc so far. Arches that don't use str*() that early simply doesn't need not-instrumented __str*() variant. Also I don't think that auto-replace str* to __str* for all not instrumented files is a good idea, as this will reduce KASAN coverage. E.g. we don't instrument slub.c but there is no reason to use non-instrumented __str*() functions there. And finally, this series make bug reporting slightly worse. E.g. let's look at strcpy(): +char *strcpy(char *dest, const char *src) +{ + size_t len = __strlen(src) + 1; + + check_memory_region((unsigned long)src, len, false, _RET_IP_); + check_memory_region((unsigned long)dest, len, true, _RET_IP_); + + return __strcpy(dest, src); +} If src is not-null terminated string we might not see proper out-of-bounds report from KASAN only a crash in __strlen(). Which might make harder to identify where 'src' comes from, where it was allocated and what's the size of allocated area. > I'd like to know if this approach is ok or if it is better to keep doing as > in https://patchwork.ozlabs.org/patch/1055788/ > I think the patch from link is a better solution to the problem.
Re: [PATCH stable v4.14 00/32] powerpc spectre backports for 4.14
On Tue, Apr 02, 2019 at 03:21:09PM +, Diana Madalina Craciun wrote: > On 3/31/2019 12:53 PM, Michael Ellerman wrote: > > Greg KH writes: > >> On Fri, Mar 29, 2019 at 03:51:16PM +0100, Greg KH wrote: > >>> On Fri, Mar 29, 2019 at 10:25:48PM +1100, Michael Ellerman wrote: > -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Greg, Please queue > up these powerpc patches for 4.14 if you have no objections. > >>> Some of these also need to go to 4.19, right? Want me to add them > >>> there, or are you going to provide a backported series? > > Yes some of them do, but I wasn't sure if they'd go cleanly. > >> Nevermind, I've queued up the missing ones to 4.19.y, and one missing > >> one to 5.0.y. If I've missed anything, please let me know. > > Thanks. I'll check everything's working as expected. > > I have validated on NXP PowerPC and worked as expected on both kernel > 4.14 and kernel 4.19. Great, thanks for testing! greg k-h
Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix
On 4/2/19 9:06 PM, Christophe Leroy wrote: Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit : Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask, subpage protection details. We can skip allocating those on radix. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V --- NOTE: If we want to do this, I am still trying to figure out how best we can do this without all the #ifdef and other overhead for 8xx book3e Did you have a look at my series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170 ? It tries to reduce as much as feasible the #ifdefs and stuff. Not yet. But a cursory look tell me introducing hash_mm_context complicates this further unless I introduce something similar for nohash 32? Are you ok with that? -aneesh
Re: [RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix
Le 02/04/2019 à 16:34, Aneesh Kumar K.V a écrit : Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask, subpage protection details. We can skip allocating those on radix. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V --- NOTE: If we want to do this, I am still trying to figure out how best we can do this without all the #ifdef and other overhead for 8xx book3e Did you have a look at my series https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170 ? It tries to reduce as much as feasible the #ifdefs and stuff. Christophe arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/book3s/64/mmu.h | 48 +++ arch/powerpc/include/asm/book3s/64/slice.h| 6 +-- arch/powerpc/kernel/paca.c| 9 ++-- arch/powerpc/kernel/setup-common.c| 7 ++- arch/powerpc/mm/hash_utils_64.c | 10 ++-- arch/powerpc/mm/mmu_context_book3s64.c| 16 ++- arch/powerpc/mm/slb.c | 2 +- arch/powerpc/mm/slice.c | 48 +-- arch/powerpc/mm/subpage-prot.c| 8 ++-- 10 files changed, 91 insertions(+), 65 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index a28a28079edb..d801be977623 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -657,7 +657,7 @@ extern void slb_set_size(u16 size); /* 4 bits per slice and we have one slice per 1TB */ #define SLICE_ARRAY_SIZE (H_PGTABLE_RANGE >> 41) -#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41) +#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.hash_context->slb_addr_limit >> 41) #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index a809bdd77322..07e76e304a3b 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -114,6 +114,33 @@ struct slice_mask { DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); }; +struct hash_mm_context { + + u16 user_psize; /* page size index */ + +#ifdef CONFIG_PPC_MM_SLICES + /* SLB page size encodings*/ + unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; + unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; + unsigned long slb_addr_limit; +#ifdef CONFIG_PPC_64K_PAGES + struct slice_mask mask_64k; +#endif + struct slice_mask mask_4k; +#ifdef CONFIG_HUGETLB_PAGE + struct slice_mask mask_16m; + struct slice_mask mask_16g; +#endif +#else + u16 sllp; /* SLB page size encoding */ +#endif + +#ifdef CONFIG_PPC_SUBPAGE_PROT + struct subpage_prot_table spt; +#endif /* CONFIG_PPC_SUBPAGE_PROT */ + +}; + typedef struct { union { /* @@ -127,7 +154,6 @@ typedef struct { mm_context_id_t id; mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE]; }; - u16 user_psize; /* page size index */ /* Number of bits in the mm_cpumask */ atomic_t active_cpus; @@ -137,27 +163,9 @@ typedef struct { /* NPU NMMU context */ struct npu_context *npu_context; + struct hash_mm_context *hash_context; -#ifdef CONFIG_PPC_MM_SLICES -/* SLB page size encodings*/ - unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; - unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; - unsigned long slb_addr_limit; -# ifdef CONFIG_PPC_64K_PAGES - struct slice_mask mask_64k; -# endif - struct slice_mask mask_4k; -# ifdef CONFIG_HUGETLB_PAGE - struct slice_mask mask_16m; - struct slice_mask mask_16g; -# endif -#else - u16 sllp; /* SLB page size encoding */ -#endif unsigned long vdso_base; -#ifdef CONFIG_PPC_SUBPAGE_PROT - struct subpage_prot_table spt; -#endif /* CONFIG_PPC_SUBPAGE_PROT */ /* * pagetable fragment support */ diff --git a/arch/powerpc/include/asm/book3s/64/slice.h b/arch/powerpc/include/asm/book3s/64/slice.h index db0dedab65ee..3ca1bebe258e 100644 --- a/arch/powerpc/include/asm/book3s/64/slice.h +++ b/arch/powerpc/include/asm/book3s/64/slice.h @@ -15,11 +15,11 @@ #else /* CONFIG_PPC_MM_SLICES */ -#define get_slice_psize(mm, addr) ((mm)->context.user_psize) +#define get_slice_psize(mm, addr) ((mm)->context.hash_context->user_psize) #define slice_set_user_psize(mm, psize) \ do { \ - (mm)->context.user_psize = (psize); \ - (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \ + (mm)->co
Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
On Tue 02-04-19 17:21:25, Aneesh Kumar K.V wrote: > With some architectures like ppc64, set_pmd_at() cannot cope with > a situation where there is already some (different) valid entry present. > > Use pmdp_set_access_flags() instead to modify the pfn which is built to > deal with modifying existing PMD entries. > > This is similar to > commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by > insert_pfn()") > > We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support > pud pfn entries now. > > Without this patch we also see the below message in kernel log > "BUG: non-zero pgtables_bytes on freeing mm:" > > CC: sta...@vger.kernel.org > Reported-by: Chandan Rajendra > Signed-off-by: Aneesh Kumar K.V Looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > Changes from v1: > * Fix the pgtable leak > > mm/huge_memory.c | 36 > 1 file changed, 36 insertions(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 404acdcd0455..165ea46bf149 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -755,6 +755,21 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, > unsigned long addr, > spinlock_t *ptl; > > ptl = pmd_lock(mm, pmd); > + if (!pmd_none(*pmd)) { > + if (write) { > + if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { > + WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); > + goto out_unlock; > + } > + entry = pmd_mkyoung(*pmd); > + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > + if (pmdp_set_access_flags(vma, addr, pmd, entry, 1)) > + update_mmu_cache_pmd(vma, addr, pmd); > + } > + > + goto out_unlock; > + } > + > entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); > if (pfn_t_devmap(pfn)) > entry = pmd_mkdevmap(entry); > @@ -766,11 +781,16 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, > unsigned long addr, > if (pgtable) { > pgtable_trans_huge_deposit(mm, pmd, pgtable); > mm_inc_nr_ptes(mm); > + pgtable = NULL; > } > > set_pmd_at(mm, addr, pmd, entry); > update_mmu_cache_pmd(vma, addr, pmd); > + > +out_unlock: > spin_unlock(ptl); > + if (pgtable) > + pte_free(mm, pgtable); > } > > vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, > @@ -821,6 +841,20 @@ static void insert_pfn_pud(struct vm_area_struct *vma, > unsigned long addr, > spinlock_t *ptl; > > ptl = pud_lock(mm, pud); > + if (!pud_none(*pud)) { > + if (write) { > + if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) { > + WARN_ON_ONCE(!is_huge_zero_pud(*pud)); > + goto out_unlock; > + } > + entry = pud_mkyoung(*pud); > + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); > + if (pudp_set_access_flags(vma, addr, pud, entry, 1)) > + update_mmu_cache_pud(vma, addr, pud); > + } > + goto out_unlock; > + } > + > entry = pud_mkhuge(pfn_t_pud(pfn, prot)); > if (pfn_t_devmap(pfn)) > entry = pud_mkdevmap(entry); > @@ -830,6 +864,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, > unsigned long addr, > } > set_pud_at(mm, addr, pud, entry); > update_mmu_cache_pud(vma, addr, pud); > + > +out_unlock: > spin_unlock(ptl); > } > > -- > 2.20.1 > -- Jan Kara SUSE Labs, CR
Re: [PATCH stable v4.14 00/32] powerpc spectre backports for 4.14
On 3/31/2019 12:53 PM, Michael Ellerman wrote: > Greg KH writes: >> On Fri, Mar 29, 2019 at 03:51:16PM +0100, Greg KH wrote: >>> On Fri, Mar 29, 2019 at 10:25:48PM +1100, Michael Ellerman wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Greg, Please queue up these powerpc patches for 4.14 if you have no objections. >>> Some of these also need to go to 4.19, right? Want me to add them >>> there, or are you going to provide a backported series? > Yes some of them do, but I wasn't sure if they'd go cleanly. >> Nevermind, I've queued up the missing ones to 4.19.y, and one missing >> one to 5.0.y. If I've missed anything, please let me know. > Thanks. I'll check everything's working as expected. I have validated on NXP PowerPC and worked as expected on both kernel 4.14 and kernel 4.19. Thanks, Diana
Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions
Le 02/04/2019 à 14:58, Dmitry Vyukov a écrit : On Tue, Apr 2, 2019 at 11:43 AM Christophe Leroy wrote: Hi Dmitry, Andrey and others, Do you have any comments to this series ? I'd like to know if this approach is ok or if it is better to keep doing as in https://patchwork.ozlabs.org/patch/1055788/ Hi Christophe, Forking every kernel function does not look like a scalable approach to me. There is not much special about str* functions. There is something a bit special about memset/memcpy as compiler emits them for struct set/copy. Could powerpc do the same as x86 and map some shadow early enough (before "prom")? Then we would not need anything of this? Sorry if we already discussed this, I am losing context quickly. Hi Dmitry, I'm afraid we can't map shadow ram that early. This code gets run by third party BIOS SW which manages the MMU and provides a 1:1 mapping, so there is no way we can map shadow memory. If you feel providing interceptors for the string functions is not a good idea, I'm ok with it, I'll keep the necessary string functions in prom_init.c I was proposing the interceptor's approach because behind the specific need for handling early prom_init code, I thought it was also a way to limit KASAN performance impact on string functions, and it was also a way to handle all the optimised string functions provided by architectures. In my series I have a patch that disables powerpc's optimised string functions (https://patchwork.ozlabs.org/patch/1055780/). The interceptor's approach was a way to avoid that. As far as I can see, at the time being the other arches don't disable their optimised string functions, meaning the KASAN checks are skipped. Thanks Christophe Thanks Christophe Le 28/03/2019 à 16:00, Christophe Leroy a écrit : In the same spirit as commit 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions"), this patch adds interceptors for string manipulation functions so that we can compile lib/string.o without kasan support hence allow the string functions to also be used from places where kasan has to be disabled. Signed-off-by: Christophe Leroy --- v2: Fixed a few checkpatch stuff and added missing EXPORT_SYMBOL() and missing #undefs include/linux/string.h | 79 ++ lib/Makefile | 2 + lib/string.c | 8 + mm/kasan/string.c | 394 + 4 files changed, 483 insertions(+) diff --git a/include/linux/string.h b/include/linux/string.h index 7927b875f80c..3d2aff2ed402 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -19,54 +19,117 @@ extern void *memdup_user_nul(const void __user *, size_t); */ #include +#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__) +/* + * For files that are not instrumented (e.g. mm/slub.c) we + * should use not instrumented version of mem* functions. + */ +#define memset16 __memset16 +#define memset32 __memset32 +#define memset64 __memset64 +#define memzero_explicit __memzero_explicit +#define strcpy __strcpy +#define strncpy __strncpy +#define strlcpy __strlcpy +#define strscpy __strscpy +#define strcat __strcat +#define strncat __strncat +#define strlcat __strlcat +#define strcmp __strcmp +#define strncmp __strncmp +#define strcasecmp __strcasecmp +#define strncasecmp __strncasecmp +#define strchr __strchr +#define strchrnul__strchrnul +#define strrchr __strrchr +#define strnchr __strnchr +#define skip_spaces __skip_spaces +#define strim__strim +#define strstr __strstr +#define strnstr __strnstr +#define strlen __strlen +#define strnlen __strnlen +#define strpbrk __strpbrk +#define strsep __strsep +#define strspn __strspn +#define strcspn __strcspn +#define memscan __memscan +#define memcmp __memcmp +#define memchr __memchr +#define memchr_inv __memchr_inv +#define strreplace __strreplace + +#ifndef __NO_FORTIFY +#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */ +#endif + +#endif + #ifndef __HAVE_ARCH_STRCPY extern char * strcpy(char *,const char *); +char *__strcpy(char *, const char *); #endif #ifndef __HAVE_ARCH_STRNCPY extern char * strncpy(char *,const char *, __kernel_size_t); +char *__strncpy(char *, const char *, __kernel_size_t); #endif #ifndef __HAVE_ARCH_STRLCPY size_t strlcpy(char *, const char *, size_t); +size_t __strlcpy(char *, const char *, size_t); #endif #ifndef __HAVE_ARCH_STRSCPY ssize_t strscpy(char *, const char *, size_t); +ssize_t __strscpy(char *, const char *, size_t); #endif #ifndef __HAVE_ARCH_STRCAT extern char * strcat(char *, c
[RFC PATCH] powerpc/mm: Reduce memory usage for mm_context_t for radix
Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask, subpage protection details. We can skip allocating those on radix. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V --- NOTE: If we want to do this, I am still trying to figure out how best we can do this without all the #ifdef and other overhead for 8xx book3e arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/book3s/64/mmu.h | 48 +++ arch/powerpc/include/asm/book3s/64/slice.h| 6 +-- arch/powerpc/kernel/paca.c| 9 ++-- arch/powerpc/kernel/setup-common.c| 7 ++- arch/powerpc/mm/hash_utils_64.c | 10 ++-- arch/powerpc/mm/mmu_context_book3s64.c| 16 ++- arch/powerpc/mm/slb.c | 2 +- arch/powerpc/mm/slice.c | 48 +-- arch/powerpc/mm/subpage-prot.c| 8 ++-- 10 files changed, 91 insertions(+), 65 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index a28a28079edb..d801be977623 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -657,7 +657,7 @@ extern void slb_set_size(u16 size); /* 4 bits per slice and we have one slice per 1TB */ #define SLICE_ARRAY_SIZE (H_PGTABLE_RANGE >> 41) -#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41) +#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.hash_context->slb_addr_limit >> 41) #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index a809bdd77322..07e76e304a3b 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -114,6 +114,33 @@ struct slice_mask { DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); }; +struct hash_mm_context { + + u16 user_psize; /* page size index */ + +#ifdef CONFIG_PPC_MM_SLICES + /* SLB page size encodings*/ + unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; + unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; + unsigned long slb_addr_limit; +#ifdef CONFIG_PPC_64K_PAGES + struct slice_mask mask_64k; +#endif + struct slice_mask mask_4k; +#ifdef CONFIG_HUGETLB_PAGE + struct slice_mask mask_16m; + struct slice_mask mask_16g; +#endif +#else + u16 sllp; /* SLB page size encoding */ +#endif + +#ifdef CONFIG_PPC_SUBPAGE_PROT + struct subpage_prot_table spt; +#endif /* CONFIG_PPC_SUBPAGE_PROT */ + +}; + typedef struct { union { /* @@ -127,7 +154,6 @@ typedef struct { mm_context_id_t id; mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE]; }; - u16 user_psize; /* page size index */ /* Number of bits in the mm_cpumask */ atomic_t active_cpus; @@ -137,27 +163,9 @@ typedef struct { /* NPU NMMU context */ struct npu_context *npu_context; + struct hash_mm_context *hash_context; -#ifdef CONFIG_PPC_MM_SLICES -/* SLB page size encodings*/ - unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; - unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; - unsigned long slb_addr_limit; -# ifdef CONFIG_PPC_64K_PAGES - struct slice_mask mask_64k; -# endif - struct slice_mask mask_4k; -# ifdef CONFIG_HUGETLB_PAGE - struct slice_mask mask_16m; - struct slice_mask mask_16g; -# endif -#else - u16 sllp; /* SLB page size encoding */ -#endif unsigned long vdso_base; -#ifdef CONFIG_PPC_SUBPAGE_PROT - struct subpage_prot_table spt; -#endif /* CONFIG_PPC_SUBPAGE_PROT */ /* * pagetable fragment support */ diff --git a/arch/powerpc/include/asm/book3s/64/slice.h b/arch/powerpc/include/asm/book3s/64/slice.h index db0dedab65ee..3ca1bebe258e 100644 --- a/arch/powerpc/include/asm/book3s/64/slice.h +++ b/arch/powerpc/include/asm/book3s/64/slice.h @@ -15,11 +15,11 @@ #else /* CONFIG_PPC_MM_SLICES */ -#define get_slice_psize(mm, addr) ((mm)->context.user_psize) +#define get_slice_psize(mm, addr) ((mm)->context.hash_context->user_psize) #define slice_set_user_psize(mm, psize)\ do { \ - (mm)->context.user_psize = (psize); \ - (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \ + (mm)->context.hash_context->user_psize = (psize); \ + (mm)->context.hash_context->sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \ } while (0) #endif /* CONFIG_PPC_MM_SLICES */ diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel
[PATCH 3.16 92/99] block/swim3: Fix -EBUSY error when re-opening device after unmount
3.16.65-rc1 review patch. If anyone has any objections, please let me know. -- From: Finn Thain commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 upstream. When the block device is opened with FMODE_EXCL, ref_count is set to -1. This value doesn't get reset when the device is closed which means the device cannot be opened again. Fix this by checking for refcount <= 0 in the release method. Reported-and-tested-by: Stan Johnson Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Finn Thain Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings --- drivers/block/swim3.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/drivers/block/swim3.c +++ b/drivers/block/swim3.c @@ -1027,7 +1027,11 @@ static void floppy_release(struct gendis struct swim3 __iomem *sw = fs->swim3; mutex_lock(&swim3_mutex); - if (fs->ref_count > 0 && --fs->ref_count == 0) { + if (fs->ref_count > 0) + --fs->ref_count; + else if (fs->ref_count == -1) + fs->ref_count = 0; + if (fs->ref_count == 0) { swim3_action(fs, MOTOR_OFF); out_8(&sw->control_bic, 0xff); swim3_select(fs, RELAX);
VLC doesn't play videos anymore since the PowerPC fixes 5.1-3
Hi All, I figured out, that the VLC player doesn't play videos anymore since the PowerPC fixes 5.1-3 [1]. VLC plays videos with the RC1 of kernel 5.1 without any problems. VLC error messages: [100ea580] ts demux warning: first packet for pid=1104 cc=0xe [100ea580] ts demux warning: first packet for pid=1102 cc=0x4 [100ea580] ts demux warning: first packet for pid=1101 cc=0x8 [10109218] core decoder warning: can't get output picture [10109218] avcodec decoder warning: disabling direct rendering [10109218] core decoder warning: can't get output picture dmesg: https://bugs.freedesktop.org/attachment.cgi?id=143840 I created a bug report because of the VLC issue with the kernel 5.1-rc2 and higher today [2]. I got an answer from Michel Dänzer today. Quote Michel Dänzer: None of them directly affect the radeon driver. It's quite likely that this is a PPC specific issue. Your best bet is bisecting between rc1 and rc2. I haven't seen any other similar reports. I was able to remove the PowerPC fixes 5.1-4 and 5.1-3 with the following commands: git revert 6536c5f2c8cf79db0d37e79afcdb227dc854509c -m 1 Output: [master 4b4a8cf] Revert "Merge tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kern ... erpc/linux" git revert a5ed1e96cafde5ba48638f486bfca0685dc6ddc9 -m 1 Output: [master 0c70b7b] Revert "Merge tag 'powerpc-5.1-3' of git://git.kernel.org/pub/scm/linux/kern ... erpc/linux" The removing of the PowerPC fixes 5.1-4 and 5.1-3 has solved the VLC issue. The problematic code is definitely in the PowerPC fixes 5.1-3 [1]. Please check the PowerPC fixes 5.1-3 [1]. Thanks, Christian [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.1-rc2&id=a5ed1e96cafde5ba48638f486bfca0685dc6ddc9 [2] https://bugs.freedesktop.org/show_bug.cgi?id=110304
Re: powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations
Michael Ellerman writes: > Ben Hutchings writes: >> On Mon, 2019-03-25 at 01:03 +0100, Andreas Schwab wrote: >>> On Mär 24 2019, Ben Hutchings wrote: >>> >>> > Presumably you have CONFIG_PPC_BOOK3S_64 enabled and >>> > CONFIG_SPARSEMEM >>> > disabled? Was this configuration actually usable? >>> >>> Why not? >> >> I assume that CONFIG_SPARSEMEM is the default for a good reason. >> What I don't know is how strong that reason is (I am not a Power expert >> at all). Looking a bit further, it seems to be related to CONFIG_NUMA >> in that you can enable CONFIG_FLATMEM if and only if that's disabled. >> So I suppose the configuration you used works for non-NUMA systems. > > Aneesh pointed out this fix would break FLATMEM after I'd merged it, but > it didn't break any of our defconfigs so I wondered if anyone would > notice. > > I checked today and a G5 will boot with FLATMEM, which I assume is what > Andreas is using. > > I guess we should fix this build break for now. > > Even some G5's have discontiguous memory, so FLATMEM is not clearly a > good choice even for all G5's, and actually a fresh g5_defconfig uses > SPARSEMEM. > > So I'm inclined to just switch to always using SPARSEMEM on 64-bit > Book3S, because that's what's well tested and we hardly need more code > paths to test. Unless anyone has a strong objection, I haven't actually > benchmarked FLATMEM vs SPARSEMEM on a G5. > How about >From 207fb0036065d8db44853e63bb858c4fd9952106 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Mon, 1 Apr 2019 17:51:17 +0530 Subject: [PATCH] powerpc/mm: Fix build error The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs. We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32 bit configs never expected a value to be set for MAX_PHYSMEM_BITS. Dependent code such as zsmalloc derived the right values based on other fields. Instead of finding a value that works with different configs, use new values only for book3s_64. For 64 bit booke, use the definition of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase sparsemem defaults") That change was done in 2005 and hopefully will work with book3e 64. Fixes: 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to 2PB") Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/mmu.h | 15 +++ arch/powerpc/include/asm/mmu.h | 15 --- arch/powerpc/include/asm/nohash/64/mmu.h | 2 ++ 3 files changed, 17 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 1ceee000c18d..a809bdd77322 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -35,6 +35,21 @@ typedef pte_t *pgtable_t; #endif /* __ASSEMBLY__ */ +/* + * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS + * if we increase SECTIONS_WIDTH we will not store node details in page->flags and + * page_to_nid does a page->section->node lookup + * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce + * memory requirements with large number of sections. + * 51 bits is the max physical real address on POWER9 + */ +#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \ + defined(CONFIG_PPC_64K_PAGES) +#define MAX_PHYSMEM_BITS 51 +#else +#define MAX_PHYSMEM_BITS 46 +#endif + /* 64-bit classic hash table MMU */ #include diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 598cdcdd1355..78d53c4396ac 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -341,21 +341,6 @@ static inline bool strict_kernel_rwx_enabled(void) */ #define MMU_PAGE_COUNT 16 -/* - * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS - * if we increase SECTIONS_WIDTH we will not store node details in page->flags and - * page_to_nid does a page->section->node lookup - * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce - * memory requirements with large number of sections. - * 51 bits is the max physical real address on POWER9 - */ -#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \ - defined (CONFIG_PPC_64K_PAGES) -#define MAX_PHYSMEM_BITS51 -#elif defined(CONFIG_SPARSEMEM) -#define MAX_PHYSMEM_BITS46 -#endif - #ifdef CONFIG_PPC_BOOK3S_64 #include #else /* CONFIG_PPC_BOOK3S_64 */ diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h b/arch/powerpc/include/asm/nohash/64/mmu.h index e6585480dfc4..81cf30c370e5 100644 --- a/arch/powerpc/include/asm/nohash/64/mmu.h +++ b/arch/powerpc/include/asm/nohash/64/mmu.h @@ -2,6 +2,8 @@ #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_ #define _ASM_POWERPC_NOHASH_64_MMU_H_ +#define MAX_PHYSMEM_BITS44 + /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */ #include -
[PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
With some architectures like ppc64, set_pmd_at() cannot cope with a situation where there is already some (different) valid entry present. Use pmdp_set_access_flags() instead to modify the pfn which is built to deal with modifying existing PMD entries. This is similar to commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by insert_pfn()") We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support pud pfn entries now. Without this patch we also see the below message in kernel log "BUG: non-zero pgtables_bytes on freeing mm:" CC: sta...@vger.kernel.org Reported-by: Chandan Rajendra Signed-off-by: Aneesh Kumar K.V --- Changes from v1: * Fix the pgtable leak mm/huge_memory.c | 36 1 file changed, 36 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 404acdcd0455..165ea46bf149 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -755,6 +755,21 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, spinlock_t *ptl; ptl = pmd_lock(mm, pmd); + if (!pmd_none(*pmd)) { + if (write) { + if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); + goto out_unlock; + } + entry = pmd_mkyoung(*pmd); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (pmdp_set_access_flags(vma, addr, pmd, entry, 1)) + update_mmu_cache_pmd(vma, addr, pmd); + } + + goto out_unlock; + } + entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); @@ -766,11 +781,16 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, if (pgtable) { pgtable_trans_huge_deposit(mm, pmd, pgtable); mm_inc_nr_ptes(mm); + pgtable = NULL; } set_pmd_at(mm, addr, pmd, entry); update_mmu_cache_pmd(vma, addr, pmd); + +out_unlock: spin_unlock(ptl); + if (pgtable) + pte_free(mm, pgtable); } vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, @@ -821,6 +841,20 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, spinlock_t *ptl; ptl = pud_lock(mm, pud); + if (!pud_none(*pud)) { + if (write) { + if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_huge_zero_pud(*pud)); + goto out_unlock; + } + entry = pud_mkyoung(*pud); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + if (pudp_set_access_flags(vma, addr, pud, entry, 1)) + update_mmu_cache_pud(vma, addr, pud); + } + goto out_unlock; + } + entry = pud_mkhuge(pfn_t_pud(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pud_mkdevmap(entry); @@ -830,6 +864,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, } set_pud_at(mm, addr, pud, entry); update_mmu_cache_pud(vma, addr, pud); + +out_unlock: spin_unlock(ptl); } -- 2.20.1
[PATCH v2 3/3] powernv/mce: print additional information about mce error.
From: Mahesh Salgaonkar Print more information about mce error whether it is an hardware or software error. Some of the mce errors can be easily categorized as hardware or software errors e.g. UEs are due to hardware error, where as error triggered due to invalid usage of tlbie is a pure software bug. But not all the mce errors can be easily categorize into either software or hardware. There are errors like multihit errors which are usually result of a software bug, but in some rare cases a hardware failure can cause a multihit error. In past, we have seen case where after replacing faulty chip, multihit errors stopped occurring. Same with parity errors, which are usually due to faulty hardware but there are chances where multihit can also cause an parity error. Such errors are difficult to determine what really caused it. Hence this patch classifies mce errors into following four categorize: 1. Hardware error: UE and Link timeout failure errors. 2. Probable hardware error (some chance of software cause) SLB/ERAT/TLB Parity errors. 3. Software error Invalid tlbie form. 4. Probable software error (some chance of hardware cause) SLB/ERAT/TLB Multihit errors. Sample o/p: [ 1289.447571] MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 01001b6e0320 [Recovered] [ 1289.447615] MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [7fffa309dc60] [ 1289.447634] MCE: CPU80: Probable Software error (some chance of hardware cause) Signed-off-by: Mahesh Salgaonkar --- Change in v2: - Rephrase the wording for error class as suggested by Michael. --- arch/powerpc/include/asm/mce.h | 10 arch/powerpc/kernel/mce.c | 12 arch/powerpc/kernel/mce_power.c | 107 +++ 3 files changed, 86 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index b1f4bf863c95..8741f4c21a1a 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -56,6 +56,14 @@ enum MCE_ErrorType { MCE_ERROR_TYPE_LINK = 7, }; +enum MCE_ErrorClass { + MCE_ECLASS_UNKNOWN = 0, + MCE_ECLASS_HARDWARE, + MCE_ECLASS_HARD_INDETERMINATE, + MCE_ECLASS_SOFTWARE, + MCE_ECLASS_SOFT_INDETERMINATE, +}; + enum MCE_UeErrorType { MCE_UE_ERROR_INDETERMINATE = 0, MCE_UE_ERROR_IFETCH = 1, @@ -115,6 +123,7 @@ struct machine_check_event { enum MCE_Severity severity:8; enum MCE_Initiator initiator:8; enum MCE_ErrorType error_type:8; + enum MCE_ErrorClass error_class:8; enum MCE_Dispositiondisposition:8; boolsync_error; u16 cpu; @@ -195,6 +204,7 @@ struct mce_error_info { } u; enum MCE_Severity severity:8; enum MCE_Initiator initiator:8; + enum MCE_ErrorClass error_class:8; boolsync_error; }; diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 0f961583bd51..1d978c3477a0 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -123,6 +123,7 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->initiator = mce_err->initiator; mce->severity = mce_err->severity; mce->sync_error = mce_err->sync_error; + mce->error_class = mce_err->error_class; /* * Populate the mce error_type and type-specific error_type. @@ -361,6 +362,13 @@ void machine_check_print_event_info(struct machine_check_event *evt, "Store (timeout)", "Page table walk Load/Store (timeout)", }; + static const char *mc_error_class[] = { + "Unknown", + "Hardware error", + "Probable Hardware error (some chance of software cause)", + "Software error", + "Probable Software error (some chance of hardware cause)", + }; /* Print things out */ if (evt->version != MCE_V1) { @@ -478,6 +486,10 @@ void machine_check_print_event_info(struct machine_check_event *evt, printk("%sMCE: CPU%d: NIP: [%016llx] %pS\n", level, evt->cpu, evt->srr0, (void *)evt->srr0); } + + subtype = evt->error_class < ARRAY_SIZE(mc_error_class) ? + mc_error_class[evt->error_class] : "Unknown"; + printk("%sMCE: CPU%d: %s\n", level, evt->cpu, subtype); } EXPORT_SYMBOL_GPL(machine_check_print_event_info); diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index 606af87a4dda..3658af85e48a 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -131,6 +131,7 @@ struct mce_ierror_table { bool nip_valid; /* nip is a valid indicator of faulting address */ unsigned int error_type;
[PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
Using a jiffies timer creates a dependency on the tick_do_timer_cpu incrementing jiffies. If that CPU has locked up and jiffies is not incrementing, the watchdog heartbeat timer for all CPUs stops and creates false positives and confusing warnings on local CPUs, and also causes the SMP detector to stop, so the root cause is never detected. Fix this by using hrtimer based timers for the watchdog heartbeat, like the generic kernel hardlockup detector. Reported-by: Ravikumar Bangoria Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/watchdog.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 3c6ab22a0c4e..59a0e5942f6b 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -77,7 +77,7 @@ static u64 wd_smp_panic_timeout_tb __read_mostly; /* panic other CPUs */ static u64 wd_timer_period_ms __read_mostly; /* interval between heartbeat */ -static DEFINE_PER_CPU(struct timer_list, wd_timer); +static DEFINE_PER_CPU(struct hrtimer, wd_hrtimer); static DEFINE_PER_CPU(u64, wd_timer_tb); /* SMP checker bits */ @@ -293,21 +293,21 @@ void soft_nmi_interrupt(struct pt_regs *regs) nmi_exit(); } -static void wd_timer_reset(unsigned int cpu, struct timer_list *t) -{ - t->expires = jiffies + msecs_to_jiffies(wd_timer_period_ms); - if (wd_timer_period_ms > 1000) - t->expires = __round_jiffies_up(t->expires, cpu); - add_timer_on(t, cpu); -} - -static void wd_timer_fn(struct timer_list *t) +static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) { int cpu = smp_processor_id(); + if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) + return HRTIMER_NORESTART; + + if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) + return HRTIMER_NORESTART; + watchdog_timer_interrupt(cpu); - wd_timer_reset(cpu, t); + hrtimer_forward_now(hrtimer, ms_to_ktime(wd_timer_period_ms)); + + return HRTIMER_RESTART; } void arch_touch_nmi_watchdog(void) @@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog); static void start_watchdog_timer_on(unsigned int cpu) { - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu); + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer); per_cpu(wd_timer_tb, cpu) = get_tb(); - timer_setup(t, wd_timer_fn, TIMER_PINNED); - wd_timer_reset(cpu, t); + hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer->function = watchdog_timer_fn; + hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms), + HRTIMER_MODE_REL_PINNED); } static void stop_watchdog_timer_on(unsigned int cpu) { - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu); + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer); - del_timer_sync(t); + hrtimer_cancel(hrtimer); } static int start_wd_on_cpu(unsigned int cpu) -- 2.20.1
[PATCH v2 1/3] powernv/mce: reduce mce console logs to lesser lines.
From: Mahesh Salgaonkar Also add cpu number while displaying mce log. This will help cleaner logs when mce hits on multiple cpus simultaneously. before the changes the mce o/p was: [ 127.223515] Severe Machine check interrupt [Recovered] [ 127.223530] NIP [dba80280]: insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb] [ 127.223539] Initiator: CPU [ 127.223544] Error type: SLB [Multihit] [ 127.223550] Effective address: dba80280 After this patch series changes the mce o/p will be: [ 471.959843] MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered] [ 471.959870] MCE: CPU80: NIP: [db550280] insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb] [ 471.959892] MCE: CPU80: Probable software error (some chance of hardware cause) and for MCE in Guest: [ 1289.447571] MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 01001b6e0320 [Recovered] [ 1289.447615] MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [7fffa309dc60] [ 1289.447634] MCE: CPU80: Probable software error (some chance of hardware cause) Signed-off-by: Mahesh Salgaonkar --- Change in v2: - Address comments from Michael. --- arch/powerpc/include/asm/mce.h |2 - arch/powerpc/kernel/mce.c | 82 2 files changed, 41 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 17996bc9382b..8d0b1c24c636 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -116,7 +116,7 @@ struct machine_check_event { enum MCE_Initiator initiator:8;/* 0x03 */ enum MCE_ErrorType error_type:8; /* 0x04 */ enum MCE_Dispositiondisposition:8; /* 0x05 */ - uint8_t reserved_1[2]; /* 0x06 */ + uint16_tcpu;/* 0x06 */ uint64_tgpr3; /* 0x08 */ uint64_tsrr0; /* 0x10 */ uint64_tsrr1; /* 0x18 */ diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index b5fec1f9751a..d3ee099e0981 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -112,6 +112,7 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->srr1 = regs->msr; mce->gpr3 = regs->gpr[3]; mce->in_use = 1; + mce->cpu = get_paca()->paca_index; /* Mark it recovered if we have handled it and MSR(RI=1). */ if (handled && (regs->msr & MSR_RI)) @@ -310,7 +311,9 @@ static void machine_check_process_queued_event(struct irq_work *work) void machine_check_print_event_info(struct machine_check_event *evt, bool user_mode, bool in_guest) { - const char *level, *sevstr, *subtype; + const char *level, *sevstr, *subtype, *err_type; + uint64_t ea = 0; + char dar_str[50]; static const char *mc_ue_types[] = { "Indeterminate", "Instruction fetch", @@ -384,101 +387,96 @@ void machine_check_print_event_info(struct machine_check_event *evt, break; } - printk("%s%s Machine check interrupt [%s]\n", level, sevstr, - evt->disposition == MCE_DISPOSITION_RECOVERED ? - "Recovered" : "Not recovered"); - - if (in_guest) { - printk("%s Guest NIP: %016llx\n", level, evt->srr0); - } else if (user_mode) { - printk("%s NIP: [%016llx] PID: %d Comm: %s\n", level, - evt->srr0, current->pid, current->comm); - } else { - printk("%s NIP [%016llx]: %pS\n", level, evt->srr0, - (void *)evt->srr0); - } - - printk("%s Initiator: %s\n", level, - evt->initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown"); switch (evt->error_type) { case MCE_ERROR_TYPE_UE: + err_type = "UE"; subtype = evt->u.ue_error.ue_error_type < ARRAY_SIZE(mc_ue_types) ? mc_ue_types[evt->u.ue_error.ue_error_type] : "Unknown"; - printk("%s Error type: UE [%s]\n", level, subtype); if (evt->u.ue_error.effective_address_provided) - printk("%sEffective address: %016llx\n", - level, evt->u.ue_error.effective_address); - if (evt->u.ue_error.physical_address_provided) - printk("%sPhysical address: %016llx\n", - level, evt->u.ue_error.physical_address); + ea = evt->u.ue_error.effective_address; break; case MCE_ERROR_TYPE_SLB: + err_type = "SLB"; subtype = evt->u.slb_error.slb_error_type < ARRAY_SIZE(mc_slb_types) ? mc_slb_types[
[PATCH v2 2/3] powernv/mce: Print correct severity for mce error.
From: Mahesh Salgaonkar Currently all machine check errors are printed as severe errors which isn't correct. Print soft errors as warning instead of severe errors. Signed-off-by: Mahesh Salgaonkar --- change in v2: - Use kernel types i.e. u8, u64 etc. - Define sync_error as bool. --- arch/powerpc/include/asm/mce.h| 86 ++-- arch/powerpc/kernel/mce.c |5 + arch/powerpc/kernel/mce_power.c | 144 + arch/powerpc/platforms/powernv/opal.c |2 4 files changed, 123 insertions(+), 114 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 8d0b1c24c636..b1f4bf863c95 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -31,7 +31,7 @@ enum MCE_Version { enum MCE_Severity { MCE_SEV_NO_ERROR = 0, MCE_SEV_WARNING = 1, - MCE_SEV_ERROR_SYNC = 2, + MCE_SEV_SEVERE = 2, MCE_SEV_FATAL = 3, }; @@ -110,73 +110,74 @@ enum MCE_LinkErrorType { }; struct machine_check_event { - enum MCE_Versionversion:8; /* 0x00 */ - uint8_t in_use; /* 0x01 */ - enum MCE_Severity severity:8; /* 0x02 */ - enum MCE_Initiator initiator:8;/* 0x03 */ - enum MCE_ErrorType error_type:8; /* 0x04 */ - enum MCE_Dispositiondisposition:8; /* 0x05 */ - uint16_tcpu;/* 0x06 */ - uint64_tgpr3; /* 0x08 */ - uint64_tsrr0; /* 0x10 */ - uint64_tsrr1; /* 0x18 */ - union { /* 0x20 */ + enum MCE_Versionversion:8; + u8 in_use; + enum MCE_Severity severity:8; + enum MCE_Initiator initiator:8; + enum MCE_ErrorType error_type:8; + enum MCE_Dispositiondisposition:8; + boolsync_error; + u16 cpu; + u64 gpr3; + u64 srr0; + u64 srr1; + union { struct { enum MCE_UeErrorType ue_error_type:8; - uint8_t effective_address_provided; - uint8_t physical_address_provided; - uint8_t reserved_1[5]; - uint64_teffective_address; - uint64_tphysical_address; - uint8_t reserved_2[8]; + u8 effective_address_provided; + u8 physical_address_provided; + u8 reserved_1[5]; + u64 effective_address; + u64 physical_address; + u8 reserved_2[8]; } ue_error; struct { enum MCE_SlbErrorType slb_error_type:8; - uint8_t effective_address_provided; - uint8_t reserved_1[6]; - uint64_teffective_address; - uint8_t reserved_2[16]; + u8 effective_address_provided; + u8 reserved_1[6]; + u64 effective_address; + u8 reserved_2[16]; } slb_error; struct { enum MCE_EratErrorType erat_error_type:8; - uint8_t effective_address_provided; - uint8_t reserved_1[6]; - uint64_teffective_address; - uint8_t reserved_2[16]; + u8 effective_address_provided; + u8 reserved_1[6]; + u64 effective_address; + u8 reserved_2[16]; } erat_error; struct { enum MCE_TlbErrorType tlb_error_type:8; - uint8_t effective_address_provided; - uint8_t reserved_1[6]; - uint64_teffective_address; - uint8_t reserved_2[16]; + u8 effective_address_provided; + u8 reserved_1[6]; + u64 effective_address; + u8 reserved_2[16]; } tlb_error; struct { enum MCE_UserErrorType user_error_type:8; - uint8_t effective_address_provided; -
RE: [PATCH] ASoC: fsl_esai: Support synchronous mode
Hi > > Shengjiu, > > On Mon, Apr 01, 2019 at 11:39:10AM +, S.j. Wang wrote: > > In ESAI synchronous mode, the clock is generated by Tx, So we should > > always set registers of Tx which relate with the bit clock and frame > > clock generation (TCCR, TCR, ECR), even there is only Rx is working. > > > > Signed-off-by: Shengjiu Wang > > --- > > sound/soc/fsl/fsl_esai.c | 28 +++- > > 1 file changed, 27 insertions(+), 1 deletion(-) > > > > diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c index > > 3623aa9a6f2e..d9fcddd55c02 100644 > > --- a/sound/soc/fsl/fsl_esai.c > > +++ b/sound/soc/fsl/fsl_esai.c > > @@ -230,6 +230,21 @@ static int fsl_esai_set_dai_sysclk(struct > snd_soc_dai *dai, int clk_id, > > return -EINVAL; > > } > > > > + if (esai_priv->synchronous && !tx) { > > + switch (clk_id) { > > + case ESAI_HCKR_FSYS: > > + fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_FSYS, > > + freq, dir); > > + break; > > + case ESAI_HCKR_EXTAL: > > + fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_EXTAL, > > + freq, dir); > > Not sure why you call set_dai_sysclk inside set_dai_sysclk again. It feels > very > confusing to do so, especially without a comments. For sync mode, only RX is enabled, the register of tx should be set, so call the Set_dai_sysclk again. > > > + break; > > + default: > > + return -EINVAL; > > + } > > + } > > + > > /* Bypass divider settings if the requirement doesn't change */ > > if (freq == esai_priv->hck_rate[tx] && dir == esai_priv->hck_dir[tx]) > > return 0; > > @@ -537,10 +552,21 @@ static int fsl_esai_hw_params(struct > > snd_pcm_substream *substream, > > > > bclk = params_rate(params) * slot_width * esai_priv->slots; > > > > - ret = fsl_esai_set_bclk(dai, tx, bclk); > > + ret = fsl_esai_set_bclk(dai, esai_priv->synchronous ? true : tx, > > +bclk); > > if (ret) > > return ret; > > > > + if (esai_priv->synchronous && !tx) { > > + /* Use Normal mode to support monaural audio */ > > + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, > > + ESAI_xCR_xMOD_MASK, > params_channels(params) > 1 ? > > + ESAI_xCR_xMOD_NETWORK : 0); > > + > > + mask = ESAI_xCR_xSWS_MASK | ESAI_xCR_PADC; > > + val = ESAI_xCR_xSWS(slot_width, width) | ESAI_xCR_PADC; > > + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, > mask, val); > > + } > > Does synchronous mode require to set both TCR and RCR? or just TCR? > The code behind this part is doing the same setting to RCR. If that is not > needed any more for a synchronous recording, we should reuse it instead > of inserting a piece of redundant code. Otherwise, if we need to set both, > we should have two regmap_update_bits operations back-to-back for TCR > and RCR (and other registers too). Both TCR and RCR. RCR will be set in normal flow. > > > + > > /* Use Normal mode to support monaural audio */ > > regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), > >ESAI_xCR_xMOD_MASK, > params_channels(params) > 1 ? > > In case that we only need to set TCR (more likely I feel), it would feel less > confusing to me, if we changed REG_ESAI_xCR(tx) here, for example, to > REG_ESAI_xCR(tx || sync). Yea, please add to the top a 'bool sync = > esai_priv->synchronous;'. > > Similarly, for ECR_ETO and ECR_ERO: > (tx || sync) ? ESAI_ECR_ETO : ESAI_ECR_ERO; Both TCR and RCR should be set.
Re: [RFC PATCH v2 3/3] kasan: add interceptors for all string functions
Hi Dmitry, Andrey and others, Do you have any comments to this series ? I'd like to know if this approach is ok or if it is better to keep doing as in https://patchwork.ozlabs.org/patch/1055788/ Thanks Christophe Le 28/03/2019 à 16:00, Christophe Leroy a écrit : In the same spirit as commit 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions"), this patch adds interceptors for string manipulation functions so that we can compile lib/string.o without kasan support hence allow the string functions to also be used from places where kasan has to be disabled. Signed-off-by: Christophe Leroy --- v2: Fixed a few checkpatch stuff and added missing EXPORT_SYMBOL() and missing #undefs include/linux/string.h | 79 ++ lib/Makefile | 2 + lib/string.c | 8 + mm/kasan/string.c | 394 + 4 files changed, 483 insertions(+) diff --git a/include/linux/string.h b/include/linux/string.h index 7927b875f80c..3d2aff2ed402 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -19,54 +19,117 @@ extern void *memdup_user_nul(const void __user *, size_t); */ #include +#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__) +/* + * For files that are not instrumented (e.g. mm/slub.c) we + * should use not instrumented version of mem* functions. + */ +#define memset16 __memset16 +#define memset32 __memset32 +#define memset64 __memset64 +#define memzero_explicit __memzero_explicit +#define strcpy __strcpy +#define strncpy__strncpy +#define strlcpy__strlcpy +#define strscpy__strscpy +#define strcat __strcat +#define strncat__strncat +#define strlcat__strlcat +#define strcmp __strcmp +#define strncmp__strncmp +#define strcasecmp __strcasecmp +#define strncasecmp__strncasecmp +#define strchr __strchr +#define strchrnul __strchrnul +#define strrchr__strrchr +#define strnchr__strnchr +#define skip_spaces__skip_spaces +#define strim __strim +#define strstr __strstr +#define strnstr__strnstr +#define strlen __strlen +#define strnlen__strnlen +#define strpbrk__strpbrk +#define strsep __strsep +#define strspn __strspn +#define strcspn__strcspn +#define memscan__memscan +#define memcmp __memcmp +#define memchr __memchr +#define memchr_inv __memchr_inv +#define strreplace __strreplace + +#ifndef __NO_FORTIFY +#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */ +#endif + +#endif + #ifndef __HAVE_ARCH_STRCPY extern char * strcpy(char *,const char *); +char *__strcpy(char *, const char *); #endif #ifndef __HAVE_ARCH_STRNCPY extern char * strncpy(char *,const char *, __kernel_size_t); +char *__strncpy(char *, const char *, __kernel_size_t); #endif #ifndef __HAVE_ARCH_STRLCPY size_t strlcpy(char *, const char *, size_t); +size_t __strlcpy(char *, const char *, size_t); #endif #ifndef __HAVE_ARCH_STRSCPY ssize_t strscpy(char *, const char *, size_t); +ssize_t __strscpy(char *, const char *, size_t); #endif #ifndef __HAVE_ARCH_STRCAT extern char * strcat(char *, const char *); +char *__strcat(char *, const char *); #endif #ifndef __HAVE_ARCH_STRNCAT extern char * strncat(char *, const char *, __kernel_size_t); +char *__strncat(char *, const char *, __kernel_size_t); #endif #ifndef __HAVE_ARCH_STRLCAT extern size_t strlcat(char *, const char *, __kernel_size_t); +size_t __strlcat(char *, const char *, __kernel_size_t); #endif #ifndef __HAVE_ARCH_STRCMP extern int strcmp(const char *,const char *); +int __strcmp(const char *, const char *); #endif #ifndef __HAVE_ARCH_STRNCMP extern int strncmp(const char *,const char *,__kernel_size_t); +int __strncmp(const char *, const char *, __kernel_size_t); #endif #ifndef __HAVE_ARCH_STRCASECMP extern int strcasecmp(const char *s1, const char *s2); +int __strcasecmp(const char *s1, const char *s2); #endif #ifndef __HAVE_ARCH_STRNCASECMP extern int strncasecmp(const char *s1, const char *s2, size_t n); +int __strncasecmp(const char *s1, const char *s2, size_t n); #endif #ifndef __HAVE_ARCH_STRCHR extern char * strchr(const char *,int); +char *__strchr(const char *, int); #endif #ifndef __HAVE_ARCH_STRCHRNUL extern char * strchrnul(const char *,int); +char *__strchrnul(const char *, int); #endif #ifndef __HAVE_ARCH_STRNCHR extern char * strnchr(const char *, size_t, int); +char *__strnchr(const char *, size_t, int); #endif #ifndef __HAVE_ARCH_STRRCHR extern char * strrchr(const char *,int); +char *__strrchr(const char *, int); #endif extern char * __must_check skip_spaces(const char *); +char * __must_check __
[PATCH 9/9] powerpc: use generic CMDLINE manipulations
This patch moves powerpc to the centraly defined CMDLINE options. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 48 +++- 1 file changed, 3 insertions(+), 45 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 22d6a48bd2ca..6a71d7c514cc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -182,6 +182,7 @@ config PPC select HAVE_CBPF_JITif !PPC64 select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13) select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2) + select HAVE_CMDLINE select HAVE_CONTEXT_TRACKINGif PPC64 select HAVE_DEBUG_KMEMLEAK select HAVE_DEBUG_STACKOVERFLOW @@ -828,52 +829,9 @@ config PPC_DENORMALISATION Add support for handling denormalisation of single precision values. Useful for bare metal only. If unsure say Y here. -config CMDLINE_BOOL - bool "Default bootloader kernel arguments" - -config CMDLINE - string "Initial kernel command string" - depends on CMDLINE_BOOL +config DEFAULT_CMDLINE + string default "console=ttyS0,9600 console=tty0 root=/dev/sda2" - help - On some platforms, there is currently no way for the boot loader to - pass arguments to the kernel. For these platforms, you can supply - some command-line options at build time by entering them here. In - most cases you will need to specify the root device here. - -choice - prompt "Kernel command line type" if CMDLINE != "" - default CMDLINE_FROM_BOOTLOADER - help - Selects the way you want to use the default kernel arguments. - -config CMDLINE_FROM_BOOTLOADER - bool "Use bootloader kernel arguments if available" - help - Uses the command-line options passed by the boot loader. If - the boot loader doesn't provide any, the default kernel command - string provided in CMDLINE will be used. - -config CMDLINE_EXTEND - bool "Extend bootloader kernel arguments" - help - The default kernel command string will be appended to the - command-line arguments provided during boot. - -config CMDLINE_PREPEND - bool "Prepend bootloader kernel arguments" - help - The default kernel command string will be prepend to the - command-line arguments provided during boot. - -config CMDLINE_FORCE - bool "Always use the default kernel command string" - help - Always use the default kernel command string, even if the boot - loader passes other arguments to the kernel. - This is useful if you cannot or don't want to change the - command-line options your boot loader passes to the kernel. -endchoice config EXTRA_TARGETS string "Additional default image types" -- 2.13.3
[PATCH 8/9] Gives arches opportunity to use generically defined boot cmdline manipulation
Most arches have similar boot command line manipulation options. This patchs adds the definition in init/Kconfig, gated by CONFIG_HAVE_CMDLINE that the arches can select to use them. In order to use this, a few arches will have to change their CONFIG options: - riscv has to replace CMDLINE_FALLBACK by CMDLINE_FROM_BOOTLOADER - arches using CONFIG_CMDLINE_OVERRIDE or CONFIG_CMDLINE_OVERWRITE have to replace them by CONFIG_CMDLINE_FORCE Arches also have to define CONFIG_DEFAULT_CMDLINE Signed-off-by: Christophe Leroy --- init/Kconfig | 56 1 file changed, 56 insertions(+) diff --git a/init/Kconfig b/init/Kconfig index 4592bf7997c0..83537603412c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -80,6 +80,62 @@ config INIT_ENV_ARG_LIMIT Maximum of each of the number of arguments and environment variables passed to init from the kernel command line. +config HAVE_CMDLINE + bool + +config CMDLINE_BOOL + bool "Default bootloader kernel arguments" + depends on HAVE_CMDLINE + help + On some platforms, there is currently no way for the boot loader to + pass arguments to the kernel. For these platforms, you can supply + some command-line options at build time by entering them here. In + most cases you will need to specify the root device here. + +config CMDLINE + string "Initial kernel command string" + depends on CMDLINE_BOOL + default DEFAULT_CMDLINE + help + On some platforms, there is currently no way for the boot loader to + pass arguments to the kernel. For these platforms, you can supply + some command-line options at build time by entering them here. In + most cases you will need to specify the root device here. + +choice + prompt "Kernel command line type" if CMDLINE != "" + default CMDLINE_FROM_BOOTLOADER + help + Selects the way you want to use the default kernel arguments. + +config CMDLINE_FROM_BOOTLOADER + bool "Use bootloader kernel arguments if available" + help + Uses the command-line options passed by the boot loader. If + the boot loader doesn't provide any, the default kernel command + string provided in CMDLINE will be used. + +config CMDLINE_EXTEND + bool "Extend bootloader kernel arguments" + help + The default kernel command string will be appended to the + command-line arguments provided during boot. + +config CMDLINE_PREPEND + bool "Prepend bootloader kernel arguments" + help + The default kernel command string will be prepend to the + command-line arguments provided during boot. + +config CMDLINE_FORCE + bool "Always use the default kernel command string" + help + Always use the default kernel command string, even if the boot + loader passes other arguments to the kernel. + This is useful if you cannot or don't want to change the + command-line options your boot loader passes to the kernel. +endchoice + config COMPILE_TEST bool "Compile also drivers which will not load" depends on !UML -- 2.13.3
[PATCH 5/9] powerpc: convert to generic builtin command line
This updates the powerpc code to use the new cmdline building function. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/prom_init.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index d4889ba04ddd..08f3db25b2f1 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -155,7 +156,7 @@ static struct prom_t __prombss prom; static unsigned long __prombss prom_entry; static char __prombss of_stdout_device[256]; -static char __prombss prom_scratch[256]; +static char __prombss prom_scratch[COMMAND_LINE_SIZE]; static unsigned long __prombss dt_header_start; static unsigned long __prombss dt_struct_start, dt_struct_end; @@ -627,18 +628,14 @@ static unsigned long prom_memparse(const char *ptr, const char **retptr) static void __init early_cmdline_parse(void) { const char *opt; + int l = 0; - char *p; - int l __maybe_unused = 0; - - prom_cmd_line[0] = 0; - p = prom_cmd_line; if ((long)prom.chosen > 0) - l = prom_getprop(prom.chosen, "bootargs", p, COMMAND_LINE_SIZE-1); -#ifdef CONFIG_CMDLINE - if (l <= 0 || p[0] == '\0' || IS_ENABLED(CONFIG_CMDLINE_EXTEND)) /* dbl check */ - strlcat(prom_cmd_line, CONFIG_CMDLINE, sizeof(prom_cmd_line)); -#endif /* CONFIG_CMDLINE */ + l = prom_getprop(prom.chosen, "bootargs", prom_scratch, +COMMAND_LINE_SIZE - 1); + + cmdline_build(prom_cmd_line, l > 0 ? prom_scratch : NULL, sizeof(prom_scratch)); + prom_printf("command line: %s\n", prom_cmd_line); #ifdef CONFIG_PPC64 -- 2.13.3
[PATCH 7/9] powerpc: add capability to prepend default command line
This patch activates the capability to prepend default arguments to the command line. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2972348e52be..22d6a48bd2ca 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -860,6 +860,12 @@ config CMDLINE_EXTEND The default kernel command string will be appended to the command-line arguments provided during boot. +config CMDLINE_PREPEND + bool "Prepend bootloader kernel arguments" + help + The default kernel command string will be prepend to the + command-line arguments provided during boot. + config CMDLINE_FORCE bool "Always use the default kernel command string" help -- 2.13.3
[PATCH 6/9] Add capability to prepend the command line
This patchs adds an option of prepend a text to the command line instead of appending it. Signed-off-by: Christophe Leroy --- include/linux/cmdline.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h index afcc00d7628d..5caf3724c1ab 100644 --- a/include/linux/cmdline.h +++ b/include/linux/cmdline.h @@ -3,7 +3,7 @@ #define _LINUX_CMDLINE_H /* - * This function will append a builtin command line to the command + * This function will append or prepend a builtin command line to the command * line provided by the bootloader. Kconfig options can be used to alter * the behavior of this builtin command line. * @dest: The destination of the final appended/prepended string. @@ -22,6 +22,9 @@ static __always_inline void cmdline_build(char *dest, const char *src, size_t le strlcat(dest, CONFIG_CMDLINE, length); return; } + + if (IS_ENABLED(CONFIG_CMDLINE_PREPEND) && sizeof(CONFIG_CMDLINE) > 1) + strlcat(dest, CONFIG_CMDLINE " ", length); #endif if (dest != src) strlcat(dest, src, length); -- 2.13.3
[PATCH 4/9] powerpc/prom_init: get rid of PROM_SCRATCH_SIZE
PROM_SCRATCH_SIZE is same as sizeof(prom_scratch) Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/prom_init.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index a6cd52240c58..d4889ba04ddd 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -154,10 +154,8 @@ static struct prom_t __prombss prom; static unsigned long __prombss prom_entry; -#define PROM_SCRATCH_SIZE 256 - static char __prombss of_stdout_device[256]; -static char __prombss prom_scratch[PROM_SCRATCH_SIZE]; +static char __prombss prom_scratch[256]; static unsigned long __prombss dt_header_start; static unsigned long __prombss dt_struct_start, dt_struct_end; @@ -1486,8 +1484,8 @@ static void __init prom_init_mem(void) endp = p + (plen / sizeof(cell_t)); #ifdef DEBUG_PROM - memset(path, 0, PROM_SCRATCH_SIZE); - call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1); + memset(path, 0, sizeof(prom_scratch)); + call_prom("package-to-path", 3, 1, node, path, sizeof(prom_scratch) - 1); prom_debug(" node %s :\n", path); #endif /* DEBUG_PROM */ @@ -1795,10 +1793,10 @@ static void __init prom_initialize_tce_table(void) local_alloc_bottom = base; /* It seems OF doesn't null-terminate the path :-( */ - memset(path, 0, PROM_SCRATCH_SIZE); + memset(path, 0, sizeof(prom_scratch)); /* Call OF to setup the TCE hardware */ if (call_prom("package-to-path", 3, 1, node, - path, PROM_SCRATCH_SIZE-1) == PROM_ERROR) { + path, sizeof(prom_scratch) - 1) == PROM_ERROR) { prom_printf("package-to-path failed\n"); } @@ -2159,14 +2157,14 @@ static void __init prom_check_displays(void) /* It seems OF doesn't null-terminate the path :-( */ path = prom_scratch; - memset(path, 0, PROM_SCRATCH_SIZE); + memset(path, 0, sizeof(prom_scratch)); /* * leave some room at the end of the path for appending extra * arguments */ if (call_prom("package-to-path", 3, 1, node, path, - PROM_SCRATCH_SIZE-10) == PROM_ERROR) + sizeof(prom_scratch) - 10) == PROM_ERROR) continue; prom_printf("found display : %s, opening... ", path); @@ -2362,8 +2360,8 @@ static void __init scan_dt_build_struct(phandle node, unsigned long *mem_start, /* get it again for debugging */ path = prom_scratch; - memset(path, 0, PROM_SCRATCH_SIZE); - call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1); + memset(path, 0, sizeof(prom_scratch)); + call_prom("package-to-path", 3, 1, node, path, sizeof(prom_scratch) - 1); /* get and store all properties */ prev_name = ""; -- 2.13.3
[PATCH 3/9] drivers: of: use cmdline building function
This patch uses the new cmdline building function to concatenate the of provided cmdline with built-in parts based on compile-time options. Signed-off-by: Christophe Leroy --- drivers/of/fdt.c| 23 --- include/linux/cmdline.h | 2 +- 2 files changed, 5 insertions(+), 20 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 4734223ab702..c6d941785b37 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -24,6 +24,7 @@ #include #include #include +#include #include /* for COMMAND_LINE_SIZE */ #include @@ -1090,26 +1091,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", &l); - if (p != NULL && l > 0) - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); + if (l <= 0) + p = NULL; - /* -* CONFIG_CMDLINE is meant to be a default in case nothing else -* managed to set the command line, unless CONFIG_CMDLINE_FORCE -* is set in which case we override whatever was found earlier. -*/ -#ifdef CONFIG_CMDLINE -#if defined(CONFIG_CMDLINE_EXTEND) - strlcat(data, " ", COMMAND_LINE_SIZE); - strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#elif defined(CONFIG_CMDLINE_FORCE) - strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#else - /* No arguments from boot loader, use kernel's cmdl*/ - if (!((char *)data)[0]) - strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#endif -#endif /* CONFIG_CMDLINE */ + cmdline_build(data, p, COMMAND_LINE_SIZE); pr_debug("Command line is: %s\n", (char*)data); diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h index 8610ddf813ff..afcc00d7628d 100644 --- a/include/linux/cmdline.h +++ b/include/linux/cmdline.h @@ -10,7 +10,7 @@ * @src: The starting string or NULL if there isn't one. Must not equal dest. * @length: the length of dest buffer. */ -static __always_inline void cmdline_build(char *dest, char *src, size_t length) +static __always_inline void cmdline_build(char *dest, const char *src, size_t length) { if (length <= 0) return; -- 2.13.3
[PATCH 1/9] powerpc: enable appending of CONFIG_CMDLINE to bootloader's cmdline.
Today, powerpc defined CONFIG_CMDLINE for when bootloader doesn't provide a command line or for overriding it. On same way as ARM, this patch adds the option of appending the CONFIG_CMDLINE to bootloader's provided command line. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 21 - arch/powerpc/kernel/prom_init.c| 5 ++--- arch/powerpc/kernel/prom_init_check.sh | 2 +- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2d0be82c3061..2972348e52be 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -841,14 +841,33 @@ config CMDLINE some command-line options at build time by entering them here. In most cases you will need to specify the root device here. +choice + prompt "Kernel command line type" if CMDLINE != "" + default CMDLINE_FROM_BOOTLOADER + help + Selects the way you want to use the default kernel arguments. + +config CMDLINE_FROM_BOOTLOADER + bool "Use bootloader kernel arguments if available" + help + Uses the command-line options passed by the boot loader. If + the boot loader doesn't provide any, the default kernel command + string provided in CMDLINE will be used. + +config CMDLINE_EXTEND + bool "Extend bootloader kernel arguments" + help + The default kernel command string will be appended to the + command-line arguments provided during boot. + config CMDLINE_FORCE bool "Always use the default kernel command string" - depends on CMDLINE_BOOL help Always use the default kernel command string, even if the boot loader passes other arguments to the kernel. This is useful if you cannot or don't want to change the command-line options your boot loader passes to the kernel. +endchoice config EXTRA_TARGETS string "Additional default image types" diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index f33ff4163a51..a6cd52240c58 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -638,9 +638,8 @@ static void __init early_cmdline_parse(void) if ((long)prom.chosen > 0) l = prom_getprop(prom.chosen, "bootargs", p, COMMAND_LINE_SIZE-1); #ifdef CONFIG_CMDLINE - if (l <= 0 || p[0] == '\0') /* dbl check */ - strlcpy(prom_cmd_line, - CONFIG_CMDLINE, sizeof(prom_cmd_line)); + if (l <= 0 || p[0] == '\0' || IS_ENABLED(CONFIG_CMDLINE_EXTEND)) /* dbl check */ + strlcat(prom_cmd_line, CONFIG_CMDLINE, sizeof(prom_cmd_line)); #endif /* CONFIG_CMDLINE */ prom_printf("command line: %s\n", prom_cmd_line); diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh index 667df97d2595..cbcf18846392 100644 --- a/arch/powerpc/kernel/prom_init_check.sh +++ b/arch/powerpc/kernel/prom_init_check.sh @@ -19,7 +19,7 @@ WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush _end enter_prom memcpy memset reloc_offset __secondary_hold __secondary_hold_acknowledge __secondary_hold_spinloop __start -strcmp strcpy strlcpy strlen strncmp strstr kstrtobool logo_linux_clut224 +strcmp strcpy strlcat strlen strncmp strstr kstrtobool logo_linux_clut224 reloc_got2 kernstart_addr memstart_addr linux_banner _stext __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC." -- 2.13.3
[PATCH 2/9] Add generic function to build command line.
This code provides architectures with a way to build command line based on what is built in the kernel and what is handed over by the bootloader, based on selected compile-time options. Signed-off-by: Christophe Leroy --- include/linux/cmdline.h | 34 ++ 1 file changed, 34 insertions(+) create mode 100644 include/linux/cmdline.h diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h new file mode 100644 index ..8610ddf813ff --- /dev/null +++ b/include/linux/cmdline.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CMDLINE_H +#define _LINUX_CMDLINE_H + +/* + * This function will append a builtin command line to the command + * line provided by the bootloader. Kconfig options can be used to alter + * the behavior of this builtin command line. + * @dest: The destination of the final appended/prepended string. + * @src: The starting string or NULL if there isn't one. Must not equal dest. + * @length: the length of dest buffer. + */ +static __always_inline void cmdline_build(char *dest, char *src, size_t length) +{ + if (length <= 0) + return; + + dest[0] = 0; + +#ifdef CONFIG_CMDLINE + if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !src || !src[0]) { + strlcat(dest, CONFIG_CMDLINE, length); + return; + } +#endif + if (dest != src) + strlcat(dest, src, length); +#ifdef CONFIG_CMDLINE + if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) && sizeof(CONFIG_CMDLINE) > 1) + strlcat(dest, " " CONFIG_CMDLINE, length); +#endif +} + +#endif /* _LINUX_CMDLINE_H */ -- 2.13.3
[PATCH 0/9] Improve boot command line handling
The purpose of this series is to improve and enhance the handling of kernel boot arguments. It is first focussed on powerpc but also extends the capability for other arches. This is based on suggestion from Daniel Walker Christophe Leroy (9): powerpc: enable appending of CONFIG_CMDLINE to bootloader's cmdline. Add generic function to build command line. drivers: of: use cmdline building function powerpc/prom_init: get rid of PROM_SCRATCH_SIZE powerpc: convert to generic builtin command line Add capability to prepend the command line powerpc: add capability to prepend default command line Gives arches opportunity to use generically defined boot cmdline manipulation powerpc: use generic CMDLINE manipulations arch/powerpc/Kconfig | 23 ++ arch/powerpc/kernel/prom_init.c| 38 ++- arch/powerpc/kernel/prom_init_check.sh | 2 +- drivers/of/fdt.c | 23 +++--- include/linux/cmdline.h| 37 ++ init/Kconfig | 56 ++ 6 files changed, 117 insertions(+), 62 deletions(-) create mode 100644 include/linux/cmdline.h -- 2.13.3
Re: [PATCH 2/5] powerpc: Fix vDSO clock_getres()
On 02/04/2019 07:14, Christophe Leroy wrote: > > > On 04/01/2019 11:51 AM, Vincenzo Frascino wrote: >> clock_getres in the vDSO library has to preserve the same behaviour >> of posix_get_hrtimer_res(). >> >> In particular, posix_get_hrtimer_res() does: >> sec = 0; >> ns = hrtimer_resolution; >> and hrtimer_resolution depends on the enablement of the high >> resolution timers that can happen either at compile or at run time. >> >> Fix the powerpc vdso implementation of clock_getres keeping a copy of >> hrtimer_resolution in vdso data and using that directly. >> >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Michael Ellerman >> Signed-off-by: Vincenzo Frascino >> --- >> arch/powerpc/include/asm/vdso_datapage.h | 2 ++ >> arch/powerpc/kernel/asm-offsets.c | 2 +- >> arch/powerpc/kernel/time.c| 1 + >> arch/powerpc/kernel/vdso32/gettimeofday.S | 22 +++--- >> arch/powerpc/kernel/vdso64/gettimeofday.S | 22 +++--- >> 5 files changed, 34 insertions(+), 15 deletions(-) >> > > [...] > >> diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S >> b/arch/powerpc/kernel/vdso32/gettimeofday.S >> index 1e0bc5955a40..b21630079496 100644 >> --- a/arch/powerpc/kernel/vdso32/gettimeofday.S >> +++ b/arch/powerpc/kernel/vdso32/gettimeofday.S >> @@ -160,14 +160,21 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) >> crorcr0*4+eq,cr0*4+eq,cr1*4+eq >> bne cr0,99f >> >> -li r3,0 >> -cmpli cr0,r4,0 >> +mflrr12 >> + .cfi_register lr,r12 >> +mr r11,r4 >> +bl __get_datapage@local >> +lwz r5,CLOCK_REALTIME_RES(r3) >> +li r4,0 >> +cmplwi r11,0 /* check if res is NULL */ >> +beq 1f >> + >> +stw r4,TSPC32_TV_SEC(r11) >> +stw r5,TSPC32_TV_NSEC(r11) >> + >> +1: mtlrr12 >> crclr cr0*4+so >> -beqlr >> -lis r5,CLOCK_REALTIME_RES@h >> -ori r5,r5,CLOCK_REALTIME_RES@l >> -stw r3,TSPC32_TV_SEC(r4) >> -stw r5,TSPC32_TV_NSEC(r4) >> +li r3,0 >> blr > > The above can be done simpler, see below > > @@ -160,12 +160,15 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) > crorcr0*4+eq,cr0*4+eq,cr1*4+eq > bne cr0,99f > > + mflrr12 > + .cfi_register lr,r12 > + bl __get_datapage@local > + lwz r5,CLOCK_REALTIME_RES(r3) > + mtlrr12 > li r3,0 > cmpli cr0,r4,0 > crclr cr0*4+so > beqlr > - lis r5,CLOCK_REALTIME_RES@h > - ori r5,r5,CLOCK_REALTIME_RES@l > stw r3,TSPC32_TV_SEC(r4) > stw r5,TSPC32_TV_NSEC(r4) > blr > Thank you for this, I will update my code accordingly before posting v2. > Christophe > >> >> /* >> @@ -175,6 +182,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) >> */ >> 99: >> li r0,__NR_clock_getres >> + .cfi_restore lr >> sc >> blr >> .cfi_endproc >> diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S >> b/arch/powerpc/kernel/vdso64/gettimeofday.S >> index a4ed9edfd5f0..a7e49bddd475 100644 >> --- a/arch/powerpc/kernel/vdso64/gettimeofday.S >> +++ b/arch/powerpc/kernel/vdso64/gettimeofday.S >> @@ -190,14 +190,21 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) >> crorcr0*4+eq,cr0*4+eq,cr1*4+eq >> bne cr0,99f >> >> -li r3,0 >> -cmpldi cr0,r4,0 >> +mflrr12 >> + .cfi_register lr,r12 >> +mr r11, r4 >> +bl V_LOCAL_FUNC(__get_datapage) >> +lwz r5,CLOCK_REALTIME_RES(r3) >> +li r4,0 >> +cmpldi r11,0 /* check if res is NULL */ >> +beq 1f >> + >> +std r4,TSPC64_TV_SEC(r11) >> +std r5,TSPC64_TV_NSEC(r11) >> + >> +1: mtlrr12 >> crclr cr0*4+so >> -beqlr >> -lis r5,CLOCK_REALTIME_RES@h >> -ori r5,r5,CLOCK_REALTIME_RES@l >> -std r3,TSPC64_TV_SEC(r4) >> -std r5,TSPC64_TV_NSEC(r4) >> +li r3,0 >> blr > > The same type of simplification applies here too. > > Christophe > > >> >> /* >> @@ -205,6 +212,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) >> */ >> 99: >> li r0,__NR_clock_getres >> + .cfi_restore lr >> sc >> blr >> .cfi_endproc >> -- Regards, Vincenzo
Re: [PATCH 2/5] powerpc: Fix vDSO clock_getres()
Hi Christophe, thank you for your review. On 02/04/2019 06:54, Christophe Leroy wrote: > > > On 04/01/2019 11:51 AM, Vincenzo Frascino wrote: >> clock_getres in the vDSO library has to preserve the same behaviour >> of posix_get_hrtimer_res(). >> >> In particular, posix_get_hrtimer_res() does: >> sec = 0; >> ns = hrtimer_resolution; >> and hrtimer_resolution depends on the enablement of the high >> resolution timers that can happen either at compile or at run time. >> >> Fix the powerpc vdso implementation of clock_getres keeping a copy of >> hrtimer_resolution in vdso data and using that directly. >> >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Michael Ellerman >> Signed-off-by: Vincenzo Frascino >> --- >> arch/powerpc/include/asm/vdso_datapage.h | 2 ++ > > Conflicts with commit b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC > inconsistencies across Y2038") > Thanks for pointing this out, I will rebase my code on top of the latest version before reissuing v2. ... -- Regards, Vincenzo
Re: [PATCH stable v4.14 13/32] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
On Tue, 2019-04-02 at 17:19 +1100, Michael Ellerman wrote: > > Joakim Tjernlund writes: > > On Fri, 2019-03-29 at 22:26 +1100, Michael Ellerman wrote: > > > From: Diana Craciun > > > > > > commit ebcd1bfc33c7a90df941df68a6e5d4018c022fba upstream. > > > > > > Implement the barrier_nospec as a isync;sync instruction sequence. > > > The implementation uses the infrastructure built for BOOK3S 64. > > > > > > Signed-off-by: Diana Craciun > > > [mpe: Split out of larger patch] > > > Signed-off-by: Michael Ellerman > > > > What is the performanc impact of these spectre fixes? > > I've not seen any numbers from anyone. Thanks for getting back to me. > > It will depend on the workload, it's copy to/from user that is most > likely to show an impact. > > We have a context switch benchmark in > tools/testing/selftests/powerpc/benchmarks/context_switch.c. > > Running that with "--no-vector --no-altivec --no-fp --test=pipe" shows > about a 2.3% slow down vs booting with "nospectre_v1". > > > Can I compile it away? > > You can't actually, but you can disable it at runtime with > "nospectre_v1" on the kernel command line. > > We could make it a user selectable compile time option if you really > want it to be. I think yes. Considering that these patches are fairly untested and the impact in the wild unknown. Requiring systems to change their boot config over night is too fast. Jocke
[PATCH v10 18/18] LS1021A: dtsi: add ftm quad decoder entries
From: Patrick Havelange Add the 4 Quadrature counters for this board. Reviewed-by: Esben Haabendal Signed-off-by: Patrick Havelange Signed-off-by: William Breathitt Gray --- arch/arm/boot/dts/ls1021a.dtsi | 28 1 file changed, 28 insertions(+) diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi index ed0941292172..0168fb62590a 100644 --- a/arch/arm/boot/dts/ls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a.dtsi @@ -433,6 +433,34 @@ status = "disabled"; }; + counter0: counter@29d { + compatible = "fsl,ftm-quaddec"; + reg = <0x0 0x29d 0x0 0x1>; + big-endian; + status = "disabled"; + }; + + counter1: counter@29e { + compatible = "fsl,ftm-quaddec"; + reg = <0x0 0x29e 0x0 0x1>; + big-endian; + status = "disabled"; + }; + + counter2: counter@29f { + compatible = "fsl,ftm-quaddec"; + reg = <0x0 0x29f 0x0 0x1>; + big-endian; + status = "disabled"; + }; + + counter3: counter@2a0 { + compatible = "fsl,ftm-quaddec"; + reg = <0x0 0x2a0 0x0 0x1>; + big-endian; + status = "disabled"; + }; + gpio0: gpio@230 { compatible = "fsl,ls1021a-gpio", "fsl,qoriq-gpio"; reg = <0x0 0x230 0x0 0x1>; -- 2.21.0
[PATCH v10 17/18] counter: ftm-quaddec: Documentation: Add specific counter sysfs documentation
From: Patrick Havelange This adds documentation for the specific prescaler entry. Signed-off-by: Patrick Havelange Signed-off-by: William Breathitt Gray --- .../ABI/testing/sysfs-bus-counter-ftm-quaddec| 16 1 file changed, 16 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec diff --git a/Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec b/Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec new file mode 100644 index ..7d2e7b363467 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-counter-ftm-quaddec @@ -0,0 +1,16 @@ +What: /sys/bus/counter/devices/counterX/countY/prescaler_available +KernelVersion: 5.2 +Contact: linux-...@vger.kernel.org +Description: + Discrete set of available values for the respective Count Y + configuration are listed in this file. Values are delimited by + newline characters. + +What: /sys/bus/counter/devices/counterX/countY/prescaler +KernelVersion: 5.2 +Contact: linux-...@vger.kernel.org +Description: + Configure the prescaler value associated with Count Y. + On the FlexTimer, the counter clock source passes through a + prescaler (i.e. a counter). This acts like a clock + divider. -- 2.21.0