Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation
Le 29/01/2020 à 08:14, Thomas Gleixner a écrit : Andy Lutomirski writes: On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner wrote: Andy Lutomirski writes: On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy Would mul_u64_u64_shr() be a good alternative? Could we adjust it to assume the shift is less than 32? That function exists to benefit 32-bit arches. We'd want mul_u64_u32_shr() for this. The rules for mult and shift are: That's what I meant to type... Just that it does not work. The math is: ns = d->nsecs; // That's the nsec value shifted left by d->shift ns += ((cur - d->last) & d->mask) * mult; ns >>= d->shift; So we cannot use mul_u64_u32_shr() because we need the addition there before shifting. And no, we can't drop the fractional part of d->nsecs. Been there, done that, got sporadic time going backwards problems as a reward. Need to look at that again as stuff has changed over time. On x86 we enforce that mask is 64bit, so the & operation is not there, but due to the nasties of TSC we have that conditional if (cur > last) return (cur - last) * mult; return 0; Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as well, so you can spare that & operation there too. Yes, I did it already. It spares reading d->mast, that the main advantage. Christophe
Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation
Andy Lutomirski writes: > On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner wrote: >> >> Andy Lutomirski writes: >> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy >> > >> > Would mul_u64_u64_shr() be a good alternative? Could we adjust it to >> > assume the shift is less than 32? That function exists to benefit >> > 32-bit arches. >> >> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are: >> > > That's what I meant to type... Just that it does not work. The math is: ns = d->nsecs; // That's the nsec value shifted left by d->shift ns += ((cur - d->last) & d->mask) * mult; ns >>= d->shift; So we cannot use mul_u64_u32_shr() because we need the addition there before shifting. And no, we can't drop the fractional part of d->nsecs. Been there, done that, got sporadic time going backwards problems as a reward. Need to look at that again as stuff has changed over time. On x86 we enforce that mask is 64bit, so the & operation is not there, but due to the nasties of TSC we have that conditional if (cur > last) return (cur - last) * mult; return 0; Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as well, so you can spare that & operation there too. Thanks, tglx
Re: [PATCH] powerpc/xmon: Fix compile error in print_insn* functions
On Thu, 2020-01-23 at 01:04:55 UTC, Sukadev Bhattiprolu wrote: > >From 72a7497a8673c93a4b80aa4fc38b88a8e90aa650 Mon Sep 17 00:00:00 2001 > From: Sukadev Bhattiprolu > Date: Wed, 22 Jan 2020 18:57:18 -0600 > Subject: [PATCH 1/1] powerpc/xmon: Fix compile error in print_insn* functions > > Fix couple of compile errors I stumbled upon with CONFIG_XMON=y and > CONFIG_XMON_DISASSEMBLY=n > > Signed-off-by: Sukadev Bhattiprolu Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/493a55f1e7724dee5e71bc726d5b819292094587 cheers
Re: [PATCH v4] powerpc: use probe_user_read() and probe_user_write()
On Thu, 2020-01-23 at 17:30:47 UTC, Christophe Leroy wrote: > Instead of opencoding, use probe_user_read() to failessly read > a user location and probe_user_write() for writing to user. > > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/def0bfdbd6039e96a9eb2baaa4470b079daab0d4 cheers
Re: [PATCH] powerpc/papr_scm: Fix leaking 'bus_desc.provider_name' in some paths
On Wed, 2020-01-22 at 15:51:40 UTC, Vaibhav Jain wrote: > String 'bus_desc.provider_name' allocated inside > papr_scm_nvdimm_init() will leaks in case call to > nvdimm_bus_register() fails or when papr_scm_remove() is called. > > This minor patch ensures that 'bus_desc.provider_name' is freed in > error path for nvdimm_bus_register() as well as in papr_scm_remove(). > > Signed-off-by: Vaibhav Jain Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/5649607a8d0b0e019a4db14aab3de1e16c3a2b4f cheers
Re: [PATCH] powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning
On Mon, 2020-01-20 at 22:10:02 UTC, Tyrel Datwyler wrote: > From: Tyrel Datwyler > > Commit e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to > iommu_table") missed an iommu_table allocation in the pseries vio code. > The iommu_table is allocated with kzalloc and as a result the associated > kref gets a value of zero. This has the side effect that during a DLPAR > remove of the associated virtual IOA the iommu_tce_table_put() triggers > a use-after-free underflow warning. > > Call Trace: > [c002879e39f0] [c071ecb4] refcount_warn_saturate+0x184/0x190 > (unreliable) > [c002879e3a50] [c00500ac] iommu_tce_table_put+0x9c/0xb0 > [c002879e3a70] [c00f54e4] vio_dev_release+0x34/0x70 > [c002879e3aa0] [c087cfa4] device_release+0x54/0xf0 > [c002879e3b10] [c0d64c84] kobject_cleanup+0xa4/0x240 > [c002879e3b90] [c087d358] put_device+0x28/0x40 > [c002879e3bb0] [c07a328c] dlpar_remove_slot+0x15c/0x250 > [c002879e3c50] [c07a348c] remove_slot_store+0xac/0xf0 > [c002879e3cd0] [c0d64220] kobj_attr_store+0x30/0x60 > [c002879e3cf0] [c04ff13c] sysfs_kf_write+0x6c/0xa0 > [c002879e3d10] [c04fde4c] kernfs_fop_write+0x18c/0x260 > [c002879e3d60] [c0410f3c] __vfs_write+0x3c/0x70 > [c002879e3d80] [c0415408] vfs_write+0xc8/0x250 > [c002879e3dd0] [c04157dc] ksys_write+0x7c/0x120 > [c002879e3e20] [c000b278] system_call+0x5c/0x68 > > Further, since the refcount was always zero the iommu_tce_table_put() > fails to call the iommu_table release function resulting in a leak. > > Fix this issue be initilizing the iommu_table kref immediately after > allocation. > > Fixes: e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to > iommu_table") > Signed-off-by: Tyrel Datwyler Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/aff8c8242bc638ba57247ae1ec5f272ac3ed3b92 cheers
Re: [PATCH] selftests/eeh: Bump EEH wait time to 60s
On Wed, 2020-01-22 at 03:11:25 UTC, Oliver O'Halloran wrote: > Some newer cards supported by aacraid can take up to 40s to recover > after an EEH event. This causes spurious failures in the basic EEH > self-test since the current maximim timeout is only 30s. > > Fix the immediate issue by bumping the timeout to a default of 60s, > and allow the wait time to be specified via an environmental variable > (EEH_MAX_WAIT). > > Reported-by: Steve Best > Suggested-by: Douglas Miller > Signed-off-by: Oliver O'Halloran Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/414f50434aa2463202a5b35e844f4125dd1a7101 cheers
Re: [PATCH 1/5] powerpc/32: add support of KASAN_VMALLOC
On Tue, 2020-01-14 at 17:54:00 UTC, Christophe Leroy wrote: > Add support of KASAN_VMALLOC on PPC32. > > To allow this, the early shadow covering the VMALLOC space > need to be removed once high_memory var is set and before > freeing memblock. > > And the VMALLOC area need to be aligned such that boundaries > are covered by a full shadow page. > > Signed-off-by: Christophe Leroy Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/3d4247fcc938d0ab5cf6fdb752dae07fdeab9736 cheers
Re: [PATCH -next] powerpc/maple: fix comparing pointer to 0
On Tue, 2020-01-21 at 01:31:53 UTC, Chen Zhou wrote: > Fixes coccicheck warning: > ./arch/powerpc/platforms/maple/setup.c:232:15-16: > WARNING comparing pointer to 0 > > Compare pointer-typed values to NULL rather than 0. > > Signed-off-by: Chen Zhou Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1e3531982ee70adf1880715a968d9c3365f321ed cheers
Re: [PATCH v2] Fix display of Maximum Memory
On Wed, 2020-01-15 at 14:53:59 UTC, Michael Bringmann wrote: > Correct overflow problem in calculation+display of Maximum Memory > value to syscfg where 32bits is insufficient. > > Signed-off-by: Michael Bringmann Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/f1dbc1c5c70d0d4c60b5d467ba941fba167c12f6 cheers
Re: [PATCH] MAINTAINERS: Add myself as maintainer of ehv_bytechan tty driver
On Tue, 2020-01-14 at 11:00:25 UTC, Laurentiu Tudor wrote: > Michael Ellerman made a call for volunteers from NXP to maintain > this driver and I offered myself. > > Signed-off-by: Laurentiu Tudor Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/173bf44bdfc768af3c07cd0aeeb6ad8d1331b77d cheers
Re: [PATCH v2] powerpc/ptdump: don't entirely rebuild kernel when selecting CONFIG_PPC_DEBUG_WX
On Tue, 2020-01-14 at 07:14:40 UTC, Christophe Leroy wrote: > Selecting CONFIG_PPC_DEBUG_WX only impacts ptdump and pgtable_32/64 > init calls. Declaring related functions in asm/pgtable.h implies > rebuilding almost everything. > > Move ptdump_check_wx() declaration in mm/mmu_decl.h > > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1e1c8b2cc37afb333c1829e8e0360321813bf220 cheers
Re: [PATCH] powerpc/ptdump: only enable PPC_CHECK_WX with STRICT_KERNEL_RWX
On Tue, 2020-01-14 at 08:13:10 UTC, Christophe Leroy wrote: > ptdump_check_wx() is called from mark_rodata_ro() which only exists > when CONFIG_STRICT_KERNEL_RWX is selected. > > Signed-off-by: Christophe Leroy > Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/f509247b08f2dcf7754d9ed85ad69a7972aa132b cheers
Re: [PATCH] powerpc/ptdump: fix W+X verification
On Tue, 2020-01-14 at 08:13:09 UTC, Christophe Leroy wrote: > Verification cannot rely on simple bit checking because on some > platforms PAGE_RW is 0, checking that a page is not W means > checking that PAGE_RO is set instead of checking that PAGE_RW > is not set. > > Use pte helpers instead of checking bits. > > Signed-off-by: Christophe Leroy > Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/d80ae83f1f932ab7af47b54d0d3bef4f4dba489f cheers
Re: [PATCH] powerpc/ptdump: fix W+X verification call in mark_rodata_ro()
On Tue, 2020-01-14 at 08:13:08 UTC, Christophe Leroy wrote: > ptdump_check_wx() also have to be called when pages are mapped > by blocks. > > Signed-off-by: Christophe Leroy > Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/e26ad936dd89d79f66c2b567f700e0c2a7103070 cheers
Re: [PATCH 1/5] powerpc/pci: Fold pcibios_setup_device() into pcibios_bus_add_device()
On Fri, 2020-01-10 at 07:02:03 UTC, Oliver O'Halloran wrote: > pcibios_bus_add_device() is the only caller of pcibios_setup_device(). > Fold them together since there's no real reason to keep them separate. > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/3ab3f3c9df348324029e3fbdf381f551b1df8f1e cheers
Re: [PATCH] powerpc/papr_scm: Don't enable direct map for a region by default
On Wed, 2020-01-08 at 06:46:47 UTC, "Aneesh Kumar K.V" wrote: > Setting ND_REGION_PAGEMAP flag implies namespace mode defaults to fsdax mode. > This also means kernel ends up creating struct page backing for these namspace > ranges. With large namespaces that is not the right thing to do. We > should let the user select the mode he/she wants the namespace to be created > with. > > Hence disable ND_REGION_PAGEMAP for papr_scm regions. We still keep the flag > for > of_pmem because it supports only small persistent memory regions. > > This is similar to what is done for x86 with commit > commit: 004f1afbe199 ("libnvdimm, pmem: direct map legacy pmem by default") > > Signed-off-by: Aneesh Kumar K.V Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/7e6f8cbc5e10cf7601c762db267b795273d53078 cheers
Re: [PATCH] powerpc/pseries: in lmb_is_removable(), advance pfn if section is not present
On Fri, 2020-01-10 at 04:54:02 UTC, Pingfan Liu wrote: > In lmb_is_removable(), if a section is not present, it should continue to > test the rest sections in the block. But the current code fails to do so. > > Signed-off-by: Pingfan Liu > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > To: linuxppc-dev@lists.ozlabs.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/fbee6ba2dca30d302efe6bddb3a886f5e964a257 cheers
Re: powerpc/xmon: don't access ASDR in VMs
On Tue, 2020-01-07 at 02:16:33 UTC, Sukadev Bhattiprolu wrote: > >From 91a77dbea3c909ff15c66cded37f1334304a293d Mon Sep 17 00:00:00 2001 > From: Sukadev Bhattiprolu > Date: Mon, 6 Jan 2020 13:50:02 -0600 > Subject: [PATCH 1/1] powerpc/xmon: don't access ASDR in VMs > > ASDR is HV-privileged and must only be accessed in HV-mode. > Fixes a Program Check (0x700) when xmon in a VM dumps SPRs. > > Signed-off-by: Sukadev Bhattiprolu Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/c2a20711fc181e7f22ee5c16c28cb9578af84729 cheers
Re: [PATCH v3 1/2] powerpc32/booke: consistently return phys_addr_t in __pa()
On Mon, 2020-01-06 at 04:29:53 UTC, yingjie_...@126.com wrote: > From: Bai Yingjie > > When CONFIG_RELOCATABLE=y is set, VIRT_PHYS_OFFSET is a 64bit variable, > thus __pa() returns as 64bit value. > But when CONFIG_RELOCATABLE=n, __pa() returns 32bit value. > > When CONFIG_PHYS_64BIT is set, __pa() should consistently return as > 64bit value irrelevant to CONFIG_RELOCATABLE. > So we'd make __pa() consistently return phys_addr_t, which is 64bit > when CONFIG_PHYS_64BIT is set. > > Signed-off-by: Bai Yingjie Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/6ad4afc97bc6c5cca9786030492ddfab871ce79e cheers
Re: [PATCH 10/10] powerpc/powernv: use resource_size
On Wed, 2020-01-01 at 17:49:50 UTC, Julia Lawall wrote: > Use resource_size rather than a verbose computation on > the end and start fields. > > The semantic patch that makes these changes is as follows: > (http://coccinelle.lip6.fr/) > > > @@ struct resource ptr; @@ > - (ptr.end - ptr.start + 1) > + resource_size(&ptr) > > > Signed-off-by: Julia Lawall Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/552aa086944a9aeabd599892007c2c7faedb894e cheers
Re: [PATCH 05/10] powerpc/83xx: use resource_size
On Wed, 2020-01-01 at 17:49:45 UTC, Julia Lawall wrote: > Use resource_size rather than a verbose computation on > the end and start fields. > > The semantic patch that makes this change is as follows: > (http://coccinelle.lip6.fr/) > > > @@ struct resource ptr; @@ > - (ptr.end - ptr.start + 1) > + resource_size(&ptr) > > > Signed-off-by: Julia Lawall Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/bfbe37f0ce994e7a9945653d7624fadc5c500a9f cheers
Re: [PATCH 1/4] misc: cxl: use mmgrab
On Sun, 2019-12-29 at 15:42:55 UTC, Julia Lawall wrote: > Mmgrab was introduced in commit f1f1007644ff ("mm: add new mmgrab() > helper") and most of the kernel was updated to use it. Update a > remaining file. > > The semantic patch that makes this change is as follows: > (http://coccinelle.lip6.fr/) > > > @@ expression e; @@ > - atomic_inc(&e->mm_count); > + mmgrab(e); > > > Signed-off-by: Julia Lawall Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/30e813cf46ccaeea6508607632e49b4a1d743d2a cheers
Re: [PATCH 09/16] powerpc/mpic: constify copied structure
On Wed, 2020-01-01 at 07:43:27 UTC, Julia Lawall wrote: > The mpic_ipi_chip and mpic_irq_ht_chip structures are only copied > into other structures, so make them const. > > The opportunity for this change was found using Coccinelle. > > Signed-off-by: Julia Lawall Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/5084ff33cac0988c1b979814501dcc2e1ecbf9c0 cheers
Re: [PATCH] powerpc/mm: don't log user reads to 0xffffffff
On Mon, 2019-12-23 at 07:54:22 UTC, Christophe Leroy wrote: > Running vdsotest leaves many times the following log: > > [ 79.629901] vdsotest[396]: User access of kernel address () - > exploit attempt? (uid: 0) > > A pointer set to (-1) is likely a programming error similar to > a NULL pointer and is not worth logging as an exploit attempt. > > Don't log user accesses to 0x. > > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/0f9aee0cb9da7db7d96f63cfa2dc5e4f1bffeb87 cheers
Re: [PATCH 1/2] powerpc/book3s64/hash: Disable 16M linear mapping size if not aligned
On Tue, 2019-12-24 at 06:41:25 UTC, Russell Currey wrote: > With STRICT_KERNEL_RWX on in a relocatable kernel under the hash MMU, if > the position the kernel is loaded at is not 16M aligned, the kernel > miscalculates its ALIGN*()s and things go horribly wrong. > > We can easily avoid this when selecting the linear mapping size, so do > so and print a warning. I tested this for various alignments and as > long as the position is 64K aligned it's fine (the base requirement for > powerpc). > > Signed-off-by: Russell Currey Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/970d54f99ceac5bbf27929cb5ebfe18338ba1543 cheers
Re: [PATCH v5 01/17] powerpc/32: replace MTMSRD() by mtmsr
On Sat, 2019-12-21 at 08:32:22 UTC, Christophe Leroy wrote: > On PPC32, MTMSRD() is simply defined as mtmsr. > > Replace MTMSRD(reg) by mtmsr reg in files dedicated to PPC32, > this makes the code less obscure. > > Signed-off-by: Christophe Leroy Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/39bccfd164970557c5cfc60b2db029f70542549f cheers
Re: [PATCH v3] powerpc/mm: Remove kvm radix prefetch workaround for Power9 DD2.2
On Fri, 2019-12-06 at 03:17:22 UTC, Jordan Niethe wrote: > Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with > KVM") introduced a number of workarounds as coming out of a guest with > the mmu enabled would make the cpu would start running in hypervisor > state with the PID value from the guest. The cpu will then start > prefetching for the hypervisor with that PID value. > > In Power9 DD2.2 the cpu behaviour was modified to fix this. When > accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will > not be performed. This means that we can get rid of the workarounds for > Power9 DD2.2 and later revisions. Add a new cpu feature > CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed. > > Signed-off-by: Jordan Niethe Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/736bcdd3a9fc672af33fb83230ecd0570ec38ec6 cheers
Re: [PATCH v4 1/8] powerpc/32: Add VDSO version of getcpu on non SMP
On Mon, 2019-12-02 at 07:57:27 UTC, Christophe Leroy wrote: > Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added > getcpu() for PPC64 only, by making use of a user readable general > purpose SPR. > > PPC32 doesn't have any such SPR. > > For non SMP, just return CPU id 0 from the VDSO directly. > PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0. > > Before the patch, vdsotest reported: > getcpu: syscall: 1572 nsec/call > getcpu:libc: 1787 nsec/call > getcpu:vdso: not tested > > Now, vdsotest reports: > getcpu: syscall: 1582 nsec/call > getcpu:libc: 502 nsec/call > getcpu:vdso: 187 nsec/call > > Signed-off-by: Christophe Leroy Patches 1, 2 and 4-8, applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/902137ba8e469ed07c7f120a390161937a6288fb cheers
Re: [PATCH] powerpc/devicetrees: Change 'gpios' to 'cs-gpios' on fsl, spi nodes
On Thu, 2019-11-28 at 12:16:35 UTC, Christophe Leroy wrote: > Since commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO > descriptors"), the prefered way to define chipselect GPIOs is using > 'cs-gpios' property instead of the legacy 'gpios' property. > > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/8c452a889821ca0cd2a5f2e3e87fbc01e56408cb cheers
Re: [PATCH v2] powerpc/8xx: Fix permanently mapped IMMR region.
On Tue, 2019-11-26 at 13:16:50 UTC, Christophe Leroy wrote: > When not using large TLBs, the IMMR region is still > mapped as a whole block in the FIXMAP area. > > Properly report that the IMMR region is block-mapped even > when not using large TLBs. > > Signed-off-by: Christophe Leroy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/991d656d722dbc783481f408d6e4cbcce2e8bb78 cheers
Re: [PATCH v2 1/2] powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size.
On Tue, 2019-11-26 at 17:43:29 UTC, Christophe Leroy wrote: > Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but > it has a breakpoint support based on a set of comparators which > allow more flexibility. > > Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint") > implemented breakpoints by emulating the DABR behaviour. It did > this by setting one comparator the match 4 bytes at breakpoint address > and the other comparator to match 4 bytes at breakpoint address + 4. > > Rewrite 8xx hw_breakpoint to make breakpoints match all addresses > defined by the breakpoint address and length by making full use of > comparators. > > Now, comparator E is set to match any address greater than breakpoint > address minus one. Comparator F is set to match any address lower than > breakpoint address plus breakpoint length. Addresses are aligned > to 32 bits. > > When the breakpoint range starts at address 0, the breakpoint is set > to match comparator F only. When the breakpoint range end at address > 0x, the breakpoint is set to match comparator E only. > Otherwise the breakpoint is set to match comparator E and F. > > At the same time, use registers bit names instead of hardcoded values. > > Signed-off-by: Christophe Leroy > Cc: Ravi Bangoria Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/39413ae009674c6ba745850515b551bbb9d6374b cheers
Re: [PATCH] macintosh: Fix Kconfig indentation
On Wed, 2019-11-20 at 13:41:15 UTC, Krzysztof Kozlowski wrote: > Adjust indentation from spaces to tab (+optional two spaces) as in > coding style with command like: > $ sed -e 's/^/\t/' -i */Kconfig > > Signed-off-by: Krzysztof Kozlowski Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/25dd118f4b2773490df12b9d190c1956cf3250c3 cheers
Re: [PATCH v2 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE
On Thu, 2019-11-21 at 13:49:08 UTC, Frederic Barrat wrote: > The pci_dn structure used to store a pointer to the struct pci_dev, so > taking a reference on the device was required. However, the pci_dev > pointer was later removed from the pci_dn structure, but the reference > was kept for the npu device. > See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary > pcidev from pci_dn"). > > We don't need to take a reference on the device when assigning the PE > as the struct pnv_ioda_pe is cleaned up at the same time as > the (physical) device is released. Doing so prevents the device from > being released, which is a problem for opencapi devices, since we want > to be able to remove them through PCI hotplug. > > Now the ugly part: nvlink npu devices are not meant to be > released. Because of the above, we've always leaked a reference and > simply removing it now is dangerous and would likely require more > work. There's currently no release device callback for nvlink devices > for example. So to be safe, this patch leaks a reference on the npu > device, but only for nvlink and not opencapi. > > CC: a...@ozlabs.ru > CC: ooh...@gmail.com > Signed-off-by: Frederic Barrat Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/05dd7da76986937fb288b4213b1fa10dbe0d1b33 cheers
Re: [PATCH] powerpc/xive: Drop extern qualifiers from header function prototypes
On Fri, 2019-11-15 at 18:10:58 UTC, Greg Kurz wrote: > As reported by ./scripts/checkpatch.pl --strict: > > CHECK: extern prototypes should be avoided in .h files > > Signed-off-by: Greg Kurz Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/b059c63620fbba8a5da60f01d99d003681447e3c cheers
Re: [PATCH v2 1/2] powerpc/powernv: Rework exports to support subnodes
On Fri, 2019-11-01 at 06:26:10 UTC, Oliver O'Halloran wrote: > Originally we only had a handful of exported memory ranges, but we'd to > export the per-core trace buffers. This results in a lot of files in the > exports directory which is a but unfortunate. We can clean things up a bit > by turning subnodes into subdirectories of the exports directory. > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/db93361260e2411f854514e8b54b6c31a5b2d5bd cheers
Re: [PATCH 1/2] powerpc/xmon: Allow passing an argument to ppc_md.restart()
On Fri, 2019-11-01 at 08:55:21 UTC, Oliver O'Halloran wrote: > On PowerNV a few different kinds of reboot are supported. We'd like to be > able to exercise these from xmon so allow 'zr' to take an argument, and > pass that to the ppc_md.restart() function. > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/2d9b332d99b24f27f77e9ba38ce3c8beb11a81c0 cheers
Re: [PATCH 1/3] powernv/pci: Use pnv_phb as the private data for debugfs entries
On Thu, 2019-09-12 at 05:29:43 UTC, Oliver O'Halloran wrote: > Use the pnv_phb structure as the private data pointer for the debugfs > files. This lets us delete some code and an open-coded use of > hose->private_data. > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/22ba7289079be12c85895fee41602139e9553c93 cheers
Re: [PATCH] powerpc/eeh: Only dump stack once if an MMIO loop is detected
On Wed, 2019-10-16 at 01:25:36 UTC, Oliver O'Halloran wrote: > Many drivers don't check for errors when they get a 0xFFs response from an > MMIO load. As a result after an EEH event occurs a driver can get stuck in > a polling loop unless it some kind of internal timeout logic. > > Currently EEH tries to detect and report stuck drivers by dumping a stack > trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an > already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump > would occur every few seconds if the driver was spinning in a loop. This > results in a lot of spurious stack traces in the kernel log. > > Fix this by limiting it to printing one stack trace for each PE freeze. If > the driver is truely stuck the kernel's hung task detector is better suited > to reporting the probelm anyway. > > Cc: Sam Bobroff > Signed-off-by: Oliver O'Halloran Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/4e0942c0302b5ad76b228b1a7b8c09f658a1d58a cheers
Re: [PATCH 1/3] powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
On Wed, 2019-08-21 at 06:26:53 UTC, Oliver O'Halloran wrote: > When disabling virtual functions on an SR-IOV adapter we currently do not > correctly remove the EEH state for the now-dead virtual functions. When > removing the pci_dn that was created for the VF when SR-IOV was enabled > we free the corresponding eeh_dev without removing it from the child device > list of the eeh_pe that contained it. This can result in crashes due to the > use-after-free. > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1fb4124ca9d456656a324f1ee29b7bf942f59ac8 cheers
Re: [PATCH 1/2] powerpc/64s: remplement power4_idle code in C
On Thu, 2019-07-11 at 02:24:03 UTC, Nicholas Piggin wrote: > This implements the tricky tracing and soft irq handling bits in C, > leaving the low level bit to asm. > > A functional difference is that this redirects the interrupt exit to > a return stub to execute blr, rather than the lr address itself. This > is probably barely measurable on real hardware, but it keeps the link > stack balanced. > > Tested with QEMU. > > Signed-off-by: Nicholas Piggin Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/ed0bc98f8cbe4f8254759d333a47aedc816ff8c5 cheers
Re: [PATCH 1/5] powerpc/eeh_cache: Don't use pci_dn when inserting new ranges
On Mon, 2019-07-15 at 08:56:08 UTC, Oliver O'Halloran wrote: > At the point where we start inserting ranges into the EEH address cache the > binding between pci_dev and eeh_dev has already been set up. Instead of > consulting the pci_dn tree we can retrieve the eeh_dev directly using > pci_dev_to_eeh_dev(). > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/b1268f4cdba71c0cd40b533778812340d36de8ae cheers
Re: [PATCH] KVM: PPC: Book3S HV: XIVE: Fix typo in comment
On Wed, 2019-07-03 at 22:03:19 UTC, Greg Kurz wrote: > Cc: triv...@kernel.org > Signed-off-by: Greg Kurz Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/6a3163212f311daaf2ca3b676db2e11cfd81c6b3 cheers
Re: powerpc Linux scv support and scv system call ABI proposal
Adhemerval Zanella's on January 29, 2020 3:26 am: > > > On 28/01/2020 11:05, Nicholas Piggin wrote: >> Florian Weimer's on January 28, 2020 11:09 pm: >>> * Nicholas Piggin: >>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other vectors will return -ENOSYS, and the decision for how to add support for a new vector deferred until we see the next user. >>> >>> Seems reasonable. We don't have to decide this today. >>> * Proposal is for scv 0 to provide the standard Linux system call ABI with some differences: - LR is volatile across scv calls. This is necessary for support because the scv instruction clobbers LR. >>> >>> I think we can express this in the glibc system call assembler wrapper >>> generators. The mcount profiling wrappers already have this property. >>> >>> But I don't think we are so lucky for the inline system calls. GCC >>> recognizes an "lr" clobber with inline asm (even though it is not >>> documented), but it generates rather strange assembler output as a >>> result: >>> >>> long >>> f (long x) >>> { >>> long y; >>> asm ("#" : "=r" (y) : "r" (x) : "lr"); >>> return y; >>> } >>> >>> .abiversion 2 >>> .section".text" >>> .align 2 >>> .p2align 4,,15 >>> .globl f >>> .type f, @function >>> f: >>> .LFB0: >>> .cfi_startproc >>> mflr 0 >>> .cfi_register 65, 0 >>> #APP >>> # 5 "t.c" 1 >>> # >>> # 0 "" 2 >>> #NO_APP >>> std 0,16(1) >>> .cfi_offset 65, 16 >>> ori 2,2,0 >>> ld 0,16(1) >>> mtlr 0 >>> .cfi_restore 65 >>> blr >>> .long 0 >>> .byte 0,0,0,1,0,0,0,0 >>> .cfi_endproc >>> .LFE0: >>> .size f,.-f >>> >>> >>> That's with GCC 8.3 at -O2. I don't understand what the ori is about. >> >> ori 2,2,0 is the group terminating nop hint for POWER8 type cores >> which had dispatch grouping rules. > > It worth to note that it aims to mitigate a load-hit-store cpu stall > on some powerpc chips. > >> >>> >>> I don't think we can save LR in a regular register around the system >>> call, explicitly in the inline asm statement, because we still have to >>> generate proper unwinding information using CFI directives, something >>> that you cannot do from within the asm statement. >>> >>> Supporting this in GCC should not be impossible, but someone who >>> actually knows this stuff needs to look at it. >> >> The generated assembler actually seems okay to me. If we compile >> something like a syscall and with -mcpu=power9: >> >> long >> f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0) >> { >> register long r0 asm ("r0") = _r0; >> register long r3 asm ("r3") = _r3; >> register long r4 asm ("r4") = _r4; >> register long r5 asm ("r5") = _r5; >> register long r6 asm ("r6") = _r6; >> register long r7 asm ("r7") = _r7; >> register long r8 asm ("r8") = _r8; >> >> asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), >> "r"(r8) : "lr", "ctr", "cc", "xer"); >> >> return r3; >> } >> >> >> f: >> .LFB0: >> .cfi_startproc >> mflr 0 >> std 0,16(1) >> .cfi_offset 65, 16 >> mr 0,9 >> #APP >> # 12 "a.c" 1 >> # scv >> # 0 "" 2 >> #NO_APP >> ld 0,16(1) >> mtlr 0 >> .cfi_restore 65 >> blr >> .long 0 >> .byte 0,0,0,1,0,0,0,0 >> .cfi_endproc >> >> That gets the LR save/restore right when we're also using r0. >> >>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the system call exit to avoid restoring the CR register. >>> >>> This sounds reasonable, but I don't know what kind of knock-on effects >>> this has. The inline system call wrappers can handle this with minor >>> tweaks. >> >> Okay, good. In the end we would have to check code trace through the >> kernel and libc of course, but I think there's little to no opportunity >> to take advantage of current extra non-volatile cr regs. >> >> mtcr has to write 8 independently renamed registers so it's cracked into >> 2 insns on POWER9 (and likely to always be a bit troublesome). It's not >> much in the scheme of a system call, but while we can tweak the ABI... > > We don't really need a mfcr/mfocr to implement the Linux syscall ABI on > powerpc, we can use a 'bns+' plus a neg instead as: > > -- > #define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5, \ > arg6) \ > ({\ > register long int r0 __asm__ ("r0") = (long int) (name); \ > register long int r3 __asm__ ("r3") = (long int) (arg1); \ > register long int r4 __asm__ ("r4") = (long int) (arg2); \ > register long int r5 __asm__ ("r5") = (long int) (arg3); \ > register long int r6 __asm__ ("r6") = (long int) (arg4);
Re: powerpc Linux scv support and scv system call ABI proposal
Florian Weimer's on January 29, 2020 1:58 am: > * Nicholas Piggin: > >> That gets the LR save/restore right when we're also using r0. > > Yes, I agree it looks good. Nice. > >>> But the kernel uses the -errno convention internally, so I think it >>> would make sense to pass this to userspace and not convert back and >>> forth. This would align with what most of the architectures do, and >>> also avoids the GCC oddity. >> >> Yes I would be interested in opinions for this option. It seems like >> matching other architectures is a good idea. Maybe there are some >> reasons not to. > > If there were a POWER-specific system call that uses all result bits and > doesn't have room for the 4096 error states (or an error number that's > outside that range), that would be a blocker. I can't find such a > system call wrapped in the glibc sources. Nothing apparent in the kernel sources either. > musl's inline syscalls always > convert the errno state to -errno, so it's not possible to use such a > system call there. > - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit calls if there was interest in developing an ABI for 32-bit programs. Marginal benefit in avoiding compat syscall selection. >>> >>> We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to >>> provide an ELFv1 port for this given that it's POWER9-specific. >> >> Okay. There's no reason not to enable this for BE, at least for the >> kernel it's no additional work so it probably remains enabled (unless >> there is something really good we could do with the ABI if we exclude >> ELFv1 but I don't see anything). >> >> But if glibc only builds for ELFv2 support that's probably reasonable. > > To be clear, we still support ELFv1 for POWER, but given that this > feature is POWER9 and later, I expect the number of users benefiting > from 32-bit support (or ELFv1 and thus big-endian support) to be quite > small. > > Especially if we go the conditional branch route, I would restrict this > to ELFv2 in glibc, at least for default builds. > >>> From the glibc perspective, the major question is how we handle run-time >>> selection of the system call instruction sequence. On i386, we use a >>> function pointer in the TCB to call an instruction sequence in the vDSO. >>> That's problematic from a security perspective. I expect that on >>> POWER9, using a pointer in read-only memory would be equally >>> non-attractive due to a similar lack of PC-relative addressing. We >>> could use the HWCAP bit in the TCB, but that would add another (easy to >>> predict) conditional branch to every system call. >> >> I would have to defer to glibc devs on this. Conditional branch >> should be acceptable I think, scv improves speed as much as several >> mispredicted branches (about 90 cycles). > > But we'd have to pay for that branch (and likely the LR clobber) on > legacy POWER, and that's something to consider, too. We would that's true. > Is there an additional performance hit if a process uses both the old > and new system call sequence? No state or logic required to switch between them or run them concurrently. Just the extra instruction footprint. Thanks, Nick
[PATCH RESEND] powerpc: indent to improve Kconfig readability
From: Randy Dunlap Indent a Kconfig continuation line to improve readability. Signed-off-by: Randy Dunlap Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-next-20200128.orig/arch/powerpc/Kconfig +++ linux-next-20200128/arch/powerpc/Kconfig @@ -478,7 +478,7 @@ config MPROFILE_KERNEL config HOTPLUG_CPU bool "Support for enabling/disabling CPUs" depends on SMP && (PPC_PSERIES || \ - PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE) + PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE) help Say Y here to be able to disable and re-enable individual CPUs at runtime on SMP machines.
Re: [PATCH v2 07/10] powerpc/configs/skiroot: Enable security features
Joel Stanley writes: > On Tue, 21 Jan 2020 at 04:30, Michael Ellerman wrote: >> >> From: Joel Stanley >> >> This turns on HARDENED_USERCOPY with HARDENED_USERCOPY_PAGESPAN, and >> FORTIFY_SOURCE. >> >> It also enables SECURITY_LOCKDOWN_LSM with _EARLY and >> LOCK_DOWN_KERNEL_FORCE_INTEGRITY options enabled. This still allows >> xmon to be used in read-only mode. >> >> MODULE_SIG is selected by lockdown, so it is still enabled. >> >> Signed-off-by: Joel Stanley >> [mpe: Switch to lockdown integrity mode per oohal] >> Signed-off-by: Michael Ellerman > > I did some testing and with change we break kexec. As it's critical > for this kernel to be able to kexec we need to set KEXEC_FILE=y if > we're setting FORCE_INTEGRITY=y. > > I've tested your series with that modification made and userspace was > once again able to kexec (with -s). Has the changes that enable this landed in kexec-lite and petitboot yet? I had to manually patch them when I was experimenting with it recently... Regards, Daniel > > Cheers, > > Joel > >> --- >> arch/powerpc/configs/skiroot_defconfig | 11 ++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> v2: Switch to lockdown integrity mode rather than confidentiality as noticed >> by >> dja and discussed with jms and oohal. >> >> diff --git a/arch/powerpc/configs/skiroot_defconfig >> b/arch/powerpc/configs/skiroot_defconfig >> index 24a210fe0049..93b478436a2b 100644 >> --- a/arch/powerpc/configs/skiroot_defconfig >> +++ b/arch/powerpc/configs/skiroot_defconfig >> @@ -49,7 +49,6 @@ CONFIG_JUMP_LABEL=y >> CONFIG_STRICT_KERNEL_RWX=y >> CONFIG_MODULES=y >> CONFIG_MODULE_UNLOAD=y >> -CONFIG_MODULE_SIG=y >> CONFIG_MODULE_SIG_FORCE=y >> CONFIG_MODULE_SIG_SHA512=y >> CONFIG_PARTITION_ADVANCED=y >> @@ -272,6 +271,16 @@ CONFIG_NLS_ASCII=y >> CONFIG_NLS_ISO8859_1=y >> CONFIG_NLS_UTF8=y >> CONFIG_ENCRYPTED_KEYS=y >> +CONFIG_SECURITY=y >> +CONFIG_HARDENED_USERCOPY=y >> +# CONFIG_HARDENED_USERCOPY_FALLBACK is not set >> +CONFIG_HARDENED_USERCOPY_PAGESPAN=y >> +CONFIG_FORTIFY_SOURCE=y >> +CONFIG_SECURITY_LOCKDOWN_LSM=y >> +CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y >> +CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y >> +# CONFIG_INTEGRITY is not set >> +CONFIG_LSM="yama,loadpin,safesetid,integrity" >> # CONFIG_CRYPTO_HW is not set >> CONFIG_CRC16=y >> CONFIG_CRC_ITU_T=y >> -- >> 2.21.1 >>
Re: [PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup
Scott Cheloha writes: > LMB lookup is currently an O(n) linear search. This scales poorly when > there are many LMBs. > > If we cache each LMB by both its base address and its DRC index > in an xarray we can cut lookups to O(log n), greatly accelerating > drmem initialization and memory hotplug. > > This patch introduces two xarrays of of LMBs and fills them during > drmem initialization. The patch also adds two interfaces for LMB > lookup. Good but can you replace the array of LMBs altogether (drmem_info->lmbs)? xarray allows iteration over the members if needed.
[PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup
LMB lookup is currently an O(n) linear search. This scales poorly when there are many LMBs. If we cache each LMB by both its base address and its DRC index in an xarray we can cut lookups to O(log n), greatly accelerating drmem initialization and memory hotplug. This patch introduces two xarrays of of LMBs and fills them during drmem initialization. The patch also adds two interfaces for LMB lookup. The first interface, drmem_find_lmb_by_base_addr(), is employed in hot_add_drconf_scn_to_nid() to replace a linear search. This speeds up memory_add_physaddr_to_nid(), which is called by lmb_set_nid(), an interface used during drmem initialization and memory hotplug. The second interface, drmem_find_lmb_by_drc_index(), is employed in get_lmb_range() to replace a linear search. This speeds up dlpar_memory_add_by_ic() and dlpar_memory_remove_by_ic(), interfaces used during memory hotplug. These substitutions yield significant improvements: 1. A POWER9 VM with a maximum memory of 10TB and 256MB LMBs has 40960 LMBs. With this patch it completes drmem_init() ~1138ms faster. Before: [0.542244] drmem: initializing drmem v1 [1.768787] drmem: initialized 40960 LMBs After: [0.543611] drmem: initializing drmem v1 [0.631386] drmem: initialized 40960 LMBs 2. A POWER9 VM with a maximum memory of 4TB and 256MB LMBs has 16384 LMBs. Via the qemu monitor we can hot-add memory as virtual DIMMs. Each DIMM is 256 LMBs. With this patch we hot-add every possible LMB about 60 seconds faster. Before: [ 17.422177] pseries-hotplug-mem: Attempting to hot-add 256 LMB(s) at index 8100 [...] [ 167.285563] pseries-hotplug-mem: Memory at 3fff000 (drc index 80003fff) was hot-added After: [ 14.753480] pseries-hotplug-mem: Attempting to hot-add 256 LMB(s) at index 8100 [...] [ 103.934092] pseries-hotplug-mem: Memory at 3fff000 (drc index 80003fff) was hot-added Signed-off-by: Scott Cheloha --- These linear searches become a serious bottleneck as the machine approaches 64TB. There are just too many LMBs to use a linear search. On a 60TB machine we recently saw the following soft lockup during drmem_init(): [ 60.602386] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [swapper/0:1] [ 60.602414] Modules linked in: [ 60.602417] Supported: No, Unreleased kernel [ 60.602423] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.3.18-2-default #1 SLE15-SP2 (unreleased) [ 60.602426] NIP: c0095c0c LR: c0095bb0 CTR: [ 60.602430] REGS: c00022c7fc497830 TRAP: 0901 Not tainted (5.3.18-2-default) [ 60.602432] MSR: 82009033 CR: 44000244 XER: [ 60.602442] CFAR: c0095c18 IRQMASK: 0 GPR00: c0095bb0 c00022c7fc497ac0 c162cc00 c0003bff5e08 GPR04: c0ea539a 002f 1000 GPR08: c7fc5f59ffb8 14bc5000 c7fc5f1f1a30 c14f2fb8 GPR12: c0001e980600 [ 60.602464] NIP [c0095c0c] hot_add_scn_to_nid+0xbc/0x400 [ 60.602467] LR [c0095bb0] hot_add_scn_to_nid+0x60/0x400 [ 60.602470] Call Trace: [ 60.602473] [c00022c7fc497ac0] [c0095bb0] hot_add_scn_to_nid+0x60/0x400 (unreliable) [ 60.602478] [c00022c7fc497b20] [c007a6a0] memory_add_physaddr_to_nid+0x20/0x60 [ 60.602483] [c00022c7fc497b40] [c10235a4] drmem_init+0x258/0x2d8 [ 60.602485] [c00022c7fc497c10] [c0010694] do_one_initcall+0x64/0x300 [ 60.602489] [c00022c7fc497ce0] [c10144f8] kernel_init_freeable+0x2e8/0x3fc [ 60.602491] [c00022c7fc497db0] [c0010b0c] kernel_init+0x2c/0x160 [ 60.602497] [c00022c7fc497e20] [c000b960] ret_from_kernel_thread+0x5c/0x7c [ 60.602498] Instruction dump: [ 60.602501] 7d0a4214 7faa4040 419d0328 e92a0010 71290088 2fa90008 409e001c e92a [ 60.602506] 7fbe4840 419c0010 7d274a14 7fbe4840 <419c00e4> 394a0018 7faa4040 409dffd0 This patch should eliminate the drmem_init() bottleneck during boot. One other important thing to note is that this only addresses part of the slowdown during memory hotplug when there are many LMBs. A far larger part of it is caused by the linear memblock search in find_memory_block(). That problem is addressed with this patch: https://lore.kernel.org/lkml/20200121231028.13699-1-chel...@linux.ibm.com/ which is in linux-next. The numbers I quote here in the commit message for time improvements during hotplug are taken with that patch applied. Without it, hotplug is even slower. arch/powerpc/include/asm/drmem.h | 3 ++ arch/powerpc/mm/drmem.c | 33 +++ arch/powerpc/mm/numa.c| 30 +++-- .../platforms/pseries/hotplug-memory.c| 11 ++- 4 files changed, 50 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/incl
Re: powerpc Linux scv support and scv system call ABI proposal
On Tue, 28 Jan 2020, Florian Weimer wrote: > I don't think we can save LR in a regular register around the system > call, explicitly in the inline asm statement, because we still have to > generate proper unwinding information using CFI directives, something > that you cannot do from within the asm statement. What other architectures in glibc have done for code sequences for syscalls that are problematic for compiler-generated CFI is made the C syscall macros call separate functions defined in a .S file (see sysdeps/unix/sysv/linux/arm/libc-do-syscall.S, sysdeps/unix/sysv/linux/i386/libc-do-syscall.S, sysdeps/unix/sysv/linux/mips/mips32/mips-syscall[567].S). I don't know if you can do that in this case and still get the performance benefits of the new instruction. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH v6 03/10] perf/core: open access to probes for CAP_PERFMON privileged process
On Tue, 28 Jan 2020, Alexey Budankov wrote: > > Open access to monitoring via kprobes and uprobes and eBPF tracing for > CAP_PERFMON privileged process. Providing the access under CAP_PERFMON > capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes > chances to misuse the credentials and makes operation more secure. > > perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses > ftrace to define new kprobe events, and those events are treated as > tracepoint events. eBPF defines new probes via perf_event_open interface > and then the probes are used in eBPF tracing. > > CAP_PERFMON implements the principal of least privilege for performance > monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle > of least privilege: A security design principle that states that a process or > program be granted only those privileges (e.g., capabilities) necessary to > accomplish its legitimate function, and only for the time that such privileges > are actually required) > > For backward compatibility reasons access to perf_events subsystem remains > open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for > secure perf_events monitoring is discouraged with respect to CAP_PERFMON > capability. > > Signed-off-by: Alexey Budankov > --- > kernel/events/core.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Acked-by: James Morris -- James Morris
Re: [PATCH v6 07/10] powerpc/perf: open access for CAP_PERFMON privileged process
On Tue, 28 Jan 2020, Alexey Budankov wrote: > Signed-off-by: Alexey Budankov > --- > arch/powerpc/perf/imc-pmu.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c > index cb50a9e1fd2d..e837717492e4 100644 > --- a/arch/powerpc/perf/imc-pmu.c > +++ b/arch/powerpc/perf/imc-pmu.c > @@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event) > if (event->attr.type != event->pmu->type) > return -ENOENT; > > - if (!capable(CAP_SYS_ADMIN)) > + if (!perfmon_capable()) > return -EACCES; > > /* Sampling not supported */ > @@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event > *event) > if (event->attr.type != event->pmu->type) > return -ENOENT; > > - if (!capable(CAP_SYS_ADMIN)) > + if (!perfmon_capable()) > return -EACCES; > > /* Return if this is a couting event */ > Acked-by: James Morris -- James Morris
Re: [PATCH v6 06/10] trace/bpf_trace: open access for CAP_PERFMON privileged process
On Tue, 28 Jan 2020, Alexey Budankov wrote: > > Signed-off-by: Alexey Budankov > --- > kernel/trace/bpf_trace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index e5ef4ae9edb5..334f1d71ebb1 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -1395,7 +1395,7 @@ int perf_event_query_prog_array(struct perf_event > *event, void __user *info) > u32 *ids, prog_cnt, ids_len; > int ret; > > - if (!capable(CAP_SYS_ADMIN)) > + if (!perfmon_capable()) > return -EPERM; > if (event->attr.type != PERF_TYPE_TRACEPOINT) > return -EINVAL; > Acked-by: James Morris -- James Morris
Re: [PATCH v6 01/10] capabilities: introduce CAP_PERFMON to kernel and user space
On Tue, 28 Jan 2020, Alexey Budankov wrote: > > Signed-off-by: Alexey Budankov > --- > include/linux/capability.h | 4 > include/uapi/linux/capability.h | 8 +++- > security/selinux/include/classmap.h | 4 ++-- > 3 files changed, 13 insertions(+), 3 deletions(-) Acked-by: James Morris -- James Morris
Re: [PATCH v6 10/10] drivers/oprofile: open access for CAP_PERFMON privileged process
On Tue, 28 Jan 2020, Alexey Budankov wrote: > > Open access to monitoring for CAP_PERFMON privileged process. Providing > the access under CAP_PERFMON capability singly, without the rest of > CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and > makes operation more secure. > > CAP_PERFMON implements the principal of least privilege for performance > monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle > of least privilege: A security design principle that states that a process > or program be granted only those privileges (e.g., capabilities) necessary > to accomplish its legitimate function, and only for the time that such > privileges are actually required) > > For backward compatibility reasons access to the monitoring remains open > for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure > monitoring is discouraged with respect to CAP_PERFMON capability. > > Signed-off-by: Alexey Budankov > --- > drivers/oprofile/event_buffer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Acked-by: James Morris > > diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c > index 12ea4a4ad607..6c9edc8bbc95 100644 > --- a/drivers/oprofile/event_buffer.c > +++ b/drivers/oprofile/event_buffer.c > @@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct > file *file) > { > int err = -EPERM; > > - if (!capable(CAP_SYS_ADMIN)) > + if (!perfmon_capable()) > return -EPERM; > > if (test_and_set_bit_lock(0, &buffer_opened)) > -- James Morris
Re: [PATCH v6 09/10] drivers/perf: open access for CAP_PERFMON privileged process
On Tue, 28 Jan 2020, Alexey Budankov wrote: > > Open access to monitoring for CAP_PERFMON privileged process. > Providing the access under CAP_PERFMON capability singly, without the > rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the > credentials and makes operation more secure. > > CAP_PERFMON implements the principal of least privilege for performance > monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle > of least privilege: A security design principle that states that a process > or program be granted only those privileges (e.g., capabilities) necessary > to accomplish its legitimate function, and only for the time that such > privileges are actually required) > > For backward compatibility reasons access to the monitoring remains open > for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure > monitoring is discouraged with respect to CAP_PERFMON capability. > > Signed-off-by: Alexey Budankov > --- > drivers/perf/arm_spe_pmu.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > Acked-by: James Morris > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > index 4e4984a55cd1..5dff81bc3324 100644 > --- a/drivers/perf/arm_spe_pmu.c > +++ b/drivers/perf/arm_spe_pmu.c > @@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event > *event) > if (!attr->exclude_kernel) > reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT); > > - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN)) > + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) > reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); > > return reg; > @@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event > *event) > return -EOPNOTSUPP; > > reg = arm_spe_event_to_pmscr(event); > - if (!capable(CAP_SYS_ADMIN) && > + if (!perfmon_capable() && > (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) | > BIT(SYS_PMSCR_EL1_CX_SHIFT) | > BIT(SYS_PMSCR_EL1_PCT_SHIFT > -- James Morris
Re: [PATCH v6 08/10] parisc/perf: open access for CAP_PERFMON privileged process
On Tue, 28 Jan 2020, Alexey Budankov wrote: > > Open access to monitoring for CAP_PERFMON privileged process. > Providing the access under CAP_PERFMON capability singly, without the > rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the > credentials and makes operation more secure. > > CAP_PERFMON implements the principal of least privilege for performance > monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle > of least privilege: A security design principle that states that a process > or program be granted only those privileges (e.g., capabilities) necessary > to accomplish its legitimate function, and only for the time that such > privileges are actually required) > > For backward compatibility reasons access to the monitoring remains open > for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure > monitoring is discouraged with respect to CAP_PERFMON capability. > > Signed-off-by: Alexey Budankov > --- > arch/parisc/kernel/perf.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c > index 676683641d00..c4208d027794 100644 > --- a/arch/parisc/kernel/perf.c > +++ b/arch/parisc/kernel/perf.c > @@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char > __user *buf, > else > return -EFAULT; > > - if (!capable(CAP_SYS_ADMIN)) > + if (!perfmon_capable()) > return -EACCES; > > if (count != sizeof(uint32_t)) > Acked-by: James Morris -- James Morris
Re: powerpc Linux scv support and scv system call ABI proposal
On Tue, Jan 28, 2020 at 05:04:49PM +0100, Florian Weimer wrote: > * Segher Boessenkool: > > >> > I don't think we can save LR in a regular register around the system > >> > call, explicitly in the inline asm statement, because we still have to > >> > generate proper unwinding information using CFI directives, something > >> > that you cannot do from within the asm statement. > > > > Why not? > > As far as I knowm there isn't a CFI directive that allows us to restore > the CFI state at the end of the inline assembly. If we say that LR is > stored in a different register than what the rest of the function uses, > that would lead to incorrect CFI after the exit of the inline assembler > fragment. > > At least that's what I think. Compilers aren't really my thing. .cfi_restore? Or .cfi_remember_state / .cfi_restore_state, that is probably easiest in inline assembler. > >> > GCC does not model the condition registers, > > > > Huh? It does model the condition register, as 8 registers in GCC's > > internal model (one each for CR0..CR7). > > But GCC doesn't expose them as integers to C code, so you can't do much > without them. Sure, it doesn't expose any other registers directly, either. > >> > We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to > >> > provide an ELFv1 port for this given that it's POWER9-specific. > > > > We *do* have a 32-bit LE ABI. And ELFv1 is not 32-bit either. Please > > don't confuse these things :-) > > > > The 64-bit LE kernel does not really support 32-bit userland (or BE > > userland), *that* is what you want to say. > > Sorry for the confusion. Is POWER9 running kernels which are not 64-bit > LE really a thing in practice, though? Linux only really supports 64-bit LE userland on p9. Anything else is not supported. > >> > From the glibc perspective, the major question is how we handle run-time > >> > selection of the system call instruction sequence. > > > > Well, if it is inlined you don't have this problem either! :-) > > How so? We would have to put the conditional sequence into all inline > system calls, of course. Ah, if you support older systems in your program as well, gotcha. That is not the usual case (just like people use -mcpu=power9 frequently, which means the resulting program will not run on any older CPU). Segher
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
> On Jan 28, 2020, at 12:47 PM, Catalin Marinas wrote: > > The primary goal here is not finding regressions but having clearly > defined semantics of the page table accessors across architectures. x86 > and arm64 are a good starting point and other architectures will be > enabled as they are aligned to the same semantics. This still does not answer the fundamental question. If this test is simply inefficient to find bugs, who wants to spend time to use it regularly? If this is just one off test that may get running once in a few years (when introducing a new arch), how does it justify the ongoing cost to maintain it? I do agree there could be a need to clearly define this thing but that belongs to documentation rather than testing purpose. It is confusing to mix this with other config options which have somewhat a different purpose, it will then be a waste of time for people who mistakenly enable this for regular automatic testing and never found any bug from it.
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On Mon, Jan 27, 2020 at 09:11:53PM -0500, Qian Cai wrote: > On Jan 27, 2020, at 8:28 PM, Anshuman Khandual > wrote: > > This adds tests which will validate architecture page table helpers and > > other accessors in their compliance with expected generic MM semantics. > > This will help various architectures in validating changes to existing > > page table helpers or addition of new ones. [...] > What’s the value of this block of new code? It only supports x86 and > arm64 which are supposed to be good now. Did those tests ever find any > regression or this is almost only useful for new architectures which > only happened once in a few years? The primary goal here is not finding regressions but having clearly defined semantics of the page table accessors across architectures. x86 and arm64 are a good starting point and other architectures will be enabled as they are aligned to the same semantics. See for example this past discussion: https://lore.kernel.org/linux-mm/20190628102003.ga56...@arrakis.emea.arm.com/ These tests should act as the 'contract' between the generic mm code and the architecture port. Without clear semantics, some bugs may be a lot subtler than a boot failure. FTR, I fully support this patch (and I should get around to review it properly; thanks for the reminder ;)). -- Catalin
Re: powerpc Linux scv support and scv system call ABI proposal
On 28/01/2020 11:05, Nicholas Piggin wrote: > Florian Weimer's on January 28, 2020 11:09 pm: >> * Nicholas Piggin: >> >>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other >>> vectors will return -ENOSYS, and the decision for how to add support for >>> a new vector deferred until we see the next user. >> >> Seems reasonable. We don't have to decide this today. >> >>> * Proposal is for scv 0 to provide the standard Linux system call ABI with >>> some >>> differences: >>> >>> - LR is volatile across scv calls. This is necessary for support because the >>> scv instruction clobbers LR. >> >> I think we can express this in the glibc system call assembler wrapper >> generators. The mcount profiling wrappers already have this property. >> >> But I don't think we are so lucky for the inline system calls. GCC >> recognizes an "lr" clobber with inline asm (even though it is not >> documented), but it generates rather strange assembler output as a >> result: >> >> long >> f (long x) >> { >> long y; >> asm ("#" : "=r" (y) : "r" (x) : "lr"); >> return y; >> } >> >> .abiversion 2 >> .section".text" >> .align 2 >> .p2align 4,,15 >> .globl f >> .type f, @function >> f: >> .LFB0: >> .cfi_startproc >> mflr 0 >> .cfi_register 65, 0 >> #APP >> # 5 "t.c" 1 >> # >> # 0 "" 2 >> #NO_APP >> std 0,16(1) >> .cfi_offset 65, 16 >> ori 2,2,0 >> ld 0,16(1) >> mtlr 0 >> .cfi_restore 65 >> blr >> .long 0 >> .byte 0,0,0,1,0,0,0,0 >> .cfi_endproc >> .LFE0: >> .size f,.-f >> >> >> That's with GCC 8.3 at -O2. I don't understand what the ori is about. > > ori 2,2,0 is the group terminating nop hint for POWER8 type cores > which had dispatch grouping rules. It worth to note that it aims to mitigate a load-hit-store cpu stall on some powerpc chips. > >> >> I don't think we can save LR in a regular register around the system >> call, explicitly in the inline asm statement, because we still have to >> generate proper unwinding information using CFI directives, something >> that you cannot do from within the asm statement. >> >> Supporting this in GCC should not be impossible, but someone who >> actually knows this stuff needs to look at it. > > The generated assembler actually seems okay to me. If we compile > something like a syscall and with -mcpu=power9: > > long > f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0) > { > register long r0 asm ("r0") = _r0; > register long r3 asm ("r3") = _r3; > register long r4 asm ("r4") = _r4; > register long r5 asm ("r5") = _r5; > register long r6 asm ("r6") = _r6; > register long r7 asm ("r7") = _r7; > register long r8 asm ("r8") = _r8; > > asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), > "r"(r8) : "lr", "ctr", "cc", "xer"); > > return r3; > } > > > f: > .LFB0: > .cfi_startproc > mflr 0 > std 0,16(1) > .cfi_offset 65, 16 > mr 0,9 > #APP > # 12 "a.c" 1 > # scv > # 0 "" 2 > #NO_APP > ld 0,16(1) > mtlr 0 > .cfi_restore 65 > blr > .long 0 > .byte 0,0,0,1,0,0,0,0 > .cfi_endproc > > That gets the LR save/restore right when we're also using r0. > >> >>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the >>> system call exit to avoid restoring the CR register. >> >> This sounds reasonable, but I don't know what kind of knock-on effects >> this has. The inline system call wrappers can handle this with minor >> tweaks. > > Okay, good. In the end we would have to check code trace through the > kernel and libc of course, but I think there's little to no opportunity > to take advantage of current extra non-volatile cr regs. > > mtcr has to write 8 independently renamed registers so it's cracked into > 2 insns on POWER9 (and likely to always be a bit troublesome). It's not > much in the scheme of a system call, but while we can tweak the ABI... We don't really need a mfcr/mfocr to implement the Linux syscall ABI on powerpc, we can use a 'bns+' plus a neg instead as: -- #define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5, \ arg6) \ ({\ register long int r0 __asm__ ("r0") = (long int) (name); \ register long int r3 __asm__ ("r3") = (long int) (arg1); \ register long int r4 __asm__ ("r4") = (long int) (arg2); \ register long int r5 __asm__ ("r5") = (long int) (arg3); \ register long int r6 __asm__ ("r6") = (long int) (arg4); \ register long int r7 __asm__ ("r7") = (long int) (arg5); \ register long int r8 __asm__ ("r8") = (long int) (arg6); \ __asm__ __volatile__
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On 1/27/20 7:33 PM, Qian Cai wrote: > >>> What’s the value of this block of new code? It only supports x86 and arm64 >>> which are supposed to be good now. >> We have been over the usefulness of this code many times before as the patch >> is >> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc >> and >> ppc32. There are build time or runtime problems with other archs which >> prevent > I am not sure if I care too much about arc and ppc32 which are pretty much > legacy > platforms. You really need to brush up on your definition and knowledge of what "legacy" means. ARC is actively maintained and used by several customers, some in arch/arc/plat* and some not in there. It is present in broadband routers used by major ISP, massively multicore deep packet inspection system from EZChip, and many more Sure you may not care about them, but the maintainers for the platforms do. It would have been better if you had spent the time and energy in improving the code over 12 revisions rather than bike shedding. -Vineet
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
Le 28/01/2020 à 02:27, Anshuman Khandual a écrit : This adds tests which will validate architecture page table helpers and other accessors in their compliance with expected generic MM semantics. This will help various architectures in validating changes to existing page table helpers or addition of new ones. This test covers basic page table entry transformations including but not limited to old, young, dirty, clean, write, write protect etc at various level along with populating intermediate entries with next page table page and validating them. Test page table pages are allocated from system memory with required size and alignments. The mapped pfns at page table levels are derived from a real pfn representing a valid kernel text symbol. This test gets called right after page_alloc_init_late(). This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and arm64. Going forward, other architectures too can enable this after fixing build or runtime problems (if any) with their page table helpers. Folks interested in making sure that a given platform's page table helpers conform to expected generic MM semantics should enable the above config which will just trigger this test during boot. Any non conformity here will be reported as an warning which would need to be fixed. This test will help catch any changes to the agreed upon semantics expected from generic MM and enable platforms to accommodate it thereafter. [...] Tested-by: Christophe Leroy #PPC32 Also tested on PPC64 (under QEMU): book3s/64 64k pages, book3s/64 4k pages and book3e/64 Reviewed-by: Ingo Molnar Suggested-by: Catalin Marinas Signed-off-by: Andrew Morton Signed-off-by: Christophe Leroy Signed-off-by: Anshuman Khandual --- [...] diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt new file mode 100644 index ..f3f8111edbe3 --- /dev/null +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt @@ -0,0 +1,35 @@ +# +# Feature name: debug-vm-pgtable +# Kconfig: ARCH_HAS_DEBUG_VM_PGTABLE +# description: arch supports pgtable tests for semantics compliance +# +--- +| arch |status| +--- +| alpha: | TODO | +| arc: | ok | +| arm: | TODO | +| arm64: | ok | +| c6x: | TODO | +|csky: | TODO | +| h8300: | TODO | +| hexagon: | TODO | +|ia64: | TODO | +|m68k: | TODO | +| microblaze: | TODO | +|mips: | TODO | +| nds32: | TODO | +| nios2: | TODO | +|openrisc: | TODO | +| parisc: | TODO | +| powerpc/32: | ok | +| powerpc/64: | TODO | You can change the two above lines by powerpc: ok +| riscv: | TODO | +|s390: | TODO | +| sh: | TODO | +| sparc: | TODO | +| um: | TODO | +| unicore32: | TODO | +| x86: | ok | +| xtensa: | TODO | +--- diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1ec34e16ed65..253dcab0bebc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -120,6 +120,7 @@ config PPC # select ARCH_32BIT_OFF_T if PPC32 select ARCH_HAS_DEBUG_VIRTUAL + select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32 Remove the 'if PPC32' as we now know it also work on PPC64. select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 0b6c4042942a..fb0e76d254b3 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { } struct mm_struct; +#define mm_p4d_folded mm_p4d_folded +static inline bool mm_p4d_folded(struct mm_struct *mm) +{ + return !pgtable_l5_enabled(); +} + For me this should be part of another patch, it is not directly linked to the tests. void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte); void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 798ea36a0549..e0b04787e789 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -1208,6 +1208,12 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif +#ifdef CONFIG_DEBUG_VM_PGTABLE Not sure it is a good idea to put that in include/asm-generic/pgtable.h By doing this you are forcing a reb
Re: powerpc Linux scv support and scv system call ABI proposal
* Segher Boessenkool: >> > I don't think we can save LR in a regular register around the system >> > call, explicitly in the inline asm statement, because we still have to >> > generate proper unwinding information using CFI directives, something >> > that you cannot do from within the asm statement. > > Why not? As far as I knowm there isn't a CFI directive that allows us to restore the CFI state at the end of the inline assembly. If we say that LR is stored in a different register than what the rest of the function uses, that would lead to incorrect CFI after the exit of the inline assembler fragment. At least that's what I think. Compilers aren't really my thing. > >> >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr >> >> instruction on the kernel side, and it is currently not implemented well >> >> in glibc, requiring a mfcr (mfocr should be possible and asm goto >> >> support >> >> would allow a better implementation). Is it worth continuing this style >> >> of >> >> error handling? Or just move to -ve return means error? Using a >> >> different >> >> bit would allow the kernel to piggy back the CR return code setting with >> >> a test for the error case exit. >> > >> > GCC does not model the condition registers, > > Huh? It does model the condition register, as 8 registers in GCC's > internal model (one each for CR0..CR7). But GCC doesn't expose them as integers to C code, so you can't do much without them. >> >> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit >> >> calls if there was interest in developing an ABI for 32-bit programs. >> >> Marginal benefit in avoiding compat syscall selection. >> > >> > We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to >> > provide an ELFv1 port for this given that it's POWER9-specific. > > We *do* have a 32-bit LE ABI. And ELFv1 is not 32-bit either. Please > don't confuse these things :-) > > The 64-bit LE kernel does not really support 32-bit userland (or BE > userland), *that* is what you want to say. Sorry for the confusion. Is POWER9 running kernels which are not 64-bit LE really a thing in practice, though? >> > From the glibc perspective, the major question is how we handle run-time >> > selection of the system call instruction sequence. > > Well, if it is inlined you don't have this problem either! :-) How so? We would have to put the conditional sequence into all inline system calls, of course. Thanks, Florian
Re: powerpc Linux scv support and scv system call ABI proposal
* Nicholas Piggin: > That gets the LR save/restore right when we're also using r0. Yes, I agree it looks good. Nice. >> But the kernel uses the -errno convention internally, so I think it >> would make sense to pass this to userspace and not convert back and >> forth. This would align with what most of the architectures do, and >> also avoids the GCC oddity. > > Yes I would be interested in opinions for this option. It seems like > matching other architectures is a good idea. Maybe there are some > reasons not to. If there were a POWER-specific system call that uses all result bits and doesn't have room for the 4096 error states (or an error number that's outside that range), that would be a blocker. I can't find such a system call wrapped in the glibc sources. musl's inline syscalls always convert the errno state to -errno, so it's not possible to use such a system call there. >>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit >>> calls if there was interest in developing an ABI for 32-bit programs. >>> Marginal benefit in avoiding compat syscall selection. >> >> We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to >> provide an ELFv1 port for this given that it's POWER9-specific. > > Okay. There's no reason not to enable this for BE, at least for the > kernel it's no additional work so it probably remains enabled (unless > there is something really good we could do with the ABI if we exclude > ELFv1 but I don't see anything). > > But if glibc only builds for ELFv2 support that's probably reasonable. To be clear, we still support ELFv1 for POWER, but given that this feature is POWER9 and later, I expect the number of users benefiting from 32-bit support (or ELFv1 and thus big-endian support) to be quite small. Especially if we go the conditional branch route, I would restrict this to ELFv2 in glibc, at least for default builds. >> From the glibc perspective, the major question is how we handle run-time >> selection of the system call instruction sequence. On i386, we use a >> function pointer in the TCB to call an instruction sequence in the vDSO. >> That's problematic from a security perspective. I expect that on >> POWER9, using a pointer in read-only memory would be equally >> non-attractive due to a similar lack of PC-relative addressing. We >> could use the HWCAP bit in the TCB, but that would add another (easy to >> predict) conditional branch to every system call. > > I would have to defer to glibc devs on this. Conditional branch > should be acceptable I think, scv improves speed as much as several > mispredicted branches (about 90 cycles). But we'd have to pay for that branch (and likely the LR clobber) on legacy POWER, and that's something to consider, too. Is there an additional performance hit if a process uses both the old and new system call sequence? Thanks, Florian
Re: powerpc Linux scv support and scv system call ABI proposal
On Wed, Jan 29, 2020 at 12:05:40AM +1000, Nicholas Piggin wrote: > Florian Weimer's on January 28, 2020 11:09 pm: > > But I don't think we are so lucky for the inline system calls. GCC > > recognizes an "lr" clobber with inline asm (even though it is not > > documented), but it generates rather strange assembler output as a > > result: > > std 0,16(1) > > ori 2,2,0 > > ld 0,16(1) > > That's with GCC 8.3 at -O2. I don't understand what the ori is about. > > ori 2,2,0 is the group terminating nop hint for POWER8 type cores > which had dispatch grouping rules. Yup. GCC generates that here to force the load into a different scheduling group than the corresponding store is, because that otherwise would cause very expensive pipeline flushes. It does that if it knows it is the same address (like here). > > I don't think we can save LR in a regular register around the system > > call, explicitly in the inline asm statement, because we still have to > > generate proper unwinding information using CFI directives, something > > that you cannot do from within the asm statement. Why not? > >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr > >> instruction on the kernel side, and it is currently not implemented well > >> in glibc, requiring a mfcr (mfocr should be possible and asm goto support > >> would allow a better implementation). Is it worth continuing this style > >> of > >> error handling? Or just move to -ve return means error? Using a different > >> bit would allow the kernel to piggy back the CR return code setting with > >> a test for the error case exit. > > > > GCC does not model the condition registers, Huh? It does model the condition register, as 8 registers in GCC's internal model (one each for CR0..CR7). There is no way to use CR0 across function calls, with our ABIs: it is a volatile register. GCC does not model the SO bits in the CR fields. If the calling convention would only use registers GCC *does* know about, we can have a builtin for this, so that you can get better inlining etc., no need for an assembler wrapper. > > But the kernel uses the -errno convention internally, so I think it > > would make sense to pass this to userspace and not convert back and > > forth. This would align with what most of the architectures do, and > > also avoids the GCC oddity. > > Yes I would be interested in opinions for this option. It seems like > matching other architectures is a good idea. Maybe there are some > reasons not to. Agreed with you both here. > >> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit > >> calls if there was interest in developing an ABI for 32-bit programs. > >> Marginal benefit in avoiding compat syscall selection. > > > > We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to > > provide an ELFv1 port for this given that it's POWER9-specific. We *do* have a 32-bit LE ABI. And ELFv1 is not 32-bit either. Please don't confuse these things :-) The 64-bit LE kernel does not really support 32-bit userland (or BE userland), *that* is what you want to say. > > From the glibc perspective, the major question is how we handle run-time > > selection of the system call instruction sequence. Well, if it is inlined you don't have this problem either! :-) Segher
Re: [PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards
On 28 January 2020 at 3:16 pm, Rob Herring wrote: On Tue, Jan 28, 2020 at 2:01 AM Christian Zigotzky wrote: Hi All, Which mailing list is responsible for the pata_pcmcia driver? We are using new SanDisk High (>8G) CF cards with this driver [1] and we need the following line in the file "drivers/ata/pata_pcmcia.c". +PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101),/* SanDisk High (>8G) CFA */ Run get_maintainers.pl and it will answer that for you: $ scripts/get_maintainer.pl -f drivers/ata/pata_pcmcia.c Bartlomiej Zolnierkiewicz (maintainer:LIBATA PATA DRIVERS) Jens Axboe (maintainer:LIBATA PATA DRIVERS) linux-...@vger.kernel.org (open list:LIBATA PATA DRIVERS) linux-ker...@vger.kernel.org (open list) Thank you!
Re: [PATCH] of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
On Sun, 26 Jan 2020 22:52:47 +1100, Michael Ellerman wrote: > There's an OF helper called of_dma_is_coherent(), which checks if a > device has a "dma-coherent" property to see if the device is coherent > for DMA. > > But on some platforms devices are coherent by default, and on some > platforms it's not possible to update existing device trees to add the > "dma-coherent" property. > > So add a Kconfig symbol to allow arch code to tell > of_dma_is_coherent() that devices are coherent by default, regardless > of the presence of the property. > > Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie. > when the system has a coherent cache. > > Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper") > Cc: sta...@vger.kernel.org # v3.16+ > Reported-by: Christian Zigotzky > Tested-by: Christian Zigotzky > Signed-off-by: Michael Ellerman > --- > arch/powerpc/Kconfig | 1 + > drivers/of/Kconfig | 4 > drivers/of/address.c | 6 +- > 3 files changed, 10 insertions(+), 1 deletion(-) > Applied, thanks. Rob
[PATCH 4.9 160/271] perf/ioctl: Add check for the sample_period value
From: Ravi Bangoria [ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ] perf_event_open() limits the sample_period to 63 bits. See: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Make ioctl() consistent with it. Also on PowerPC, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: a...@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: ma...@linux.vnet.ibm.com Cc: m...@ellerman.id.au Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bango...@linux.ibm.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 5bbf7537a6121..64ace5e9af2a0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4624,6 +4624,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg) if (perf_event_check_period(event, value)) return -EINVAL; + if (!event->attr.freq && (value & (1ULL << 63))) + return -EINVAL; + event_function_call(event, __perf_event_period, &value); return 0; -- 2.20.1
Re: [PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards
On Tue, Jan 28, 2020 at 2:01 AM Christian Zigotzky wrote: > > Hi All, > > Which mailing list is responsible for the pata_pcmcia driver? We are > using new SanDisk High (>8G) CF cards with this driver [1] and we need > the following line in the file "drivers/ata/pata_pcmcia.c". > > +PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101),/* SanDisk High > (>8G) CFA */ Run get_maintainers.pl and it will answer that for you: $ scripts/get_maintainer.pl -f drivers/ata/pata_pcmcia.c Bartlomiej Zolnierkiewicz (maintainer:LIBATA PATA DRIVERS) Jens Axboe (maintainer:LIBATA PATA DRIVERS) linux-...@vger.kernel.org (open list:LIBATA PATA DRIVERS) linux-ker...@vger.kernel.org (open list)
Re: powerpc Linux scv support and scv system call ABI proposal
Florian Weimer's on January 28, 2020 11:09 pm: > * Nicholas Piggin: > >> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other >> vectors will return -ENOSYS, and the decision for how to add support for >> a new vector deferred until we see the next user. > > Seems reasonable. We don't have to decide this today. > >> * Proposal is for scv 0 to provide the standard Linux system call ABI with >> some >> differences: >> >> - LR is volatile across scv calls. This is necessary for support because the >> scv instruction clobbers LR. > > I think we can express this in the glibc system call assembler wrapper > generators. The mcount profiling wrappers already have this property. > > But I don't think we are so lucky for the inline system calls. GCC > recognizes an "lr" clobber with inline asm (even though it is not > documented), but it generates rather strange assembler output as a > result: > > long > f (long x) > { > long y; > asm ("#" : "=r" (y) : "r" (x) : "lr"); > return y; > } > > .abiversion 2 > .section".text" > .align 2 > .p2align 4,,15 > .globl f > .type f, @function > f: > .LFB0: > .cfi_startproc > mflr 0 > .cfi_register 65, 0 > #APP > # 5 "t.c" 1 > # > # 0 "" 2 > #NO_APP > std 0,16(1) > .cfi_offset 65, 16 > ori 2,2,0 > ld 0,16(1) > mtlr 0 > .cfi_restore 65 > blr > .long 0 > .byte 0,0,0,1,0,0,0,0 > .cfi_endproc > .LFE0: > .size f,.-f > > > That's with GCC 8.3 at -O2. I don't understand what the ori is about. ori 2,2,0 is the group terminating nop hint for POWER8 type cores which had dispatch grouping rules. > > I don't think we can save LR in a regular register around the system > call, explicitly in the inline asm statement, because we still have to > generate proper unwinding information using CFI directives, something > that you cannot do from within the asm statement. > > Supporting this in GCC should not be impossible, but someone who > actually knows this stuff needs to look at it. The generated assembler actually seems okay to me. If we compile something like a syscall and with -mcpu=power9: long f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0) { register long r0 asm ("r0") = _r0; register long r3 asm ("r3") = _r3; register long r4 asm ("r4") = _r4; register long r5 asm ("r5") = _r5; register long r6 asm ("r6") = _r6; register long r7 asm ("r7") = _r7; register long r8 asm ("r8") = _r8; asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), "r"(r8) : "lr", "ctr", "cc", "xer"); return r3; } f: .LFB0: .cfi_startproc mflr 0 std 0,16(1) .cfi_offset 65, 16 mr 0,9 #APP # 12 "a.c" 1 # scv # 0 "" 2 #NO_APP ld 0,16(1) mtlr 0 .cfi_restore 65 blr .long 0 .byte 0,0,0,1,0,0,0,0 .cfi_endproc That gets the LR save/restore right when we're also using r0. > >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the >> system call exit to avoid restoring the CR register. > > This sounds reasonable, but I don't know what kind of knock-on effects > this has. The inline system call wrappers can handle this with minor > tweaks. Okay, good. In the end we would have to check code trace through the kernel and libc of course, but I think there's little to no opportunity to take advantage of current extra non-volatile cr regs. mtcr has to write 8 independently renamed registers so it's cracked into 2 insns on POWER9 (and likely to always be a bit troublesome). It's not much in the scheme of a system call, but while we can tweak the ABI... > >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr >> instruction on the kernel side, and it is currently not implemented well >> in glibc, requiring a mfcr (mfocr should be possible and asm goto support >> would allow a better implementation). Is it worth continuing this style of >> error handling? Or just move to -ve return means error? Using a different >> bit would allow the kernel to piggy back the CR return code setting with >> a test for the error case exit. > > GCC does not model the condition registers, so for inline system calls, > we have to produce a value anyway that the subsequence C code can check. > The assembler syscall wrappers do not need to do this, of course, but > I'm not sure which category of interfaces is more important. Right. asm goto can improve this kind of pattern if it's inlined into the C code which tests the result, it can branch using the flags to the C error handling label, rather than move flags into GPR, test GPR, branch. However... > But the kernel uses the -errno convention internally, so I think it > would make sense to pass this to userspace and not convert back and > forth. This would align with what most of the arch
Re: powerpc Linux scv support and scv system call ABI proposal
* Nicholas Piggin: > * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other > vectors will return -ENOSYS, and the decision for how to add support for > a new vector deferred until we see the next user. Seems reasonable. We don't have to decide this today. > * Proposal is for scv 0 to provide the standard Linux system call ABI with > some > differences: > > - LR is volatile across scv calls. This is necessary for support because the > scv instruction clobbers LR. I think we can express this in the glibc system call assembler wrapper generators. The mcount profiling wrappers already have this property. But I don't think we are so lucky for the inline system calls. GCC recognizes an "lr" clobber with inline asm (even though it is not documented), but it generates rather strange assembler output as a result: long f (long x) { long y; asm ("#" : "=r" (y) : "r" (x) : "lr"); return y; } .abiversion 2 .section".text" .align 2 .p2align 4,,15 .globl f .type f, @function f: .LFB0: .cfi_startproc mflr 0 .cfi_register 65, 0 #APP # 5 "t.c" 1 # # 0 "" 2 #NO_APP std 0,16(1) .cfi_offset 65, 16 ori 2,2,0 ld 0,16(1) mtlr 0 .cfi_restore 65 blr .long 0 .byte 0,0,0,1,0,0,0,0 .cfi_endproc .LFE0: .size f,.-f That's with GCC 8.3 at -O2. I don't understand what the ori is about. I don't think we can save LR in a regular register around the system call, explicitly in the inline asm statement, because we still have to generate proper unwinding information using CFI directives, something that you cannot do from within the asm statement. Supporting this in GCC should not be impossible, but someone who actually knows this stuff needs to look at it. > - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the > system call exit to avoid restoring the CR register. This sounds reasonable, but I don't know what kind of knock-on effects this has. The inline system call wrappers can handle this with minor tweaks. > - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr > instruction on the kernel side, and it is currently not implemented well > in glibc, requiring a mfcr (mfocr should be possible and asm goto support > would allow a better implementation). Is it worth continuing this style of > error handling? Or just move to -ve return means error? Using a different > bit would allow the kernel to piggy back the CR return code setting with > a test for the error case exit. GCC does not model the condition registers, so for inline system calls, we have to produce a value anyway that the subsequence C code can check. The assembler syscall wrappers do not need to do this, of course, but I'm not sure which category of interfaces is more important. But the kernel uses the -errno convention internally, so I think it would make sense to pass this to userspace and not convert back and forth. This would align with what most of the architectures do, and also avoids the GCC oddity. > - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit > calls if there was interest in developing an ABI for 32-bit programs. > Marginal benefit in avoiding compat syscall selection. We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to provide an ELFv1 port for this given that it's POWER9-specific. >From the glibc perspective, the major question is how we handle run-time selection of the system call instruction sequence. On i386, we use a function pointer in the TCB to call an instruction sequence in the vDSO. That's problematic from a security perspective. I expect that on POWER9, using a pointer in read-only memory would be equally non-attractive due to a similar lack of PC-relative addressing. We could use the HWCAP bit in the TCB, but that would add another (easy to predict) conditional branch to every system call. I don't think it matters whether both system call variants use the same error convention because we could have different error code extraction code on the two branches. Thanks, Florian
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
> On Jan 28, 2020, at 7:10 AM, Mike Rapoport wrote: > > Aren't x86 and arm64 not decent enough? > Even if this test could be used to detect regressions only on these two > platforms, the test is valuable. The question is does it detect regressions good enough? Where is the list of past bugs that it had found? It is an usual deal for unproven debugging features remain out of tree first and keep gathering unique bugs it found and then justify for a mainline inclusion with enough data.
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
Hello Qian, On Mon, Jan 27, 2020 at 10:33:08PM -0500, Qian Cai wrote: > > > On Jan 27, 2020, at 10:06 PM, Anshuman Khandual > > wrote: > > > > enablement of this test (for the moment) but then the goal is to integrate > > all > > of them going forward. The test not only validates platform's adherence to > > the > > expected semantics from generic MM but also helps in keeping it that way > > during > > code changes in future as well. > > Another option maybe to get some decent arches on board first before merging > this > thing, so it have more changes to catch regressions for developers who might > run this. Aren't x86 and arm64 not decent enough? Even if this test could be used to detect regressions only on these two platforms, the test is valuable. -- Sincerely yours, Mike.
Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
On Tue, Jan 28, 2020 at 02:12:56AM -0500, Qian Cai wrote: > > On Jan 28, 2020, at 1:13 AM, Christophe Leroy > > wrote: > > ppc32 an indecent / legacy platform ? Are you kidying ? > > Powerquicc II PRO for instance is fully supported by the > > manufacturer and widely used in many small networking devices. > Of course I forgot about embedded devices. The problem is that how > many developers are actually going to run this debug option on > embedded devices? Much fewer if the code isn't upstream than if it is. This isn't something that every developer is going to enable all the time but that doesn't mean it's not useful, it's more for people doing work on the architectures or on memory management (or who suspect they're running into a relevant problem), and I'm sure some of the automated testing people will enable it. The more barriers there are in place to getting the testsuite up and running the less likely it is that any of these groups will run it regularly. signature.asc Description: PGP signature
Re: [FSL P5020 P5040 PPC] Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates
Christian Zigotzky writes: > On 24 January 2020 at 12:42 pm, Michael Ellerman wrote: >> Ulf Hansson writes: >>> On Thu, 16 Jan 2020 at 12:18, Christian Zigotzky >>> wrote: Hi All, We still need the attached patch for our onboard SD card interface [1,2]. Could you please add this patch to the tree? >>> No, because according to previous discussion that isn't the correct >>> solution and more importantly it will break other archs (if I recall >>> correctly). >>> >>> Looks like someone from the ppc community needs to pick up the ball. >> That's a pretty small community these days :) :/ >> >> Christian can you test this please? I think I got the polarity of all >> the tests right, but it's Friday night so maybe I'm wrong :) >> >> cheers > Michael, > > Thanks a lot for the new patch! I compiled the RC7 of kernel 5.5 with > your patch again yesterday and the kernel works without any problems > with our onboard SD cards. [1] Thanks for testing. cheers
powerpc Linux scv support and scv system call ABI proposal
I would like to enable support for the scv instruction to provide the Linux system calls. This requires two things to be defined, firstly how to advertise support for scv and how to allocate and advertise support for individual scv vectors. Secondly, how to define a Linux system call ABI with this new instruction. I have put together a rough proposal along with some options and questions. Any thoughts or input would be welcome, I have probably missed some things so please point them out. (I will be on vacation for two weeks from the end of the week, may not get to replying immediately) Thanks, Nick System Call Vectored (scv) ABI The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can avoid one mtmsrd instruction. Another benefit is that the ABI can be changed if there is a good reason to. The scv instruction has 128 interrupt entry points (not enough to cover the Linux system call space). The proposal is to assign scv numbers conservatively. 'scv 0' could be used for the regular Linux system call ABI initially. Examples of other assignments could be 32-bit compat system calls, and firmware service calls. Linux has not enabled FSCR[SCV] yet, so the instruction will trap with illegal instruction on current environments. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. One option is for PPC_FEATURE2_SCV to indicate 'scv 0' support, and a new HWCAP bit assigned for each new scv vector supported for userspace. This is the most regular and flexible approach. It requires the most HWCAP space, but vector usage is not expected to grow quickly. Another option is for PPC_FEATURE2_SCV to indicate 'scv 0', and other vectors will each return -ENOSYS, then when they are assigned to a new ABI, it will define a particular way they can be queried for support (which would return something other than -ENOSYS if supported). This will not require more HWCAP bits, but it's less regular and more complicated to determine. * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other vectors will return -ENOSYS, and the decision for how to add support for a new vector deferred until we see the next user. * Proposal is for scv 0 to provide the standard Linux system call ABI with some differences: - LR is volatile across scv calls. This is necessary for support because the scv instruction clobbers LR. - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the system call exit to avoid restoring the CR register. - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr instruction on the kernel side, and it is currently not implemented well in glibc, requiring a mfcr (mfocr should be possible and asm goto support would allow a better implementation). Is it worth continuing this style of error handling? Or just move to -ve return means error? Using a different bit would allow the kernel to piggy back the CR return code setting with a test for the error case exit. - R2 could be volatile as though it's an external function call, which would avoid one store in the system call entry path. However it would require the caller to load R2 after the system call returns, where the latency of the load can not be overlapped with the costly system call exit sequence. On balance, it may be better to keep R2 as non-volatile. - Number of volatile registers available seems sufficient. Linux's 'sc' handler is badly constrained here, but that is because it is shared between both hypercall and syscall handlers, which have different call conventions that share no volatile GPR registers! r9-r12 should be quite enough. - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit calls if there was interest in developing an ABI for 32-bit programs. Marginal benefit in avoiding compat syscall selection.
Re: [PATCH] powerpc: drmem: avoid NULL pointer dereference when drmem is unavailable
Hello Nathan, On Thu 23-01-20 09:56:10, Nathan Lynch wrote: > Libor Pechacek writes: > > In KVM guests drmem structure is only zero initialized. Trying to > > manipulate DLPAR parameters results in a crash in this environment. > > I think this statement needs qualification. Unless I'm mistaken, this > happens only when you boot a guest without any hotpluggable memory > configured, and then try to add or remove memory. Thanks for the review. The introductory statement can indeed be clearer. [...] > > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c > > b/arch/powerpc/platforms/pseries/hotplug-memory.c > > index c126b94d1943..4ea6af002e27 100644 > > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c > > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c > > @@ -236,9 +236,9 @@ static int get_lmb_range(u32 drc_index, int n_lmbs, > > if (!start) > > return -EINVAL; > > > > - end = &start[n_lmbs - 1]; > > + end = &start[n_lmbs]; > > > > - last_lmb = &drmem_info->lmbs[drmem_info->n_lmbs - 1]; > > + last_lmb = &drmem_info->lmbs[drmem_info->n_lmbs]; > > if (end > last_lmb) > > return -EINVAL; > > Is this not undefined behavior? I'd rather do this in a way that does > not involve forming out-of-bounds pointers. Well, this is a tough question for the case when drmem_info->lmbs was not allocated. Given that the array does not exist, what bounds are we talking about? My patch builds on the fact that NULL[0] is NULL and NULL < NULL is false. Talking about a pointer to one past the last element of an non-existent array is too much philosophy for me. For the case when drmem_info->lmbs is allocated, last_lmb is a pointer to one past the last element of the array as Michal mentioned. > Even if it's safe, naming that pointer "last_lmb" now actively hinders > understanding of the code; it should be named "limit" or something. Good catch. [...] > 1 file changed, 36 insertions(+), 7 deletions(-) > arch/powerpc/include/asm/drmem.h | 43 +--- > > modified arch/powerpc/include/asm/drmem.h > @@ -20,19 +20,48 @@ struct drmem_lmb { > > struct drmem_lmb_info { > struct drmem_lmb*lmbs; > - int n_lmbs; > + unsigned intn_lmbs; > u32 lmb_size; > }; > > extern struct drmem_lmb_info *drmem_info; > > -#define for_each_drmem_lmb_in_range(lmb, start, end) \ > - for ((lmb) = (start); (lmb) <= (end); (lmb)++) > +static inline bool drmem_present(void) > +{ > + return drmem_info->lmbs != NULL; > +} Yes, use of this test was also my first idea about the fix. > +static inline struct drmem_lmb *drmem_lmb_index(unsigned int index) > +{ > + if (!drmem_present()) > + return NULL; > > -#define for_each_drmem_lmb(lmb) \ > - for_each_drmem_lmb_in_range((lmb), \ > - &drmem_info->lmbs[0], \ > - &drmem_info->lmbs[drmem_info->n_lmbs - 1]) > + if (WARN_ON(index >= drmem_info->n_lmbs)) > + return NULL; Why is this WARN_ON needed? > + > + return &drmem_info->lmbs[index]; > +} > + > +static inline struct drmem_lmb *drmem_first_lmb(void) > +{ > + return drmem_lmb_index(0); > +} > + > +static inline struct drmem_lmb *drmem_last_lmb(void) > +{ > + if (!drmem_present()) > + return NULL; > + > + return drmem_lmb_index(drmem_info->n_lmbs - 1); Is the unsigned integer wraparound intended in drmem_info->n_lmbs == 0 case? > +} > + > +#define for_each_drmem_lmb(lmb) > \ > + for ((lmb) = drmem_first_lmb(); \ drmem_first_lmb() is essentially a call to drmem_info->lmbs(0). What happens if drmem_info->n_lmbs is zero and drmem_info->lmbs is not NULL? > + (lmb) != NULL && (lmb) <= drmem_last_lmb();\ > + (lmb)++) > + > +#define for_each_drmem_lmb_in_range(lmb, start, end) \ > + for ((lmb) = (start); (lmb) <= (end); (lmb)++) > > /* > * The of_drconf_cell_v1 struct defines the layout of the LMB data > After all, I don't mind how the bug will be fixed. As you can see, my preference is towards simpler solutions. In my opinion your solution special-cased drmem_info->lmbs == NULL and opened the doorway to the combination of drmem_info->lmbs != NULL && !drmem_info->n_lmbs. Maybe the condition can never become true but the code should IMHO be robust enough to handle it. Thanks! Libor -- Libor Pechacek SUSE LabsRemember to have fun...
Re: [PATCH] powerpc/64: system call implement the bulk of the logic in C fix
On Tue, Jan 28, 2020 at 10:41:02AM +1000, Nicholas Piggin wrote: > Michal Suchánek's on January 28, 2020 4:08 am: > > On Tue, Jan 28, 2020 at 12:17:12AM +1000, Nicholas Piggin wrote: > >> This incremental patch fixes several soft-mask debug and unsafe > >> smp_processor_id messages due to tracing and false positives in > >> "unreconciled" code. > >> > >> It also fixes a bug with syscall tracing functions that set registers > >> (e.g., PTRACE_SETREG) not setting GPRs properly. > >> > >> There was a bug reported with the TM selftests, I haven't been able > >> to reproduce that one. > >> > >> I can squash this into the main patch and resend the series if it > >> helps but the incremental helps to see the bug fixes. > > > > There are some whitespace differences between this and the series I have > > applied locally. What does it apply to? > > > > Is there some revision of the patchset I missed? > > No I may have just missed some of your whitespace cleanups, or maybe I got > some that Michael made which you don't have in his next-test branch. Looks like the latter. I will pick patches from next-test. Thanks Michal
Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys
Hi Dave, On 27/01/20 9:12 pm, Dave Hansen wrote: > > How have you tested this patch (and the whole series for that matter)? > I replaced the second patch with this one and did a build test. Till v16, I had tested the whole series (build + run) on both a POWER8 system (with 4K and 64K page sizes) and a Skylake SP system but for x86_64 only. Following that, I could only do a build test locally on my laptop for i386 and x86_64 on my laptop as I did not have access to the Skylake system anymore. This is how I tested the build process: $ cd linux $ make -C tools/testing/selftests ... make[1]: Entering directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' ... gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 ... $ make -C tools/testing/selftests clean $ make -C tools/testing/selftests/vm make: Entering directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' make --no-builtin-rules ARCH=x86_64 -C ../../../.. headers_install make[1]: Entering directory '/home/sandipan/.devel/linux' INSTALL ./usr/include make[1]: Leaving directory '/home/sandipan/.devel/linux' ... gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 ... $ make -C tools/testing/selftests/vm clean $ make -C tools/testing/selftests/vm protection_keys make: Entering directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 make: Leaving directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' $ make -C tools/testing/selftests/vm clean $ make -C tools/testing/selftests/vm protection_keys_32 make: Entering directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 make: Leaving directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' $ make -C tools/testing/selftests/vm protection_keys_64 make: Entering directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 make: Leaving directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm' $ make -C tools/testing/selftests/vm clean $ cd tools/testing/selftests/vm $ make make --no-builtin-rules ARCH=x86_64 -C ../../../.. headers_install make[1]: Entering directory '/home/sandipan/.devel/linux' INSTALL ./usr/include make[1]: Leaving directory '/home/sandipan/.devel/linux' ... gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 ... $ make clean $ make protection_keys gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 $ make clean $ make protection_keys_32 gcc -Wall -I ../../../../usr/include -no-pie -m32 protection_keys.c -lrt -lrt -ldl -lm -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32 $ make protection_keys_64 gcc -Wall -I ../../../../usr/include -no-pie -m64 protection_keys.c -lrt -lrt -ldl -o /home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64 - Sandipan
Re: vmlinux ELF header sometimes corrupt
On 22/01/2020 18.52, Rasmus Villemoes wrote: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/user) Gid: ( 1001/user) > Access: 2020-01-22 10:52:38.946703081 + > Modify: 2020-01-22 10:52:38.954703105 + > Change: 2020-01-22 10:52:38.954703105 + > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/user) Gid: ( 1001/user) > Access: 2020-01-22 17:20:00.650379057 + > Modify: 2020-01-22 10:52:38.954703105 + > Change: 2020-01-22 10:52:38.954703105 + > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. So, I think I take that last part back. I just hit a case where I built the kernel manually, made a copy of vmlinux to vmlinux.copy, and file(1) said both were fine (and cmp(1) agreed they were identical). Then I went off and did work elsewhere with a lot of I/O. When I came back to the linux build dir, vmlinux was broken, exactly as before. So I now suspect it to be some kind of "while the file is in the pagecache, everything is fine, but when it's read back from disk it's broken". My ext4 fs does have inline_data enabled, which could explain why the corruption happens in the beginning. It's just very odd that it only ever seems to trigger for vmlinux and not other files, but perhaps the I/O patterns that ld and/or sortextable does are exactly what are needed to trigger the bug. I've done a long overdue kernel update, and there are quite a few fs/ext4/ -stable patches in there, so now I'll see if it still happens. And if anything more comes of this, I'll remove the kbuild and ppc lists from cc, sorry for the noise. Rasmus
Re: [PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards
On Tue, Jan 28, 2020 at 08:58:29AM +0100, Christian Zigotzky wrote: > Hi All, > > Which mailing list is responsible for the pata_pcmcia driver? We are using > new SanDisk High (>8G) CF cards with this driver [1] and we need the > following line in the file "drivers/ata/pata_pcmcia.c". > > + PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101), /* SanDisk High (>8G) > CFA */ Please send a formal patch for it to linux-...@vger.kernel.org
[PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards
Hi All, Which mailing list is responsible for the pata_pcmcia driver? We are using new SanDisk High (>8G) CF cards with this driver [1] and we need the following line in the file "drivers/ata/pata_pcmcia.c". + PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101), /* SanDisk High (>8G) CFA */ Thanks, Christian [1] https://forum.hyperion-entertainment.com/viewtopic.php?f=35&t=4282