[PATCH v2] selftests/powerpc: Fix L1D flushing tests for Power10
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1 perf event to count L1D misses. The value of this event has changed over time: - Power7 uses 0x400f0 - Power8 and Power9 use both 0x400f0 and 0x3e054 - Power10 uses only 0x3e054 Rather than relying on raw values, configure perf to count L1D read misses in the most explicit way available. This fixes the selftests to work on systems without 0x400f0 as PM_LD_MISS_L1, and should change no behaviour for systems that the tests already worked on. The only potential downside is that referring to a specific perf event requires PMU support implemented in the kernel for that platform. Signed-off-by: Russell Currey --- v2: Move away from raw events as suggested by mpe tools/testing/selftests/powerpc/security/entry_flush.c | 2 +- tools/testing/selftests/powerpc/security/flush_utils.h | 4 tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c index 78cf914fa321..68ce377b205e 100644 --- a/tools/testing/selftests/powerpc/security/entry_flush.c +++ b/tools/testing/selftests/powerpc/security/entry_flush.c @@ -53,7 +53,7 @@ int entry_flush_test(void) entry_flush = entry_flush_orig; - fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1); + fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1); FAIL_IF(fd < 0); p = (char *)memalign(zero_size, CACHELINE_SIZE); diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h index 07a5eb301466..7a3d60292916 100644 --- a/tools/testing/selftests/powerpc/security/flush_utils.h +++ b/tools/testing/selftests/powerpc/security/flush_utils.h @@ -9,6 +9,10 @@ #define CACHELINE_SIZE 128 +#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \ + (PERF_COUNT_HW_CACHE_OP_READ << 8) | \ + (PERF_COUNT_HW_CACHE_RESULT_MISS << 16)) + void syscall_loop(char *p, unsigned long iterations, unsigned long zero_size); diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c index 7565fd786640..f73484a6470f 100644 --- a/tools/testing/selftests/powerpc/security/rfi_flush.c +++ b/tools/testing/selftests/powerpc/security/rfi_flush.c @@ -54,7 +54,7 @@ int rfi_flush_test(void) rfi_flush = rfi_flush_orig; - fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1); + fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1); FAIL_IF(fd < 0); p = (char *)memalign(zero_size, CACHELINE_SIZE); -- 2.30.1
[PATCH] powerpc/perf: Fix handling of privilege level checks in perf interrupt context
Running "perf mem record" in powerpc platforms with selinux enabled resulted in soft lockup's. Below call-trace was seen in the logs: CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2 NIP: c0dff3d4 LR: c0dff3d0 CTR: REGS: c07fffab7d60 TRAP: 0100 Not tainted (5.11.0-rc7+) <<>> NIP [c0dff3d4] _raw_spin_lock_irqsave+0x94/0x120 LR [c0dff3d0] _raw_spin_lock_irqsave+0x90/0x120 Call Trace: [cfd471a0] [cfd47260] 0xcfd47260 (unreliable) [cfd471e0] [c0b5fbbc] skb_queue_tail+0x3c/0x90 [cfd47220] [c0296edc] audit_log_end+0x6c/0x180 [cfd47260] [c06a3f20] common_lsm_audit+0xb0/0xe0 [cfd472a0] [c066c664] slow_avc_audit+0xa4/0x110 [cfd47320] [c066cff4] avc_has_perm+0x1c4/0x260 [cfd47430] [c066e064] selinux_perf_event_open+0x74/0xd0 [cfd47450] [c0669888] security_perf_event_open+0x68/0xc0 [cfd47490] [c013d788] record_and_restart+0x6e8/0x7f0 [cfd476c0] [c013dabc] perf_event_interrupt+0x22c/0x560 [cfd477d0] [c002d0fc] performance_monitor_exception+0x4c/0x60 [cfd477f0] [c000b378] performance_monitor_common_virt+0x1c8/0x1d0 interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120 NIP: c0dff378 LR: c0b5fbbc CTR: c07d47f0 REGS: cfd47860 TRAP: 0f00 Not tainted (5.11.0-rc7+) <<>> NIP [c0dff378] _raw_spin_lock_irqsave+0x38/0x120 LR [c0b5fbbc] skb_queue_tail+0x3c/0x90 interrupt: f00 [cfd47b00] [0038] 0x38 (unreliable) [cfd47b40] [caae6200] 0xcaae6200 [cfd47b80] [c0296edc] audit_log_end+0x6c/0x180 [cfd47bc0] [c029f494] audit_log_exit+0x344/0xf80 [cfd47d10] [c02a2b00] __audit_syscall_exit+0x2c0/0x320 [cfd47d60] [c0032878] do_syscall_trace_leave+0x148/0x200 [cfd47da0] [c003d5b4] syscall_exit_prepare+0x324/0x390 [cfd47e10] [c000d76c] system_call_common+0xfc/0x27c The above trace shows that while the CPU was handling a performance monitor exception, there was a call to "security_perf_event_open" function. In powerpc core-book3s, this function is called from 'perf_allow_kernel' check during recording of data address in the sample via perf_get_data_addr(). Commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks") introduced security enhancements to perf. As part of this commit, the new security hook for perf_event_open was added in all places where perf paranoid check was previously used. In powerpc core-book3s code, originally had paranoid checks in 'perf_get_data_addr' and 'power_pmu_bhrb_read'. So 'perf_paranoid_kernel' checks were replaced with 'perf_allow_kernel' in these pmu helper functions as well. The intention of paranoid checks in core-book3s is to verify privilege access before capturing some of the sample data. Along with paranoid checks, 'perf_allow_kernel' also does a 'security_perf_event_open'. Since these functions are accessed while recording sample, we end up in calling selinux_perf_event_open in PMI context. Some of the security functions use spinlock like sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and if we end up in calling selinux hook functions in PMI handler, this could cause a dead lock. Since the purpose of this security hook is to control access to perf_event_open, it is not right to call this in interrupt context. But in case of powerpc PMU, we need the privilege checks for specific samples from branch history ring buffer and sampling register values. Reference commits: Commit cd1231d7035f ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") As a fix, patch caches 'perf_allow_kernel' value in event_init in 'pmu_private' field of perf_event. The cached value is used in the PMI code path. Suggested-by: Michael Ellerman Signed-off-by: Athira Rajeev --- arch/powerpc/perf/core-book3s.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 4b4319d8..9e9f67f 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -189,6 +189,11 @@ static inline unsigned long perf_ip_adjust(struct pt_regs *regs) return 0; } +static bool event_allow_kernel(struct perf_event *event) +{ + return (bool)event->pmu_private; +} + /* * The user wants a data address recorded. * If we're not doing instruction sampling, give them the SDAR @@ -222,7 +227,7 @@ static inline void perf_get_data_addr(struct perf_event *event, struct pt_regs * if (!(mmcra & MMCRA_SAMPLE_ENABLE) || sdar_valid) *addrp = mfspr(SPRN_SDAR); - if (is_kernel_addr(mfspr(SPRN_SDAR)) &&
Re: [PATCH kernel] powerpc/iommu: Annotate nested lock for lockdep
On 18/02/2021 23:59, Frederic Barrat wrote: On 16/02/2021 04:20, Alexey Kardashevskiy wrote: The IOMMU table is divided into pools for concurrent mappings and each pool has a separate spinlock. When taking the ownership of an IOMMU group to pass through a device to a VM, we lock these spinlocks which triggers a false negative warning in lockdep (below). This fixes it by annotating the large pool's spinlock as a nest lock. === WARNING: possible recursive locking detected 5.11.0-le_syzkaller_a+fstn1 #100 Not tainted qemu-system-ppc/4129 is trying to acquire lock: c000119bddb0 (&(p->lock)/1){}-{2:2}, at: iommu_take_ownership+0xac/0x1e0 but task is already holding lock: c000119bdd30 (&(p->lock)/1){}-{2:2}, at: iommu_take_ownership+0xac/0x1e0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(&(p->lock)/1); lock(&(p->lock)/1); === Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/kernel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 557a09dd5b2f..2ee642a6731a 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1089,7 +1089,7 @@ int iommu_take_ownership(struct iommu_table *tbl) spin_lock_irqsave(&tbl->large_pool.lock, flags); for (i = 0; i < tbl->nr_pools; i++) - spin_lock(&tbl->pools[i].lock); + spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock); We have the same pattern and therefore should have the same problem in iommu_release_ownership(). But as I understand, we're hacking our way around lockdep here, since conceptually, those locks are independent. I was wondering why it seems to fix it by worrying only about the large pool lock. This is the other way around - we telling the lockdep not to worry about small pool locks if the nest lock (==large pool lock) is locked. The warning is printed when a nested lock is detected and the lockdep checks if there is a nest for this nested lock at check_deadlock(). That loop can take many locks (up to 4 with current config). However, if the dma window is less than 1GB, we would only have one, so it would make sense for lockdep to stop complaining. Why would it stop if the large pool is always there? Is it what happened? In which case, this patch doesn't really fix it. Or I'm missing something :-) I tried with 1 or 2 small pools, no difference at all. I might also be missing something here too :) Fred iommu_table_release_pages(tbl); -- Alexey
Re: [PATCH v4 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
On Mon, Feb 22, 2021 at 12:16:08PM +0530, Bharata B Rao wrote: > On Wed, Feb 17, 2021 at 11:38:07AM +1100, David Gibson wrote: > > On Mon, Feb 15, 2021 at 12:05:41PM +0530, Bharata B Rao wrote: > > > Implement H_RPT_INVALIDATE hcall and add KVM capability > > > KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same. > > > > > > This hcall does two types of TLB invalidations: > > > > > > 1. Process-scoped invalidations for guests with LPCR[GTSE]=0. > > >This is currently not used in KVM as GTSE is not usually > > >disabled in KVM. > > > 2. Partition-scoped invalidations that an L1 hypervisor does on > > >behalf of an L2 guest. This replaces the uses of the existing > > >hcall H_TLB_INVALIDATE. > > > > > > In order to handle process scoped invalidations of L2, we > > > intercept the nested exit handling code in L0 only to handle > > > H_TLB_INVALIDATE hcall. > > > > > > Signed-off-by: Bharata B Rao > > > --- > > > Documentation/virt/kvm/api.rst | 17 + > > > arch/powerpc/include/asm/kvm_book3s.h | 3 + > > > arch/powerpc/include/asm/mmu_context.h | 11 +++ > > > arch/powerpc/kvm/book3s_hv.c | 91 > > > arch/powerpc/kvm/book3s_hv_nested.c| 96 ++ > > > arch/powerpc/kvm/powerpc.c | 3 + > > > arch/powerpc/mm/book3s64/radix_tlb.c | 25 +++ > > > include/uapi/linux/kvm.h | 1 + > > > 8 files changed, 247 insertions(+) > > > > > > diff --git a/Documentation/virt/kvm/api.rst > > > b/Documentation/virt/kvm/api.rst > > > index 99ceb978c8b0..416c36aa35d4 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -6038,6 +6038,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit > > > notifications which user space > > > can then handle to implement model specific MSR handling and/or user > > > notifications > > > to inform a user that an MSR was not handled. > > > > > > +7.22 KVM_CAP_PPC_RPT_INVALIDATE > > > +-- > > > + > > > +:Capability: KVM_CAP_PPC_RPT_INVALIDATE > > > +:Architectures: ppc > > > +:Type: vm > > > + > > > +This capability indicates that the kernel is capable of handling > > > +H_RPT_INVALIDATE hcall. > > > + > > > +In order to enable the use of H_RPT_INVALIDATE in the guest, > > > +user space might have to advertise it for the guest. For example, > > > +IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is > > > +present in the "ibm,hypertas-functions" device-tree property. > > > + > > > +This capability is always enabled. > > > > I guess that means it's always enabled when it's available - I'm > > pretty sure it won't be enabled on POWER8 or on PR KVM. > > Correct, will reword this and restrict this to POWER9, radix etc > > > > > > + > > > 8. Other capabilities. > > > == > > > > > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > > > b/arch/powerpc/include/asm/kvm_book3s.h > > > index d32ec9ae73bd..0f1c5fa6e8ce 100644 > > > --- a/arch/powerpc/include/asm/kvm_book3s.h > > > +++ b/arch/powerpc/include/asm/kvm_book3s.h > > > @@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, > > > u64 dw1); > > > void kvmhv_release_all_nested(struct kvm *kvm); > > > long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); > > > long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu); > > > +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid, > > > + unsigned long type, unsigned long pg_sizes, > > > + unsigned long start, unsigned long end); > > > int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, > > > u64 time_limit, unsigned long lpcr); > > > void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state > > > *hr); > > > diff --git a/arch/powerpc/include/asm/mmu_context.h > > > b/arch/powerpc/include/asm/mmu_context.h > > > index d5821834dba9..fbf3b5b45fe9 100644 > > > --- a/arch/powerpc/include/asm/mmu_context.h > > > +++ b/arch/powerpc/include/asm/mmu_context.h > > > @@ -124,8 +124,19 @@ static inline bool need_extra_context(struct > > > mm_struct *mm, unsigned long ea) > > > > > > #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && > > > defined(CONFIG_PPC_RADIX_MMU) > > > extern void radix_kvm_prefetch_workaround(struct mm_struct *mm); > > > +void do_h_rpt_invalidate(unsigned long pid, unsigned long lpid, > > > + unsigned long type, unsigned long page_size, > > > + unsigned long psize, unsigned long start, > > > + unsigned long end); > > > #else > > > static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { > > > } > > > +static inline void do_h_rpt_invalidate(unsigned long pid, > > > +unsigned long lpid, > > > +unsigned long type, > > > +unsigned long page_size, > > > +
Re: [PATCH RFC v1 5/6] xen-swiotlb: convert variables to arrays
On Fri, 19 Feb 2021, Konrad Rzeszutek Wilk wrote: > On Sun, Feb 07, 2021 at 04:56:01PM +0100, Christoph Hellwig wrote: > > On Thu, Feb 04, 2021 at 09:40:23AM +0100, Christoph Hellwig wrote: > > > So one thing that has been on my mind for a while: I'd really like > > > to kill the separate dma ops in Xen swiotlb. If we compare xen-swiotlb > > > to swiotlb the main difference seems to be: > > > > > > - additional reasons to bounce I/O vs the plain DMA capable > > > - the possibility to do a hypercall on arm/arm64 > > > - an extra translation layer before doing the phys_to_dma and vice > > >versa > > > - an special memory allocator > > > > > > I wonder if inbetween a few jump labels or other no overhead enablement > > > options and possibly better use of the dma_range_map we could kill > > > off most of swiotlb-xen instead of maintaining all this code duplication? > > > > So I looked at this a bit more. > > > > For x86 with XENFEAT_auto_translated_physmap (how common is that?) > > Juergen, Boris please correct me if I am wrong, but that > XENFEAT_auto_translated_physmap > only works for PVH guests? ARM is always XENFEAT_auto_translated_physmap > > pfn_to_gfn is a nop, so plain phys_to_dma/dma_to_phys do work as-is. > > > > xen_arch_need_swiotlb always returns true for x86, and > > range_straddles_page_boundary should never be true for the > > XENFEAT_auto_translated_physmap case. > > Correct. The kernel should have no clue of what the real MFNs are > for PFNs. On ARM, Linux knows the MFNs because for local pages MFN == PFN and for foreign pages it keeps track in arch/arm/xen/p2m.c. More on this below. xen_arch_need_swiotlb only returns true on ARM in rare situations where bouncing on swiotlb buffers is required. Today it only happens on old versions of Xen that don't support the cache flushing hypercall but there could be more cases in the future. > > > > So as far as I can tell the mapping fast path for the > > XENFEAT_auto_translated_physmap can be trivially reused from swiotlb. > > > > That leaves us with the next more complicated case, x86 or fully cache > > coherent arm{,64} without XENFEAT_auto_translated_physmap. In that case > > we need to patch in a phys_to_dma/dma_to_phys that performs the MFN > > lookup, which could be done using alternatives or jump labels. > > I think if that is done right we should also be able to let that cover > > the foreign pages in is_xen_swiotlb_buffer/is_swiotlb_buffer, but > > in that worst case that would need another alternative / jump label. > > > > For non-coherent arm{,64} we'd also need to use alternatives or jump > > labels to for the cache maintainance ops, but that isn't a hard problem > > either. With the caveat that ARM is always XENFEAT_auto_translated_physmap, what you wrote looks correct. I am writing down a brief explanation on how swiotlb-xen is used on ARM. pfn: address as seen by the guest, pseudo-physical address in ARM terminology mfn (or bfn): real address, physical address in ARM terminology On ARM dom0 is auto_translated (so Xen sets up the stage2 translation in the MMU) and the translation is 1:1. So pfn == mfn for Dom0. However, when another domain shares a page with Dom0, that page is not 1:1. Swiotlb-xen is used to retrieve the mfn for the foreign page at xen_swiotlb_map_page. It does that with xen_phys_to_bus -> pfn_to_bfn. It is implemented with a rbtree in arch/arm/xen/p2m.c. In addition, swiotlb-xen is also used to cache-flush the page via hypercall at xen_swiotlb_unmap_page. That is done because dev_addr is really the mfn at unmap_page and we don't know the pfn for it. We can do pfn-to-mfn but we cannot do mfn-to-pfn (there are good reasons for it unfortunately). The only way to cache-flush by mfn is by issuing a hypercall. The hypercall is implemented in arch/arm/xen/mm.c. The pfn != bfn and pfn_valid() checks are used to detect if the page is local (of dom0) or foreign; they work thanks to the fact that Dom0 is 1:1 mapped. Getting back to what you wrote, yes if we had a way to do MFN lookups in phys_to_dma, and a way to call the hypercall at unmap_page if the page is foreign (e.g. if it fails a pfn_valid check) then I think we would be good from an ARM perspective. The only exception is when xen_arch_need_swiotlb returns true, in which case we need to actually bounce on swiotlb buffers.
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
Rob Herring writes: > On Mon, Feb 22, 2021 at 6:05 AM Michael Ellerman wrote: >> >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA256 >> >> Hi Linus, >> >> Please pull powerpc updates for 5.12. >> >> There will be a conflict with the devicetree tree. It's OK to just take their >> side of the conflict, we'll fix up the minor behaviour change that causes in >> a >> follow-up patch. > > The issues turned out to be worse than just this, so I've dropped the > conflicting change for 5.12. OK, no worries. cheers
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
"Oliver O'Halloran" writes: > On Tue, Feb 23, 2021 at 9:44 AM Linus Torvalds > wrote: >> >> On Mon, Feb 22, 2021 at 4:06 AM Michael Ellerman wrote: >> > >> > Please pull powerpc updates for 5.12. >> >> Pulled. However: >> >> > mode change 100755 => 100644 >> > tools/testing/selftests/powerpc/eeh/eeh-functions.sh >> > create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh >> > create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-unaware.sh >> >> Somebody is being confused. >> >> Why create two new shell scripts with the proper executable bit, and >> then remove the executable bit from an existing one? >> >> That just seems very inconsistent. > > eeh-function.sh just provides some helper functions for the other > scripts and doesn't do anything when executed directly. I thought > making it non-executable made sense. Yeah I think it does make sense. It just looks a bit odd in the diffstat like this. Maybe if we called it lib.sh it would be more obvious? cheers
Re: linux-next: manual merge of the spi tree with the powerpc tree
Hi Stephen, On Fri, 12 Feb 2021 15:31:42 +1100 Stephen Rothwell wrote: > > Hi all, > > Today's linux-next merge of the spi tree got a conflict in: > > drivers/spi/spi-mpc52xx.c > > between commit: > > e10656114d32 ("spi: mpc52xx: Avoid using get_tbl()") > > from the powerpc tree and commit: > > 258ea99fe25a ("spi: spi-mpc52xx: Use new structure for SPI transfer delays") > > from the spi tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > diff --cc drivers/spi/spi-mpc52xx.c > index e6a30f232370,36f941500676.. > --- a/drivers/spi/spi-mpc52xx.c > +++ b/drivers/spi/spi-mpc52xx.c > @@@ -247,8 -247,10 +247,10 @@@ static int mpc52xx_spi_fsmstate_transfe > /* Is the transfer complete? */ > ms->len--; > if (ms->len == 0) { > -ms->timestamp = get_tbl(); > +ms->timestamp = mftb(); > - ms->timestamp += ms->transfer->delay_usecs * tb_ticks_per_usec; > + if (ms->transfer->delay.unit == SPI_DELAY_UNIT_USECS) > + ms->timestamp += ms->transfer->delay.value * > + tb_ticks_per_usec; > ms->state = mpc52xx_spi_fsmstate_wait; > return FSM_CONTINUE; > } This is now a conflict between the powerpc tree and Linus' tree. -- Cheers, Stephen Rothwell pgp3YqnBZzZgW.pgp Description: OpenPGP digital signature
[powerpc:fixes-test] BUILD SUCCESS a5c2f7d40511976f30de38b4374b8da2b39a073c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: a5c2f7d40511976f30de38b4374b8da2b39a073c powerpc/4xx: Fix build errors from mfdcr() elapsed time: 724m configs tested: 101 configs skipped: 88 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig mips mtx1_defconfig mipsmalta_qemu_32r6_defconfig powerpcfsp2_defconfig h8300alldefconfig powerpcicon_defconfig sh se7343_defconfig ia64zx1_defconfig arm shannon_defconfig armmvebu_v5_defconfig powerpc kmeter1_defconfig sh ecovec24_defconfig sh polaris_defconfig powerpc g5_defconfig mips malta_defconfig arcnsim_700_defconfig powerpc akebono_defconfig powerpc ppa8548_defconfig powerpc mpc5200_defconfig arm davinci_all_defconfig mips tb0219_defconfig armkeystone_defconfig sh sh7785lcr_32bit_defconfig powerpc makalu_defconfig armrealview_defconfig powerpc taishan_defconfig arm pxa168_defconfig arm simpad_defconfig mips ci20_defconfig powerpc mpc8560_ads_defconfig powerpc lite5200b_defconfig powerpc ppc6xx_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a001-20210222 x86_64 randconfig-a002-20210222 x86_64 randconfig-a003-20210222 x86_64 randconfig-a005-20210222 x86_64 randconfig-a006-20210222 x86_64 randconfig-a004-20210222 i386 randconfig-a013-20210222 i386 randconfig-a012-20210222 i386 randconfig-a011-20210222 i386 randconfig-a014-20210222 i386 randconfig-a016-20210222 i386 randconfig-a015-20210222 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-a015-20210222 x86_64 randconfig-a011-20210222 x86_64 randconfig-a012-20210222 x86_64 randconfig-a016-20210222 x86_64 randconfig-a014-20210222
[powerpc:merge] BUILD SUCCESS b267c8c58643460da9159ee69f46b3945cfd9de6
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: b267c8c58643460da9159ee69f46b3945cfd9de6 Automatic merge of 'master' into merge (2021-02-22 21:30) elapsed time: 723m configs tested: 123 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig mips mtx1_defconfig mipsmalta_qemu_32r6_defconfig powerpcfsp2_defconfig h8300alldefconfig powerpcicon_defconfig sh se7343_defconfig ia64zx1_defconfig arm shannon_defconfig armmvebu_v5_defconfig powerpc kmeter1_defconfig sh ecovec24_defconfig powerpc redwood_defconfig sh apsh4a3a_defconfig powerpc ppc44x_defconfig sh polaris_defconfig powerpc g5_defconfig mips malta_defconfig arcnsim_700_defconfig sh se7705_defconfig powerpc chrp32_defconfig ia64generic_defconfig powerpc bamboo_defconfig arm simpad_defconfig arc nsimosci_hs_defconfig arm netwinder_defconfig arm spitz_defconfig m68k sun3_defconfig powerpc mpc85xx_cds_defconfig powerpc akebono_defconfig mipsar7_defconfig sparc sparc32_defconfig ia64 gensparse_defconfig powerpc ppa8548_defconfig powerpc mpc5200_defconfig arm davinci_all_defconfig mips tb0219_defconfig armkeystone_defconfig sh sh7785lcr_32bit_defconfig powerpc makalu_defconfig powerpc walnut_defconfig s390 allyesconfig mipsjmr3927_defconfig mips maltasmvp_defconfig m68k multi_defconfig armrealview_defconfig powerpc taishan_defconfig arm pxa168_defconfig mips ci20_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allmodconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a001-20210222 x86_64 randconfig-a002-20210222 x86_64 randconfig-a003-20210222 x86_64 randconfig-a005-20210222 x86_64 randconfig-a006-20210222 x86_64 randconfig-a004-20210222 i386 randconfig-a005-20210222 i386 randconfig-a006-20210222 i386 randconfig-a004-20210222 i386 randconfig-a003-20210222 i386 randconfig-a001-20210222 i386 randconfig-a002-20210222 i386 randconfig-a013-20210222 i386 randconfig-a012-202
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
On Tue, Feb 23, 2021 at 9:44 AM Linus Torvalds wrote: > > On Mon, Feb 22, 2021 at 4:06 AM Michael Ellerman wrote: > > > > Please pull powerpc updates for 5.12. > > Pulled. However: > > > mode change 100755 => 100644 > > tools/testing/selftests/powerpc/eeh/eeh-functions.sh > > create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh > > create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-unaware.sh > > Somebody is being confused. > > Why create two new shell scripts with the proper executable bit, and > then remove the executable bit from an existing one? > > That just seems very inconsistent. eeh-function.sh just provides some helper functions for the other scripts and doesn't do anything when executed directly. I thought making it non-executable made sense. > > Linus
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
On Mon, Feb 22, 2021 at 4:06 AM Michael Ellerman wrote: > > Please pull powerpc updates for 5.12. Pulled. However: > mode change 100755 => 100644 > tools/testing/selftests/powerpc/eeh/eeh-functions.sh > create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-aware.sh > create mode 100755 tools/testing/selftests/powerpc/eeh/eeh-vf-unaware.sh Somebody is being confused. Why create two new shell scripts with the proper executable bit, and then remove the executable bit from an existing one? That just seems very inconsistent. Linus
Re: [PATCH 06/13] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM
Nicholas Piggin writes: > Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM > internal detail that has no real need to be in common handlers. > > Also add a comment explaining why this this thing exists. this this > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kernel/exceptions-64s.S | 60 -- > arch/powerpc/kvm/book3s_64_entry.S | 64 > 2 files changed, 56 insertions(+), 68 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index a1640d6ea65d..96f22c582213 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -133,7 +133,6 @@ name: > #define IBRANCH_TO_COMMON.L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch > to common */ > #define IREALMODE_COMMON .L_IREALMODE_COMMON_\name\() /* Common runs in > realmode */ > #define IMASK.L_IMASK_\name\() /* IRQ soft-mask bit */ > -#define IKVM_SKIP.L_IKVM_SKIP_\name\() /* Generate KVM skip handler */ > #define IKVM_REAL.L_IKVM_REAL_\name\() /* Real entry tests KVM */ > #define __IKVM_REAL(name).L_IKVM_REAL_ ## name > #define IKVM_VIRT.L_IKVM_VIRT_\name\() /* Virt entry tests KVM */ > @@ -191,9 +190,6 @@ do_define_int n > .ifndef IMASK > IMASK=0 > .endif > - .ifndef IKVM_SKIP > - IKVM_SKIP=0 > - .endif > .ifndef IKVM_REAL > IKVM_REAL=0 > .endif > @@ -254,15 +250,10 @@ do_define_int n > .balign IFETCH_ALIGN_BYTES > \name\()_kvm: > > - .if IKVM_SKIP > - cmpwi r10,KVM_GUEST_MODE_SKIP > - beq 89f > - .else > BEGIN_FTR_SECTION > ld r10,IAREA+EX_CFAR(r13) > std r10,HSTATE_CFAR(r13) > END_FTR_SECTION_IFSET(CPU_FTR_CFAR) > - .endif > > ld r10,IAREA+EX_CTR(r13) > mtctr r10 > @@ -289,27 +280,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > ori r12,r12,(IVEC) > .endif > b kvmppc_interrupt > - > - .if IKVM_SKIP > -89: mtocrf 0x80,r9 > - ld r10,IAREA+EX_CTR(r13) > - mtctr r10 > - ld r9,IAREA+EX_R9(r13) > - ld r10,IAREA+EX_R10(r13) > - ld r11,IAREA+EX_R11(r13) > - ld r12,IAREA+EX_R12(r13) > - .if IHSRR_IF_HVMODE > - BEGIN_FTR_SECTION > - b kvmppc_skip_Hinterrupt > - FTR_SECTION_ELSE > - b kvmppc_skip_interrupt > - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > - .elseif IHSRR > - b kvmppc_skip_Hinterrupt > - .else > - b kvmppc_skip_interrupt > - .endif > - .endif > .endm > > #else > @@ -1128,7 +1098,6 @@ INT_DEFINE_BEGIN(machine_check) > ISET_RI=0 > IDAR=1 > IDSISR=1 > - IKVM_SKIP=1 > IKVM_REAL=1 > INT_DEFINE_END(machine_check) > > @@ -1419,7 +1388,6 @@ INT_DEFINE_BEGIN(data_access) > IVEC=0x300 > IDAR=1 > IDSISR=1 > - IKVM_SKIP=1 > IKVM_REAL=1 > INT_DEFINE_END(data_access) > > @@ -1465,7 +1433,6 @@ INT_DEFINE_BEGIN(data_access_slb) > IVEC=0x380 > IRECONCILE=0 > IDAR=1 > - IKVM_SKIP=1 > IKVM_REAL=1 > INT_DEFINE_END(data_access_slb) > > @@ -2111,7 +2078,6 @@ INT_DEFINE_BEGIN(h_data_storage) > IHSRR=1 > IDAR=1 > IDSISR=1 > - IKVM_SKIP=1 > IKVM_REAL=1 > IKVM_VIRT=1 > INT_DEFINE_END(h_data_storage) > @@ -3088,32 +3054,6 @@ EXPORT_SYMBOL(do_uaccess_flush) > MASKED_INTERRUPT > MASKED_INTERRUPT hsrr=1 > > -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER > -kvmppc_skip_interrupt: > - /* > - * Here all GPRs are unchanged from when the interrupt happened > - * except for r13, which is saved in SPRG_SCRATCH0. > - */ > - mfspr r13, SPRN_SRR0 > - addir13, r13, 4 > - mtspr SPRN_SRR0, r13 > - GET_SCRATCH0(r13) > - RFI_TO_KERNEL > - b . > - > -kvmppc_skip_Hinterrupt: > - /* > - * Here all GPRs are unchanged from when the interrupt happened > - * except for r13, which is saved in SPRG_SCRATCH0. > - */ > - mfspr r13, SPRN_HSRR0 > - addir13, r13, 4 > - mtspr SPRN_HSRR0, r13 > - GET_SCRATCH0(r13) > - HRFI_TO_KERNEL > - b . > -#endif > - > /* >* Relocation-on interrupts: A subset of the interrupts can be delivered >* with IR=1/DR=1, if AIL==2 and MSR.HV won't be changed by delivering > diff --git a/arch/powerpc/kvm/book3s_64_entry.S > b/arch/powerpc/kvm/book3s_64_entry.S > index 147ebf1c3c1f..820d103e5f50 100644 > --- a/arch/powerpc/kvm/book3s_64_entry.S > +++ b/arch/powerpc/kvm/book3s_64_entry.S > @@ -1,9 +1,10 @@ > +#include > #include > -#include > +#include > #include > -#include > -#include > #include > +#include > +#include > > /* > * This is branched to from interrupt handlers in exception-64s.S which set > @@ -19,17 +20,64 @@ kvmppc_interrupt: >
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
The pull request you sent on Mon, 22 Feb 2021 23:05:37 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.12-1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/b12b47249688915e987a9a2a393b522f86f6b7ab Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH 03/13] KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR
Nicholas Piggin writes: > Rather than add the ME bit to the MSR when the guest is entered, make > it clear that the hypervisor does not allow the guest to clear the bit. > > The ME addition is kept in the code for now, but a future patch will > warn if it's not present. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_hv_builtin.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c > b/arch/powerpc/kvm/book3s_hv_builtin.c > index dad118760a4e..ae8f291c5c48 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -661,6 +661,13 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu) > > void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) > { > + /* > + * Guest must always run with machine check interrupt > + * enabled. > + */ > + if (!(msr & MSR_ME)) > + msr |= MSR_ME; > + > /* >* Check for illegal transactional state bit combination >* and if we find it, force the TS field to a safe state.
Re: [PATCH 02/13] powerpc/64s: remove KVM SKIP test from instruction breakpoint handler
Nicholas Piggin writes: > The code being executed in KVM_GUEST_MODE_SKIP is hypervisor code with > MSR[IR]=0, so the faults of concern are the d-side ones caused by access > to guest context by the hypervisor. > > Instruction breakpoint interrupts are not a concern here. It's unlikely > any good would come of causing breaks in this code, but skipping the > instruction that caused it won't help matters (e.g., skip the mtmsr that > sets MSR[DR]=0 or clears KVM_GUEST_MODE_SKIP). > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kernel/exceptions-64s.S | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 5d0ad3b38e90..5bc689a546ae 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -2597,7 +2597,6 @@ EXC_VIRT_NONE(0x5200, 0x100) > INT_DEFINE_BEGIN(instruction_breakpoint) > IVEC=0x1300 > #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE > - IKVM_SKIP=1 > IKVM_REAL=1 > #endif > INT_DEFINE_END(instruction_breakpoint)
Re: [PATCH 01/13] powerpc/64s: Remove KVM handler support from CBE_RAS interrupts
Nicholas Piggin writes: > Cell does not support KVM. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kernel/exceptions-64s.S | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 39cbea495154..5d0ad3b38e90 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -2574,8 +2574,6 @@ EXC_VIRT_NONE(0x5100, 0x100) > INT_DEFINE_BEGIN(cbe_system_error) > IVEC=0x1200 > IHSRR=1 > - IKVM_SKIP=1 > - IKVM_REAL=1 > INT_DEFINE_END(cbe_system_error) > > EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100) > @@ -2745,8 +2743,6 @@ EXC_COMMON_BEGIN(denorm_exception_common) > INT_DEFINE_BEGIN(cbe_maintenance) > IVEC=0x1600 > IHSRR=1 > - IKVM_SKIP=1 > - IKVM_REAL=1 > INT_DEFINE_END(cbe_maintenance) > > EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100) > @@ -2798,8 +2794,6 @@ EXC_COMMON_BEGIN(altivec_assist_common) > INT_DEFINE_BEGIN(cbe_thermal) > IVEC=0x1800 > IHSRR=1 > - IKVM_SKIP=1 > - IKVM_REAL=1 > INT_DEFINE_END(cbe_thermal) > > EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100)
Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
On Mon, 2021-02-22 at 16:24 +1100, Alexey Kardashevskiy wrote: > > On 18/02/2021 06:32, Leonardo Bras wrote: > > On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote: > > > Most platforms allocate IOMMU table structures (specifically it_map) > > > at the boot time and when this fails - it is a valid reason for panic(). > > > > > > However the powernv platform allocates it_map after a device is returned > > > to the host OS after being passed through and this happens long after > > > the host OS booted. It is quite possible to trigger the it_map allocation > > > panic() and kill the host even though it is not necessary - the host OS > > > can still use the DMA bypass mode (requires a tiny fraction of it_map's > > > memory) and even if that fails, the host OS is runnnable as it was without > > > the device for which allocating it_map causes the panic. > > > > > > Instead of immediately crashing in a powernv/ioda2 system, this prints > > > an error and continues. All other platforms still call panic(). > > > > > > Signed-off-by: Alexey Kardashevskiy > > > > Hello Alexey, > > > > This looks like a good change, that passes panic() decision to platform > > code. Everything looks pretty straightforward, but I have a question > > regarding this: > > > > > @@ -1930,16 +1931,16 @@ static long > > > pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > > > res_start = pe->phb->ioda.m32_pci_base >> > > > tbl->it_page_shift; > > > res_end = min(window_size, SZ_4G) >> tbl->it_page_shift; > > > } > > > - iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end); > > > - rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl); > > > > > > + if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end)) > > > + rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl); > > > + else > > > + rc = -ENOMEM; > > > if (rc) { > > > - pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", > > > - rc); > > > + pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", > > > rc); > > > iommu_tce_table_put(tbl); > > > - return rc; > > > + tbl = NULL; /* This clears iommu_table_base below */ > > > } > > > - > > > if (!pnv_iommu_bypass_disabled) > > > pnv_pci_ioda2_set_bypass(pe, true); > > > > > > > > > > > > > > > > > > > If I could understand correctly, previously if iommu_init_table() did > > not panic(), and pnv_pci_ioda2_set_window() returned something other > > than 0, it would return rc in the if (rc) clause, but now it does not > > happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards. > > > > Is that desired? > > > Yes. A PE (==device, pretty much) has 2 DMA windows: > - the default one which requires some RAM to operate > - a bypass mode which tells the hardware that PCI addresses are > statically mapped to RAM 1:1. > > This bypass mode does not require extra memory to work and is used in > the most cases on the bare metal as long as the device supports 64bit > DMA which is everything except GPUs. Since it is cheap to enable and > this what we prefer anyway, no urge to fail. > > > > As far as I could see, returning rc there seems a good procedure after > > iommu_init_table returning -ENOMEM. > > This change is intentional and yes it could be done by a separate patch > but I figured there is no that much value in splitting. Ok then, thanks for clarifying. FWIW: Reviewed-by: Leonardo Bras
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
On Mon, Feb 22, 2021 at 6:05 AM Michael Ellerman wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Hi Linus, > > Please pull powerpc updates for 5.12. > > There will be a conflict with the devicetree tree. It's OK to just take their > side of the conflict, we'll fix up the minor behaviour change that causes in a > follow-up patch. The issues turned out to be worse than just this, so I've dropped the conflicting change for 5.12. Rob
[GIT PULL] Please pull powerpc/linux.git powerpc-5.12-1 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull powerpc updates for 5.12. There will be a conflict with the devicetree tree. It's OK to just take their side of the conflict, we'll fix up the minor behaviour change that causes in a follow-up patch. There's also a trivial conflict with the spi tree. cheers The following changes since commit e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62: Linux 5.11-rc2 (2021-01-03 15:55:30 -0800) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.12-1 for you to fetch changes up to 82d2c16b350f72aa21ac2a6860c542aa4b43a51e: powerpc/perf: Adds support for programming of Thresholding in P10 (2021-02-11 23:35:36 +1100) - -- powerpc updates for 5.12 A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. Conversion of the 32-bit syscall handling into C. A series from Nick to streamline our TLB flushing when using the Radix MMU. Switch to using queued spinlocks by default for 64-bit server CPUs. A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng Yongjun. - -- Alexey Kardashevskiy (3): powerpc/iommu/debug: Add debugfs entries for IOMMU tables powerpc/uaccess: Avoid might_fault() when user access is enabled powerpc/kuap: Restore AMR after replaying soft interrupts Ananth N Mavinakayanahalli (2): powerpc/sstep: Check instruction validity against ISA version before emulation powerpc/sstep: Fix incorrect return from analyze_instr() Aneesh Kumar K.V (3): powerpc/mm: Enable compound page check for both THP and HugeTLB powerpc/mm: Add PG_dcache_clean to indicate dcache clean state powerpc/mm: Remove dcache flush from memory remove. Athira Rajeev (3): powerpc/perf: Include PMCs as part of per-cpu cpuhw_events struct powerpc/perf: Expose Performance Monitor Counter SPR's as part of extended regs powerpc/perf: Record counter overflow always if SAMPLE_IP is unset Bhaskar Chowdhury (1): powerpc/44x: Fix a spelling mismach to mismatch in head_44x.S Chengyang Fan (1): powerpc: remove unneeded semicolons Christophe Leroy (38): powerpc/kvm: Force selection of CONFIG_PPC_FPU powerpc/47x: Disable 256k page size powerpc/44x: Remove STDBINUTILS kconfig option powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selected powerpc/xmon: Enable breakpoints on 8xx powerpc/xmon: Select CONSOLE_POLL for the 8xx powerpc/32s: move DABR match out of handle_page_fault powerpc/8xx: Fix software emulation interrupt powerpc/uaccess: Perform barrier_nospec() in KUAP allowance helpers powerpc/32s: Change mfsrin() into a static inline function powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr() powerpc/32s: Allow constant folding in mtsr()/mfsr() powerpc/32: Preserve cr1 in exception prolog stack check to fix build error powerpc/32s: Add missing call to kuep_lock on syscall entry powerpc/32: Always enable data translation on syscall entry powerpc/32: On syscall entry, enable instruction translation at the same time as data powerpc/32: Reorder instructions to avoid using CTR in syscall entry powerpc/irq: Add helper to set regs->softe powerpc/irq: Rework helpers that manipulate MSR[EE/RI] powerpc/irq: Add stub irq_soft_mask_return() for PPC32 powerpc/syscall: Rename syscall_64.c into interrupt.c powerpc/syscall: Make interrupt.c buildable on PPC32 powerpc/syscall: Use is_compat_task() powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Change condition to check MSR_RI powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/32: Remove the counter in global_dbcr0 powerpc/syscall: Do not check unsupported scv vector on PPC32
Re: [PATCH kernel] powerpc/iommu: Annotate nested lock for lockdep
On 20/02/2021 14:49, Alexey Kardashevskiy wrote: On 18/02/2021 23:59, Frederic Barrat wrote: On 16/02/2021 04:20, Alexey Kardashevskiy wrote: The IOMMU table is divided into pools for concurrent mappings and each pool has a separate spinlock. When taking the ownership of an IOMMU group to pass through a device to a VM, we lock these spinlocks which triggers a false negative warning in lockdep (below). This fixes it by annotating the large pool's spinlock as a nest lock. === WARNING: possible recursive locking detected 5.11.0-le_syzkaller_a+fstn1 #100 Not tainted qemu-system-ppc/4129 is trying to acquire lock: c000119bddb0 (&(p->lock)/1){}-{2:2}, at: iommu_take_ownership+0xac/0x1e0 but task is already holding lock: c000119bdd30 (&(p->lock)/1){}-{2:2}, at: iommu_take_ownership+0xac/0x1e0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(&(p->lock)/1); lock(&(p->lock)/1); === Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/kernel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 557a09dd5b2f..2ee642a6731a 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1089,7 +1089,7 @@ int iommu_take_ownership(struct iommu_table *tbl) spin_lock_irqsave(&tbl->large_pool.lock, flags); for (i = 0; i < tbl->nr_pools; i++) - spin_lock(&tbl->pools[i].lock); + spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock); We have the same pattern and therefore should have the same problem in iommu_release_ownership(). But as I understand, we're hacking our way around lockdep here, since conceptually, those locks are independent. I was wondering why it seems to fix it by worrying only about the large pool lock. That loop can take many locks (up to 4 with current config). However, if the dma window is less than 1GB, we would only have one, so it would make sense for lockdep to stop complaining. Is it what happened? In which case, this patch doesn't really fix it. Or I'm missing something :-) My rough undestanding is that when spin_lock_nest_lock is called first time, it does some magic with lockdep classes somewhere in __lock_acquire()/register_lock_class() and right after that the nested lock is not the same as before and it is annotated so we cannot lock nested locks without locking the nest lock first and no (re)annotation is needed. I'll try to poke this code once again and see, it is just was easier with p9/nested which is gone for now because of little snow in one of the southern states :) Turns out I have good imagination and in fact it does print this huge warning in the release hook as well so v2 is coming. Thanks, Fred iommu_table_release_pages(tbl); -- Alexey
[PATCH] ibmveth: Switch to using the new API kobj_to_dev()
fixed the following coccicheck: ./drivers/net/ethernet/ibm/ibmveth.c:1805:51-52: WARNING opportunity for kobj_to_dev() Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/net/ethernet/ibm/ibmveth.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index c3ec9ce..6e9572c 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1801,8 +1801,7 @@ static ssize_t veth_pool_store(struct kobject *kobj, struct attribute *attr, struct ibmveth_buff_pool *pool = container_of(kobj, struct ibmveth_buff_pool, kobj); - struct net_device *netdev = dev_get_drvdata( - container_of(kobj->parent, struct device, kobj)); + struct net_device *netdev = dev_get_drvdata(kobj_to_dev(kobj->parent)); struct ibmveth_adapter *adapter = netdev_priv(netdev); long value = simple_strtol(buf, NULL, 10); long rc; -- 1.8.3.1