Re: [PATCH -next] irq-chip/gic-v3-its: Fixed an issue where the ITS executes the residual commands in the queue again when the ITS wakes up from sleep mode.
[dropping Jason, whose email address has been bouncing for weeks now] On 2020-11-07 10:42, Xu Qiang wrote: On my platform, ITS_FLAGS_SAVE_SUSPEND_STATE is not set,thus do nothing Which platform? in its suspend and resuse function.On the other hand,firmware stores GITS_CTRL,GITS_CBASER,GITS_CWRITER and GITS_BASER in the suspend, and restores these registers in the resume. As a result, the ITS executes the residual commands in the queue. Which firmware are you using? I just had a look at the trusted firmware source code, and while it definitely does something that *looks* like what you are describing, it doesn't re-enable the ITS on resume. So what are you running? Memory corruption may occur in the following scenarios: The kernel sends three commands in the following sequence: 1.mapd(deviceA, ITT_addr1, valid:1) 2.mapti(deviceA):ITS write ITT_addr1 memory; 3.mapd(deviceA, ITT_addr1, valid:0) and kfree(ITT_addr1); The ITS doesn't 'kfree' stuff. 4.mapd(deviceA, ITT_addr2, valid:1); 5.mapti(deviceA):ITS write ITT_addr2 memory; I don't think this example is relevant. The core of the problem is that the ITS gets re-enabled by your firmware. What are the affected systems? To solve this problem,dropping the checks for ITS_FLAGS_SAVE_SUSPEND_STATE. Signed-off-by: Xu Qiang --- drivers/irqchip/irq-gic-v3-its.c | 13 - 1 file changed, 13 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 0fec31931e11..06f2c1c252b9 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -42,7 +42,6 @@ #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0) #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1) #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2) -#define ITS_FLAGS_SAVE_SUSPEND_STATE (1ULL << 3) #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING(1 << 0) #define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1) @@ -4741,9 +4740,6 @@ static int its_save_disable(void) list_for_each_entry(its, _nodes, entry) { void __iomem *base; - if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE)) - continue; - base = its->base; its->ctlr_save = readl_relaxed(base + GITS_CTLR); err = its_force_quiescent(base); @@ -4762,9 +4758,6 @@ static int its_save_disable(void) list_for_each_entry_continue_reverse(its, _nodes, entry) { void __iomem *base; - if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE)) - continue; - base = its->base; writel_relaxed(its->ctlr_save, base + GITS_CTLR); } @@ -4784,9 +4777,6 @@ static void its_restore_enable(void) void __iomem *base; int i; - if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE)) - continue; - base = its->base; /* @@ -5074,9 +5064,6 @@ static int __init its_probe_one(struct resource *res, ctlr |= GITS_CTLR_ImDe; writel_relaxed(ctlr, its->base + GITS_CTLR); - if (GITS_TYPER_HCC(typer)) - its->flags |= ITS_FLAGS_SAVE_SUSPEND_STATE; - err = its_init_domain(handle, its); if (err) goto out_free_tables; I'm OK with the patch itself, but I don't want to paper over broken firmware. I'll get TF-A fixed one way or another, but I want to be sure yours is too. If firmware does its job correctly, we shouldn't have to do all of this. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] KVM: arm64: Fix build error in user_mem_abort()
On Tue, 3 Nov 2020 11:30:09 +1100, Gavin Shan wrote: > The PUD and PMD are folded into PGD when the following options are > enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail > to build with the indicated errors: > >CONFIG_ARM64_VA_BITS_42=y >CONFIG_ARM64_PAGE_SHIFT=16 >CONFIG_PGTABLE_LEVELS=3 > > [...] Applied to next, thanks! [1/1] KVM: arm64: Fix build error in user_mem_abort() commit: faf000397e7f103df9953a312e1df21df1dc797f Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 3/5] ARM: implement support for SMCCC TRNG entropy source
On 2020-11-06 15:30, Ard Biesheuvel wrote: On Fri, 6 Nov 2020 at 16:30, Marc Zyngier wrote: [...] I don't think this cast is safe. At least not on 64bit. True, but this is arch/arm I think the glasses theme becomes recurrent. Apologies for the noise. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2 3/5] ARM: implement support for SMCCC TRNG entropy source
On 2020-11-05 12:56, Andre Przywara wrote: From: Ard Biesheuvel Implement arch_get_random_seed_*() for ARM based on the firmware or hypervisor provided entropy source described in ARM DEN0098. This will make the kernel's random number generator consume entropy provided by this interface, at early boot, and periodically at runtime when reseeding. Cc: Linus Walleij Cc: Russell King Signed-off-by: Ard Biesheuvel [Andre: rework to be initialised by the SMCCC firmware driver] Signed-off-by: Andre Przywara --- arch/arm/Kconfig | 4 ++ arch/arm/include/asm/archrandom.h | 64 +++ 2 files changed, 68 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index fe2f17eb2b50..06fda4f954fd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1667,6 +1667,10 @@ config STACKPROTECTOR_PER_TASK Enable this option to switch to a different method that uses a different canary value for each task. +config ARCH_RANDOM + def_bool y + depends on HAVE_ARM_SMCCC + endmenu menu "Boot options" diff --git a/arch/arm/include/asm/archrandom.h b/arch/arm/include/asm/archrandom.h index a8e84ca5c2ee..f3e96a5b65f8 100644 --- a/arch/arm/include/asm/archrandom.h +++ b/arch/arm/include/asm/archrandom.h @@ -2,9 +2,73 @@ #ifndef _ASM_ARCHRANDOM_H #define _ASM_ARCHRANDOM_H +#ifdef CONFIG_ARCH_RANDOM + +#include +#include + +#define ARM_SMCCC_TRNG_MIN_VERSION 0x1UL + +extern bool smccc_trng_available; + +static inline bool __init smccc_probe_trng(void) +{ + struct arm_smccc_res res; + + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_VERSION, ); + if ((s32)res.a0 < 0) + return false; + if (res.a0 >= ARM_SMCCC_TRNG_MIN_VERSION) { + /* double check that the 32-bit flavor is available */ + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_FEATURES, +ARM_SMCCC_TRNG_RND32, +); + if ((s32)res.a0 >= 0) + return true; + } + + return false; +} + +static inline bool __must_check arch_get_random_long(unsigned long *v) +{ + return false; +} + +static inline bool __must_check arch_get_random_int(unsigned int *v) +{ + return false; +} + +static inline bool __must_check arch_get_random_seed_long(unsigned long *v) +{ + struct arm_smccc_res res; + + if (smccc_trng_available) { + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND32, 8 * sizeof(*v), ); + + if (res.a0 != 0) + return false; + + *v = res.a3; + return true; + } + + return false; +} + +static inline bool __must_check arch_get_random_seed_int(unsigned int *v) +{ + return arch_get_random_seed_long((unsigned long *)v); I don't think this cast is safe. At least not on 64bit. M. -- Who you jivin' with that Cosmik Debris?
Re: [RFC PATCH 00/26] kvm: arm64: Always-on nVHE hypervisor
On 2020-11-04 18:36, David Brazdil wrote: As we progress towards being able to keep guest state private to the host running nVHE hypervisor, this series allows the hypervisor to install itself on newly booted CPUs before the host is allowed to run on them. To this end, the hypervisor starts trapping host SMCs and intercepting host's PSCI CPU_ON/OFF/SUSPEND calls. It replaces the host's entry point with its own, initializes the EL2 state of the new CPU and installs the nVHE hyp vector before ERETing to the host's entry point. Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded to EL3. Future changes will need to ensure the safety of all SMCs wrt. private guests. The host is still allowed to reset EL2 back to the stub vector, eg. for hibernation or kexec, but will not disable nVHE when there are no VMs. Tested on Rock Pi 4b. Sending this as an RFC to get feedback on the following decisions: 1) The kernel checks new cores' features against the finalized system capabilities. To avoid the need to move this code/data to EL2, the implementation only allows to boot cores that were online at the time of KVM initialization. 2) Trapping and forwarding SMCs cannot be switched off. This could cause issues eg. if EL3 always returned to EL1. A kernel command line flag may be needed to turn the feature off on such platforms. I'd rather have it the other way around (buy-in rather than turn off). On top of the potential issue with stupid EL3s, there is the issue that PSCI is optional, and that protected VMs won't be able to work without it. Another related thing is that EL3 itself is optional. Note that this flag shouldn't be specific to PSCI proxying. It should also control Stage-2 wrapping, and the whole pKVM. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: Aw: Re: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-05 23:00, Thomas Gleixner wrote: On Thu, Nov 05 2020 at 09:20, Marc Zyngier wrote: On 2020-11-04 23:14, Thomas Gleixner wrote: /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, If that's the direction of travel, we also need something like this for configuration where the host bridge relies on an external MSI block that uses MSI domains (boot-tested in a GICv3 guest). Some more context would be helpful. Brain fails to decode the logic here. OK, let me try again. The MSI controller, which is the thing that deals with MSIs in the system (GICv2m, GICv3-ITS, and a number of others), is optional, is not part of the host bridge (it has nothing to do with PCI at all), and the bridge driver has absolutely no idea whether: - there is something that provides MSI or not - that something has successfully been initialised or not (which translates into an MSI domain being present or not) This is the case for most ARM systems, and all KVM/arm guests. Booting a VM without MSIs is absolutely trivial, and actually makes sense for some of the smaller guests. In these conditions, your no_msi attribute doesn't work as is: we can't decide on its value at probe time without extracting all of the OF/ACPI logic that deals with MSI domains from the core code, and making it available to the host bridge drivers for systems that follow that model. Using the flow you insist on requires parsing the topology twice: - once to find out whether there is actually a MSI provider registered for the host bridge in order to set the no_msi flag - once to actually be able to store the domain into the pci_bus structure, as it isn't available at probe time. My last suggestion is to indicate to the core code that there is a *possible* MSI controller available in the form of a MSI domain. This is still suboptimal compared to checking the presence an MSI domain in core code (my initial suggestion), but the fallback stuff gets in the way (though I still think it can be made to work). Anyway, this was my last attempt at addressing the problem. Most people won't see it. The couple of drivers that require the fallback hack are usually selected in distro kernels, and do a good job hiding the error. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2 3/5] gpio: msc313: MStar MSC313 GPIO driver
On 2020-11-05 15:23, Daniel Palmer wrote: Hi Marc, On Thu, 5 Nov 2020 at 21:08, Marc Zyngier wrote: On 2020-11-05 09:40, Linus Walleij wrote: > On Mon, Oct 19, 2020 at 4:10 PM Daniel Palmer wrote: [...] >> +/* The parent interrupt controller needs the GIC interrupt type set >> to GIC_SPI >> + * so we need to provide the fwspec. Essentially >> gpiochip_populate_parent_fwspec_twocell >> + * that puts GIC_SPI into the first cell. >> + */ nit: comment style. I've fixed these and some other bits for the v3. I've held off on pushing that until the rest of it seemed right. >> +static void *msc313_gpio_populate_parent_fwspec(struct gpio_chip *gc, >> +unsigned int >> parent_hwirq, >> +unsigned int parent_type) >> +{ >> + struct irq_fwspec *fwspec; >> + >> + fwspec = kmalloc(sizeof(*fwspec), GFP_KERNEL); >> + if (!fwspec) >> + return NULL; >> + >> + fwspec->fwnode = gc->irq.parent_domain->fwnode; >> + fwspec->param_count = 3; >> + fwspec->param[0] = GIC_SPI; >> + fwspec->param[1] = parent_hwirq; >> + fwspec->param[2] = parent_type; >> + >> + return fwspec; >> +} > > Clever. Looping in Marc Z so he can say if this looks allright to him. Yup, this looks correct. However, looking at the bit of the patch that isn't quoted here, I see that msc313_gpio_irqchip doesn't have a .irq_set_affinity callback. Is this system UP only? What is in mainline right now is UP only but there are chips with a second cortex A7 that I have working in my tree. So I will add that in for v3 if I can work out what I should actually do there. :) Probably nothing more than setting the callback to irq_chip_set_affinity_parent, I'd expect. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2 5/5] KVM: arm64: implement the TRNG hypervisor call
On 2020-11-05 12:56, Andre Przywara wrote: From: Ard Biesheuvel Provide a hypervisor implementation of the ARM architected TRNG firmware interface described in ARM spec DEN0098. All function IDs are implemented, including both 32-bit and 64-bit versions of the TRNG_RND service, which is the centerpiece of the API. The API is backed by arch_get_unsigned_seed_long(), which is implemented in terms of RNDRRS currently, and will be alternatively backed by a SMC call to the secure firmware using same interface after a future patch. If neither are available, the kernel's entropy pool is used instead. Signed-off-by: Ard Biesheuvel Signed-off-by: Andre Przywara --- arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/hypercalls.c | 6 ++ arch/arm64/kvm/trng.c | 91 +++ 4 files changed, 100 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kvm/trng.c diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 781d029b8aa8..615932bacf76 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -652,4 +652,6 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu); #define kvm_arm_vcpu_sve_finalized(vcpu) \ ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED) +int kvm_trng_call(struct kvm_vcpu *vcpu); + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 1504c81fbf5d..a510037e3270 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \ inject_fault.o regmap.o va_layout.o handle_exit.o \ guest.o debug.o reset.o sys_regs.o \ vgic-sys-reg-v3.o fpsimd.o pmu.o \ -aarch32.o arch_timer.o \ +aarch32.o arch_timer.o trng.o \ vgic/vgic.o vgic/vgic-init.o \ vgic/vgic-irqfd.o vgic/vgic-v2.o \ vgic/vgic-v3.o vgic/vgic-v4.o \ diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 25ea4ecb6449..ead21b98b620 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -71,6 +71,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) if (gpa != GPA_INVALID) val = gpa; break; + case ARM_SMCCC_TRNG_VERSION: + case ARM_SMCCC_TRNG_FEATURES: + case ARM_SMCCC_TRNG_GET_UUID: + case ARM_SMCCC_TRNG_RND32: + case ARM_SMCCC_TRNG_RND64: + return kvm_trng_call(vcpu); default: return kvm_psci_call(vcpu); } diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c new file mode 100644 index ..5a27b2d99977 --- /dev/null +++ b/arch/arm64/kvm/trng.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2020 Arm Ltd. + +#include +#include + +#include + +#include + +#define ARM_SMCCC_TRNG_VERSION_1_0 0x1UL + +#define TRNG_SUCCESS 0UL SMCCC_RET_SUCCESS +#define TRNG_NOT_SUPPORTED ((unsigned long)-1) SMCCC_RET_NOT_SUPPORTED +#define TRNG_INVALID_PARAMETER ((unsigned long)-2) *crap*. Why isn't that the same value as SMCCC_RET_INVALID_PARAMETER? Is it too late to fix the spec? +#define TRNG_NO_ENTROPY((unsigned long)-3) + +#define MAX_BITS32 96 +#define MAX_BITS64 192 Nothing seems to be using these definitions. + +static const uuid_t arm_smc_trng_uuid __aligned(4) = UUID_INIT( + 0x023534a2, 0xe0bc, 0x45ec, 0x95, 0xdd, 0x33, 0x34, 0xc1, 0xcc, 0x31, 0x89); I object to the lack of Easter egg in this UUID. Or at least one I can understand. ;-) + +static int kvm_trng_do_rnd(struct kvm_vcpu *vcpu, int size) +{ + u32 num_bits = smccc_get_arg1(vcpu); + u64 bits[3]; + int i; + + if (num_bits > 3 * size) { + smccc_set_retval(vcpu, TRNG_INVALID_PARAMETER, 0, 0, 0); + return 1; + } + + /* get as many bits as we need to fulfil the request */ + for (i = 0; i < DIV_ROUND_UP(num_bits, 64); i++) + /* use the arm64 specific backend directly if one exists */ + if (!arch_get_random_seed_long((unsigned long *)[i])) + bits[i] = get_random_long(); + + if (num_bits % 64) + bits[i - 1] &= U64_MAX >> (64 - (num_bits % 64)); + + while (i < ARRAY_SIZE(bits)) + bits[i++] = 0; I just wasted 3 minutes trying to understand what this was doing, only to realise this is clearing the [MAX_BITS:num_bits] range. How about using a bitmap instead? It would result in the exact same data structure, only much more readable (and no u64->unsigned long casts). + + if (size == 32) + smccc_set_retval(vcpu, TRNG_SUCCESS, lower_32_bits(bits[1]), +
Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source
On 2020-11-05 14:34, Ard Biesheuvel wrote: On Thu, 5 Nov 2020 at 15:30, Mark Rutland wrote: On Thu, Nov 05, 2020 at 03:04:57PM +0100, Ard Biesheuvel wrote: > On Thu, 5 Nov 2020 at 15:03, Mark Rutland wrote: > > On Thu, Nov 05, 2020 at 01:41:42PM +, Mark Brown wrote: > > > On Thu, Nov 05, 2020 at 12:56:55PM +, Andre Przywara wrote: > > That said, I'm not sure it's great to plumb this under the > > arch_get_random*() interfaces, e.g. given this measn that > > add_interrupt_randomness() will end up trapping to the host all the time > > when it calls arch_get_random_seed_long(). > > As it turns out, add_interrupt_randomness() isn't actually used on ARM. It's certainly called on arm64, per a warning I just hacked in: [1.083802] [ cut here ] [1.084802] add_interrupt_randomness called [1.085685] WARNING: CPU: 1 PID: 0 at drivers/char/random.c:1267 add_interrupt_randomness+0x2e8/0x318 [1.087599] Modules linked in: [1.088258] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc2-dirty #13 [1.089672] Hardware name: linux,dummy-virt (DT) [1.090659] pstate: 60400085 (nZCv daIf +PAN -UAO -TCO BTYPE=--) [1.091910] pc : add_interrupt_randomness+0x2e8/0x318 [1.092965] lr : add_interrupt_randomness+0x2e8/0x318 [1.094021] sp : 80001000be80 [1.094732] x29: 80001000be80 x28: 2d0c80209840 [1.095859] x27: 137c3e3a x26: 8000100abdd0 [1.096978] x25: 0035 x24: 67918bda8000 [1.098100] x23: c57c31923fe8 x22: fffedc14 [1.099224] x21: 2d0dbef796a0 x20: c57c331d16a0 [1.100339] x19: c57c33720a48 x18: 0010 [1.101459] x17: x16: 0002 [1.102578] x15: 00e7 x14: 80001000bb20 [1.103706] x13: ffea x12: c57c337b56e8 [1.104821] x11: 0003 x10: c57c3379d6a8 [1.105944] x9 : c57c3379d700 x8 : 00017fe8 [1.107073] x7 : c000efff x6 : 0001 [1.108186] x5 : 00057fa8 x4 : [1.109305] x3 : x2 : c57c337455d0 [1.110428] x1 : db8dc9c2a1e0f600 x0 : [1.111552] Call trace: [1.112083] add_interrupt_randomness+0x2e8/0x318 [1.113074] handle_irq_event_percpu+0x48/0x90 [1.114016] handle_irq_event+0x48/0xf8 [1.114826] handle_fasteoi_irq+0xa4/0x130 [1.115689] generic_handle_irq+0x30/0x48 [1.116528] __handle_domain_irq+0x64/0xc0 [1.117392] gic_handle_irq+0xc0/0x138 [1.118194] el1_irq+0xbc/0x180 [1.118870] arch_cpu_idle+0x20/0x30 [1.119630] default_idle_call+0x8c/0x350 [1.120479] do_idle+0x224/0x298 [1.121163] cpu_startup_entry+0x28/0x70 [1.121994] secondary_start_kernel+0x184/0x198 ... and I couldn't immediately spot why 32-bit arm would be different. Hmm, I actually meant both arm64 and ARM. Marc looked into this at my request a while ago, and I had a look myself as well at the time, and IIRC, we both concluded that we don't hit that code path. Darn. Yes, I remember checking this. Obviously, I need a new pair of glasses... M. -- Jazz is not dead. It just smells funny...
Re: [PATCH -next] irq-chip/gic-v3-its: Fixed an issue where the ITS executes the residual commands in the queue again when the ITS wakes up from sleep mode.
On 2020-11-05 14:06, xuqiang (M) wrote: 在 2020/11/5 21:12, Marc Zyngier 写道: Please don't top-post. On 2020-11-05 11:54, xuqiang (M) wrote: The kernel sends three commands in the following sequence: 1.mapd(deviceA, ITT_addr1, valid:1) 2.mapti(deviceA):ITS write ITT_addr1 memory; 3.mapd(deviceA, ITT_addr1, valid:0) and kfree(ITT_addr1); 4.mapd(deviceA, ITT_addr2, valid:1); 5.mapti(deviceA):ITS write ITT_addr2 memory; In this case, the processor enters the sleep mode. After the kernel performs the suspend operation, the firmware performs the store operation and saves GITS_CBASER and GITS_CWRITER registers. Then, the processor is woken up, and the firmware restores GITS_CBASER and GITS_CWRITER registers. Because GITS_CWRITER register is not 0, ITS will read the above command sequence execution from the command queue, causing ITT_addr1 memory to be trampled. This cannot work. By doing a memset on the command queue, you are only feeding crap to the ITS (command 0 simply does not exist). Consider yourself lucky that it doesn't just lock-up. What needs to happen is the restore sequence that is already in the driver, so that the command queue is in a sane state before re-enabling the ITS. M. On my platform, ITS_FLAGS_SAVE_SUSPEND_STATE is not set, thus the first if condition in its_save_disable under list_for_each_entry goes to the continue, however, I want to set the GITS_CWRITER to 0 at the end of list_for_each_entry. Do you have any suggestions about how to do this? That's pretty much dropping the checks for ITS_FLAGS_SAVE_SUSPEND_STATE, isn't it? But I assume your ITS is already enabled by the time you reenter the kernel? If so, I bet your firmware is doing more than just writing to CBASER and CWRITER... M. -- Jazz is not dead. It just smells funny...
Re: [PATCH -next] irq-chip/gic-v3-its: Fixed an issue where the ITS executes the residual commands in the queue again when the ITS wakes up from sleep mode.
Please don't top-post. On 2020-11-05 11:54, xuqiang (M) wrote: The kernel sends three commands in the following sequence: 1.mapd(deviceA, ITT_addr1, valid:1) 2.mapti(deviceA):ITS write ITT_addr1 memory; 3.mapd(deviceA, ITT_addr1, valid:0) and kfree(ITT_addr1); 4.mapd(deviceA, ITT_addr2, valid:1); 5.mapti(deviceA):ITS write ITT_addr2 memory; In this case, the processor enters the sleep mode. After the kernel performs the suspend operation, the firmware performs the store operation and saves GITS_CBASER and GITS_CWRITER registers. Then, the processor is woken up, and the firmware restores GITS_CBASER and GITS_CWRITER registers. Because GITS_CWRITER register is not 0, ITS will read the above command sequence execution from the command queue, causing ITT_addr1 memory to be trampled. This cannot work. By doing a memset on the command queue, you are only feeding crap to the ITS (command 0 simply does not exist). Consider yourself lucky that it doesn't just lock-up. What needs to happen is the restore sequence that is already in the driver, so that the command queue is in a sane state before re-enabling the ITS. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2 3/5] gpio: msc313: MStar MSC313 GPIO driver
On 2020-11-05 09:40, Linus Walleij wrote: On Mon, Oct 19, 2020 at 4:10 PM Daniel Palmer wrote: [...] +/* The parent interrupt controller needs the GIC interrupt type set to GIC_SPI + * so we need to provide the fwspec. Essentially gpiochip_populate_parent_fwspec_twocell + * that puts GIC_SPI into the first cell. + */ nit: comment style. +static void *msc313_gpio_populate_parent_fwspec(struct gpio_chip *gc, +unsigned int parent_hwirq, +unsigned int parent_type) +{ + struct irq_fwspec *fwspec; + + fwspec = kmalloc(sizeof(*fwspec), GFP_KERNEL); + if (!fwspec) + return NULL; + + fwspec->fwnode = gc->irq.parent_domain->fwnode; + fwspec->param_count = 3; + fwspec->param[0] = GIC_SPI; + fwspec->param[1] = parent_hwirq; + fwspec->param[2] = parent_type; + + return fwspec; +} Clever. Looping in Marc Z so he can say if this looks allright to him. Yup, this looks correct. However, looking at the bit of the patch that isn't quoted here, I see that msc313_gpio_irqchip doesn't have a .irq_set_affinity callback. Is this system UP only? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 18/26] kvm: arm64: Intercept PSCI_CPU_OFF host SMC calls
On 2020-11-04 18:36, David Brazdil wrote: Add a handler of the CPU_OFF PSCI host SMC trapped in KVM nVHE hyp code. When invoked, it changes the recorded state of the core to OFF before forwarding the call to EL3. If the call fails, it changes the state back to ON and returns the error to the host. Signed-off-by: David Brazdil --- arch/arm64/kvm/hyp/nvhe/psci.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c index c3d0a6246c66..00dc0cab860c 100644 --- a/arch/arm64/kvm/hyp/nvhe/psci.c +++ b/arch/arm64/kvm/hyp/nvhe/psci.c @@ -13,6 +13,8 @@ #include #include +#include + /* Config options set by the host. */ u32 kvm_host_psci_version = PSCI_VERSION(0, 0); u32 kvm_host_psci_function_id[PSCI_FN_MAX]; @@ -20,6 +22,7 @@ s64 hyp_physvirt_offset; #define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset) +static DEFINE_PER_CPU(hyp_spinlock_t, psci_cpu_lock); DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state); static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt) @@ -76,9 +79,32 @@ static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *ho hyp_panic(); /* unreachable */ } +static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt) +{ + hyp_spinlock_t *cpu_lock = this_cpu_ptr(_cpu_lock); + enum kvm_nvhe_psci_state *cpu_power = this_cpu_ptr(_cpu_state); + u32 power_state = (u32)host_ctxt->regs.regs[1]; + int ret; + + /* Change the recorded state to OFF before forwarding the call. */ + hyp_spin_lock(cpu_lock); + *cpu_power = KVM_NVHE_PSCI_CPU_OFF; + hyp_spin_unlock(cpu_lock); So at this point, another CPU can observe the vcpu being "off", and issue a PSCI_ON, which may result in an "already on". I'm not sure this is an actual issue, but it is worth documenting. What is definitely missing is a rational about *why* we need to track the state of the vcpus. I naively imagined that we could directly proxy the PSCI calls to EL3, only repainting PC for PSCI_ON. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 12/26] kvm: arm64: Add SMC handler in nVHE EL2
On 2020-11-04 18:36, David Brazdil wrote: Add handler of host SMCs in KVM nVHE trap handler. Forward all SMCs to EL3 and propagate the result back to EL1. This is done in preparation for validating host SMCs. Signed-off-by: David Brazdil --- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 36 ++ 1 file changed, 36 insertions(+) diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index 19332c20fcde..fffc2dc09a1f 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -106,6 +106,38 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt) host_ctxt->regs.regs[1] = ret; } +static void skip_host_instruction(void) +{ + write_sysreg_el2(read_sysreg_el2(SYS_ELR) + 4, SYS_ELR); +} + +static void forward_host_smc(struct kvm_cpu_context *host_ctxt) +{ + struct arm_smccc_res res; + + arm_smccc_1_1_smc(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1], + host_ctxt->regs.regs[2], host_ctxt->regs.regs[3], + host_ctxt->regs.regs[4], host_ctxt->regs.regs[5], + host_ctxt->regs.regs[6], host_ctxt->regs.regs[7], + ); + host_ctxt->regs.regs[0] = res.a0; + host_ctxt->regs.regs[1] = res.a1; + host_ctxt->regs.regs[2] = res.a2; + host_ctxt->regs.regs[3] = res.a3; +} + +static void handle_host_smc(struct kvm_cpu_context *host_ctxt) +{ + /* +* Unlike HVC, the return address of an SMC is the instruction's PC. +* Move the return address past the instruction. +*/ + skip_host_instruction(); + + /* Forward SMC not handled in EL2 to EL3. */ + forward_host_smc(host_ctxt); +} + void handle_trap(struct kvm_cpu_context *host_ctxt) { u64 esr = read_sysreg_el2(SYS_ESR); @@ -114,6 +146,10 @@ void handle_trap(struct kvm_cpu_context *host_ctxt) case ESR_ELx_EC_HVC64: handle_host_hcall(host_ctxt); break; + case ESR_ELx_EC_SMC32: How is that even possible? Host EL1 is strictly 64bit, so SMC32 cannot occur. + case ESR_ELx_EC_SMC64: + handle_host_smc(host_ctxt); + break; default: hyp_panic(); } Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 02/26] psci: Export configured PSCI function IDs
On 2020-11-04 18:36, David Brazdil wrote: Function IDs used by PSCI are configurable for v0.1 via DT/APCI. If the host is using PSCI v0.1, KVM's PSCI proxy needs to use the same IDs. Expose the array holding the information. Signed-off-by: David Brazdil --- drivers/firmware/psci/psci.c | 10 +- include/linux/psci.h | 10 ++ 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index ff523bdbfe3f..ffcb88f60e21 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -60,15 +60,7 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned long, unsigned long, unsigned long); static psci_fn *invoke_psci_fn; -enum psci_function { - PSCI_FN_CPU_SUSPEND, - PSCI_FN_CPU_ON, - PSCI_FN_CPU_OFF, - PSCI_FN_MIGRATE, - PSCI_FN_MAX, -}; - -static u32 psci_function_id[PSCI_FN_MAX]; +u32 psci_function_id[PSCI_FN_MAX]; #define PSCI_0_2_POWER_STATE_MASK \ (PSCI_0_2_POWER_STATE_ID_MASK | \ diff --git a/include/linux/psci.h b/include/linux/psci.h index cb35b90d1746..877d844ee6d9 100644 --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -29,6 +29,16 @@ bool psci_has_osi_support(void); */ extern int psci_driver_version; +enum psci_function { + PSCI_FN_CPU_SUSPEND, + PSCI_FN_CPU_ON, + PSCI_FN_CPU_OFF, + PSCI_FN_MIGRATE, + PSCI_FN_MAX, +}; + +extern u32 psci_function_id[PSCI_FN_MAX]; I am very reluctant to expose this array to the rest of the kernel. The temptation becomes huge for people to start writing to it from random drivers in order to route PSCI calls somewhere else. Consider exporting an accessor instead. Same thing for the following patch (there already are a couple of accessors for psci_cpu_suspend_feature, which you could make visible). Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 01/26] psci: Export configured PSCI version
On 2020-11-04 18:36, David Brazdil wrote: The version of PSCI that the kernel should use to communicate with firmware is typically obtained from probing PSCI_VERSION. However, that doesn't work for PSCI v0.1 where the host gets the information from DT/ACPI, or if PSCI is not supported / was disabled. KVM's PSCI proxy for the host needs to be configured with the same version used by the host driver. Expose the PSCI version used by the host. Signed-off-by: David Brazdil --- drivers/firmware/psci/psci.c | 6 ++ include/linux/psci.h | 8 2 files changed, 14 insertions(+) diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index 00af99b6f97c..ff523bdbfe3f 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -49,6 +49,8 @@ static int resident_cpu = -1; struct psci_operations psci_ops; static enum arm_smccc_conduit psci_conduit = SMCCC_CONDUIT_NONE; +int psci_driver_version = PSCI_VERSION(0, 0); + bool psci_tos_resident_on(int cpu) { return cpu == resident_cpu; @@ -461,6 +463,8 @@ static int __init psci_probe(void) return -EINVAL; } + psci_driver_version = ver; + psci_0_2_set_functions(); psci_init_migrate(); @@ -514,6 +518,8 @@ static int __init psci_0_1_init(struct device_node *np) pr_info("Using PSCI v0.1 Function IDs from DT\n"); + psci_driver_version = PSCI_VERSION(0, 1); + if (!of_property_read_u32(np, "cpu_suspend", )) { psci_function_id[PSCI_FN_CPU_SUSPEND] = id; psci_ops.cpu_suspend = psci_cpu_suspend; diff --git a/include/linux/psci.h b/include/linux/psci.h index 2a1bfb890e58..cb35b90d1746 100644 --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -21,6 +21,14 @@ bool psci_power_state_is_valid(u32 state); int psci_set_osi_mode(bool enable); bool psci_has_osi_support(void); +/** + * The version of the PSCI specification followed by the driver. + * This is equivalent to calling PSCI_VERSION except: + * (a) it also works for PSCI v0.1, which does not support PSCI_VERSION, and + * (b) it is set to v0.0 if the PSCI driver was not initialized. + */ +extern int psci_driver_version; + struct psci_operations { u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); How about providing a get_version callback for pre-0.2 implementations instead? This would avoid exposing more symbols (psci_ops is already global). Thanks, M. diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index 00af99b6f97c..b84454e12d92 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -500,6 +500,11 @@ static int __init psci_0_2_init(struct device_node *np) return psci_probe(); } +static u32 psci_0_1_get_version(void) +{ + return PSCI_VERSION(0, 1); +} + /* * PSCI < v0.2 get PSCI Function IDs via DT. */ @@ -514,6 +519,8 @@ static int __init psci_0_1_init(struct device_node *np) pr_info("Using PSCI v0.1 Function IDs from DT\n"); + psci_ops.get_version = psci_0_1_get_version; + if (!of_property_read_u32(np, "cpu_suspend", )) { psci_function_id[PSCI_FN_CPU_SUSPEND] = id; psci_ops.cpu_suspend = psci_cpu_suspend; -- Jazz is not dead. It just smells funny...
Re: [PATCH 4/4] arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
On 2020-11-04 23:22, Konrad Dybcio wrote: QCOM KRYO2XX Silver cores are Cortex-A53 based and are susceptible to the 845719 erratum. Add them to the lookup list to apply the erratum. Signed-off-by: Konrad Dybcio --- arch/arm64/kernel/cpu_errata.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 61314fd70f13..cafaf0da05b7 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -299,6 +299,8 @@ static const struct midr_range erratum_845719_list[] = { MIDR_REV_RANGE(MIDR_CORTEX_A53, 0, 0, 4), /* Brahma-B53 r0p[0] */ MIDR_REV(MIDR_BRAHMA_B53, 0, 0), + /* Kryo2XX Silver rAp4 */ + MIDR_REV(MIDR_QCOM_KRYO_2XX_SILVER, 0xa, 0x4), Is this the only affected version? If this is actually an A53, how do the revisions map between Kryo and Cortex cores? M. -- Jazz is not dead. It just smells funny...
Re: Aw: Re: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-04 23:14, Thomas Gleixner wrote: [...] TBH, that's butt ugly. So after staring long enough into the PCI code I came up with a way to transport that information to the probe code. That allows a particular device to say 'I can't do MSI' and at the same time keeps the warning machinery intact which tells us that a particular host controller driver is broken. Uncompiled and untested as usual :) Thanks, tglx --- drivers/pci/controller/pcie-mediatek.c |4 drivers/pci/probe.c|3 +++ include/linux/pci.h|1 + 3 files changed, 8 insertions(+) --- a/drivers/pci/controller/pcie-mediatek.c +++ b/drivers/pci/controller/pcie-mediatek.c @@ -143,6 +143,7 @@ struct mtk_pcie_port; * struct mtk_pcie_soc - differentiate between host generations * @need_fix_class_id: whether this host's class ID needed to be fixed or not * @need_fix_device_id: whether this host's device ID needed to be fixed or not + * @no_msi: Bridge has no MSI support * @device_id: device ID which this host need to be fixed * @ops: pointer to configuration access functions * @startup: pointer to controller setting functions @@ -151,6 +152,7 @@ struct mtk_pcie_port; struct mtk_pcie_soc { bool need_fix_class_id; bool need_fix_device_id; + bool no_msi; unsigned int device_id; struct pci_ops *ops; int (*startup)(struct mtk_pcie_port *port); @@ -1084,6 +1086,7 @@ static int mtk_pcie_probe(struct platfor host->ops = pcie->soc->ops; host->sysdata = pcie; + host->no_msi = pcie->soc->no_msi; err = pci_host_probe(host); if (err) @@ -1173,6 +1176,7 @@ static const struct dev_pm_ops mtk_pcie_ }; static const struct mtk_pcie_soc mtk_pcie_soc_v1 = { + .no_msi = true, .ops = _pcie_ops, .startup = mtk_pcie_startup_port, }; --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -889,6 +889,9 @@ static int pci_register_host_bridge(stru if (!bus) return -ENOMEM; + if (bridge->no_msi) + bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; + bridge->bus = bus; /* Temporarily move resources off the list */ --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -545,6 +545,7 @@ struct pci_host_bridge { unsigned intnative_dpc:1; /* OS may use PCIe DPC */ unsigned intpreserve_config:1; /* Preserve FW resource setup */ unsigned intsize_windows:1; /* Enable root bus sizing */ + unsigned intno_msi:1; /* Bridge has no MSI support */ /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, If that's the direction of travel, we also need something like this for configuration where the host bridge relies on an external MSI block that uses MSI domains (boot-tested in a GICv3 guest). M. diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c index 6ce34a1deecb..603f6fbbe68a 100644 --- a/drivers/pci/controller/pci-host-common.c +++ b/drivers/pci/controller/pci-host-common.c @@ -77,6 +77,7 @@ int pci_host_common_probe(struct platform_device *pdev) bridge->sysdata = cfg; bridge->ops = (struct pci_ops *)>pci_ops; + bridge->msi_domain = true; platform_set_drvdata(pdev, bridge); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 16fb150fbb8d..f421b2869bca 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -889,9 +889,6 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge) if (!bus) return -ENOMEM; - if (bridge->no_msi) - bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; - bridge->bus = bus; /* Temporarily move resources off the list */ @@ -928,6 +925,9 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge) device_enable_async_suspend(bus->bridge); pci_set_bus_of_node(bus); pci_set_bus_msi_domain(bus); + if (bridge->no_msi || + (bridge->msi_domain && !bus->dev.msi_domain)) + bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; if (!parent) set_dev_node(bus->bridge, pcibus_to_node(bus)); diff --git a/include/linux/pci.h b/include/linux/pci.h index c2a0c1d471d6..81f72fd46e06 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -546,6 +546,7 @@ struct pci_host_bridge { unsigned intpreserve_config:1; /* Preserve FW resource setup */ unsigned intsize_windows:1; /* Enable root bus sizing */ unsigned intno_msi:1; /* Bridge has no MSI support */ + unsigned intmsi_domain:1; /* Bridge wants MSI domain */ /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, -- Jazz is not dead. It just smells funny...
Re: [PATCH -next] irq-chip/gic-v3-its: Fixed an issue where the ITS executes the residual commands in the queue again when the ITS wakes up from sleep mode.
On Tue, 03 Nov 2020 08:11:23 +, Xu Qiang wrote: > > During wakeup, the ATF restore interface restores the values of > the cbaser and cwriter registers. As a result, the ITS executes > the residual commands in the queue, which may cause memory corruption. > > To solve this problem, clear all data in the command queue > in the suspend interface of the ITS driver. > > Signed-off-by: Xu Qiang > --- > drivers/irqchip/irq-gic-v3-its.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/irqchip/irq-gic-v3-its.c > b/drivers/irqchip/irq-gic-v3-its.c > index 0fec31931e11..b8487f78ac21 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -4741,6 +4741,14 @@ static int its_save_disable(void) > list_for_each_entry(its, _nodes, entry) { > void __iomem *base; > > + /* > + * Clear the command queue so that the ITS will not re-execute > + * the remaining commands in the command queue when > + * the cwriter and cbaser registers are restored > + * in the restore interface of the firmware. > + */ > + memset(its->cmd_base, 0, ITS_CMD_QUEUE_SZ); > + > if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE)) > continue; You are wiping the ITS queue before even stopping the ITS. How well is that going to work? What if there is something in flight? I don't understand what you are trying to do here, nor how ATF is involved. So please describe the whole sequence of events, and we'll decide whether that's something we need to fix. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: Aw: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-03 10:31, Thomas Gleixner wrote: On Tue, Nov 03 2020 at 09:54, Marc Zyngier wrote: On 2020-11-02 22:18, Thomas Gleixner wrote: So we really need some other solution and removing the warning is not an option. If MSI is enabled then we want to get a warning when a PCI device has no MSI domain associated. Explicitly expressing the PCIE brigde misfeature of not supporting MSI is way better than silently returning an error code which is swallowed anyway. I don't disagree here, though the PCI_MSI_ARCH_FALLBACKS mechanism makes it more difficult to establish. Only for the few leftovers which implement msi_controller, i.e. drivers/pci/controller/pci-hyperv.c drivers/pci/controller/pci-tegra.c drivers/pci/controller/pcie-rcar-host.c drivers/pci/controller/pcie-xilinx.c The architectures which select PCI_MSI_ARCH_FALLBACKS are: arch/ia64/Kconfig: select PCI_MSI_ARCH_FALLBACKS if PCI_MSI arch/mips/Kconfig: select PCI_MSI_ARCH_FALLBACKS if PCI_MSI arch/powerpc/Kconfig: select PCI_MSI_ARCH_FALLBACKS if PCI_MSI arch/s390/Kconfig: select PCI_MSI_ARCH_FALLBACKS if PCI_MSI arch/sparc/Kconfig: select PCI_MSI_ARCH_FALLBACKS if PCI_MSI implement arch_setup_msi_irq() which makes it magically work :) Whatever the preferred way is via flags at host probe time or flagging it post probe I don't care much as long as it is consistent. Host probe time is going to require some changes in the core PCI api, as everything that checks for a MSI domain is based on the pci_bus structure, which is only allocated much later. Yeah, it's nasty. One possible solution is to add flags or a callback to pci_ops, but it's not pretty either. I think we should go with the 'mark it after pci_host_probe()' hack for 5.10-rc. The real fix will be larger and go into 5.11. Thoughts? We can do that, although I worried that it isn't 100% reliable: pci_host_probe() ends up calling pci_add_device(), and will start probing devices if the endpoint drivers have already registered with the core code, long before the flag gets set. Here's what I've hacked together for a guest that doesn't have any MSI capability: diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c index 6ce34a1deecb..7dd5145cd38d 100644 --- a/drivers/pci/controller/pci-host-common.c +++ b/drivers/pci/controller/pci-host-common.c @@ -55,6 +55,7 @@ int pci_host_common_probe(struct platform_device *pdev) struct pci_host_bridge *bridge; struct pci_config_window *cfg; const struct pci_ecam_ops *ops; + int ret; ops = of_device_get_match_data(>dev); if (!ops) @@ -80,7 +81,10 @@ int pci_host_common_probe(struct platform_device *pdev) platform_set_drvdata(pdev, bridge); - return pci_host_probe(bridge); + ret = pci_host_probe(bridge); + bridge->bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; + + return ret; } EXPORT_SYMBOL_GPL(pci_host_common_probe); (plus another hack to get the host controller to initialise a bit later, though building it as a module will achieve the same thing): [0.369114] 9pnet: Installing 9P2000 support [0.369807] mpls_gso: MPLS GSO support [0.370512] registered taskstats version 1 [0.371204] Loading compiled-in X.509 certificates [0.371988] zswap: loaded using pool lzo/zbud [0.373041] pci-host-generic 4000.pci: host bridge /pci ranges: [0.374045] pci-host-generic 4000.pci: IO 0x00..0x00 -> 0x00 [0.375458] pci-host-generic 4000.pci: MEM 0x004100..0x007fff -> 0x004100 [0.376848] pci-host-generic 4000.pci: ECAM at [mem 0x4000-0x40ff] for [bus 00] [0.378204] pci-host-generic 4000.pci: PCI host bridge to bus :00 [0.379316] pci_bus :00: root bus resource [bus 00] [0.380146] pci_bus :00: root bus resource [io 0x-0x] [0.381131] pci_bus :00: root bus resource [mem 0x4100-0x7fff] [0.382267] pci :00:00.0: [1af4:1009] type 00 class 0xff [0.383369] pci :00:00.0: reg 0x10: [io 0x6200-0x62ff] [0.384286] pci :00:00.0: reg 0x14: [mem 0x4100-0x41ff] [0.385324] pci :00:00.0: reg 0x18: [mem 0x41000200-0x410003ff] [0.386680] pci :00:01.0: [1af4:1009] type 00 class 0xff [0.387778] pci :00:01.0: reg 0x10: [io 0x6300-0x63ff] [0.388696] pci :00:01.0: reg 0x14: [mem 0x41000400-0x410004ff] [0.389730] pci :00:01.0: reg 0x18: [mem 0x41000600-0x410007ff] [0.391070] pci :00:02.0: [1af4:1000] type 00 class 0x02 [0.392212] pci :00:02.0: reg 0x10: [io 0x6400-0x64ff] [0.393137] pci :00:02.0: reg 0x14: [mem 0x41000800-0x410008ff] [0.394163] pci :00:02.0: reg 0x18: [mem 0x41000a00-0x41000bff] [0.395678] pci :00:00.0: BAR 2: assigned [mem 0x4100-0x410001ff] [0.396762] pci :00:01.0: BAR 2: assigned [mem 0x4100
Re: Aw: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-03 10:16, Thomas Gleixner wrote: On Tue, Nov 03 2020 at 09:54, Marc Zyngier wrote: On 2020-11-02 22:18, Thomas Gleixner wrote: On Mon, Nov 02 2020 at 17:16, Thomas Gleixner wrote: On Mon, Nov 02 2020 at 11:30, Marc Zyngier wrote: --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -871,6 +871,8 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus) d = pci_host_bridge_msi_domain(b); dev_set_msi_domain(>dev, d); + if (!d) + bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; Hrm, that might break legacy setups (no irqdomain support). I'd rather prefer to explicitly tell the pci core at host registration time. s/might break/ breaks / Just validated :) For my own edification, can you point me to the failing case? Any architecture which selects PCI_MSI_ARCH_FALLBACKS and does not have irqdomain support runs into: if (!d) bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; which in turn makes pci_msi_supported() return 0 and consequently makes pci_enable_msi[x]() fail. I pointer that out in [1], together with a potential fix. Not sure if anything else breaks though. Thanks, M. [1] https://lore.kernel.org/r/336d6588567949029c52ecfbb8766...@kernel.org/ -- Jazz is not dead. It just smells funny...
Re: Aw: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-02 22:18, Thomas Gleixner wrote: On Mon, Nov 02 2020 at 17:16, Thomas Gleixner wrote: On Mon, Nov 02 2020 at 11:30, Marc Zyngier wrote: --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -871,6 +871,8 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus) d = pci_host_bridge_msi_domain(b); dev_set_msi_domain(>dev, d); + if (!d) + bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; Hrm, that might break legacy setups (no irqdomain support). I'd rather prefer to explicitly tell the pci core at host registration time. s/might break/ breaks / Just validated :) For my own edification, can you point me to the failing case? So we really need some other solution and removing the warning is not an option. If MSI is enabled then we want to get a warning when a PCI device has no MSI domain associated. Explicitly expressing the PCIE brigde misfeature of not supporting MSI is way better than silently returning an error code which is swallowed anyway. I don't disagree here, though the PCI_MSI_ARCH_FALLBACKS mechanism makes it more difficult to establish. Whatever the preferred way is via flags at host probe time or flagging it post probe I don't care much as long as it is consistent. Host probe time is going to require some changes in the core PCI api, as everything that checks for a MSI domain is based on the pci_bus structure, which is only allocated much later. I'll have a think. M. -- Jazz is not dead. It just smells funny...
Re: Using fixed LPI number for some Device ID
On 2020-11-03 05:22, Dongjiu Geng wrote: On 2020/10/31 17:55, Marc Zyngier wrote: Dongjiu, On Sat, 31 Oct 2020 02:19:19 +, Dongjiu Geng wrote: Hi Marc, Sorry to disturb you, Currently the LPI number is not fixed for the device. The LPI number is dynamically allocated start from 8092. For two OS which shares the ITS, One OS needs to configure the device interrupt required by another OS, and the other OS uses a fixed interrupt ID to respond the interrupt. Therefore, the LPI IRQ number of the device needed be fixed. I want to upstream this feature that allocate fixed LPI number for the device that is specified through the DTS. What is your meaning? Thanks I think you are starting from the wrong premises. You can't "share" an ITS directly between two operating systems. The ITS can only be controlled by a single operating system, because its function goes way beyond allocating an LPI. How would you deal with simple things such as masking an interrupt, which requires: - Access to memory (configuration table) - Access to the command queue (to insert an invalidation command) - Access to MMIO registers (to kick the command queue into action) all of which needs to be exclusive of concurrent modifications. How do you propose this is implemented in a safe manner by two operating systems which, by nature, distrust each other? Allocating LPIs is the least of your problems, really. Yes, I agree with you it . But in my HW platform, using virtualization, the performance deteriorates greatly. So I distributed the I/O devices to different operation systems. During the startup of one OS, interrupts are bound to different OS in one OS, which can be exclusive of concurrent modifications. In fact it has some limitations as you said, such mask/enable/route Interrupts, If want to mask interrupts, need to mask interrupts on the source device. If you think it is not a common feature, I will used it as a local customization function and not upstream. I don't think this makes sense for Linux, at least not in a way that limits the way the kernel deals with simple things such as LPI allocation. We have systems in the tree where Linux route interrupts on behalf of other agents in the system (see what the TI PRUSS subsystem does, for example), and even direct interrupt injection is, to an extent, doing that. This requires a standardised way for describing the routing, the allocation, and potentially the life cycle of the interrupt. But hardcoding the allocation based on some non-standard scheme is not something I'm considering. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: Aw: Re: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-02 11:56, Frank Wunderlich wrote: looks good on bananapi-r2, no warning, pcie-card and hdd recognized Thanks for giving it a shot. Still needs a bit of tweaking, as I expect it to break configurations that select CONFIG_PCI_MSI_ARCH_FALLBACKS (we have to assume that MSIs can be handled until we hit the arch-specific stuff). There is also a small nit in the way we allow userspace to mess with this flag via sysfs, and similar restrictions should probably apply. Updated patch below. M. diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index d15c881e2e7e..5bb1306162c7 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -387,10 +387,20 @@ static ssize_t msi_bus_store(struct device *dev, struct device_attribute *attr, return count; } - if (val) + if (val) { + /* +* If there is no possibility for this bus to deal with +* MSIs, then allowing them to be requested would lead to +* the kernel complaining loudly. In this situation, don't +* let userspace mess things up. +*/ + if (!pci_bus_is_msi_capable(subordinate)) + return -EINVAL; + subordinate->bus_flags &= ~PCI_BUS_FLAGS_NO_MSI; - else + } else { subordinate->bus_flags |= PCI_BUS_FLAGS_NO_MSI; + } dev_info(>dev, "MSI/MSI-X %s for future drivers of devices on this bus\n", val ? "allowed" : "disallowed"); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4289030b0fff..28861cc6435a 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -871,6 +871,8 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus) d = pci_host_bridge_msi_domain(b); dev_set_msi_domain(>dev, d); + if (!pci_bus_is_msi_capable(bus)) + bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; } static int pci_register_host_bridge(struct pci_host_bridge *bridge) diff --git a/include/linux/pci.h b/include/linux/pci.h index 22207a79762c..6aadb863dff4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2333,6 +2333,12 @@ pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; } static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; } #endif +static inline bool pci_bus_is_msi_capable(struct pci_bus *bus) +{ + return (IS_ENABLED(CONFIG_PCI_MSI_ARCH_FALLBACKS) || + dev_get_msi_domain(>dev)); +} + #ifdef CONFIG_EEH static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) { -- Jazz is not dead. It just smells funny...
Re: Aw: Re: [PATCH] pci: mediatek: fix warning in msi.h
On 2020-11-01 22:27, Thomas Gleixner wrote: On Sun, Nov 01 2020 at 21:47, Marc Zyngier wrote: On Sun, 01 Nov 2020 18:27:13 +, Frank Wunderlich wrote: Thinking of it a bit more, I think this is the wrong solution. PCI MSIs are optional, and not a requirement. I can trivially spin a VM with PCI devices and yet no MSI capability (yes, it is more difficult with real HW), and this results in a bunch of warning, none of which are actually indicative of anything being wrong. Well. No. The problem is that the device enumerates MSI capability, but the host bridge is not proving support for MSI. The host bridge fails to mark the bus with PCI_BUS_FLAGS_NO_MSI. That's the reason why this runs into this issue. Right, that's the piece I was missing, thanks for that. However, that doesn't really address the issue when the host bridge and the MSI widget are two separate entities, oblivious of each other (which is a pretty common thing on the ARM side). In this configuration, you can't really decide whether you have a MSI domain in the host bridge driver (the association is done in the code PCI code, after you have registered it with the core code), and by the time you get a pointer to the bus, the endpoints have already been probed. The following patch makes it work for me (GICv3 guest without an ITS)by checking for the presence of an MSI domain at the point where we actually perform this association, and before starting to scan for endpoints. I *think* this should work for the MTK thingy, but someone needs to go and check. Thanks, M. diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4289030b0fff..bb363eb103a2 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -871,6 +871,8 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus) d = pci_host_bridge_msi_domain(b); dev_set_msi_domain(>dev, d); + if (!d) + bus->bus_flags |= PCI_BUS_FLAGS_NO_MSI; } static int pci_register_host_bridge(struct pci_host_bridge *bridge) -- Jazz is not dead. It just smells funny...
Re: Aw: Re: [PATCH] pci: mediatek: fix warning in msi.h
On Sun, 01 Nov 2020 18:27:13 +, Frank Wunderlich wrote: > > > Gesendet: Sonntag, 01. November 2020 um 18:54 Uhr > > Von: "Ryder Lee" > > > Yea, mt7623 (mtk_pcie_soc_v1) does not support MSI, so that's a way to > > handle it. > > > > @Frank, could you help to test it? > > > > Ryder > > compiles clean for mt7623/armhf and mt7622/aarch64 so far > > at least bananapi-r2/mt7623 booting is clean now - no warning pcie > and sata/ahci seems still working as expected. I have a mt7615 card > in pcie-slot (firmware-load and init without errors) and hdd > connected to outer sata port (can access partitions witout errors) > > booted r64 too, still see no warning, but have not yet connected > hdd/pcie-device, but i guess this should not break anything here > > so Marc, if you post the patch separately, you can add my tested-by > ;) thank you for this Thinking of it a bit more, I think this is the wrong solution. PCI MSIs are optional, and not a requirement. I can trivially spin a VM with PCI devices and yet no MSI capability (yes, it is more difficult with real HW), and this results in a bunch of warning, none of which are actually indicative of anything being wrong. I think the real fix is to get rid of the warnings altogether. If we could detect that there should be an MSI domain associated with the device and that it wasn't there, that'd be a good reason to scream. But on its own, the absence of a MSI domain is not an indication of anything being amiss. M. -- Without deviation from the norm, progress is not possible.
[tip: irq/urgent] irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY
The following commit has been merged into the irq/urgent branch of tip: Commit-ID: d26dd4131d0d6ad7aa294a7f8d18782b47c27c93 Gitweb: https://git.kernel.org/tip/d26dd4131d0d6ad7aa294a7f8d18782b47c27c93 Author:Marc Zyngier AuthorDate:Fri, 16 Oct 2020 09:28:23 +01:00 Committer: Marc Zyngier CommitterDate: Fri, 16 Oct 2020 10:51:12 +01:00 irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY Now that GENERIC_IRQ_IPI selects IRQ_DOMAIN_HIERARCHY, there is no need to have this conditional select for IRQ_MIPS_CPU. Similarily, MIPS_GIC only needs selecting GENERIC_IRQ_IPI. Suggested-by: Thomas Gleixner Signed-off-by: Marc Zyngier --- drivers/irqchip/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index cd734df..38785a0 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -180,7 +180,6 @@ config IRQ_MIPS_CPU select GENERIC_IRQ_CHIP select GENERIC_IRQ_IPI if SYS_SUPPORTS_MULTITHREADING select IRQ_DOMAIN - select IRQ_DOMAIN_HIERARCHY if GENERIC_IRQ_IPI select GENERIC_IRQ_EFFECTIVE_AFF_MASK config CLPS711X_IRQCHIP @@ -307,7 +306,6 @@ config KEYSTONE_IRQ config MIPS_GIC bool select GENERIC_IRQ_IPI - select IRQ_DOMAIN_HIERARCHY select MIPS_CM config INGENIC_IRQ
[tip: irq/urgent] irqchip/mst: Make mst_intc_of_init static
The following commit has been merged into the irq/urgent branch of tip: Commit-ID: 893a7cfb6b0bea650fafa43838d7f7f8f0f076bc Gitweb: https://git.kernel.org/tip/893a7cfb6b0bea650fafa43838d7f7f8f0f076bc Author:Marc Zyngier AuthorDate:Thu, 15 Oct 2020 22:26:26 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 15 Oct 2020 22:32:31 +01:00 irqchip/mst: Make mst_intc_of_init static mst_intc_of_init has no external caller, so let's make it static. Reported-by: kernel test robot Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-mst-intc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-mst-intc.c b/drivers/irqchip/irq-mst-intc.c index 4be0775..143657b 100644 --- a/drivers/irqchip/irq-mst-intc.c +++ b/drivers/irqchip/irq-mst-intc.c @@ -154,8 +154,8 @@ static const struct irq_domain_ops mst_intc_domain_ops = { .free = irq_domain_free_irqs_common, }; -int __init -mst_intc_of_init(struct device_node *dn, struct device_node *parent) +static int __init mst_intc_of_init(struct device_node *dn, + struct device_node *parent) { struct irq_domain *domain, *domain_parent; struct mst_intc_chip_data *cd;
[tip: irq/urgent] irqchip/bcm2836: Fix missing __init annotation
The following commit has been merged into the irq/urgent branch of tip: Commit-ID: 57733e009f0c7e0526e10a18be12f56996c5460e Gitweb: https://git.kernel.org/tip/57733e009f0c7e0526e10a18be12f56996c5460e Author:Marc Zyngier AuthorDate:Sun, 25 Oct 2020 11:10:29 Committer: Marc Zyngier CommitterDate: Sun, 25 Oct 2020 11:10:29 irqchip/bcm2836: Fix missing __init annotation bcm2836_arm_irqchip_smp_init() calls set_smp_ipi_range(), which has an __init annotation. Make sure the caller has the same annotation. Reported-by: kernel test robot Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-bcm2836.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c index 97838eb..cbc7c74 100644 --- a/drivers/irqchip/irq-bcm2836.c +++ b/drivers/irqchip/irq-bcm2836.c @@ -244,7 +244,7 @@ static int bcm2836_cpu_dying(unsigned int cpu) #define BITS_PER_MBOX 32 -static void bcm2836_arm_irqchip_smp_init(void) +static void __init bcm2836_arm_irqchip_smp_init(void) { struct irq_fwspec ipi_fwspec = { .fwnode = intc.domain->fwnode,
[tip: irq/urgent] genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
The following commit has been merged into the irq/urgent branch of tip: Commit-ID: 151a535171be6ff824a0a3875553ea38570f4c05 Gitweb: https://git.kernel.org/tip/151a535171be6ff824a0a3875553ea38570f4c05 Author:Marc Zyngier AuthorDate:Thu, 15 Oct 2020 21:41:44 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 15 Oct 2020 21:41:44 +01:00 genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY kernel/irq/ipi.c otherwise fails to compile if nothing else selects it. Fixes: 379b656446a3 ("genirq: Add GENERIC_IRQ_IPI Kconfig symbol") Reported-by: Pavel Machek Tested-by: Pavel Machek Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201015101222.GA32747@amd --- kernel/irq/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 10a5aff..164a031 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -82,6 +82,7 @@ config IRQ_FASTEOI_HIERARCHY_HANDLERS # Generic IRQ IPI support config GENERIC_IRQ_IPI bool + select IRQ_DOMAIN_HIERARCHY # Generic MSI interrupt support config GENERIC_MSI_IRQ
[PATCH 1/2] genirq: Allow an interrupt to be marked as 'naked'
Some interrupts (such as the rescheduling IPI) rely on not going through the irq_enter()/irq_exit() calls. To distinguish such interrupts, add a new IRQ flag that allows the low-level handling code to sidestep the enter()/exit() calls. Only the architecture code is expected to use this. It will do the wrong thing on normal interrupts. Signed-off-by: Marc Zyngier --- include/linux/irq.h | 4 +++- kernel/irq/debugfs.c | 1 + kernel/irq/irqdesc.c | 17 - kernel/irq/settings.h | 7 +++ 4 files changed, 23 insertions(+), 6 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index c54365309e97..af5ba7336925 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -72,6 +72,7 @@ enum irqchip_irq_state; * mechanism and from core side polling. * IRQ_DISABLE_UNLAZY - Disable lazy irq disable * IRQ_HIDDEN - Don't show up in /proc/interrupts + * IRQ_NAKED - Bypass irq_enter()/irq_exit() */ enum { IRQ_TYPE_NONE = 0x, @@ -99,13 +100,14 @@ enum { IRQ_IS_POLLED = (1 << 18), IRQ_DISABLE_UNLAZY = (1 << 19), IRQ_HIDDEN = (1 << 20), + IRQ_NAKED = (1 << 21), }; #define IRQF_MODIFY_MASK \ (IRQ_TYPE_SENSE_MASK | IRQ_NOPROBE | IRQ_NOREQUEST | \ IRQ_NOAUTOEN | IRQ_MOVE_PCNTXT | IRQ_LEVEL | IRQ_NO_BALANCING | \ IRQ_PER_CPU | IRQ_NESTED_THREAD | IRQ_NOTHREAD | IRQ_PER_CPU_DEVID | \ -IRQ_IS_POLLED | IRQ_DISABLE_UNLAZY | IRQ_HIDDEN) +IRQ_IS_POLLED | IRQ_DISABLE_UNLAZY | IRQ_HIDDEN | IRQ_NAKED) #define IRQ_NO_BALANCING_MASK (IRQ_PER_CPU | IRQ_NO_BALANCING) diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c index e4cff358b437..e031d6afc0f8 100644 --- a/kernel/irq/debugfs.c +++ b/kernel/irq/debugfs.c @@ -140,6 +140,7 @@ static const struct irq_bit_descr irqdesc_states[] = { BIT_MASK_DESCR(_IRQ_IS_POLLED), BIT_MASK_DESCR(_IRQ_DISABLE_UNLAZY), BIT_MASK_DESCR(_IRQ_HIDDEN), + BIT_MASK_DESCR(_IRQ_NAKED), }; static const struct irq_bit_descr irqdesc_istates[] = { diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 1a7723604399..c08a1c19d061 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -667,10 +667,9 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, { struct pt_regs *old_regs = set_irq_regs(regs); unsigned int irq = hwirq; + struct irq_desc *desc; int ret = 0; - irq_enter(); - #ifdef CONFIG_IRQ_DOMAIN if (lookup) irq = irq_find_mapping(domain, hwirq); @@ -680,14 +679,22 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, * Some hardware gives randomly wrong interrupts. Rather * than crashing, do something sensible. */ - if (unlikely(!irq || irq >= nr_irqs)) { + desc = irq_to_desc(irq); + if (unlikely(!desc || irq >= nr_irqs)) { ack_bad_irq(irq); ret = -EINVAL; + goto out; + } + + if (unlikely(irq_settings_is_naked(desc))) { + generic_handle_irq_desc(desc); } else { - generic_handle_irq(irq); + irq_enter(); + generic_handle_irq_desc(desc); + irq_exit(); } - irq_exit(); +out: set_irq_regs(old_regs); return ret; } diff --git a/kernel/irq/settings.h b/kernel/irq/settings.h index 403378b9947b..587e67f9c302 100644 --- a/kernel/irq/settings.h +++ b/kernel/irq/settings.h @@ -18,6 +18,7 @@ enum { _IRQ_IS_POLLED = IRQ_IS_POLLED, _IRQ_DISABLE_UNLAZY = IRQ_DISABLE_UNLAZY, _IRQ_HIDDEN = IRQ_HIDDEN, + _IRQ_NAKED = IRQ_NAKED, _IRQF_MODIFY_MASK = IRQF_MODIFY_MASK, }; @@ -33,6 +34,7 @@ enum { #define IRQ_IS_POLLED GOT_YOU_MORON #define IRQ_DISABLE_UNLAZY GOT_YOU_MORON #define IRQ_HIDDEN GOT_YOU_MORON +#define IRQ_NAKED GOT_YOU_MORON #undef IRQF_MODIFY_MASK #define IRQF_MODIFY_MASK GOT_YOU_MORON @@ -174,3 +176,8 @@ static inline bool irq_settings_is_hidden(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_HIDDEN; } + +static inline bool irq_settings_is_naked(struct irq_desc *desc) +{ + return desc->status_use_accessors & _IRQ_NAKED; +} -- 2.28.0
[PATCH 0/2] arm64: Allow the rescheduling IPI to bypass irq_enter/exit
Vincent recently reported [1] that 5.10-rc1 showed a significant regression when running "perf bench sched pipe" on arm64, and pinpointed it to the recent move to handling IPIs as normal interrupts. The culprit is the use of irq_enter/irq_exit around the handling of the rescheduling IPI, meaning that we enter the scheduler right after the handling of the IPI instead of deferring it to the next preemption event. This accounts for most of the overhead introduced. On architectures that have architected IPIs at the CPU level (x86 being the obvious one), the absence of irq_enter/exit is natural. ARM (both 32 and 64bits) mimicked this behaviour by having some arch-specific handling for the interrupts that are used to implement IPIs. Moving IPIs on the normal interrupt path introduced the regression. This couple of patches try to acknowledge the fact that some IPIs are "special", in the sense that they don't need to follow the standard interrupt handling flow. The good news is that it cures the regression on arm64, and could be similarly beneficial to both 32bit ARM, MIPS, or any other architecture that uses a unique IRQ to represent the scheduler IPI. The bad news is that these patches are ugly as sin, and I really don't like them. I specially hate that they can give driver authors the idea that they can make random interrupts "faster". Comments, suggestions and hate mails appreciated, as always. M. [1] https://lore.kernel.org/r/cakftptdjppri5gt6klefp_b_zjuz5dyxeqtj+0vkohu-y9b...@mail.gmail.com Marc Zyngier (2): genirq: Allow an interrupt to be marked as 'naked' arm64: Mark the recheduling IPI as naked interrupt arch/arm64/kernel/smp.c | 4 include/linux/irq.h | 4 +++- kernel/irq/debugfs.c| 1 + kernel/irq/irqdesc.c| 17 - kernel/irq/settings.h | 7 +++ 5 files changed, 27 insertions(+), 6 deletions(-) -- 2.28.0
[PATCH 2/2] arm64: Mark the recheduling IPI as naked interrupt
Flag the rescheduling IPI as 'naked', making sure such interrupt doesn't trigger a rescheduling event by itself. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/smp.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 82e75fc2c903..6c11be3e74d3 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -993,6 +993,10 @@ void __init set_smp_ipi_range(int ipi_base, int n) ipi_desc[i] = irq_to_desc(ipi_base + i); irq_set_status_flags(ipi_base + i, IRQ_HIDDEN); + + /* The recheduling IPI is special... */ + if (i == IPI_RESCHEDULE) + irq_set_status_flags(ipi_base + i, IRQ_NAKED); } ipi_irq_base = ipi_base; -- 2.28.0
[GIT PULL] irqchip updates for 5.10, take #1
Hi Thomas, Here's a smallish set of fixes for 5.10. Some fixes after the IPI-as-IRQ (I expect a couple more next week), two significant bug fixes for the SiFive PLIC, and a TI update to handle their "unmapped events". The rest is the usual set of cleanups and tidying up. Please pull, M. The following changes since commit 63ea38a402213d8c9c16e58ee4901ff51bc8fe3c: Merge branch 'irq/mstar' into irq/irqchip-next (2020-10-10 12:46:54 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git tags/irqchip-fixes-5.10-1 for you to fetch changes up to d95bdca75b3fb41bf185efe164e05aed820081a5: irqchip/ti-sci-inta: Add support for unmapped event handling (2020-11-01 12:00:50 +) irqchip fixes for Linux 5.10, take #1 - A couple of fixes after the IPI as IRQ patches (Kconfig, bcm2836) - Two SiFive PLIC fixes (irq_set_affinity, hierarchy handling) - "unmapped events" handling for the ti-sci-inta controller - Tidying up for the irq-mst driver (static functions, Kconfig) - Small cleanup in the Renesas irqpin driver - STM32 exti can now handle LP timer events Fabrice Gasnier (1): irqchip/stm32-exti: Add all LP timer exti direct events support Geert Uytterhoeven (2): irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7 irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm Greentime Hu (2): irqchip/sifive-plic: Fix broken irq_set_affinity() callback irqchip/sifive-plic: Fix chip_data access within a hierarchy Marc Zyngier (4): genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY irqchip/mst: Make mst_intc_of_init static irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY irqchip/bcm2836: Fix missing __init annotation Peter Ujfalusi (2): dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling irqchip/ti-sci-inta: Add support for unmapped event handling .../bindings/interrupt-controller/ti,sci-inta.yaml | 10 +++ drivers/irqchip/Kconfig| 3 +- drivers/irqchip/irq-bcm2836.c | 2 +- drivers/irqchip/irq-mst-intc.c | 4 +- drivers/irqchip/irq-renesas-intc-irqpin.c | 8 +-- drivers/irqchip/irq-sifive-plic.c | 10 +-- drivers/irqchip/irq-stm32-exti.c | 4 ++ drivers/irqchip/irq-ti-sci-inta.c | 83 +- kernel/irq/Kconfig | 1 + 9 files changed, 107 insertions(+), 18 deletions(-)
Re: [RFC PATCH] irqchip/sifive-plic: Fix getting wrong chip_data when interrupt is hierarchy
On Thu, 29 Oct 2020 10:37:38 +0800, Greentime Hu wrote: > This oops is caused by a wrong chip_data and it is because plic_irq_unmask > uses irq_get_chip_data(irq_data->irq) to get the chip_data. However it may > get another irq_data with the same irq_data->irq if it is hierarchy. > > In this case, it will get irq_data of sifive_gpio_irqchip instead of > plic_chip so that it will get a wrong chip_data and then the wrong lmask > of it to cause this oops. > > [...] Applied to irq/irqchip-next, thanks! [1/1] irqchip/sifive-plic: Fix chip_data access within a hierarchy commit: f9ac7bbd6e4540dcc6df621b9c9b6eb2e26ded1d Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm
On Wed, 28 Oct 2020 16:39:55 +0100, Geert Uytterhoeven wrote: > Get rid of the separate flag to indicate if the IRLM bit is present in > the INTC/Interrupt Control Register 0, by considering -1 an invalid > irlm_bit value. Applied to irq/irqchip-next, thanks! [1/1] irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm commit: b388bdf2bac7aedac9bde5ab63eaf7646f29fc00 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3 0/2] irqchip/ti-sci-inta: Support for unmapped events
On Tue, 20 Oct 2020 10:32:41 +0300, Peter Ujfalusi wrote: > Changes since v2: > - Extended the block diagram of INTA in the DT documentation > - Use less creative variable names for unmapped events in the driver > - Short comment section to describe the unmapped event handling in driver > - Use u16 array to store the TI-SCI device identifiers instead of u32 > - Use printk format specifier instead of_node_full_name > > [...] Applied to irq/irqchip-next, thanks! [1/2] dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling commit: bb2bd7c7f3d0946acc2104db31df228d10f7b598 [2/2] irqchip/ti-sci-inta: Add support for unmapped event handling commit: d95bdca75b3fb41bf185efe164e05aed820081a5 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] pci: mediatek: fix warning in msi.h
On Sun, 01 Nov 2020 09:25:04 +, Frank Wunderlich wrote: > > Am 31. Oktober 2020 22:49:14 MEZ schrieb Thomas Gleixner : > > >That's not a fix. It's just supressing the warning. > > Ok sorry > > >So it needs to be figured out why the domain association is not there. > > It looks like for mt7623 there is no msi domain setup (done via > mtk_pcie_setup_irq callback + mtk_pcie_init_irq_domain) in mtk pcie > driver. Does this mean that this SoC never handled MSIs the first place? Which would explain the warning, as there is no MSI domain registered for the device, and we end-up falling back to arch_setup_msi_irqs(). If this system truly is unable to handle MSIs, one potential workaround would be to register a PCI-MSI domain that would always fail its allocation with -ENOSPC. It is really ugly, but would keep the horror localised. See the patchlet below, which I can't test. If this situation is more common than we expect, we may need something in core code instead. M. diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c index cf4c18f0c25a..52758b546d40 100644 --- a/drivers/pci/controller/pcie-mediatek.c +++ b/drivers/pci/controller/pcie-mediatek.c @@ -151,6 +151,7 @@ struct mtk_pcie_port; struct mtk_pcie_soc { bool need_fix_class_id; bool need_fix_device_id; + bool no_msi; unsigned int device_id; struct pci_ops *ops; int (*startup)(struct mtk_pcie_port *port); @@ -435,6 +436,9 @@ static int mtk_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int vir struct mtk_pcie_port *port = domain->host_data; unsigned long bit; + if (port->pcie->soc->no_msi) + return -ENOSPC; + WARN_ON(nr_irqs != 1); mutex_lock(>lock); @@ -966,11 +970,13 @@ static int mtk_pcie_parse_port(struct mtk_pcie *pcie, port->slot = slot; port->pcie = pcie; - if (pcie->soc->setup_irq) { + if (pcie->soc->setup_irq) err = pcie->soc->setup_irq(port, node); - if (err) - return err; - } + else + err = mtk_pcie_allocate_msi_domains(port); + + if (err) + return err; INIT_LIST_HEAD(>list); list_add_tail(>list, >ports); @@ -1173,6 +1179,7 @@ static const struct dev_pm_ops mtk_pcie_pm_ops = { }; static const struct mtk_pcie_soc mtk_pcie_soc_v1 = { + .no_msi = true, .ops = _pcie_ops, .startup = mtk_pcie_startup_port, }; -- Without deviation from the norm, progress is not possible.
Re: Using fixed LPI number for some Device ID
On Sat, 31 Oct 2020 03:10:24 +, Dongjiu Geng wrote: [...] > Sorry for the noise, Because Marc rarely uses the ARM email address, > so I replace to use Marc's kernel.org address instead of ARM email address. Rarely is quite the understatement. I left ARM over a year ago, so the likelihood of me answering at this address in vanishingly small. Maybe in a parallel universe? ;-) M. -- Without deviation from the norm, progress is not possible.
Re: Using fixed LPI number for some Device ID
Dongjiu, On Sat, 31 Oct 2020 02:19:19 +, Dongjiu Geng wrote: > > Hi Marc, > Sorry to disturb you, Currently the LPI number is not fixed for the > device. The LPI number is dynamically allocated start from 8092. > For two OS which shares the ITS, One OS needs to configure the > device interrupt required by another OS, and the other OS uses a > fixed interrupt ID to respond the interrupt. Therefore, the LPI IRQ > number of the device needed be fixed. I want to upstream this > feature that allocate fixed LPI number for the device that is > specified through the DTS. What is your meaning? Thanks I think you are starting from the wrong premises. You can't "share" an ITS directly between two operating systems. The ITS can only be controlled by a single operating system, because its function goes way beyond allocating an LPI. How would you deal with simple things such as masking an interrupt, which requires: - Access to memory (configuration table) - Access to the command queue (to insert an invalidation command) - Access to MMIO registers (to kick the command queue into action) all of which needs to be exclusive of concurrent modifications. How do you propose this is implemented in a safe manner by two operating systems which, by nature, distrust each other? Allocating LPIs is the least of your problems, really. If you need two concurrent OSs taking interrupts, use virtualisation. That is its purpose. On your HW, you'll even get direct injection. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
On Mon, 26 Oct 2020 14:44:23 +, Will Deacon wrote: > For consistency with the rest of the stage-2 page-table page allocations > (performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT is > included in the GFP flags for the PGD pages. Applied to next, thanks! [1/1] KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT commit: 7efe8ef274024ef1d5c495c79dfcbbff38c5f366 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 0/1] KVM: arm64: fix the mmio faulting
On Mon, 26 Oct 2020 16:54:06 +0530, Santosh Shukla wrote: > Description of the Reproducer scenario as asked in the thread [1]. > > Tried to create the reproducer scenario with vfio-pci driver using > nvidia GPU in PT mode, As because vfio-pci driver now supports > vma faulting (/vfio_pci_mmap_fault) so could create a crude reproducer > situation with that. > > [...] Applied to next, thanks! [1/1] KVM: arm64: Force PTE mapping on fault resulting in a device mapping commit: 91a2c34b7d6fadc9c5d9433c620ea4c32ee7cae8 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] KVM: arm64: Fix masks in stage2_pte_cacheable()
On Thu, 29 Oct 2020 14:47:16 +, Will Deacon wrote: > stage2_pte_cacheable() tries to figure out whether the mapping installed > in its 'pte' parameter is cacheable or not. Unfortunately, it fails > miserably because it extracts the memory attributes from the entry using > FIELD_GET(), which returns the attributes shifted down to bit 0, but then > compares this with the unshifted value generated by the PAGE_S2_MEMATTR() > macro. > > [...] Applied to next, thanks! [1/1] KVM: arm64: Fix masks in stage2_pte_cacheable() commit: e2fc6a9f686d037cbd9b08b9fb657685b4a722d3 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2] KVM: arm64: Failback on unsupported huge page sizes
On Mon, 26 Oct 2020 10:06:26 +1100, Gavin Shan wrote: > The huge page could be mapped through multiple contiguous PMDs or PTEs. > The corresponding huge page sizes aren't supported by the page table > walker currently. > > This fails the unsupported huge page sizes to the near one. Otherwise, > the guest can't boot successfully: CONT_PMD_SHIFT and CONT_PTE_SHIFT > fail back to PMD_SHIFT and PAGE_SHIFT separately. Applied to next, thanks! [1/1] KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes commit: 2f40c46021bbb3ecd5c5f05764ecccbc276bc690 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3] KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED
On 2020-10-26 13:25, Will Deacon wrote: On Fri, Oct 23, 2020 at 08:47:50AM -0700, Stephen Boyd wrote: According to the SMCCC spec[1](7.5.2 Discovery) the ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and SMCCC_RET_NOT_SUPPORTED. 0 is "workaround required and safe to call this function" 1 is "workaround not required but safe to call this function" SMCCC_RET_NOT_SUPPORTED is "might be vulnerable or might not be, who knows, I give up!" SMCCC_RET_NOT_SUPPORTED might as well mean "workaround required, except calling this function may not work because it isn't implemented in some cases". Wonderful. We map this SMC call to 0 is SPECTRE_MITIGATED 1 is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE For KVM hypercalls (hvc), we've implemented this function id to return SMCCC_RET_NOT_SUPPORTED, 0, and SMCCC_RET_NOT_REQUIRED. One of those isn't supposed to be there. Per the code we call arm64_get_spectre_v2_state() to figure out what to return for this feature discovery call. 0 is SPECTRE_MITIGATED SMCCC_RET_NOT_REQUIRED is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE Let's clean this up so that KVM tells the guest this mapping: 0 is SPECTRE_MITIGATED 1 is SPECTRE_UNAFFECTED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE Note: SMCCC_RET_NOT_AFFECTED is 1 but isn't part of the SMCCC spec Cc: Andre Przywara Cc: Steven Price Cc: Marc Zyngier Cc: sta...@vger.kernel.org Link: https://developer.arm.com/documentation/den0028/latest [1] Fixes: c118bbb52743 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests") Signed-off-by: Stephen Boyd --- I see that before commit c118bbb52743 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests") we had this mapping: 0 is SPECTRE_MITIGATED SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE so the return value '1' wasn't there then. Once the commit was merged we introduced the notion of NOT_REQUIRED here when it shouldn't have been introduced. Changes from v2: * Moved define to header file and used it Changes from v1: * Way longer commit text, more background (sorry) * Dropped proton-pack part because it was wrong * Rebased onto other patch accepted upstream arch/arm64/kernel/proton-pack.c | 2 -- arch/arm64/kvm/hypercalls.c | 2 +- include/linux/arm-smccc.h | 2 ++ 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index 25f3c80b5ffe..c18eb7d41274 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -135,8 +135,6 @@ static enum mitigation_state spectre_v2_get_cpu_hw_mitigation_state(void) return SPECTRE_VULNERABLE; } -#define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED (1) - static enum mitigation_state spectre_v2_get_cpu_fw_mitigation_state(void) { int ret; diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 9824025ccc5c..25ea4ecb6449 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -31,7 +31,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) val = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val = SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED; break; } break; diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index 15c706fb0a37..0e50ba3e88d7 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -86,6 +86,8 @@ ARM_SMCCC_SMC_32,\ 0, 0x7fff) +#define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED 1 I thought we'd stick this in asm/spectre.h, but here is also good: Acked-by: Will Deacon Acked-by: Marc Zyngier Will, if you're about to send fixes to Linus, can you please pick this one up? M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH kernel 1/2] irq: Add reference counting to IRQ mappings
Hi Alexey, On 2020-10-27 09:06, Alexey Kardashevskiy wrote: PCI devices share 4 legacy INTx interrupts from the same PCI host bridge. Device drivers map/unmap hardware interrupts via irq_create_mapping()/ irq_dispose_mapping(). The problem with that these interrupts are shared and when performing hot unplug, we need to unmap the interrupt only when the last device is released. This reuses already existing irq_desc::kobj for this purpose. The refcounter is naturally 1 when the descriptor is allocated already; this adds kobject_get() in places where already existing mapped virq is returned. That's quite interesting, as I was about to revive a patch series that rework the irqdomain subsystem to directly cache irq_desc instead of raw interrupt numbers. And for that, I needed some form of refcounting... This reorganizes irq_dispose_mapping() to release the kobj and let the release callback do the cleanup. If some driver or platform does its own reference counting, this expects those parties to call irq_find_mapping() and call irq_dispose_mapping() for every irq_create_fwspec_mapping()/irq_create_mapping(). Signed-off-by: Alexey Kardashevskiy --- kernel/irq/irqdesc.c | 35 +++ kernel/irq/irqdomain.c | 27 +-- 2 files changed, 36 insertions(+), 26 deletions(-) diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 1a7723604399..dae096238500 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -419,20 +419,39 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags, return NULL; } +static void delayed_free_desc(struct rcu_head *rhp); static void irq_kobj_release(struct kobject *kobj) { struct irq_desc *desc = container_of(kobj, struct irq_desc, kobj); + struct irq_domain *domain; + unsigned int virq = desc->irq_data.irq; - free_masks(desc); - free_percpu(desc->kstat_irqs); - kfree(desc); + domain = desc->irq_data.domain; + if (domain) { + if (irq_domain_is_hierarchy(domain)) { + irq_domain_free_irqs(virq, 1); How does this work with hierarchical domains? Each domain should contribute as a reference on the irq_desc. But if you got here, it means the refcount has already dropped to 0. So either there is nothing to free here, or you don't track the references implied by the hierarchy. I suspect the latter. + } else { + irq_domain_disassociate(domain, virq); + irq_free_desc(virq); + } + } + + /* +* We free the descriptor, masks and stat fields via RCU. That +* allows demultiplex interrupts to do rcu based management of +* the child interrupts. +* This also allows us to use rcu in kstat_irqs_usr(). +*/ + call_rcu(>rcu, delayed_free_desc); } static void delayed_free_desc(struct rcu_head *rhp) { struct irq_desc *desc = container_of(rhp, struct irq_desc, rcu); - kobject_put(>kobj); + free_masks(desc); + free_percpu(desc->kstat_irqs); + kfree(desc); } static void free_desc(unsigned int irq) @@ -453,14 +472,6 @@ static void free_desc(unsigned int irq) */ irq_sysfs_del(desc); delete_irq_desc(irq); - - /* -* We free the descriptor, masks and stat fields via RCU. That -* allows demultiplex interrupts to do rcu based management of -* the child interrupts. -* This also allows us to use rcu in kstat_irqs_usr(). -*/ - call_rcu(>rcu, delayed_free_desc); } static int alloc_descs(unsigned int start, unsigned int cnt, int node, diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index cf8b374b892d..02733ddc321f 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -638,6 +638,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, { struct device_node *of_node; int virq; + struct irq_desc *desc; pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq); @@ -655,7 +656,9 @@ unsigned int irq_create_mapping(struct irq_domain *domain, /* Check if mapping already exists */ virq = irq_find_mapping(domain, hwirq); if (virq) { + desc = irq_to_desc(virq); pr_debug("-> existing mapping on virq %d\n", virq); + kobject_get(>kobj); My worry with this is that there is probably a significant amount of code out there that relies on multiple calls to irq_create_mapping() with the same parameters not to have any side effects. They would expect a subsequent irq_dispose_mapping() to drop the translation altogether, and that's obviously not the case here. Have you audited the various call sites to see what could break? return virq; } @@ -751,6 +754,7 @@ unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
Re: [PATCH v3 03/16] arm64: Allow IPIs to be handled as normal interrupts
On 2020-10-27 11:21, Vincent Guittot wrote: On Tue, 27 Oct 2020 at 11:50, Vincent Guittot wrote: On Tue, 27 Oct 2020 at 11:37, Marc Zyngier wrote: > > On 2020-10-27 10:12, Vincent Guittot wrote: > > HI Marc, > > > > On Mon, 19 Oct 2020 at 17:43, Vincent Guittot > > wrote: > >> > >> On Mon, 19 Oct 2020 at 15:04, Marc Zyngier wrote: > >> > > > > > ... > > > >> > >> > >> > >> One of the major difference is that we end up, in some cases > >> > >> (such as when performing IRQ time accounting on the scheduler > >> > >> IPI), end up with nested irq_enter()/irq_exit() pairs. > >> > >> Other than the (relatively small) overhead, there should be > >> > >> no consequences to it (these pairs are designed to nest > >> > >> correctly, and the accounting shouldn't be off). > >> > > > >> > > While rebasing on mainline, I have faced a performance regression for > >> > > the benchmark: > >> > > perf bench sched pipe > >> > > on my arm64 dual quad core (hikey) and my 2 nodes x 112 CPUS (thx2) > >> > > > >> > > The regression comes from: > >> > > commit: d3afc7f12987 ("arm64: Allow IPIs to be handled as normal > >> > > interrupts") > >> > > >> > That's interesting, as this patch doesn't really change anything (most > >> > of the potential overhead comes in later). The only potential overhead > >> > I can see is that the scheduler_ipi() call is now wrapped around > >> > irq_enter()/irq_exit(). > >> > > >> > > > >> > > v5.9 + this patch > >> > > hikey : 48818(+/- 0.31) 37503(+/- 0.15%) -23.2% > >> > > thx2 : 132410(+/- 1.72) 122646(+/- 1.92%) -7.4% > >> > > > >> > > By + this patch, I mean merging branch from this patch. Whereas > >> > > merging the previous: > >> > > commit: 83cfac95c018 ("genirq: Allow interrupts to be excluded from > >> > > /proc/interrupts") > >> > > It doesn't show any regression > >> > > >> > Since you are running perf, can you spot where the overhead occurs? > > > > Any idea about the root cause of the regression ? > > I have faced it on more arm64 platforms in the meantime > > two possible causes: > > (1) irq_enter/exit on the rescheduling IPI means we reschedule much more > often > (2) irq_domain lookups add some overhead. > > For (1), I have this series[1] which is ugly as sin and needs much more > testing. Ok, I'm going to test this series to see if it fixes the perf regression You have spotted the root cause of the regression. We are back to ~1% performance diff on the hikey Yeah. Only thing is that I can't look at this hack without vomiting... M. -- Jazz is not dead. It just smells funny...
Re: [v2 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 2020-10-27 10:35, Biwen Li (OSS) wrote: On 2020-10-27 04:46, Biwen Li wrote: > From: Hou Zhiqiang > > Add an new IRQ chip declaration for LS1043A and LS1088A > - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A. > SCFG_INTPCR[31:0] > of these SoCs is stored/read as SCFG_INTPCR[0:31] defaultly(bit > reverse) > - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA > > Signed-off-by: Hou Zhiqiang > Signed-off-by: Biwen Li You clearly couldn't be bothered to read what I wrote in my earlier replies. I'm thus ignoring this series... Okay, got it. > --- > Change in v2: >- add despcription of bit reverse >- update copyright > > drivers/irqchip/irq-ls-extirq.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/irqchip/irq-ls-extirq.c > b/drivers/irqchip/irq-ls-extirq.c index 4d1179fed77c..9587bc2607fc > 100644 > --- a/drivers/irqchip/irq-ls-extirq.c > +++ b/drivers/irqchip/irq-ls-extirq.c > @@ -1,5 +1,8 @@ > // SPDX-License-Identifier: GPL-2.0 > - > +/* > + * Author: Rasmus Villemoes > + * Copyright 2020 NXP ... specially when you keep attributing someone else's copyright to NXP. Then I don't know how to add the copyright, any suggestions? Simple. You don't add anything. NXP's copyright doesn't apply to this file before this patch, and your changes are so trivial that they don't really warrant a mention. Furthermore, the git history already keeps track of who did what. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v3 03/16] arm64: Allow IPIs to be handled as normal interrupts
On 2020-10-27 10:12, Vincent Guittot wrote: HI Marc, On Mon, 19 Oct 2020 at 17:43, Vincent Guittot wrote: On Mon, 19 Oct 2020 at 15:04, Marc Zyngier wrote: > ... > >> > >> One of the major difference is that we end up, in some cases > >> (such as when performing IRQ time accounting on the scheduler > >> IPI), end up with nested irq_enter()/irq_exit() pairs. > >> Other than the (relatively small) overhead, there should be > >> no consequences to it (these pairs are designed to nest > >> correctly, and the accounting shouldn't be off). > > > > While rebasing on mainline, I have faced a performance regression for > > the benchmark: > > perf bench sched pipe > > on my arm64 dual quad core (hikey) and my 2 nodes x 112 CPUS (thx2) > > > > The regression comes from: > > commit: d3afc7f12987 ("arm64: Allow IPIs to be handled as normal > > interrupts") > > That's interesting, as this patch doesn't really change anything (most > of the potential overhead comes in later). The only potential overhead > I can see is that the scheduler_ipi() call is now wrapped around > irq_enter()/irq_exit(). > > > > > v5.9 + this patch > > hikey : 48818(+/- 0.31) 37503(+/- 0.15%) -23.2% > > thx2 : 132410(+/- 1.72) 122646(+/- 1.92%) -7.4% > > > > By + this patch, I mean merging branch from this patch. Whereas > > merging the previous: > > commit: 83cfac95c018 ("genirq: Allow interrupts to be excluded from > > /proc/interrupts") > > It doesn't show any regression > > Since you are running perf, can you spot where the overhead occurs? Any idea about the root cause of the regression ? I have faced it on more arm64 platforms in the meantime two possible causes: (1) irq_enter/exit on the rescheduling IPI means we reschedule much more often (2) irq_domain lookups add some overhead. For (1), I have this series[1] which is ugly as sin and needs much more testing. For (2), I have some ideas which need more work (let the irq domain resolve to an irq_desc instead of an interrupt number, avoiding another radix-tree lookup). M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/ipi-fixes -- Jazz is not dead. It just smells funny...
Re: [v2 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 2020-10-27 04:46, Biwen Li wrote: From: Hou Zhiqiang Add an new IRQ chip declaration for LS1043A and LS1088A - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A. SCFG_INTPCR[31:0] of these SoCs is stored/read as SCFG_INTPCR[0:31] defaultly(bit reverse) - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA Signed-off-by: Hou Zhiqiang Signed-off-by: Biwen Li You clearly couldn't be bothered to read what I wrote in my earlier replies. I'm thus ignoring this series... --- Change in v2: - add despcription of bit reverse - update copyright drivers/irqchip/irq-ls-extirq.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ls-extirq.c b/drivers/irqchip/irq-ls-extirq.c index 4d1179fed77c..9587bc2607fc 100644 --- a/drivers/irqchip/irq-ls-extirq.c +++ b/drivers/irqchip/irq-ls-extirq.c @@ -1,5 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 - +/* + * Author: Rasmus Villemoes + * Copyright 2020 NXP ... specially when you keep attributing someone else's copyright to NXP. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 2/2] irqchip: bcm2836: fix section mismatch warning
On 2020-10-27 08:51, ba...@kernel.org wrote: From: Felipe Balbi Fix the following warning: WARNING: modpost: vmlinux.o(.text.unlikely+0x17b2c): Section mismatch in reference from the function bcm2836_arm_irqchip_smp_init() to the function .init.text:set_smp_ipi_range() The function bcm2836_arm_irqchip_smp_init() references the function __init set_smp_ipi_range(). This is often because bcm2836_arm_irqchip_smp_init lacks a __init annotation or the annotation of set_smp_ipi_range is wrong. Signed-off-by: Felipe Balbi --- drivers/irqchip/irq-bcm2836.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c index 97838eb705f9..cbc7c740e4dc 100644 --- a/drivers/irqchip/irq-bcm2836.c +++ b/drivers/irqchip/irq-bcm2836.c @@ -244,7 +244,7 @@ static int bcm2836_cpu_dying(unsigned int cpu) #define BITS_PER_MBOX 32 -static void bcm2836_arm_irqchip_smp_init(void) +static void __init bcm2836_arm_irqchip_smp_init(void) { struct irq_fwspec ipi_fwspec = { .fwnode = intc.domain->fwnode, I already have a fix for this one[1], which should be in -next. Thanks, M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next=57733e009f0c7e0526e10a18be12f56996c5460e -- Jazz is not dead. It just smells funny...
Re: [PATCH AUTOSEL 5.9 087/147] genirq: Add stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER
On 2020-10-26 23:48, Sasha Levin wrote: From: Zhen Lei [ Upstream commit ea0c80d1764449acf2f70fdb25aec33800cd0348 ] In order to avoid compilation errors when a driver references set_handle_irq(), but that the architecture doesn't select GENERIC_IRQ_MULTI_HANDLER, add a stub function that will just WARN_ON_ONCE() if ever used. Signed-off-by: Zhen Lei [maz: commit message] Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20200924071754.4509-2-thunder.leiz...@huawei.com Signed-off-by: Sasha Levin --- include/linux/irq.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/linux/irq.h b/include/linux/irq.h index 1b7f4dfee35b3..b167baef88c0b 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -1252,6 +1252,12 @@ int __init set_handle_irq(void (*handle_irq)(struct pt_regs *)); * top-level IRQ handler. */ extern void (*handle_arch_irq)(struct pt_regs *) __ro_after_init; +#else +#define set_handle_irq(handle_irq) \ + do {\ + (void)handle_irq; \ + WARN_ON(1); \ + } while (0) #endif #endif /* _LINUX_IRQ_H */ What is the reason for this backport? The only user is a driver that isn't getting backported (d59f7d159891 and following patches). Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 0/2] irq-meson-gpio: make it possible to build as a module
On 2020-10-26 17:28, Kevin Hilman wrote: Marc Zyngier writes: On 2020-10-26 16:18, Kevin Hilman wrote: Marc Zyngier writes: On Tue, 20 Oct 2020 08:25:30 +0100, Neil Armstrong wrote: In order to reduce the kernel Image size on multi-platform distributions, make it possible to build the Amlogic GPIO IRQ controller as a module by switching it to a platform driver. The second patch removes MESON_IRQ_GPIO selection from ARCH_MESON to allow building the driver as module. Neil Armstrong (2): irqchip: irq-meson-gpio: make it possible to build as a module arm64: meson: remove MESON_IRQ_GPIO selection arch/arm64/Kconfig.platforms | 1 - drivers/irqchip/Kconfig | 5 +- drivers/irqchip/irq-meson-gpio.c | 89 3 files changed, 59 insertions(+), 36 deletions(-) I've tried this series on my vim3l with the this driver compiled as a module, and lost the Ethernet interface in the process, as the phy wasn't able to resolve its interrupt and things fail later on: [ 72.238291] meson8b-dwmac ff3f.ethernet eth1: no phy at addr -1 [ 72.238917] meson8b-dwmac ff3f.ethernet eth1: stmmac_open: Cannot attach to PHY (error: -19) This is a generic problem with making DT-based interrupt controllers modular when not *all* the drivers can deal with probing deferral. Yes, but this series still keeps the default as built-in. If you build as a module, and you add `fw_devlink=on` to the kernel command-line, device-links will be created based on DT dependencies which will ensure the right module load order. It doesn't work here. I get the exact same error (well, with eth0 instead of eth1). In my experience, fw_devlink isn't reliable yet. Config on request. Other than CONFIG_MESON_IRQ_GPIO=m, are you using default upstream defconfig? I use something that is much closer to a Debian configuration, given that the same kernel as to run on *all* the systems I have access to. I just double-checked with this series on top of v5.10-rc1, and I have eth0 working with interrupts without needing fw_devlink=on. With the default upstream defconfig all the networking for these devices is already configured as modules. dmesg: http://www.loen.fr/tmp/dmesg config: http://www.loen.fr/tmp/Config.full-arm64 root@tiger-roach:~# lsmod Module Size Used by macvtap16384 0 macvlan32768 1 macvtap tap32768 1 macvtap nls_ascii 16384 1 nls_cp437 20480 1 vfat 28672 1 fat81920 1 vfat aes_ce_blk 36864 0 crypto_simd24576 1 aes_ce_blk cryptd 28672 1 crypto_simd aes_ce_cipher 20480 1 aes_ce_blk ghash_ce 24576 0 gf128mul 16384 1 ghash_ce sha2_ce20480 0 sha256_arm64 28672 1 sha2_ce sha1_ce20480 0 panfrost 69632 0 gpu_sched 45056 1 panfrost meson_saradc 24576 0 industrialio 90112 1 meson_saradc irq_meson_gpio 20480 0 pwm_meson 20480 1 meson_dw_hdmi 24576 0 meson_drm 61440 1 meson_dw_hdmi meson_canvas 16384 1 meson_drm dw_hdmi53248 1 meson_dw_hdmi cec57344 1 dw_hdmi drm_kms_helper258048 4 meson_dw_hdmi,meson_drm,dw_hdmi meson_rng 16384 0 rng_core 24576 1 meson_rng cpufreq_dt 20480 0 leds_gpio 16384 0 drm 606208 7 gpu_sched,meson_dw_hdmi,meson_drm,drm_kms_helper,dw_hdmi,panfrost ip_tables 32768 0 x_tables 45056 1 ip_tables autofs449152 2 xhci_plat_hcd 20480 0 dwc2 249856 0 dwc3 151552 0 ulpi 20480 1 dwc3 udc_core 69632 2 dwc2,dwc3 rtc_hym856320480 0 meson_gxl 20480 0 realtek24576 0 dwmac_generic 16384 0 dwc3_meson_g12a24576 0 meson_gx_mmc 24576 0 xhci_pci 24576 0 igb 237568 0 xhci_hcd 290816 2 xhci_pci,xhci_plat_hcd i2c_meson 20480 0 mdio_mux_meson_g12a16384 0 mdio_mux 16384 1 mdio_mux_meson_g12a nvme 45056 2 nvme_core 110592 4 nvme i2c_algo_bit 16384 1 igb t10_pi 16384 1 nvme_core usbcore 311296 4 xhci_hcd,dwc2,xhci_pci,xhci_plat_hcd dwmac_meson8b 16384 0 stmmac_platform24576 2 dwmac_meson8b,dwmac_generic stmmac204800 3 dwmac_meson8b,stmmac_platform,dwmac_generic pcs_xpcs 20480 1 stmmac phylink45056 1 stmmac of_mdio20480 4 stmmac_platform,mdio_mux,stmmac,phylink fixed_phy 16384 1 of_mdio pwm_regulator 20480 1 libphy
Re: [PATCH 0/2] irq-meson-gpio: make it possible to build as a module
On 2020-10-26 16:18, Kevin Hilman wrote: Marc Zyngier writes: On Tue, 20 Oct 2020 08:25:30 +0100, Neil Armstrong wrote: In order to reduce the kernel Image size on multi-platform distributions, make it possible to build the Amlogic GPIO IRQ controller as a module by switching it to a platform driver. The second patch removes MESON_IRQ_GPIO selection from ARCH_MESON to allow building the driver as module. Neil Armstrong (2): irqchip: irq-meson-gpio: make it possible to build as a module arm64: meson: remove MESON_IRQ_GPIO selection arch/arm64/Kconfig.platforms | 1 - drivers/irqchip/Kconfig | 5 +- drivers/irqchip/irq-meson-gpio.c | 89 3 files changed, 59 insertions(+), 36 deletions(-) I've tried this series on my vim3l with the this driver compiled as a module, and lost the Ethernet interface in the process, as the phy wasn't able to resolve its interrupt and things fail later on: [ 72.238291] meson8b-dwmac ff3f.ethernet eth1: no phy at addr -1 [ 72.238917] meson8b-dwmac ff3f.ethernet eth1: stmmac_open: Cannot attach to PHY (error: -19) This is a generic problem with making DT-based interrupt controllers modular when not *all* the drivers can deal with probing deferral. Yes, but this series still keeps the default as built-in. If you build as a module, and you add `fw_devlink=on` to the kernel command-line, device-links will be created based on DT dependencies which will ensure the right module load order. It doesn't work here. I get the exact same error (well, with eth0 instead of eth1). In my experience, fw_devlink isn't reliable yet. Config on request. I've tested this series with `fw_devlink=on` on several Amlogic platforms and it works just fine, but since it requires the extra cmdline option, I think the default should remain built-in. So, I'd still like to see this series merged so that at least it's an option to enable this as a module. I have taken similar patches in 5.9 for other SoC families (qcomm, mtk), and ended up reverting them in -rc2, because there is simply too much breakage. Even keeping it as built in changes the init order, which tons of drivers depend on. I proposed a middle-of-the-road approach (modules can break, built-in stays the same) which Rob pushed back on. So either we fix fw_devlink to work for everything and be on by default, or we keep the current setup. Also, another reason to make it optional is that not all platforms need this feature at all, but right now we select it for all Amlogic SoCs. I understand that, but I don't want another episode of widespread breakages, and this series definitely breaks things. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RESEND 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 2020-10-26 15:06, Leo Li wrote: -Original Message- From: Marc Zyngier Sent: Monday, October 26, 2020 4:23 AM To: Rasmus Villemoes Cc: Biwen Li (OSS) ; shawn...@kernel.org; robh...@kernel.org; mark.rutl...@arm.com; Leo Li ; Z.q. Hou ; t...@linutronix.de; ja...@lakedaemon.net; devicet...@vger.kernel.org; linux- ker...@vger.kernel.org; Jiafei Pan ; Xiaobo Xie ; linux-arm-ker...@lists.infradead.org; Biwen Li Subject: Re: [RESEND 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt On 2020-10-26 09:06, Rasmus Villemoes wrote: > On 26/10/2020 09.44, Marc Zyngier wrote: >> On 2020-10-26 08:01, Biwen Li wrote: >>> From: Hou Zhiqiang >>> >>> Add an new IRQ chip declaration for LS1043A and LS1088A >>> - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A >>> - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA >> >> Three things: >> - This commit message doesn't describe the bit_reverse change > > Yeah, please elaborate on that, as the RM for 1043 or 1046 doesn't > mention anything about bit reversal for the scfg registers - they > don't seem to have the utter nonsense that is SCFG_SCFGREVCR, but > perhaps, instead of removing it, that has just become a hard-coded > part of the IP. > > Also, IANAL etc., but > >>> +// Copyright 2019-2020 NXP > > really? Seems to be a bit of a stretch. > > At the very least, cc'ing the original author and only person to ever > touch that file would have been appreciated. Huh. Well spotted. That's definitely not on. NXP people, please talk to your legal department. We do have an internal policy to require developer adding/updating NXP copyright on non-trivial changes. I'm not sure if this change should be considered trivial, but adding copyright claim on a file without prior copyright claims could causing confusion like in this case. The copyright exists implicitly, and doesn't require a copyright claim in the file itself. Please talk to your legal department. One potential solution is to add a more specific description on the NXP change together with the copyright claim. But maybe an easier solution is to add Rasmus your Copyright claim first if you are ok with it. That's for Rasmus to decide whether he wants to add such a claim, given that it exists implicitly. Adding copyright claims for any odd change you make isn't acceptable either (your changes are already unambiguously identified in git). For the time being, I'm not taking any NXP patch carrying additional copyright update until this is clarified. M. -- Jazz is not dead. It just smells funny...
Re: [RESEND 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 2020-10-26 09:06, Rasmus Villemoes wrote: On 26/10/2020 09.44, Marc Zyngier wrote: On 2020-10-26 08:01, Biwen Li wrote: From: Hou Zhiqiang Add an new IRQ chip declaration for LS1043A and LS1088A - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA Three things: - This commit message doesn't describe the bit_reverse change Yeah, please elaborate on that, as the RM for 1043 or 1046 doesn't mention anything about bit reversal for the scfg registers - they don't seem to have the utter nonsense that is SCFG_SCFGREVCR, but perhaps, instead of removing it, that has just become a hard-coded part of the IP. Also, IANAL etc., but +// Copyright 2019-2020 NXP really? Seems to be a bit of a stretch. At the very least, cc'ing the original author and only person to ever touch that file would have been appreciated. Huh. Well spotted. That's definitely not on. NXP people, please talk to your legal department. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 3/3] KVM: arm64: Failback on unsupported huge page sizes
On 2020-10-25 23:04, Gavin Shan wrote: Hi Marc, On 10/25/20 9:48 PM, Marc Zyngier wrote: On Sun, 25 Oct 2020 01:27:39 +0100, Gavin Shan wrote: The huge page could be mapped through multiple contiguous PMDs or PTEs. The corresponding huge page sizes aren't supported by the page table walker currently. This fails the unsupported huge page sizes to the near one. Otherwise, the guest can't boot successfully: CONT_PMD_SHIFT and CONT_PTE_SHIFT fail back to PMD_SHIFT and PAGE_SHIFT separately. Signed-off-by: Gavin Shan --- arch/arm64/kvm/mmu.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 0f51585adc04..81cbdc368246 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -793,12 +793,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, vma_shift = PMD_SHIFT; #endif + if (vma_shift == CONT_PMD_SHIFT) + vma_shift = PMD_SHIFT; + if (vma_shift == PMD_SHIFT && !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { force_pte = true; vma_shift = PAGE_SHIFT; } + if (vma_shift == CONT_PTE_SHIFT) { + force_pte = true; + vma_shift = PAGE_SHIFT; + } + vma_pagesize = 1UL << vma_shift; if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) fault_ipa &= ~(vma_pagesize - 1); Yup, nice catch. However, I think we should take this opportunity to rationalise the logic here, and catch future discrepancies (should someone add contiguous PUD or something similarly silly). How about something like this (untested): Yeah, I started the work to support contiguous PMDs/PTEs, but I'm not sure when I can post the patches for review as my time becomes a bit fragmented recently. At least, I need focus on "async page fault" in the coming weeks :) Thanks for the suggested code and it worked for me. I'll post v2 to integrate them. However, I would like to drop PATCH[1] and PATCH[2] as I really don't have strong reasons to have them. Yes, please drop these patches, and focus on the actual bug fix. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 1/3] KVM: arm64: Check if 52-bits PA is enabled
On 2020-10-25 22:23, Gavin Shan wrote: Hi Marc, On 10/25/20 8:52 PM, Marc Zyngier wrote: On Sun, 25 Oct 2020 01:27:37 +0100, Gavin Shan wrote: The 52-bits physical address is disabled until CONFIG_ARM64_PA_BITS_52 is chosen. This uses option for that check, to avoid the unconditional check on PAGE_SHIFT in the hot path and thus save some CPU cycles. PAGE_SHIFT is known at compile time, and this code is dropped by the compiler if the selected page size is not 64K. This patch really only makes the code slightly less readable and the "CPU cycles" argument doesn't hold at all. So what are you trying to solve exactly? There are two points covered by the patch: (1) The 52-bits physical address is visible only when CONFIG_ARM64_PA_BITS_52 is enabled in arch/arm64 code. The code looks consistent with this option used here. (2) I had the assumption that gcc doesn't optimize the code and PAGE_SHIFT is always checked in order to get higher 4 physical address bits, but you said gcc should optimize the code accordingly. However, it would be still nice to make the code explicit. Conditional compilation only results in more breakages, specially for configs that hardly anyone uses (big-endian and 64K pages are the two that bitrot very quickly). So if anything can build without #ifdef, I'll take that anytime. If the compiler doesn't optimize it away, let's fix the compiler. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RESEND 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 2020-10-26 08:01, Biwen Li wrote: From: Hou Zhiqiang Add an new IRQ chip declaration for LS1043A and LS1088A - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA Three things: - This commit message doesn't describe the bit_reverse change - Please add a cover letter - Sending the same series again after 4 days is not OK, specially when the initial one was during the merge window. Thanks, M. Signed-off-by: Hou Zhiqiang Signed-off-by: Biwen Li --- drivers/irqchip/irq-ls-extirq.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/irqchip/irq-ls-extirq.c b/drivers/irqchip/irq-ls-extirq.c index 4d1179fed77c..564e6de0bd8e 100644 --- a/drivers/irqchip/irq-ls-extirq.c +++ b/drivers/irqchip/irq-ls-extirq.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +// Copyright 2019-2020 NXP #define pr_fmt(fmt) "irq-ls-extirq: " fmt @@ -183,6 +184,9 @@ ls_extirq_of_init(struct device_node *node, struct device_node *parent) priv->bit_reverse = (revcr != 0); } + if (of_device_is_compatible(node, "fsl,ls1043a-extirq")) + priv->bit_reverse = true; + domain = irq_domain_add_hierarchy(parent_domain, 0, priv->nirq, node, _domain_ops, priv); if (!domain) @@ -195,3 +199,5 @@ ls_extirq_of_init(struct device_node *node, struct device_node *parent) } IRQCHIP_DECLARE(ls1021a_extirq, "fsl,ls1021a-extirq", ls_extirq_of_init); +IRQCHIP_DECLARE(ls1043a_extirq, "fsl,ls1043a-extirq", ls_extirq_of_init); +IRQCHIP_DECLARE(ls1088a_extirq, "fsl,ls1088a-extirq", ls_extirq_of_init); -- Jazz is not dead. It just smells funny...
Re: [PATCH] irqchip/sifive-plic: Fix broken irq_set_affinity() callback
On Tue, 20 Oct 2020 16:15:32 +0800, Greentime Hu wrote: > It will always enable the interrupt after calling plic_set_affinity() > however it should set to it previous setting. Staying disabled or enabled. > > This patch can also fix this pwm hang issue in Unleashed board. > > [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [ 919.020922] rcu: 0-...0: (0 ticks this GP) > idle=7d2/1/0x4002 softirq=1424/1424 fqs=105807 > [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496) > [ 919.036109] Task dump for CPU 0: > [ 919.039321] kworker/0:1 R running task030 2 > 0x0008 > [ 919.046359] Workqueue: events set_brightness_delayed > [ 919.051302] Call Trace: > [ 919.053738] [] __schedule+0x194/0x4de > [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > [ 982.040923] rcu: 0-...0: (0 ticks this GP) > idle=7d2/1/0x4002 softirq=1424/1424 fqs=113325 > [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509) > [ 982.056108] Task dump for CPU 0: > [ 982.059321] kworker/0:1 R running task030 2 > 0x0008 > [ 982.066359] Workqueue: events set_brightness_delayed > [ 982.071302] Call Trace: > [ 982.073739] [] __schedule+0x194/0x4de > [..] Applied to irq/irqchip-next, with some commit message adjustments. [1/1] irqchip/sifive-plic: Fix broken irq_set_affinity() callback commit: a7480c5d725c4ecfc627e70960f249c34f5d13e8 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 0/3] Add STM32 LP timer EXTI interrupts
On Fri, 16 Oct 2020 16:40:16 +0200, Fabrice Gasnier wrote: > STM32 LP timer that's available on STM32MP15x can wakeup the platform > using EXTI interrupts. > > This series add: > - LP timer EXTI - GIC interrupt events to EXTI driver and device-tree > - LP timer wakeup-source to device-tree > > [...] Applied to irq/irqchip-next, thanks! [1/3] irqchip/stm32-exti: Add all LP timer exti direct events support commit: a00e85b581fd5ee47e770b6b8d2038dbebbe81f9 Please route the last two patches via arm-soc. Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 0/2] irq-meson-gpio: make it possible to build as a module
On Tue, 20 Oct 2020 08:25:30 +0100, Neil Armstrong wrote: > > In order to reduce the kernel Image size on multi-platform distributions, > make it possible to build the Amlogic GPIO IRQ controller as a module > by switching it to a platform driver. > > The second patch removes MESON_IRQ_GPIO selection from ARCH_MESON to allow > building the driver as module. > > Neil Armstrong (2): > irqchip: irq-meson-gpio: make it possible to build as a module > arm64: meson: remove MESON_IRQ_GPIO selection > > arch/arm64/Kconfig.platforms | 1 - > drivers/irqchip/Kconfig | 5 +- > drivers/irqchip/irq-meson-gpio.c | 89 > 3 files changed, 59 insertions(+), 36 deletions(-) I've tried this series on my vim3l with the this driver compiled as a module, and lost the Ethernet interface in the process, as the phy wasn't able to resolve its interrupt and things fail later on: [ 72.238291] meson8b-dwmac ff3f.ethernet eth1: no phy at addr -1 [ 72.238917] meson8b-dwmac ff3f.ethernet eth1: stmmac_open: Cannot attach to PHY (error: -19) This is a generic problem with making DT-based interrupt controllers modular when not *all* the drivers can deal with probing deferral. M. -- Without deviation from the norm, progress is not possible.
Re: WARNING: modpost: vmlinux.o(.text.unlikely+0x11494): Section mismatch in reference from the function bcm2836_arm_irqchip_smp_init() to the function .init.text:set_smp_ipi_range()
On Sun, 25 Oct 2020 01:09:21 +0100, kernel test robot wrote: > > [1 ] > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master > head: d76913908102044f14381df865bb74df17a538cb > commit: 0809ae724904c3c5dbdddf4169d48aac9c6fcdc8 irqchip/bcm2836: Configure > mailbox interrupts as standard interrupts > date: 5 weeks ago > config: arm64-randconfig-p001-20201025 (attached as .config) > compiler: aarch64-linux-gcc (GCC) 9.3.0 > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0809ae724904c3c5dbdddf4169d48aac9c6fcdc8 > git remote add linus > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > git fetch --no-tags linus master > git checkout 0809ae724904c3c5dbdddf4169d48aac9c6fcdc8 > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross > ARCH=arm64 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot Fix queued, thanks for the report. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 3/3] KVM: arm64: Failback on unsupported huge page sizes
On Sun, 25 Oct 2020 01:27:39 +0100, Gavin Shan wrote: > > The huge page could be mapped through multiple contiguous PMDs or PTEs. > The corresponding huge page sizes aren't supported by the page table > walker currently. > > This fails the unsupported huge page sizes to the near one. Otherwise, > the guest can't boot successfully: CONT_PMD_SHIFT and CONT_PTE_SHIFT > fail back to PMD_SHIFT and PAGE_SHIFT separately. > > Signed-off-by: Gavin Shan > --- > arch/arm64/kvm/mmu.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 0f51585adc04..81cbdc368246 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -793,12 +793,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > vma_shift = PMD_SHIFT; > #endif > > + if (vma_shift == CONT_PMD_SHIFT) > + vma_shift = PMD_SHIFT; > + > if (vma_shift == PMD_SHIFT && > !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { > force_pte = true; > vma_shift = PAGE_SHIFT; > } > > + if (vma_shift == CONT_PTE_SHIFT) { > + force_pte = true; > + vma_shift = PAGE_SHIFT; > + } > + > vma_pagesize = 1UL << vma_shift; > if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) > fault_ipa &= ~(vma_pagesize - 1); Yup, nice catch. However, I think we should take this opportunity to rationalise the logic here, and catch future discrepancies (should someone add contiguous PUD or something similarly silly). How about something like this (untested): diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index cc323d96c9d4..d9a13a8a82e0 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -787,14 +787,31 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, vma_shift = PAGE_SHIFT; } - if (vma_shift == PUD_SHIFT && - !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) - vma_shift = PMD_SHIFT; + switch (vma_shift) { + case PUD_SHIFT: + if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) + break; + fallthrough; - if (vma_shift == PMD_SHIFT && - !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { - force_pte = true; + case CONT_PMD_SHIFT: + vma_shift = PMD_SHIFT; + fallthrough; + + case PMD_SHIFT: + if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) + break; + fallthrough; + + case CONT_PTE_SHIFT: vma_shift = PAGE_SHIFT; + force_pte = true; + fallthrough; + + case PAGE_SHIFT: + break; + + default: + WARN_ONCE(1, "Unknown vma_shift %d", vma_shift); } vma_pagesize = 1UL << vma_shift; Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 2/3] KVM: arm64: Don't map PUD huge page if it's not available
On Sun, 25 Oct 2020 01:27:38 +0100, Gavin Shan wrote: > > PUD huge page isn't available when CONFIG_ARM64_4K_PAGES is disabled. > In this case, we needn't try to map the memory through PUD huge pages > to save some CPU cycles in the hot path. > > This also corrects the code style issue, which was introduced by > commit <523b3999e5f6> ("KVM: arm64: Try PMD block mappings if PUD mappings > are not supported"). > > Signed-off-by: Gavin Shan > --- > arch/arm64/kvm/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index a816cb8e619b..0f51585adc04 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -787,9 +787,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > vma_shift = PAGE_SHIFT; > } > > +#ifdef CONFIG_ARM64_4K_PAGES > if (vma_shift == PUD_SHIFT && > !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) > -vma_shift = PMD_SHIFT; > + vma_shift = PMD_SHIFT; > +#endif > > if (vma_shift == PMD_SHIFT && > !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { I really don't buy the "CPU cycles" argument here either. Can you actually measure any difference here? You have taken a fault, gone through a full guest exit, triaged it, and are about to into a page mapping operation which may result in a TLBI, and reenter the guest. It only happen a handful of times per page over the lifetime of the guest unless you start swapping. Hot path? I don't think so. M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/3] KVM: arm64: Check if 52-bits PA is enabled
On Sun, 25 Oct 2020 01:27:37 +0100, Gavin Shan wrote: > > The 52-bits physical address is disabled until CONFIG_ARM64_PA_BITS_52 > is chosen. This uses option for that check, to avoid the unconditional > check on PAGE_SHIFT in the hot path and thus save some CPU cycles. PAGE_SHIFT is known at compile time, and this code is dropped by the compiler if the selected page size is not 64K. This patch really only makes the code slightly less readable and the "CPU cycles" argument doesn't hold at all. So what are you trying to solve exactly? M. > > Signed-off-by: Gavin Shan > --- > arch/arm64/kvm/hyp/pgtable.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 0cdf6e461cbd..fd850353ee89 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -132,8 +132,9 @@ static u64 kvm_pte_to_phys(kvm_pte_t pte) > { > u64 pa = pte & KVM_PTE_ADDR_MASK; > > - if (PAGE_SHIFT == 16) > - pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48; > +#ifdef CONFIG_ARM64_PA_BITS_52 > + pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48; > +#endif > > return pa; > } > @@ -142,8 +143,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa) > { > kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK; > > - if (PAGE_SHIFT == 16) > - pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48); > +#ifdef CONFIG_ARM64_PA_BITS_52 > + pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48); > +#endif > > return pte; > } > -- > 2.23.0 > > -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
Hi David, nit: please use my kernel.org address for kernel related stuff. On Sat, 24 Oct 2020 22:35:22 +0100, David Woodhouse wrote: > > From: David Woodhouse > > Prerequesite to make x86 more irqdomain compliant. Prerequisite? > > Signed-off-by: David Woodhouse > Signed-off-by: Thomas Gleixner > --- > kernel/irq/irqdomain.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c > index cf8b374b892d..673fa64c1c44 100644 > --- a/kernel/irq/irqdomain.c > +++ b/kernel/irq/irqdomain.c > @@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct > irq_domain *d) { } > static inline void debugfs_remove_domain_dir(struct irq_domain *d) { } > #endif > > -const struct fwnode_operations irqchip_fwnode_ops; > +static const char *irqchip_fwnode_get_name(const struct fwnode_handle > *fwnode) > +{ > + struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, > fwnode); > + > + return fwid->name; > +} > + > +const struct fwnode_operations irqchip_fwnode_ops = { > + .get_name = irqchip_fwnode_get_name, > +}; > EXPORT_SYMBOL_GPL(irqchip_fwnode_ops); > > /** Acked-by: Marc Zyngier -- Without deviation from the norm, progress is not possible.
Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Hi Santosh, Thanks for this. On 2020-10-21 17:16, Santosh Shukla wrote: The Commit:6d674e28 introduces a notion to detect and handle the device mapping. The commit checks for the VM_PFNMAP flag is set in vma->flags and if set then marks force_pte to true such that if force_pte is true then ignore the THP function check (/transparent_hugepage_adjust()). There could be an issue with the VM_PFNMAP flag setting and checking. For example consider a case where the mdev vendor driver register's the vma_fault handler named vma_mmio_fault(), which maps the host MMIO region in-turn calls remap_pfn_range() and maps the MMIO's vma space. Where, remap_pfn_range implicitly sets the VM_PFNMAP flag into vma->flags. Now lets assume a mmio fault handing flow where guest first access the MMIO region whose 2nd stage translation is not present. So that results to arm64-kvm hypervisor executing guest abort handler, like below: kvm_handle_guest_abort() --> user_mem_abort()--> { ... 0. checks the vma->flags for the VM_PFNMAP. 1. Since VM_PFNMAP flag is not yet set so force_pte _is_ false; 2. gfn_to_pfn_prot() --> __gfn_to_pfn_memslot() --> fixup_user_fault() --> handle_mm_fault()--> __do_fault() --> vma_mmio_fault() --> // vendor's mdev fault handler remap_pfn_range()--> // Here sets the VM_PFNMAP flag into vma->flags. 3. Now that force_pte is set to false in step-2), will execute transparent_hugepage_adjust() func and that lead to Oops [4]. } Hmmm. Nice. Any chance you could provide us with an actual reproducer? The proposition is to check is_iomap flag before executing the THP function transparent_hugepage_adjust(). [4] THP Oops: pc: kvm_is_transparent_hugepage+0x18/0xb0 ... ... user_mem_abort+0x340/0x9b8 kvm_handle_guest_abort+0x248/0x468 handle_exit+0x150/0x1b0 kvm_arch_vcpu_ioctl_run+0x4d4/0x778 kvm_vcpu_ioctl+0x3c0/0x858 ksys_ioctl+0x84/0xb8 __arm64_sys_ioctl+0x28/0x38 Tested on Huawei Kunpeng Taishan-200 arm64 server, Using VFIO-mdev device. Linux tip: 583090b1 Fixes: 6d674e28 ("KVM: arm/arm64: Properly handle faulting of device mappings") Signed-off-by: Santosh Shukla --- arch/arm64/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 3d26b47..ff15357 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1947,7 +1947,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * If we are not forced to use page mapping, check if we are * backed by a THP and thus use block mapping if possible. */ - if (vma_pagesize == PAGE_SIZE && !force_pte) + if (vma_pagesize == PAGE_SIZE && !force_pte && !is_iomap(flags)) vma_pagesize = transparent_hugepage_adjust(memslot, hva, , _ipa); if (writable) Why don't you directly set force_pte to true at the point where we update the flags? It certainly would be a bit more readable: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 3d26b47a1343..7a4ad984d54e 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1920,6 +1920,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (kvm_is_device_pfn(pfn)) { mem_type = PAGE_S2_DEVICE; flags |= KVM_S2PTE_FLAG_IS_IOMAP; + force_pte = true; } else if (logging_active) { /* * Faults on pages in a memslot with logging enabled and almost directly applies to what we have queued for 5.10. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 5/5] arm64: ipi_nmi: Add support for NMI backtrace
On 2020-10-20 10:13, Sumit Garg wrote: On Mon, 19 Oct 2020 at 17:50, Marc Zyngier wrote: On 2020-10-14 12:12, Sumit Garg wrote: > Enable NMI backtrace support on arm64 using IPI turned as an NMI > leveraging pseudo NMIs support. It is now possible for users to get a > backtrace of a CPU stuck in hard-lockup using magic SYSRQ. > > Signed-off-by: Sumit Garg > --- > arch/arm64/include/asm/irq.h | 6 ++ > arch/arm64/kernel/ipi_nmi.c | 12 +++- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/irq.h > b/arch/arm64/include/asm/irq.h > index b2b0c64..e840bf1 100644 > --- a/arch/arm64/include/asm/irq.h > +++ b/arch/arm64/include/asm/irq.h > @@ -6,6 +6,12 @@ > > #include > > +#ifdef CONFIG_SMP > +extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask, > +bool exclude_self); > +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace > +#endif > + > struct pt_regs; > > static inline int nr_legacy_irqs(void) > diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c > index e0a9e03..e1dc19d 100644 > --- a/arch/arm64/kernel/ipi_nmi.c > +++ b/arch/arm64/kernel/ipi_nmi.c > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -25,12 +26,21 @@ void arch_send_call_nmi_func_ipi_mask(cpumask_t > *mask) > __ipi_send_mask(ipi_desc, mask); > } > > +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool > exclude_self) > +{ > + nmi_trigger_cpumask_backtrace(mask, exclude_self, > + arch_send_call_nmi_func_ipi_mask); > +} > + > static irqreturn_t ipi_nmi_handler(int irq, void *data) > { > unsigned int cpu = smp_processor_id(); > > - ipi_kgdb_nmicallback(cpu, get_irq_regs()); > + if (nmi_cpu_backtrace(get_irq_regs())) > + goto out; > > + ipi_kgdb_nmicallback(cpu, get_irq_regs()); > +out: > return IRQ_HANDLED; > } Can't you have *both* a request for a backtrace and a KGDB call? It really shouldn't be either/or. It also outlines how well shared interrupts work with edge triggered signalling... Unfortunately, NMIs doesn't seem to support shared mode [1]. You are totally missing the point: shared interrupts *cannot* work reliably with edge signalling. What I am saying here is that you need implement the sharing yourself in this function. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 1/5] arm64: Add framework to turn IPI as NMI
On 2020-10-20 12:22, Sumit Garg wrote: On Tue, 20 Oct 2020 at 15:38, Marc Zyngier wrote: On 2020-10-20 07:43, Sumit Garg wrote: > On Mon, 19 Oct 2020 at 17:07, Marc Zyngier wrote: [...] >> > +{ >> > + if (!ipi_desc) >> > + return; >> > + >> > + if (is_nmi) { >> > + if (!prepare_percpu_nmi(ipi_id)) >> > + enable_percpu_nmi(ipi_id, IRQ_TYPE_NONE); >> > + } else { >> > + enable_percpu_irq(ipi_id, IRQ_TYPE_NONE); >> >> I'm not keen on this. Normal IRQs can't reliably work, so why do you >> even bother with this? > > Yeah I agree but we need to support existing functionality for kgdb > roundup and sysrq backtrace using normal IRQs as well. When has this become a requirement? I don't really see the point in implementing something that is known not to work. For kgdb: Default implementation [1] uses smp_call_function_single_async() which in turn will invoke IPI as a normal IRQ to roundup CPUs. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/debug/debug_core.c#n244 For sysrq backtrace: Default implementation [2] fallbacks to smp_call_function() (IPI as a normal IRQ) to print backtrace in case architecture doesn't provide arch_trigger_cpumask_backtrace() hook. [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/tty/sysrq.c#n250 So in general, IPI as a normal IRQ is still useful for debugging but it can't debug a core which is stuck in deadlock with interrupts disabled. And that's not something we implement today for good reasons: it *cannot* work reliably. What changed that we all of a sudden need it? And since we choose override default implementations for pseudo NMI support, we need to be backwards compatible for platforms which don't possess pseudo NMI support. No. There is nothing to be "backward compatible" with, because - this isn't a userspace visible feature - *it doesn't work* So please drop this non-feature from this series. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 1/2] arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED
On 2020-10-21 08:57, Will Deacon wrote: On Tue, Oct 20, 2020 at 02:45:43PM -0700, Stephen Boyd wrote: According to the SMCCC spec (7.5.2 Discovery) the ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and SMCCC_RET_NOT_SUPPORTED corresponding to "workaround required", "workaround not required but implemented", and "who knows, you're on your own" respectively. For kvm hypercalls (hvc), we've implemented this function id to return SMCCC_RET_NOT_SUPPORTED, 1, and SMCCC_RET_NOT_REQUIRED. The SMCCC_RET_NOT_REQUIRED return value is not a thing for this function id, and is probably copy/pasted from the SMCCC_ARCH_WORKAROUND_2 function id that does support it. Clean this up by returning 0, 1, and SMCCC_RET_NOT_SUPPORTED appropriately. Changing this exposes the problem that spectre_v2_get_cpu_fw_mitigation_state() assumes a SMCCC_RET_NOT_SUPPORTED return value means we are vulnerable, but really it means we have no idea and should assume we can't do anything about mitigation. Put another way, it better be unaffected because it can't be mitigated in the firmware (in this case kvm) as the call isn't implemented! Cc: Andre Przywara Cc: Steven Price Cc: Marc Zyngier Cc: sta...@vger.kernel.org Fixes: c118bbb52743 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests") Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof") Signed-off-by: Stephen Boyd --- This will require a slightly different backport to stable kernels, but at least it looks like this is a problem given that this return value isn't valid per the spec and we've been going around it by returning something invalid for some time. arch/arm64/kernel/proton-pack.c | 3 +-- arch/arm64/kvm/hypercalls.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index 68b710f1b43f..00bd54f63f4f 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -149,10 +149,9 @@ static enum mitigation_state spectre_v2_get_cpu_fw_mitigation_state(void) case SMCCC_RET_SUCCESS: return SPECTRE_MITIGATED; case SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED: + case SMCCC_RET_NOT_SUPPORTED: /* Good luck w/ the Gatekeeper of Gozer */ return SPECTRE_UNAFFECTED; Hmm, I'm not sure this is correct. The SMCCC spec is terrifically unhelpful: NOT_SUPPORTED: Either: * None of the PEs in the system require firmware mitigation for CVE-2017-5715. * The system contains at least 1 PE affected by CVE-2017-5715 that has no firmware mitigation available. * The firmware does not provide any information about whether firmware mitigation is required. so we can't tell whether the thing is vulnerable or not in this case, and have to assume that it is. default: - fallthrough; - case SMCCC_RET_NOT_SUPPORTED: return SPECTRE_VULNERABLE; } } diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 9824025ccc5c..868486957808 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -31,7 +31,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) val = SMCCC_RET_SUCCESS; break; case SPECTRE_UNAFFECTED: - val = SMCCC_RET_NOT_REQUIRED; + val = SMCCC_RET_NOT_SUPPORTED; Which means we need to return SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED here, I suppose? Gahh, I keep mixing Spectre-v2 and WA2. Not good. I *think* the patch below is enough, but I need to give it a go... M. diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index 68b710f1b43f..3f417d6305ef 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -134,8 +134,6 @@ static enum mitigation_state spectre_v2_get_cpu_hw_mitigation_state(void) return SPECTRE_VULNERABLE; } -#define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED (1) - static enum mitigation_state spectre_v2_get_cpu_fw_mitigation_state(void) { int ret; @@ -148,7 +146,7 @@ static enum mitigation_state spectre_v2_get_cpu_fw_mitigation_state(void) switch (ret) { case SMCCC_RET_SUCCESS: return SPECTRE_MITIGATED; - case SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED: + case SMCCC_RET_UNAFFECTED: return SPECTRE_UNAFFECTED; default: fallthrough; @@ -474,7 +472,7 @@ static enum mitigation_state spectre_v4_get_cpu_fw_mitigation_state(void) switch (ret) { case SMCCC_RET_SUCCESS: return SPECTRE_MITIGATED; - case SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED: + case SMCCC_RET_UNAFFECTED: fallthrough; case SMCCC_RET_NOT_REQUIRED:
Re: [PATCH v5 1/5] arm64: Add framework to turn IPI as NMI
On 2020-10-20 13:25, Daniel Thompson wrote: On Tue, Oct 20, 2020 at 04:52:43PM +0530, Sumit Garg wrote: [...] So in general, IPI as a normal IRQ is still useful for debugging but it can't debug a core which is stuck in deadlock with interrupts disabled. And since we choose override default implementations for pseudo NMI support, we need to be backwards compatible for platforms which don't possess pseudo NMI support. Do the fallback implementations require a new IPI? The fallbacks could rely on existing mechanisms such as the smp_call_function family. Indeed. I'd be worried of using that mechanism for NMIs, but normal IPIs should stick to the normal cross-call stuff. The jury is still out on why this is a good idea, given that it doesn't work in the only interesting case (deadlocked CPUs). M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 1/5] arm64: Add framework to turn IPI as NMI
On 2020-10-20 07:43, Sumit Garg wrote: On Mon, 19 Oct 2020 at 17:07, Marc Zyngier wrote: [...] > +{ > + if (!ipi_desc) > + return; > + > + if (is_nmi) { > + if (!prepare_percpu_nmi(ipi_id)) > + enable_percpu_nmi(ipi_id, IRQ_TYPE_NONE); > + } else { > + enable_percpu_irq(ipi_id, IRQ_TYPE_NONE); I'm not keen on this. Normal IRQs can't reliably work, so why do you even bother with this? Yeah I agree but we need to support existing functionality for kgdb roundup and sysrq backtrace using normal IRQs as well. When has this become a requirement? I don't really see the point in implementing something that is known not to work. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v3 03/16] arm64: Allow IPIs to be handled as normal interrupts
Hi Vincent, On 2020-10-19 13:42, Vincent Guittot wrote: Hi Marc, On Tue, 1 Sep 2020 at 16:44, Marc Zyngier wrote: In order to deal with IPIs as normal interrupts, let's add a new way to register them with the architecture code. set_smp_ipi_range() takes a range of interrupts, and allows the arch code to request them as if the were normal interrupts. A standard handler is then called by the core IRQ code to deal with the IPI. This means that we don't need to call irq_enter/irq_exit, and that we don't need to deal with set_irq_regs either. So let's move the dispatcher into its own function, and leave handle_IPI() as a compatibility function. On the sending side, let's make use of ipi_send_mask, which already exists for this purpose. One of the major difference is that we end up, in some cases (such as when performing IRQ time accounting on the scheduler IPI), end up with nested irq_enter()/irq_exit() pairs. Other than the (relatively small) overhead, there should be no consequences to it (these pairs are designed to nest correctly, and the accounting shouldn't be off). While rebasing on mainline, I have faced a performance regression for the benchmark: perf bench sched pipe on my arm64 dual quad core (hikey) and my 2 nodes x 112 CPUS (thx2) The regression comes from: commit: d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts") That's interesting, as this patch doesn't really change anything (most of the potential overhead comes in later). The only potential overhead I can see is that the scheduler_ipi() call is now wrapped around irq_enter()/irq_exit(). v5.9 + this patch hikey : 48818(+/- 0.31) 37503(+/- 0.15%) -23.2% thx2 : 132410(+/- 1.72) 122646(+/- 1.92%) -7.4% By + this patch, I mean merging branch from this patch. Whereas merging the previous: commit: 83cfac95c018 ("genirq: Allow interrupts to be excluded from /proc/interrupts") It doesn't show any regression Since you are running perf, can you spot where the overhead occurs? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] perf: arm_spe: Use Inner Shareable DSB when draining the buffer
On 2020-10-19 13:24, Mark Rutland wrote: On Tue, Oct 06, 2020 at 05:13:31PM +0100, Alexandru Elisei wrote: Hi Marc, Thank you for having a look at the patch! On 10/6/20 4:32 PM, Marc Zyngier wrote: > Hi Alex, > > On Tue, 06 Oct 2020 16:05:20 +0100, > Alexandru Elisei wrote: >> From ARM DDI 0487F.b, page D9-2807: >> >> "Although the Statistical Profiling Extension acts as another observer in >> the system, for determining the Shareability domain of the DSB >> instructions, the writes of sample records are treated as coming from the >> PE that is being profiled." >> >> Similarly, on page D9-2801: >> >> "The memory type and attributes that are used for a write by the >> Statistical Profiling Extension to the Profiling Buffer is taken from the >> translation table entries for the virtual address being written to. That >> is: >> - The writes are treated as coming from an observer that is coherent with >> all observers in the Shareability domain that is defined by the >> translation tables." >> >> All the PEs are in the Inner Shareable domain, use a DSB ISH to make sure >> writes to the profiling buffer have completed. > I'm a bit sceptical of this change. The SPE writes are per-CPU, and > all we are trying to ensure is that the CPU we are running on has > drained its own queue of accesses. > > The accesses being made within the IS domain doesn't invalidate the > fact that they are still per-CPU, because "the writes of sample > records are treated as coming from the PE that is being profiled.". > > So why should we have an IS-wide synchronisation for accesses that are > purely local? I think I might have misunderstood how perf spe works. Below is my original train of thought. In the buffer management event interrupt we drain the buffer, and if the buffer is full, we call arm_spe_perf_aux_output_end() -> perf_aux_output_end(). The comment for perf_aux_output_end() says "Commit the data written by hardware into the ring buffer by adjusting aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the pmu driver's responsibility to observe ordering rules of the hardware, so that all the data is externally visible before this is called." My conclusion was that after we drain the buffer, the data must be visible to all CPUs. FWIW, this reasoning sounds correct to me. The DSB NSH will be sufficient to drain the buffer, but we need the DSB ISH to ensure that it's visbile to other CPUs at the instant we call perf_aux_output_end(). Right. I think I missed that last bit (and Alex's email at the same time). Otherwise, if CPU x is reading the ring-buffer written by CPU y, it might see the aux buffer pointers updated before the samples are viisble, and hence read junk from the buffer. We can add a comment to that effect (or rework perf_aux_output_end() somehow to handle that ordering). I'd rather this is done in perf_aux_output_end(), as a full blown DSB ISH on guest entry is pretty harsh... It would also nicely split the responsibilities: - KVM stops SPE and make sure the output is drained - Perf makes the data visible to all CPUs Thoughts? M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 5/5] arm64: ipi_nmi: Add support for NMI backtrace
On 2020-10-14 12:12, Sumit Garg wrote: Enable NMI backtrace support on arm64 using IPI turned as an NMI leveraging pseudo NMIs support. It is now possible for users to get a backtrace of a CPU stuck in hard-lockup using magic SYSRQ. Signed-off-by: Sumit Garg --- arch/arm64/include/asm/irq.h | 6 ++ arch/arm64/kernel/ipi_nmi.c | 12 +++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h index b2b0c64..e840bf1 100644 --- a/arch/arm64/include/asm/irq.h +++ b/arch/arm64/include/asm/irq.h @@ -6,6 +6,12 @@ #include +#ifdef CONFIG_SMP +extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask, + bool exclude_self); +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace +#endif + struct pt_regs; static inline int nr_legacy_irqs(void) diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c index e0a9e03..e1dc19d 100644 --- a/arch/arm64/kernel/ipi_nmi.c +++ b/arch/arm64/kernel/ipi_nmi.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -25,12 +26,21 @@ void arch_send_call_nmi_func_ipi_mask(cpumask_t *mask) __ipi_send_mask(ipi_desc, mask); } +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) +{ + nmi_trigger_cpumask_backtrace(mask, exclude_self, + arch_send_call_nmi_func_ipi_mask); +} + static irqreturn_t ipi_nmi_handler(int irq, void *data) { unsigned int cpu = smp_processor_id(); - ipi_kgdb_nmicallback(cpu, get_irq_regs()); + if (nmi_cpu_backtrace(get_irq_regs())) + goto out; + ipi_kgdb_nmicallback(cpu, get_irq_regs()); +out: return IRQ_HANDLED; } Can't you have *both* a request for a backtrace and a KGDB call? It really shouldn't be either/or. It also outlines how well shared interrupts work with edge triggered signalling... M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 4/5] arm64: kgdb: Round up cpus using IPI as NMI
On 2020-10-14 12:12, Sumit Garg wrote: arm64 platforms with GICv3 or later supports pseudo NMIs which can be leveraged to round up CPUs which are stuck in hard lockup state with interrupts disabled that wouldn't be possible with a normal IPI. So instead switch to round up CPUs using IPI turned as NMI. And in case a particular arm64 platform doesn't supports pseudo NMIs, this IPI will act as a normal IPI which maintains existing kgdb functionality. Signed-off-by: Sumit Garg --- arch/arm64/include/asm/kgdb.h | 8 arch/arm64/kernel/ipi_nmi.c | 5 - arch/arm64/kernel/kgdb.c | 21 + 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kgdb.h b/arch/arm64/include/asm/kgdb.h index 21fc85e..6f3d3af 100644 --- a/arch/arm64/include/asm/kgdb.h +++ b/arch/arm64/include/asm/kgdb.h @@ -24,6 +24,14 @@ static inline void arch_kgdb_breakpoint(void) extern void kgdb_handle_bus_error(void); extern int kgdb_fault_expected; +#ifdef CONFIG_KGDB +extern void ipi_kgdb_nmicallback(int cpu, void *regs); +#else +static inline void ipi_kgdb_nmicallback(int cpu, void *regs) +{ +} +#endif + #endif /* !__ASSEMBLY__ */ /* diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c index a959256..e0a9e03 100644 --- a/arch/arm64/kernel/ipi_nmi.c +++ b/arch/arm64/kernel/ipi_nmi.c @@ -8,6 +8,7 @@ #include #include +#include #include #include @@ -26,7 +27,9 @@ void arch_send_call_nmi_func_ipi_mask(cpumask_t *mask) static irqreturn_t ipi_nmi_handler(int irq, void *data) { - /* nop, NMI handlers for special features can be added here. */ + unsigned int cpu = smp_processor_id(); + + ipi_kgdb_nmicallback(cpu, get_irq_regs()); Please add a return value to ipi_kgdb_nmicallback(), and check it before returning IRQ_HANDLED. Thinking a bit more about the whole thing, you should have a way to avoid requesting the NMI if there is no user for it (there is nothing worse than an enabled interrupt without handlers...). return IRQ_HANDLED; } diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c index 1a157ca3..0991275 100644 --- a/arch/arm64/kernel/kgdb.c +++ b/arch/arm64/kernel/kgdb.c @@ -17,6 +17,7 @@ #include #include +#include #include struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { @@ -353,3 +354,23 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) return aarch64_insn_write((void *)bpt->bpt_addr, *(u32 *)bpt->saved_instr); } + +void ipi_kgdb_nmicallback(int cpu, void *regs) +{ + if (atomic_read(_active) != -1) + kgdb_nmicallback(cpu, regs); +} + +#ifdef CONFIG_SMP There is no such thing as an arm64 UP kernel. +void kgdb_roundup_cpus(void) +{ + struct cpumask mask; + + cpumask_copy(, cpu_online_mask); + cpumask_clear_cpu(raw_smp_processor_id(), ); + if (cpumask_empty()) + return; + + arch_send_call_nmi_func_ipi_mask(); Surely you can come up with a less convoluted name for this function. arm64_send_nmi() would be plenty in my opinion. +} +#endif Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 2/5] irqchip/gic-v3: Enable support for SGIs to act as NMIs
On 2020-10-14 12:12, Sumit Garg wrote: Add support to handle SGIs as regular NMIs. As SGIs or IPIs defaults to a There is nothing "regular" about NMIs. Drop "or IPIs". s/defaults/default/ special flow handler: handle_percpu_devid_fasteoi_ipi(), so skip NMI handler update in case of SGIs. Also, enable NMI support prior to gic_smp_init() as allocation of SGIs as IRQs/NMIs happen as part of this routine. Signed-off-by: Sumit Garg --- drivers/irqchip/irq-gic-v3.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 16fecc0..5efc865 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -477,6 +477,11 @@ static int gic_irq_nmi_setup(struct irq_data *d) if (WARN_ON(gic_irq(d) >= 8192)) return -EINVAL; + if (get_intid_range(d) == SGI_RANGE) { + gic_irq_set_prio(d, GICD_INT_NMI_PRI); + return 0; + } + Please follow the existing control flow, or rework it to be organised by range. /* desc lock should already be held */ if (gic_irq_in_rdist(d)) { u32 idx = gic_get_ppi_index(d); @@ -514,6 +519,11 @@ static void gic_irq_nmi_teardown(struct irq_data *d) if (WARN_ON(gic_irq(d) >= 8192)) return; + if (get_intid_range(d) == SGI_RANGE) { + gic_irq_set_prio(d, GICD_INT_DEF_PRI); + return; + } Same here. + /* desc lock should already be held */ if (gic_irq_in_rdist(d)) { u32 idx = gic_get_ppi_index(d); @@ -1708,6 +1718,7 @@ static int __init gic_init_bases(void __iomem *dist_base, gic_dist_init(); gic_cpu_init(); + gic_enable_nmi_support(); gic_smp_init(); gic_cpu_pm_init(); @@ -1719,8 +1730,6 @@ static int __init gic_init_bases(void __iomem *dist_base, gicv2m_init(handle, gic_data.domain); } - gic_enable_nmi_support(); - return 0; out_free: Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 3/5] arm64: smp: Allocate and setup IPI as NMI
On 2020-10-14 12:12, Sumit Garg wrote: Allocate an unused IPI that can be turned as NMI using ipi_nmi framework. This doesn't do any allocation, as far as I can see. It relies on the initial grant from the interrupt controller to be larger than what the kernel currently uses. Also, invoke corresponding NMI setup/teardown APIs. Signed-off-by: Sumit Garg --- arch/arm64/kernel/smp.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 82e75fc..129ebfb 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include #include @@ -962,6 +963,8 @@ static void ipi_setup(int cpu) for (i = 0; i < nr_ipi; i++) enable_percpu_irq(ipi_irq_base + i, 0); + + ipi_nmi_setup(cpu); } #ifdef CONFIG_HOTPLUG_CPU @@ -974,6 +977,8 @@ static void ipi_teardown(int cpu) for (i = 0; i < nr_ipi; i++) disable_percpu_irq(ipi_irq_base + i); + + ipi_nmi_teardown(cpu); } #endif @@ -995,6 +1000,9 @@ void __init set_smp_ipi_range(int ipi_base, int n) irq_set_status_flags(ipi_base + i, IRQ_HIDDEN); } + if (n > nr_ipi) + set_smp_ipi_nmi(ipi_base + nr_ipi); + ipi_irq_base = ipi_base; /* Setup the boot CPU immediately */ Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 1/5] arm64: Add framework to turn IPI as NMI
On 2020-10-14 12:12, Sumit Garg wrote: Introduce framework to turn an IPI as NMI using pseudo NMIs. In case a particular platform doesn't support pseudo NMIs, then request IPI as a regular IRQ. The main motivation for this feature is to have an IPI that can be leveraged to invoke NMI functions on other CPUs. And current prospective users are NMI backtrace and KGDB CPUs round-up whose support is added via future patches. Signed-off-by: Sumit Garg --- arch/arm64/include/asm/nmi.h | 16 + arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/ipi_nmi.c | 77 3 files changed, 94 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/nmi.h create mode 100644 arch/arm64/kernel/ipi_nmi.c [...] + irq_set_status_flags(ipi, IRQ_HIDDEN); Another thing is this. Why are you hiding this from /proc/interrupts? The only reason the other IPIs are hidden is that displaying them as "normal" interrupts would be a change in userspace ABI. In your case, this is something new that can perfectly appear as a standard interrupt (and I don't see how you'd display the statistics otherwise). M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 1/5] arm64: Add framework to turn IPI as NMI
On 2020-10-14 12:12, Sumit Garg wrote: Introduce framework to turn an IPI as NMI using pseudo NMIs. In case a particular platform doesn't support pseudo NMIs, then request IPI as a regular IRQ. The main motivation for this feature is to have an IPI that can be leveraged to invoke NMI functions on other CPUs. And current prospective users are NMI backtrace and KGDB CPUs round-up whose support is added via future patches. Signed-off-by: Sumit Garg --- arch/arm64/include/asm/nmi.h | 16 + arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/ipi_nmi.c | 77 3 files changed, 94 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/nmi.h create mode 100644 arch/arm64/kernel/ipi_nmi.c diff --git a/arch/arm64/include/asm/nmi.h b/arch/arm64/include/asm/nmi.h new file mode 100644 index 000..3433c55 --- /dev/null +++ b/arch/arm64/include/asm/nmi.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_NMI_H +#define __ASM_NMI_H + +#ifndef __ASSEMBLER__ + +#include + +extern void arch_send_call_nmi_func_ipi_mask(cpumask_t *mask); + +void set_smp_ipi_nmi(int ipi); +void ipi_nmi_setup(int cpu); +void ipi_nmi_teardown(int cpu); + +#endif /* !__ASSEMBLER__ */ +#endif diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index bbaf0bc..525a1e0 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -17,7 +17,7 @@ obj-y := debug-monitors.o entry.o irq.o fpsimd.o \ return_address.o cpuinfo.o cpu_errata.o \ cpufeature.o alternative.o cacheinfo.o \ smp.o smp_spin_table.o topology.o smccc-call.o \ - syscall.o proton-pack.o + syscall.o proton-pack.o ipi_nmi.o targets+= efi-entry.o diff --git a/arch/arm64/kernel/ipi_nmi.c b/arch/arm64/kernel/ipi_nmi.c new file mode 100644 index 000..a959256 --- /dev/null +++ b/arch/arm64/kernel/ipi_nmi.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * NMI support for IPIs + * + * Copyright (C) 2020 Linaro Limited + * Author: Sumit Garg + */ + +#include +#include +#include + +#include + +static struct irq_desc *ipi_desc __read_mostly; +static int ipi_id __read_mostly; +static bool is_nmi __read_mostly; + +void arch_send_call_nmi_func_ipi_mask(cpumask_t *mask) +{ + if (WARN_ON_ONCE(!ipi_desc)) + return; + + __ipi_send_mask(ipi_desc, mask); +} + +static irqreturn_t ipi_nmi_handler(int irq, void *data) +{ + /* nop, NMI handlers for special features can be added here. */ + + return IRQ_HANDLED; This definitely is the *wrong* default. If nothing is explicitly handling the interrupt, it should be reported as such to the core code to be disabled if this happens too often. +} + +void ipi_nmi_setup(int cpu) The naming is awful. "ipi" is nowhere specific enough (we already have another 7 of them), and this doesn't necessarily uses pseudo-NMIs, since you are requesting IRQs. +{ + if (!ipi_desc) + return; + + if (is_nmi) { + if (!prepare_percpu_nmi(ipi_id)) + enable_percpu_nmi(ipi_id, IRQ_TYPE_NONE); + } else { + enable_percpu_irq(ipi_id, IRQ_TYPE_NONE); I'm not keen on this. Normal IRQs can't reliably work, so why do you even bother with this? + } +} + +void ipi_nmi_teardown(int cpu) +{ + if (!ipi_desc) + return; + + if (is_nmi) { + disable_percpu_nmi(ipi_id); + teardown_percpu_nmi(ipi_id); + } else { + disable_percpu_irq(ipi_id); + } +} + +void __init set_smp_ipi_nmi(int ipi) +{ + int err; + + err = request_percpu_nmi(ipi, ipi_nmi_handler, "IPI", _number); + if (err) { + err = request_percpu_irq(ipi, ipi_nmi_handler, "IPI", +_number); + WARN_ON(err); + is_nmi = false; + } else { + is_nmi = true; + } + + ipi_desc = irq_to_desc(ipi); + irq_set_status_flags(ipi, IRQ_HIDDEN); + ipi_id = ipi; Updating global state without checking for errors doesn't seem like a great move. M. -- Jazz is not dead. It just smells funny...
Re: 5.10-rc0: build error in ipi.c
On 2020-10-16 00:24, Thomas Gleixner wrote: On Thu, Oct 15 2020 at 20:41, Marc Zyngier wrote: On 2020-10-15 18:18, Pavel Machek wrote: diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 10a5aff4eecc..db923e0da162 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -81,6 +81,7 @@ config IRQ_FASTEOI_HIERARCHY_HANDLERS # Generic IRQ IPI support config GENERIC_IRQ_IPI + select IRQ_DOMAIN_HIERARCHY bool which makes some of the MIPS GENERIC_IRQ_IPI/IRQ_DOMAIN_HIERARCHY Kconfig magic in drivers/irqchip/Kconfig obsolete. Good point. I'll queue this on top: diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index cd734df57c42..d2a651372e15 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -180,7 +180,6 @@ config IRQ_MIPS_CPU select GENERIC_IRQ_CHIP select GENERIC_IRQ_IPI if SYS_SUPPORTS_MULTITHREADING select IRQ_DOMAIN - select IRQ_DOMAIN_HIERARCHY if GENERIC_IRQ_IPI select GENERIC_IRQ_EFFECTIVE_AFF_MASK config CLPS711X_IRQCHIP Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] irqchip: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7
On Wed, 14 Oct 2020 15:17:03 +0200, Geert Uytterhoeven wrote: > The MStar interrupt controller is only found on MStar, SigmaStar, and > Mediatek SoCs. Hence add dependencies on ARCH_MEDIATEK and > ARCH_MSTARV7, to prevent asking the user about the MStar interrupt > controller driver when configuring a kernel without support for MStar, > SigmaStar, and Mediatek SoCs. Applied to irq/irqchip-next, thanks! [1/1] irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7 commit: 61b0648d569aca932eab87a67f7ca0ffd3ea2b68 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 2/2] irqchip/sifive-plic: Fix the interrupt was enabled accidentally issue.
On 2020-10-12 14:57, Greentime Hu wrote: In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if activated opt-in"), it added irqd_affinity_on_activate() checking in the function irq_set_affinity_deactivated() so it will return false here. In that case, it will call irq_try_set_affinity() -> plic_irq_toggle() which will enable the interrupt to cause the CPU hang. if (irq_set_affinity_deactivated()) return 0; ... irq_try_set_affinity(data, mask, force); [ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 919.020922] rcu: 0-...0: (0 ticks this GP) idle=7d2/1/0x4002 softirq=1424/1424 fqs=105807 [ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496) [ 919.036109] Task dump for CPU 0: [ 919.039321] kworker/0:1 R running task030 2 0x0008 [ 919.046359] Workqueue: events set_brightness_delayed [ 919.051302] Call Trace: [ 919.053738] [] __schedule+0x194/0x4de [ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 982.040923] rcu: 0-...0: (0 ticks this GP) idle=7d2/1/0x4002 softirq=1424/1424 fqs=113325 [ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509) [ 982.056108] Task dump for CPU 0: [ 982.059321] kworker/0:1 R running task030 2 0x0008 [ 982.066359] Workqueue: events set_brightness_delayed [ 982.071302] Call Trace: [ 982.073739] [] __schedule+0x194/0x4de [..] After applying this patch, the CPU hang issue can be fixed. Signed-off-by: Greentime Hu --- drivers/irqchip/irq-sifive-plic.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index 4cc8a2657a6d..8a673d9cff69 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hwirq) { struct plic_priv *priv = d->host_data; + struct irq_data *irqd; irq_domain_set_info(d, irq, hwirq, _chip, d->host_data, handle_fasteoi_irq, NULL, NULL); irq_set_noprobe(irq); + irqd = irq_get_irq_data(irq); + irqd_set_single_target(irqd); + irqd_set_affinity_on_activate(irqd); irq_set_affinity(irq, >lmask); return 0; } How does this fix anything? The plic driver doesn't have an activate callback, so how does it make any difference? I get the feeling this papers over another issue. M. -- Jazz is not dead. It just smells funny...
Re: 5.10-rc0: build error in ipi.c
On 2020-10-15 18:18, Pavel Machek wrote: Hi! > > I'm getting build problems in 5.10-rc0 in config for n900. ARM board. > > > > CONFIG_SMP=y > > CONFIG_SMP_ON_UP=y On its own, this doesn't break anything with multi_v7_defconfig. I sent config off-list. Let me know if it does not arrive or if you need more info. Try this for size: diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 10a5aff4eecc..db923e0da162 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -81,6 +81,7 @@ config IRQ_FASTEOI_HIERARCHY_HANDLERS # Generic IRQ IPI support config GENERIC_IRQ_IPI + select IRQ_DOMAIN_HIERARCHY bool # Generic MSI interrupt support N, -- Jazz is not dead. It just smells funny...
Re: 5.10-rc0: build error in ipi.c
On 2020-10-15 15:23, Thomas Gleixner wrote: On Thu, Oct 15 2020 at 12:12, Pavel Machek wrote: Cc+ Marc Thanks Thomas. I'm getting build problems in 5.10-rc0 in config for n900. ARM board. CONFIG_SMP=y CONFIG_SMP_ON_UP=y On its own, this doesn't break anything with multi_v7_defconfig. CC net/devres.o kernel/irq/ipi.c: In function ‘irq_reserve_ipi’: kernel/irq/ipi.c:84:9: error: implicit declaration of function ‘__irq_domain_alloc_irqs’; did you mean ‘irq_domain_alloc_irqs’? [-Werror=implicit-function-declaration] virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE, ^~~ irq_domain_alloc_irqs cc1: some warnings being treated as errors make[2]: *** [scripts/Makefile.build:283: kernel/irq/ipi.o] Error 1 That probably comes from the ipi as irq rework for arm/arm64. Most probably. Pawel, can you please stash your config somewhere where I can get it? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] arm64: KVM: marking pages as XN in Stage-2 does not care about CTR_EL0.DIC
On 2020-10-13 13:56, limingwang (A) wrote: Hi Li, On 2020-10-12 02:08, l00484210 wrote: From: MingWang Li When testing the ARMv8.2-TTS2UXN feature, setting bits of XN is unavailable. Because the control bit CTR_EL0.DIC is set by default on system. But when CTR_EL0.DIC is set, software does not need to flush icache actively, instead of clearing XN bits.The patch, the commit id of which is 6ae4b6e0578886eb36cedbf99f04031d93f9e315, has implemented the function of CTR_EL0.DIC. Signed-off-by: MingWang Li Signed-off-by: Henglong Fan --- arch/arm64/include/asm/pgtable-prot.h | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 4d867c6446c4..5feb94882bf7 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -79,17 +79,7 @@ extern bool arm64_use_ng_mappings; __val; \ }) -#define PAGE_S2_XN \ - ({ \ - u64 __val; \ - if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) \ - __val = 0; \ - else\ - __val = PTE_S2_XN; \ - __val; \ - }) - -#define PAGE_S2__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PAGE_S2_XN) +#define PAGE_S2__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PTE_S2_XN) #define PAGE_S2_DEVICE __pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN) #define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN) I don't understand what you are trying to achieve here. This whole point of not setting XN in the page tables when DIC is present is to avoid a pointless permission fault at run time. At you noticed above, no icache invalidation is necessary. So why would you ever want to take a fault on exec the first place? M. -- Jazz is not dead. It just smells funny... Hi Marc, According to ARMv8.2-TTS2UXN feature, which extends the stage 2 translation table access permissions to provide control of whether memory is executable at EL0 independent of whether it is executable at EL1. Testing this feature in some security scenario, for example, if I want to grant execute permission to some memory only for EL0, but it will failed. Because KVM clears XN bits at first, this means that the memory can be executable in both EL0 and El1. KVM currently offers no support for this, and the only use we have for XN so far is to to ensure Data/Instruction coherency when the CPU doesn't offer it in HW. So the execute permission is not granted when the page table is created for the first time, then grant the execute permission by setting xn, based on the actual requirements. And according to spec: DIC, bit [29] Instruction cache invalidation requirements for data to instruction coherence. 0b0 Instruction cache invalidation to the Point of Unification is required for data to instruction coherence. 0b1 Instruction cache invalidation to the Point of Unification is not required for data to instruction coherence. So when DIC is set, if the memory is changed to executable, the hardware will flush icache. No. The Icache *snoops* the Dcache at all times. Which is why we don't need to trap on execution, and we can leave the guest run without any intervention. If as you said, I feel that DIC conflicts with ARMv8.2-TTS2UXN feature. There is no conflict. KVM doesn't make use of all the bells and whistle in the architecture, which is probably a good thing. If you feel that there is a need for S2UXN as a security feature, we can discuss how to expose this to the guest (because it definitely needs to know about that). But setting XN when DIC is present for no other reason than "it may be useful one day" doesn't make sense, and results in a massive performance drop on the platforms that have DIC (and I really wish they all had it). M. -- Jazz is not dead. It just smells funny...
[PATCH] arm64: tegra186: Add missing CPU PMUs
Add the description of CPU PMUs for both the Denver and A57 clusters, which enables the perf subsystem. Signed-off-by: Marc Zyngier --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 28 +++- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index fd44545e124d..6bb03668a8d3 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -1321,7 +1321,7 @@ cpus { #address-cells = <1>; #size-cells = <0>; - cpu@0 { + denver_0: cpu@0 { compatible = "nvidia,tegra186-denver"; device_type = "cpu"; i-cache-size = <0x2>; @@ -1334,7 +1334,7 @@ cpu@0 { reg = <0x000>; }; - cpu@1 { + denver_1: cpu@1 { compatible = "nvidia,tegra186-denver"; device_type = "cpu"; i-cache-size = <0x2>; @@ -1347,7 +1347,7 @@ cpu@1 { reg = <0x001>; }; - cpu@2 { + ca57_0: cpu@2 { compatible = "arm,cortex-a57"; device_type = "cpu"; i-cache-size = <0xC000>; @@ -1360,7 +1360,7 @@ cpu@2 { reg = <0x100>; }; - cpu@3 { + ca57_1: cpu@3 { compatible = "arm,cortex-a57"; device_type = "cpu"; i-cache-size = <0xC000>; @@ -1373,7 +1373,7 @@ cpu@3 { reg = <0x101>; }; - cpu@4 { + ca57_2: cpu@4 { compatible = "arm,cortex-a57"; device_type = "cpu"; i-cache-size = <0xC000>; @@ -1386,7 +1386,7 @@ cpu@4 { reg = <0x102>; }; - cpu@5 { + ca57_3: cpu@5 { compatible = "arm,cortex-a57"; device_type = "cpu"; i-cache-size = <0xC000>; @@ -1418,6 +1418,22 @@ L2_A57: l2-cache1 { }; }; + pmu_denver { + compatible = "nvidia,denver-pmu", "arm,armv8-pmuv3"; + interrupts = , +; + interrupt-affinity = <_0 _1>; + }; + + pmu_a57 { + compatible = "arm,cortex-a57-pmu", "arm,armv8-pmuv3"; + interrupts = , +, +, +; + interrupt-affinity = <_0 _1 _2 _3>; + }; + thermal-zones { a57 { polling-delay = <0>; -- 2.28.0
[PATCH] phy: tegra: xusb: Fix dangling pointer on probe failure
If, for some reason, the xusb PHY fails to probe, it leaves a dangling pointer attached to the platform device structure. This would normally be harmless, but the Tegra XHCI driver then goes and extract that pointer from the PHY device. Things go downhill from there: 8.752082] [004d554e5145533c] address between user and kernel address ranges [8.752085] Internal error: Oops: 9604 [#1] PREEMPT SMP [8.752088] Modules linked in: max77620_regulator(E+) xhci_tegra(E+) sdhci_tegra(E+) xhci_hcd(E) sdhci_pltfm(E) cqhci(E) fixed(E) usbcore(E) scsi_mod(E) sdhci(E) host1x(E+) [8.752103] CPU: 4 PID: 158 Comm: systemd-udevd Tainted: G S W E 5.9.0-rc7-00298-gf6337624c4fe #1980 [8.752105] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT) [8.752108] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--) [8.752115] pc : kobject_put+0x1c/0x21c [8.752120] lr : put_device+0x20/0x30 [8.752121] sp : ffc012eb3840 [8.752122] x29: ffc012eb3840 x28: ffc010e82638 [8.752125] x27: ffc008d56440 x26: [8.752128] x25: ff81eb508200 x24: [8.752130] x23: ff81eb538800 x22: [8.752132] x21: fdfb x20: ff81eb538810 [8.752134] x19: 3d4d554e51455300 x18: 0020 [8.752136] x17: ffc008d00270 x16: ffc008d00c94 [8.752138] x15: 0004 x14: ff81ebd4ae90 [8.752140] x13: x12: ff81eb86a4e8 [8.752142] x11: ff81eb86a480 x10: ff81eb862fea [8.752144] x9 : ffc01055fb28 x8 : ff81eb86a4a8 [8.752146] x7 : 0001 x6 : 0001 [8.752148] x5 : ff81dff8bc38 x4 : [8.752150] x3 : 0001 x2 : 0001 [8.752152] x1 : 0002 x0 : 3d4d554e51455300 [8.752155] Call trace: [8.752157] kobject_put+0x1c/0x21c [8.752160] put_device+0x20/0x30 [8.752164] tegra_xusb_padctl_put+0x24/0x3c [8.752170] tegra_xusb_probe+0x8b0/0xd10 [xhci_tegra] [8.752174] platform_drv_probe+0x60/0xb4 [8.752176] really_probe+0xf0/0x504 [8.752179] driver_probe_device+0x100/0x170 [8.752181] device_driver_attach+0xcc/0xd4 [8.752183] __driver_attach+0xb0/0x17c [8.752185] bus_for_each_dev+0x7c/0xd4 [8.752187] driver_attach+0x30/0x3c [8.752189] bus_add_driver+0x154/0x250 [8.752191] driver_register+0x84/0x140 [8.752193] __platform_driver_register+0x54/0x60 [8.752197] tegra_xusb_init+0x40/0x1000 [xhci_tegra] [8.752201] do_one_initcall+0x54/0x2d0 [8.752205] do_init_module+0x68/0x29c [8.752207] load_module+0x2178/0x26c0 [8.752209] __do_sys_finit_module+0xb0/0x120 [8.752211] __arm64_sys_finit_module+0x2c/0x40 [8.752215] el0_svc_common.constprop.0+0x80/0x240 [8.752218] do_el0_svc+0x30/0xa0 [8.752220] el0_svc+0x18/0x50 [8.752223] el0_sync_handler+0x90/0x318 [8.752225] el0_sync+0x158/0x180 [8.752230] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (3940f000) [8.752232] ---[ end trace 90f6c89d62d85ff5 ]--- Reset the pointer on probe failure fixes the issue. Fixes: 53d2a715c2403 ("phy: Add Tegra XUSB pad controller support") Signed-off-by: Marc Zyngier --- drivers/phy/tegra/xusb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c index de4a46fe1763..ad88d74c1884 100644 --- a/drivers/phy/tegra/xusb.c +++ b/drivers/phy/tegra/xusb.c @@ -1242,6 +1242,7 @@ static int tegra_xusb_padctl_probe(struct platform_device *pdev) reset: reset_control_assert(padctl->rst); remove: + platform_set_drvdata(pdev, NULL); soc->ops->remove(padctl); return err; } -- 2.28.0
[PATCH] drm/tegra: sor: Ensure regulators are disabled on teardown
The Tegra SOR driver uses the devm infrastructure to request regulators, but enables them without registering them with the infrastructure. This results in the following splat if probing fails for any odd resaon (such as dependencies not being available): [8.974187] tegra-sor 1558.sor: cannot get HDMI supply: -517 [9.414403] tegra-sor 1558.sor: failed to probe HDMI: -517 [9.421240] [ cut here ] [9.425879] WARNING: CPU: 1 PID: 164 at drivers/regulator/core.c:2089 _regulator_put.part.0+0x16c/0x174 [9.435259] Modules linked in: tegra_drm(E+) cec(E) ahci_tegra(E) drm_kms_helper(E) drm(E) libahci_platform(E) libahci(E) max77620_regulator(E) xhci_tegra(E+) sdhci_tegra(E) xhci_hcd(E) libata(E) sdhci_pltfm(E) cqhci(E) fixed(E) usbcore(E) scsi_mod(E) sdhci(E) host1x(E) [9.459211] CPU: 1 PID: 164 Comm: systemd-udevd Tainted: G SD W E 5.9.0-rc7-00298-gf6337624c4fe #1980 [9.469285] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT) [9.475202] pstate: 8005 (Nzcv daif -PAN -UAO BTYPE=--) [9.480784] pc : _regulator_put.part.0+0x16c/0x174 [9.485581] lr : regulator_put+0x44/0x60 [9.489501] sp : ffc011d837b0 [9.492814] x29: ffc011d837b0 x28: ff81dd085900 [9.498141] x27: ff81de1c8ec0 x26: ff81de1c8c10 [9.503464] x25: ff81dd085800 x24: ffc008f2c6b0 [9.508790] x23: ffc0117373f0 x22: 0005 [9.514101] x21: ff81dd085900 x20: ffc01172b098 [9.515822] ata1: SATA link down (SStatus 0 SControl 300) [9.519426] x19: ff81dd085100 x18: 0030 [9.530122] x17: x16: [9.535453] x15: x14: 038f [9.540777] x13: 0003 x12: 0040 [9.546105] x11: ff81eb80 x10: 0ae0 [9.551417] x9 : ffc0106fea24 x8 : ff81de83e6c0 [9.556728] x7 : 0018 x6 : 03c3 [9.562064] x5 : 5660 x4 : [9.567392] x3 : ffc01172b388 x2 : ff81de83db80 [9.572702] x1 : x0 : 0001 [9.578034] Call trace: [9.580494] _regulator_put.part.0+0x16c/0x174 [9.584940] regulator_put+0x44/0x60 [9.588522] devm_regulator_release+0x20/0x2c [9.592885] release_nodes+0x1c8/0x2c0 [9.596636] devres_release_all+0x44/0x6c [9.600649] really_probe+0x1ec/0x504 [9.604316] driver_probe_device+0x100/0x170 [9.608589] device_driver_attach+0xcc/0xd4 [9.612774] __driver_attach+0xb0/0x17c [9.616614] bus_for_each_dev+0x7c/0xd4 [9.620450] driver_attach+0x30/0x3c [9.624027] bus_add_driver+0x154/0x250 [9.627867] driver_register+0x84/0x140 [9.631719] __platform_register_drivers+0xa0/0x180 [9.636660] host1x_drm_init+0x60/0x1000 [tegra_drm] [9.641629] do_one_initcall+0x54/0x2d0 [9.645490] do_init_module+0x68/0x29c [9.649244] load_module+0x2178/0x26c0 [9.652997] __do_sys_finit_module+0xb0/0x120 [9.657356] __arm64_sys_finit_module+0x2c/0x40 [9.661902] el0_svc_common.constprop.0+0x80/0x240 [9.666701] do_el0_svc+0x30/0xa0 [9.670022] el0_svc+0x18/0x50 [9.673081] el0_sync_handler+0x90/0x318 [9.677006] el0_sync+0x158/0x180 [9.680324] ---[ end trace 90f6c89d62d85ff6 ]--- Instead, let's register a callback that will disable the regulators on teardown. This allows for the removal of the .remove callbacks, which are not needed anymore. Signed-off-by: Marc Zyngier --- drivers/gpu/drm/tegra/sor.c | 59 +++-- 1 file changed, 24 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c index 45b5258c77a2..39e6b32f6c10 100644 --- a/drivers/gpu/drm/tegra/sor.c +++ b/drivers/gpu/drm/tegra/sor.c @@ -397,7 +397,6 @@ struct tegra_sor; struct tegra_sor_ops { const char *name; int (*probe)(struct tegra_sor *sor); - int (*remove)(struct tegra_sor *sor); void (*audio_enable)(struct tegra_sor *sor); void (*audio_disable)(struct tegra_sor *sor); }; @@ -2942,6 +2941,24 @@ static const struct drm_encoder_helper_funcs tegra_sor_dp_helpers = { .atomic_check = tegra_sor_encoder_atomic_check, }; +static void tegra_sor_disable_regulator(void *data) +{ + struct regulator *reg = data; + + regulator_disable(reg); +} + +static int tegra_sor_enable_regulator(struct tegra_sor *sor, struct regulator *reg) +{ + int err; + + err = regulator_enable(reg); + if (err) + return err; + + return devm_add_action_or_reset(sor->dev, tegra_sor_disable_regulator, reg); +} + static int tegra_sor_hdmi_probe(struct tegra_sor *sor) { int err; @@ -2953,7 +2970,7 @@ static int tegra_sor_hdmi_probe(struct tegra_sor *sor) return PTR_ERR(sor->avdd_io_supply); } - err = regulator_enable(sor->avdd_
Re: [PATCH v2 2/2] irqchip/ti-sci-inta: Add support for unmapped event handling
On 2020-10-09 09:58, Peter Ujfalusi wrote: Marc, [...] The design of irqchip/irq-ti-sci-inta.c, soc/ti/ti_sci_inta_msi.c and irqchip/irq-ti-sci-intr.c created to handle the interrupt needs present in K3 devices with NAVSS. DMSS of newer K3 devices extends and simplifies the NAVSS components and a big part of that change was done with the INTA and DMAs. System Firmware also changed to adopt to these changes. As an example, let's assume that we want an interrupt from a ring. The high level of the events in this case are: NAVSS: 1.1 ring generates an internal signal (up or down) 1.2 the ringacc will send a (mapped) Global Event to INTA 1.3 When INTA receives the global event, it will signal it's outgoing VINT to INTR 1.4 INTR will trigger a host interrupt. DMSS 1.1 ring generates an internal signal (up or down) 1.2 the DMA (ring is now part of the DMA) will send an unmapped event to INTA 1.3 When INTA receives the unmapped event, it will send a (mapped) Global Event to itself 1.4 When INTA receives the global event, it will signal it's outgoing VINT to INTR 1.5 INTR will trigger a host interrupt. The API from sysfw is the same to configure the global events and VINT, but we need to use the INTA's tisci device identification number to let sysfw know that the Global event number can be programmed into INTA's unmapped event steering register. The DMA no longer have this register, it sends unmapped event to INTA. The unmapped event number is fixed per sources, they will arrive at the specific unmapped event configuration register of INTA. INTA itself does not know which source are allocated to be used by Linux, the allocation is for the DMA resources and only the DMA driver knows which channels, rings, flows can be used and can ask the INTA MSI domain to create interrupts for those. By handling the ti,unmapped-event-sources the INTA driver can make decision on the correct tisci dev_id to be used for the sysfw API to where the global event must be configured and the client drivers does not need to know how things are working under the hood. There are components in DMSS which use is exactly how they worked within NAVSS, they are not using unmapped events. Ringacc comes to mind first. I can add a comment block to explain the nature of unmapped events and the reason why we need to do what we do. Would this be acceptable? That'd be useful, as long as it is shorter than the above. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] arm64: KVM: marking pages as XN in Stage-2 does not care about CTR_EL0.DIC
Hi Li, On 2020-10-12 02:08, l00484210 wrote: From: MingWang Li When testing the ARMv8.2-TTS2UXN feature, setting bits of XN is unavailable. Because the control bit CTR_EL0.DIC is set by default on system. But when CTR_EL0.DIC is set, software does not need to flush icache actively, instead of clearing XN bits.The patch, the commit id of which is 6ae4b6e0578886eb36cedbf99f04031d93f9e315, has implemented the function of CTR_EL0.DIC. Signed-off-by: MingWang Li Signed-off-by: Henglong Fan --- arch/arm64/include/asm/pgtable-prot.h | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 4d867c6446c4..5feb94882bf7 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -79,17 +79,7 @@ extern bool arm64_use_ng_mappings; __val; \ }) -#define PAGE_S2_XN \ - ({ \ - u64 __val; \ - if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) \ - __val = 0; \ - else\ - __val = PTE_S2_XN; \ - __val; \ - }) - -#define PAGE_S2__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PAGE_S2_XN) +#define PAGE_S2__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PTE_S2_XN) #define PAGE_S2_DEVICE __pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN) #define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN) I don't understand what you are trying to achieve here. This whole point of not setting XN in the page tables when DIC is present is to avoid a pointless permission fault at run time. At you noticed above, no icache invalidation is necessary. So why would you ever want to take a fault on exec the first place? M. -- Jazz is not dead. It just smells funny...
[tip: irq/core] arm: Move ipi_teardown() to a CONFIG_HOTPLUG_CPU section
The following commit has been merged into the irq/core branch of tip: Commit-ID: ac15a54e03d13686d2fc016a88311801b0734046 Gitweb: https://git.kernel.org/tip/ac15a54e03d13686d2fc016a88311801b0734046 Author:Marc Zyngier AuthorDate:Fri, 18 Sep 2020 17:19:46 +01:00 Committer: Marc Zyngier CommitterDate: Fri, 18 Sep 2020 17:40:48 +01:00 arm: Move ipi_teardown() to a CONFIG_HOTPLUG_CPU section ipi_teardown() is only used when CONFIG_HOTPLUG_CPU is enabled. Move the function to a location guarded by this config option. Signed-off-by: Marc Zyngier --- arch/arm/kernel/smp.c | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 00327fa..8425da5 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -85,7 +85,6 @@ static int nr_ipi __read_mostly = NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __read_mostly; static void ipi_setup(int cpu); -static void ipi_teardown(int cpu); static DECLARE_COMPLETION(cpu_running); @@ -236,6 +235,17 @@ int platform_can_hotplug_cpu(unsigned int cpu) return cpu != 0; } +static void ipi_teardown(int cpu) +{ + int i; + + if (WARN_ON_ONCE(!ipi_irq_base)) + return; + + for (i = 0; i < nr_ipi; i++) + disable_percpu_irq(ipi_irq_base + i); +} + /* * __cpu_disable runs on the processor to be shutdown. */ @@ -707,17 +717,6 @@ static void ipi_setup(int cpu) enable_percpu_irq(ipi_irq_base + i, 0); } -static void ipi_teardown(int cpu) -{ - int i; - - if (WARN_ON_ONCE(!ipi_irq_base)) - return; - - for (i = 0; i < nr_ipi; i++) - disable_percpu_irq(ipi_irq_base + i); -} - void __init set_smp_ipi_range(int ipi_base, int n) { int i;
[tip: irq/core] gpio: tegra186: Allow optional irq parent callbacks
The following commit has been merged into the irq/core branch of tip: Commit-ID: 986ec63d4482292570b579ac98b151acf8bdd1de Gitweb: https://git.kernel.org/tip/986ec63d4482292570b579ac98b151acf8bdd1de Author:Marc Zyngier AuthorDate:Mon, 05 Oct 2020 10:27:27 +01:00 Committer: Marc Zyngier CommitterDate: Sat, 10 Oct 2020 12:12:10 +01:00 gpio: tegra186: Allow optional irq parent callbacks Make the tegra186 GPIO driver resistent to variable depth interrupt hierarchy, which we are about to introduce. No functionnal change yet. Signed-off-by: Marc Zyngier --- drivers/gpio/gpio-tegra186.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-tegra186.c b/drivers/gpio/gpio-tegra186.c index 178e912..9500074 100644 --- a/drivers/gpio/gpio-tegra186.c +++ b/drivers/gpio/gpio-tegra186.c @@ -430,7 +430,18 @@ static int tegra186_irq_set_type(struct irq_data *data, unsigned int type) else irq_set_handler_locked(data, handle_edge_irq); - return irq_chip_set_type_parent(data, type); + if (data->parent_data) + return irq_chip_set_type_parent(data, type); + + return 0; +} + +static int tegra186_irq_set_wake(struct irq_data *data, unsigned int on) +{ + if (data->parent_data) + return irq_chip_set_wake_parent(data, on); + + return 0; } static void tegra186_gpio_irq(struct irq_desc *desc) @@ -678,7 +689,7 @@ static int tegra186_gpio_probe(struct platform_device *pdev) gpio->intc.irq_mask = tegra186_irq_mask; gpio->intc.irq_unmask = tegra186_irq_unmask; gpio->intc.irq_set_type = tegra186_irq_set_type; - gpio->intc.irq_set_wake = irq_chip_set_wake_parent; + gpio->intc.irq_set_wake = tegra186_irq_set_wake; irq = >gpio.irq; irq->chip = >intc;
[tip: irq/core] genirq/irqdomain: Allow partial trimming of irq_data hierarchy
The following commit has been merged into the irq/core branch of tip: Commit-ID: 55567976629e58fde28fb70612ca73228271eef2 Gitweb: https://git.kernel.org/tip/55567976629e58fde28fb70612ca73228271eef2 Author:Marc Zyngier AuthorDate:Tue, 06 Oct 2020 10:10:20 +01:00 Committer: Marc Zyngier CommitterDate: Sat, 10 Oct 2020 12:12:10 +01:00 genirq/irqdomain: Allow partial trimming of irq_data hierarchy It appears that some HW is ugly enough that not all the interrupts connected to a particular interrupt controller end up with the same hierarchy depth (some of them are terminated early). This leaves the irqchip hacker with only two choices, both equally bad: - create discrete domain chains, one for each "hierarchy depth", which is very hard to maintain - create fake hierarchy levels for the shallow paths, leading to all kind of problems (what are the safe hwirq values for these fake levels?) Implement the ability to cut short a single interrupt hierarchy from a level marked as being disconnected by using the new irq_domain_disconnect_hierarchy() helper. The irqdomain allocation code will then perform the trimming Signed-off-by: Marc Zyngier --- include/linux/irqdomain.h | 3 +- kernel/irq/irqdomain.c| 99 +++-- 2 files changed, 98 insertions(+), 4 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index b37350c..a52b095 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -509,6 +509,9 @@ extern void irq_domain_free_irqs_parent(struct irq_domain *domain, unsigned int irq_base, unsigned int nr_irqs); +extern int irq_domain_disconnect_hierarchy(struct irq_domain *domain, + unsigned int virq); + static inline bool irq_domain_is_hierarchy(struct irq_domain *domain) { return domain->flags & IRQ_DOMAIN_FLAG_HIERARCHY; diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 76cd7eb..cf8b374 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -1136,6 +1136,17 @@ static struct irq_data *irq_domain_insert_irq_data(struct irq_domain *domain, return irq_data; } +static void __irq_domain_free_hierarchy(struct irq_data *irq_data) +{ + struct irq_data *tmp; + + while (irq_data) { + tmp = irq_data; + irq_data = irq_data->parent_data; + kfree(tmp); + } +} + static void irq_domain_free_irq_data(unsigned int virq, unsigned int nr_irqs) { struct irq_data *irq_data, *tmp; @@ -1147,12 +1158,83 @@ static void irq_domain_free_irq_data(unsigned int virq, unsigned int nr_irqs) irq_data->parent_data = NULL; irq_data->domain = NULL; - while (tmp) { - irq_data = tmp; - tmp = tmp->parent_data; - kfree(irq_data); + __irq_domain_free_hierarchy(tmp); + } +} + +/** + * irq_domain_disconnect_hierarchy - Mark the first unused level of a hierarchy + * @domain:IRQ domain from which the hierarchy is to be disconnected + * @virq: IRQ number where the hierarchy is to be trimmed + * + * Marks the @virq level belonging to @domain as disconnected. + * Returns -EINVAL if @virq doesn't have a valid irq_data pointing + * to @domain. + * + * Its only use is to be able to trim levels of hierarchy that do not + * have any real meaning for this interrupt, and that the driver marks + * as such from its .alloc() callback. + */ +int irq_domain_disconnect_hierarchy(struct irq_domain *domain, + unsigned int virq) +{ + struct irq_data *irqd; + + irqd = irq_domain_get_irq_data(domain, virq); + if (!irqd) + return -EINVAL; + + irqd->chip = ERR_PTR(-ENOTCONN); + return 0; +} + +static int irq_domain_trim_hierarchy(unsigned int virq) +{ + struct irq_data *tail, *irqd, *irq_data; + + irq_data = irq_get_irq_data(virq); + tail = NULL; + + /* The first entry must have a valid irqchip */ + if (!irq_data->chip || IS_ERR(irq_data->chip)) + return -EINVAL; + + /* +* Validate that the irq_data chain is sane in the presence of +* a hierarchy trimming marker. +*/ + for (irqd = irq_data->parent_data; irqd; irq_data = irqd, irqd = irqd->parent_data) { + /* Can't have a valid irqchip after a trim marker */ + if (irqd->chip && tail) + return -EINVAL; + + /* Can't have an empty irqchip before a trim marker */ + if (!irqd->chip && !tail) + return -EINVAL; + + if (IS_ERR(irqd->chip)) { + /* Only -ENOTCONN is a valid trim m
[tip: irq/core] irqchip/bcm2836: Configure mailbox interrupts as standard interrupts
The following commit has been merged into the irq/core branch of tip: Commit-ID: 0809ae724904c3c5dbdddf4169d48aac9c6fcdc8 Gitweb: https://git.kernel.org/tip/0809ae724904c3c5dbdddf4169d48aac9c6fcdc8 Author:Marc Zyngier AuthorDate:Tue, 05 May 2020 12:59:04 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 17 Sep 2020 16:37:27 +01:00 irqchip/bcm2836: Configure mailbox interrupts as standard interrupts In order to switch the bcm2836 driver to privide standard interrupts for IPIs, it first needs to stop lying about the way things work. The mailbox interrupt is actually a multiplexer, with enough bits to store 32 pending interrupts per CPU. So let's turn it into a chained irqchip. Once this is done, we can instanciate the corresponding IPIs, and pass them to the architecture code. Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-bcm2836.c | 151 +++-- 1 file changed, 125 insertions(+), 26 deletions(-) diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c index 2038693..85df6dd 100644 --- a/drivers/irqchip/irq-bcm2836.c +++ b/drivers/irqchip/irq-bcm2836.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -89,12 +90,24 @@ static struct irq_chip bcm2836_arm_irqchip_gpu = { .irq_unmask = bcm2836_arm_irqchip_unmask_gpu_irq, }; +static void bcm2836_arm_irqchip_dummy_op(struct irq_data *d) +{ +} + +static struct irq_chip bcm2836_arm_irqchip_dummy = { + .name = "bcm2836-dummy", + .irq_eoi= bcm2836_arm_irqchip_dummy_op, +}; + static int bcm2836_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) { struct irq_chip *chip; switch (hw) { + case LOCAL_IRQ_MAILBOX0: + chip = _arm_irqchip_dummy; + break; case LOCAL_IRQ_CNTPSIRQ: case LOCAL_IRQ_CNTPNSIRQ: case LOCAL_IRQ_CNTHPIRQ: @@ -127,17 +140,7 @@ __exception_irq_entry bcm2836_arm_irqchip_handle_irq(struct pt_regs *regs) u32 stat; stat = readl_relaxed(intc.base + LOCAL_IRQ_PENDING0 + 4 * cpu); - if (stat & BIT(LOCAL_IRQ_MAILBOX0)) { -#ifdef CONFIG_SMP - void __iomem *mailbox0 = (intc.base + - LOCAL_MAILBOX0_CLR0 + 16 * cpu); - u32 mbox_val = readl(mailbox0); - u32 ipi = ffs(mbox_val) - 1; - - writel(1 << ipi, mailbox0); - handle_IPI(ipi, regs); -#endif - } else if (stat) { + if (stat) { u32 hwirq = ffs(stat) - 1; handle_domain_irq(intc.domain, hwirq, regs); @@ -145,8 +148,35 @@ __exception_irq_entry bcm2836_arm_irqchip_handle_irq(struct pt_regs *regs) } #ifdef CONFIG_SMP -static void bcm2836_arm_irqchip_send_ipi(const struct cpumask *mask, -unsigned int ipi) +static struct irq_domain *ipi_domain; + +static void bcm2836_arm_irqchip_handle_ipi(struct irq_desc *desc) +{ + struct irq_chip *chip = irq_desc_get_chip(desc); + int cpu = smp_processor_id(); + u32 mbox_val; + + chained_irq_enter(chip, desc); + + mbox_val = readl_relaxed(intc.base + LOCAL_MAILBOX0_CLR0 + 16 * cpu); + if (mbox_val) { + int hwirq = ffs(mbox_val) - 1; + generic_handle_irq(irq_find_mapping(ipi_domain, hwirq)); + } + + chained_irq_exit(chip, desc); +} + +static void bcm2836_arm_irqchip_ipi_eoi(struct irq_data *d) +{ + int cpu = smp_processor_id(); + + writel_relaxed(BIT(d->hwirq), + intc.base + LOCAL_MAILBOX0_CLR0 + 16 * cpu); +} + +static void bcm2836_arm_irqchip_ipi_send_mask(struct irq_data *d, + const struct cpumask *mask) { int cpu; void __iomem *mailbox0_base = intc.base + LOCAL_MAILBOX0_SET0; @@ -157,11 +187,45 @@ static void bcm2836_arm_irqchip_send_ipi(const struct cpumask *mask, */ smp_wmb(); - for_each_cpu(cpu, mask) { - writel(1 << ipi, mailbox0_base + 16 * cpu); + for_each_cpu(cpu, mask) + writel_relaxed(BIT(d->hwirq), mailbox0_base + 16 * cpu); +} + +static struct irq_chip bcm2836_arm_irqchip_ipi = { + .name = "IPI", + .irq_eoi= bcm2836_arm_irqchip_ipi_eoi, + .ipi_send_mask = bcm2836_arm_irqchip_ipi_send_mask, +}; + +static int bcm2836_arm_irqchip_ipi_alloc(struct irq_domain *d, +unsigned int virq, +unsigned int nr_irqs, void *args) +{ + int i; + + for (i = 0; i < nr_irqs; i++) { + irq_set_percpu_devid(virq + i); + irq_domain_set_info(d, virq + i, i, _arm_irqchip_ipi, + d->host_data, +
[tip: irq/core] irqchip/gic-v3: Configure SGIs as standard interrupts
The following commit has been merged into the irq/core branch of tip: Commit-ID: 64b499d8df40dadb1818ad9f74c4546951b37a8f Gitweb: https://git.kernel.org/tip/64b499d8df40dadb1818ad9f74c4546951b37a8f Author:Marc Zyngier AuthorDate:Sat, 25 Apr 2020 15:24:01 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 17 Sep 2020 16:37:26 +01:00 irqchip/gic-v3: Configure SGIs as standard interrupts Change the way we deal with GICv3 SGIs by turning them into proper IRQs, and calling into the arch code to register the interrupt range instead of a callback. Reviewed-by: Valentin Schneider Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v3.c | 94 ++- 1 file changed, 51 insertions(+), 43 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index f7c99a3..84ceb63 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -36,6 +36,8 @@ #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996(1ULL << 0) #define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539 (1ULL << 1) +#define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) + struct redist_region { void __iomem*redist_base; phys_addr_t phys_base; @@ -383,7 +385,7 @@ static int gic_irq_set_irqchip_state(struct irq_data *d, { u32 reg; - if (d->hwirq >= 8192) /* PPI/SPI only */ + if (d->hwirq >= 8192) /* SGI/PPI/SPI only */ return -EINVAL; switch (which) { @@ -550,12 +552,12 @@ static int gic_set_type(struct irq_data *d, unsigned int type) u32 offset, index; int ret; - /* Interrupt configuration for SGIs can't be changed */ - if (irq < 16) - return -EINVAL; - range = get_intid_range(d); + /* Interrupt configuration for SGIs can't be changed */ + if (range == SGI_RANGE) + return type != IRQ_TYPE_EDGE_RISING ? -EINVAL : 0; + /* SPIs have restrictions on the supported types */ if ((range == SPI_RANGE || range == ESPI_RANGE) && type != IRQ_TYPE_LEVEL_HIGH && type != IRQ_TYPE_EDGE_RISING) @@ -583,6 +585,9 @@ static int gic_set_type(struct irq_data *d, unsigned int type) static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) { + if (get_intid_range(d) == SGI_RANGE) + return -EINVAL; + if (vcpu) irqd_set_forwarded_to_vcpu(d); else @@ -657,38 +662,14 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs if ((irqnr >= 1020 && irqnr <= 1023)) return; - /* Treat anything but SGIs in a uniform way */ - if (likely(irqnr > 15)) { - int err; - - if (static_branch_likely(_deactivate_key)) - gic_write_eoir(irqnr); - else - isb(); - - err = handle_domain_irq(gic_data.domain, irqnr, regs); - if (err) { - WARN_ONCE(true, "Unexpected interrupt received!\n"); - gic_deactivate_unhandled(irqnr); - } - return; - } - if (irqnr < 16) { + if (static_branch_likely(_deactivate_key)) gic_write_eoir(irqnr); - if (static_branch_likely(_deactivate_key)) - gic_write_dir(irqnr); -#ifdef CONFIG_SMP - /* -* Unlike GICv2, we don't need an smp_rmb() here. -* The control dependency from gic_read_iar to -* the ISB in gic_write_eoir is enough to ensure -* that any shared data read by handle_IPI will -* be read after the ACK. -*/ - handle_IPI(irqnr, regs); -#else - WARN_ONCE(true, "Unexpected SGI received!\n"); -#endif + else + isb(); + + if (handle_domain_irq(gic_data.domain, irqnr, regs)) { + WARN_ONCE(true, "Unexpected interrupt received!\n"); + gic_deactivate_unhandled(irqnr); } } @@ -1136,11 +1117,11 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq) gic_write_sgi1r(val); } -static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) +static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) { int cpu; - if (WARN_ON(irq >= 16)) + if (WARN_ON(d->hwirq >= 16)) return; /* @@ -1154,7 +1135,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) u16 tlist; tlist = gic_compute_target_list(, mask, cluster_id); - gic_send_sgi(cluster_id, tlist, irq); + gic_send_sgi(cluster_id, tlist, d->hwirq); }
[tip: irq/core] irqchip/gic: Cleanup Franken-GIC handling
The following commit has been merged into the irq/core branch of tip: Commit-ID: 8594c3b85171b6f68e34e07b533ec2f1bf7fb065 Gitweb: https://git.kernel.org/tip/8594c3b85171b6f68e34e07b533ec2f1bf7fb065 Author:Marc Zyngier AuthorDate:Tue, 15 Sep 2020 14:03:51 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 17 Sep 2020 16:37:29 +01:00 irqchip/gic: Cleanup Franken-GIC handling Introduce a static key identifying Samsung's unique creation, allowing to replace the indirect call to compute the base addresses with a simple test on the static key. Faster, cheaper, negative diffstat. Tested-by: Marek Szyprowski Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic.c | 41 +++--- 1 file changed, 12 insertions(+), 29 deletions(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 66671e1..30edcca 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -83,9 +83,6 @@ struct gic_chip_data { #endif struct irq_domain *domain; unsigned int gic_irqs; -#ifdef CONFIG_GIC_NON_BANKED - void __iomem *(*get_base)(union gic_base *); -#endif }; #ifdef CONFIG_BL_SWITCHER @@ -127,35 +124,27 @@ static struct gic_kvm_info gic_v2_kvm_info; static DEFINE_PER_CPU(u32, sgi_intid); #ifdef CONFIG_GIC_NON_BANKED -static void __iomem *gic_get_percpu_base(union gic_base *base) -{ - return raw_cpu_read(*base->percpu_base); -} +static DEFINE_STATIC_KEY_FALSE(frankengic_key); -static void __iomem *gic_get_common_base(union gic_base *base) +static void enable_frankengic(void) { - return base->common_base; + static_branch_enable(_key); } -static inline void __iomem *gic_data_dist_base(struct gic_chip_data *data) +static inline void __iomem *__get_base(union gic_base *base) { - return data->get_base(>dist_base); -} + if (static_branch_unlikely(_key)) + return raw_cpu_read(*base->percpu_base); -static inline void __iomem *gic_data_cpu_base(struct gic_chip_data *data) -{ - return data->get_base(>cpu_base); + return base->common_base; } -static inline void gic_set_base_accessor(struct gic_chip_data *data, -void __iomem *(*f)(union gic_base *)) -{ - data->get_base = f; -} +#define gic_data_dist_base(d) __get_base(&(d)->dist_base) +#define gic_data_cpu_base(d) __get_base(&(d)->cpu_base) #else #define gic_data_dist_base(d) ((d)->dist_base.common_base) #define gic_data_cpu_base(d) ((d)->cpu_base.common_base) -#define gic_set_base_accessor(d, f) +#define enable_frankengic()do { } while(0) #endif static inline void __iomem *gic_dist_base(struct irq_data *d) @@ -307,7 +296,7 @@ static int gic_set_type(struct irq_data *d, unsigned int type) /* Interrupt configuration for SGIs can't be changed */ if (gicirq < 16) - return type == IRQ_TYPE_EDGE_RISING ? 0 : -EINVAL; + return type != IRQ_TYPE_EDGE_RISING ? -EINVAL : 0; /* SPIs have restrictions on the supported types */ if (gicirq >= 32 && type != IRQ_TYPE_LEVEL_HIGH && @@ -720,11 +709,6 @@ static int gic_notifier(struct notifier_block *self, unsigned long cmd,void *v) int i; for (i = 0; i < CONFIG_ARM_GIC_MAX_NR; i++) { -#ifdef CONFIG_GIC_NON_BANKED - /* Skip over unused GICs */ - if (!gic_data[i].get_base) - continue; -#endif switch (cmd) { case CPU_PM_ENTER: gic_cpu_save(_data[i]); @@ -1165,7 +1149,7 @@ static int gic_init_bases(struct gic_chip_data *gic, gic->raw_cpu_base + offset; } - gic_set_base_accessor(gic, gic_get_percpu_base); + enable_frankengic(); } else { /* Normal, sane GIC... */ WARN(gic->percpu_offset, @@ -1173,7 +1157,6 @@ static int gic_init_bases(struct gic_chip_data *gic, gic->percpu_offset); gic->dist_base.common_base = gic->raw_dist_base; gic->cpu_base.common_base = gic->raw_cpu_base; - gic_set_base_accessor(gic, gic_get_common_base); } /*
[tip: irq/core] arm64: Kill __smp_cross_call and co
The following commit has been merged into the irq/core branch of tip: Commit-ID: 5cebfd2d47c214f69d918e3d34ad183c061eddb2 Gitweb: https://git.kernel.org/tip/5cebfd2d47c214f69d918e3d34ad183c061eddb2 Author:Marc Zyngier AuthorDate:Sat, 09 May 2020 14:00:23 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 17 Sep 2020 16:37:28 +01:00 arm64: Kill __smp_cross_call and co The old IPI registration interface is now unused on arm64, so let's get rid of it. Reviewed-by: Valentin Schneider Acked-by: Catalin Marinas Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/irq_work.h | 4 +--- arch/arm64/include/asm/smp.h | 12 +- arch/arm64/kernel/smp.c | 38 +- 3 files changed, 8 insertions(+), 46 deletions(-) diff --git a/arch/arm64/include/asm/irq_work.h b/arch/arm64/include/asm/irq_work.h index 8a1ef19..a102028 100644 --- a/arch/arm64/include/asm/irq_work.h +++ b/arch/arm64/include/asm/irq_work.h @@ -2,11 +2,9 @@ #ifndef __ASM_IRQ_WORK_H #define __ASM_IRQ_WORK_H -#include - static inline bool arch_irq_work_has_interrupt(void) { - return !!__smp_cross_call; + return true; } #endif /* __ASM_IRQ_WORK_H */ diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 57c5db1..c298ad0 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -61,24 +61,12 @@ struct seq_file; extern void show_ipi_list(struct seq_file *p, int prec); /* - * Called from C code, this handles an IPI. - */ -extern void handle_IPI(int ipinr, struct pt_regs *regs); - -/* * Discover the set of possible CPUs and determine their * SMP operations. */ extern void smp_init_cpus(void); /* - * Provide a function to raise an IPI cross call on CPUs in callmap. - */ -extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int)); - -extern void (*__smp_cross_call)(const struct cpumask *, unsigned int); - -/* * Register IPI interrupts with the arch SMP code */ extern void set_smp_ipi_range(int ipi_base, int nr_ipi); diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 00c9db1..58fb155 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -782,13 +782,6 @@ void __init smp_prepare_cpus(unsigned int max_cpus) } } -void (*__smp_cross_call)(const struct cpumask *, unsigned int); - -void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int)) -{ - __smp_cross_call = fn; -} - static const char *ipi_types[NR_IPI] __tracepoint_string = { #define S(x,s) [x] = s S(IPI_RESCHEDULE, "Rescheduling interrupts"), @@ -800,11 +793,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { S(IPI_WAKEUP, "CPU wake-up interrupts"), }; -static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) -{ - trace_ipi_raise(target, ipi_types[ipinr]); - __smp_cross_call(target, ipinr); -} +static void smp_cross_call(const struct cpumask *target, unsigned int ipinr); void show_ipi_list(struct seq_file *p, int prec) { @@ -851,8 +840,7 @@ void arch_send_wakeup_ipi_mask(const struct cpumask *mask) #ifdef CONFIG_IRQ_WORK void arch_irq_work_raise(void) { - if (__smp_cross_call) - smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK); + smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK); } #endif @@ -959,34 +947,23 @@ static void do_handle_IPI(int ipinr) trace_ipi_exit_rcuidle(ipi_types[ipinr]); } -/* Legacy version, should go away once all irqchips have been converted */ -void handle_IPI(int ipinr, struct pt_regs *regs) -{ - struct pt_regs *old_regs = set_irq_regs(regs); - - irq_enter(); - do_handle_IPI(ipinr); - irq_exit(); - - set_irq_regs(old_regs); -} - static irqreturn_t ipi_handler(int irq, void *data) { do_handle_IPI(irq - ipi_irq_base); return IRQ_HANDLED; } -static void ipi_send(const struct cpumask *target, unsigned int ipi) +static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) { - __ipi_send_mask(ipi_desc[ipi], target); + trace_ipi_raise(target, ipi_types[ipinr]); + __ipi_send_mask(ipi_desc[ipinr], target); } static void ipi_setup(int cpu) { int i; - if (!ipi_irq_base) + if (WARN_ON_ONCE(!ipi_irq_base)) return; for (i = 0; i < nr_ipi; i++) @@ -997,7 +974,7 @@ static void ipi_teardown(int cpu) { int i; - if (!ipi_irq_base) + if (WARN_ON_ONCE(!ipi_irq_base)) return; for (i = 0; i < nr_ipi; i++) @@ -1023,7 +1000,6 @@ void __init set_smp_ipi_range(int ipi_base, int n) } ipi_irq_base = ipi_base; - __smp_cross_call = ipi_send; /* Setup the boot CPU immediately */ ipi_setup(smp_processor_id());
[tip: irq/core] ARM: Kill __smp_cross_call and co
The following commit has been merged into the irq/core branch of tip: Commit-ID: 8aa837cb7a032884c787b15de81f7d9de8af0869 Gitweb: https://git.kernel.org/tip/8aa837cb7a032884c787b15de81f7d9de8af0869 Author:Marc Zyngier AuthorDate:Mon, 22 Jun 2020 22:15:54 +01:00 Committer: Marc Zyngier CommitterDate: Thu, 17 Sep 2020 16:37:28 +01:00 ARM: Kill __smp_cross_call and co The old IPI registration interface is now unused on arm, so let's get rid of it. Reviewed-by: Valentin Schneider Signed-off-by: Marc Zyngier --- arch/arm/include/asm/smp.h | 6 -- arch/arm/kernel/smp.c | 26 +++--- 2 files changed, 7 insertions(+), 25 deletions(-) diff --git a/arch/arm/include/asm/smp.h b/arch/arm/include/asm/smp.h index 0e29730..0ca55a6 100644 --- a/arch/arm/include/asm/smp.h +++ b/arch/arm/include/asm/smp.h @@ -39,12 +39,6 @@ void handle_IPI(int ipinr, struct pt_regs *regs); */ extern void smp_init_cpus(void); - -/* - * Provide a function to raise an IPI cross call on CPUs in callmap. - */ -extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int)); - /* * Register IPI interrupts with the arch SMP code */ diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index f21f784..d51e649 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -511,14 +511,6 @@ void __init smp_prepare_cpus(unsigned int max_cpus) } } -static void (*__smp_cross_call)(const struct cpumask *, unsigned int); - -void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int)) -{ - if (!__smp_cross_call) - __smp_cross_call = fn; -} - static const char *ipi_types[NR_IPI] __tracepoint_string = { #define S(x,s) [x] = s S(IPI_WAKEUP, "CPU wakeup interrupts"), @@ -530,11 +522,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { S(IPI_COMPLETION, "completion interrupts"), }; -static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) -{ - trace_ipi_raise_rcuidle(target, ipi_types[ipinr]); - __smp_cross_call(target, ipinr); -} +static void smp_cross_call(const struct cpumask *target, unsigned int ipinr); void show_ipi_list(struct seq_file *p, int prec) { @@ -713,16 +701,17 @@ static irqreturn_t ipi_handler(int irq, void *data) return IRQ_HANDLED; } -static void ipi_send(const struct cpumask *target, unsigned int ipi) +static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) { - __ipi_send_mask(ipi_desc[ipi], target); + trace_ipi_raise_rcuidle(target, ipi_types[ipinr]); + __ipi_send_mask(ipi_desc[ipinr], target); } static void ipi_setup(int cpu) { int i; - if (!ipi_irq_base) + if (WARN_ON_ONCE(!ipi_irq_base)) return; for (i = 0; i < nr_ipi; i++) @@ -733,7 +722,7 @@ static void ipi_teardown(int cpu) { int i; - if (!ipi_irq_base) + if (WARN_ON_ONCE(!ipi_irq_base)) return; for (i = 0; i < nr_ipi; i++) @@ -759,7 +748,6 @@ void __init set_smp_ipi_range(int ipi_base, int n) } ipi_irq_base = ipi_base; - set_smp_cross_call(ipi_send); /* Setup the boot CPU immediately */ ipi_setup(smp_processor_id()); @@ -872,7 +860,7 @@ core_initcall(register_cpufreq_notifier); static void raise_nmi(cpumask_t *mask) { - __smp_cross_call(mask, IPI_CPU_BACKTRACE); + __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask); } void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)