Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED
On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote: > > From: Baolu Lu > > > > > --- a/include/linux/dmar.h > > > +++ b/include/linux/dmar.h > > > @@ -19,7 +19,7 @@ > > > struct acpi_dmar_header; > > > > > > #ifdef CONFIG_X86 > > > -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS > > > +# define DMAR_UNITS_SUPPORTED640 > > > #else > > > # defineDMAR_UNITS_SUPPORTED64 > > > #endif > > ... is it necessary to permanently do 10x increase which wastes memory > on most platforms which won't have such need. I was just looking at that. It mostly adds about 3½ KiB to each struct dmar_domain. I think the only actual static array is the dmar_seq_ids bitmap which grows to 640 *bits* which is fairly negligible, and the main growth is that it adds about 3½ KiB to each struct dmar_domain for the iommu_refcnt[] and iommu_did[] arrays. > Does it make more sense to have a configurable approach similar to > CONFIG_NR_CPUS? or even better can we just replace those static > arrays with dynamic allocation so removing this restriction > completely? Hotplug makes that fun, but I suppose you only need to grow the array in a given struct dmar_domain if you actually add a device to it that's behind a newly added IOMMU. I don't know if the complexity of making it fully dynamic is worth it though. We could make it a config option, and/or a command line option (perhaps automatically derived from CONFIG_NR_CPUS). If it wasn't for hotplug, I think we'd know the right number by the time we actually need it anyway, wouldn't we? Can we have a heuristic for how many DMAR units are likely to be hotplugged? Is it as simple as the ratio of present to not-yet-present CPUs in MADT? > another nit: dmar is intel specific thus CONFIG_X86 is always true. DMAR exists on IA64 too. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled
On 17 March 2021 13:32:35 GMT, Joerg Roedel wrote: >On Wed, Mar 17, 2021 at 11:47:11AM +0000, David Woodhouse wrote: >> If you've already moved the Stoney Ridge check out of the way, >there's >> no real reason why you can't just set >init_state=IOMMU_CMDLINE_DISABLED >> directly from parse_amd_iommu_options(), is there? Then you don't >need >> the condition here at all? > >True, there is even more room for optimization. The amd_iommu_disabled >variable can go away entirely, including its checks in >early_amd_iommu_init(). I will do a patch-set on-top of this for v5.13 >which does more cleanups. If we can get to the point where we don't even need to check amd_iommu_irq_remap in the ...select() function because the IRQ domain is never even registered in the case where the flag ends up false, all the better :) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled
On Wed, 2021-03-17 at 10:10 +0100, Joerg Roedel wrote: > diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c > index 3280e6f5b720..61dae1800b7f 100644 > --- a/drivers/iommu/amd/init.c > +++ b/drivers/iommu/amd/init.c > @@ -2919,12 +2919,12 @@ static int __init state_next(void) > } > break; > case IOMMU_IVRS_DETECTED: > - ret = early_amd_iommu_init(); > - init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED; > - if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) { > - pr_info("AMD IOMMU disabled\n"); > + if (amd_iommu_disabled) { > init_state = IOMMU_CMDLINE_DISABLED; > ret = -EINVAL; > + } else { > + ret = early_amd_iommu_init(); > + init_state = ret ? IOMMU_INIT_ERROR : > IOMMU_ACPI_FINISHED; > } > break; > case IOMMU_ACPI_FINISHED: > -- If you've already moved the Stoney Ridge check out of the way, there's no real reason why you can't just set init_state=IOMMU_CMDLINE_DISABLED directly from parse_amd_iommu_options(), is there? Then you don't need the condition here at all? smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Don't initialise remapping irqdomain if IOMMU is disabled
From: David Woodhouse When the IOMMU is disabled, the driver still enumerates and initialises the hardware in order to turn it off. Because IRQ remapping setup is done early, the irqdomain is set up opportunistically. In commit b34f10c2dc59 ("iommu/amd: Stop irq_remapping_select() matching when remapping is disabled") I already make the irq_remapping_select() function check the amd_iommu_irq_setup flag because that might get cleared only after the irqdomain setup is done, when the IVRS is parsed. However, in the case where 'amd_iommu=off' is passed on the command line, the IRQ remapping setup isn't done but the amd_iommu_irq_setup flag is still set by the early IRQ remap init code. Stop it doing that, by bailing out of amd_iommu_prepare() early when it's disabled. This avoids the crash in irq_remapping_select() as it dereferences the NULL amd_iommu_rlookup_table[]: [0.243659] Switched APIC routing to physical x2apic. [0.262206] BUG: kernel NULL pointer dereference, address: 0500 [0.262927] #PF: supervisor read access in kernel mode [0.263390] #PF: error_code(0x) - not-present page [0.263844] PGD 0 P4D 0 [0.264135] Oops: [#1] SMP PTI [0.264460] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3 #831 [0.265069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014 [0.265825] RIP: 0010:irq_remapping_select+0x57/0xb0 [0.266327] Code: 4b 0c 48 3d 30 e0 a7 9e 75 0d eb 35 48 8b 00 48 3d 30 e0 a7 9e 74 2a 0f b6 50 10 39 d1 75 ed 0f b7 40 12 48 8b 15 69 e3 d2 01 <48> 8b 14 c2 48 85 d2 74 0e b8 01 00 00 00 48 3b aa 90 04 00 00 74 [0.268412] RSP: :9e803db0 EFLAGS: 00010246 [0.268919] RAX: 00a0 RBX: 9e803df8 RCX: [0.269550] RDX: RSI: RDI: 98120112fe79 [0.270245] RBP: 9812011c8218 R08: 0001 R09: 000a [0.270922] R10: 000a R11: f000 R12: 9812011c8218 [0.271549] R13: 98120181ed88 R14: R15: [0.272221] FS: () GS:98127dc0() knlGS: [0.272997] CS: 0010 DS: ES: CR0: 80050033 [0.273508] CR2: 0500 CR3: 3081 CR4: 06b0 [0.274178] Call Trace: [0.274416] irq_find_matching_fwspec+0x41/0xc0 [0.274812] mp_irqdomain_create+0x65/0x150 [0.275251] setup_IO_APIC+0x70/0x811 Fixes: a1a785b57242 ("iommu/amd: Implement select() method on remapping irqdomain") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212017 Signed-off-by: David Woodhouse --- drivers/iommu/amd/init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 9126efcbaf2c..07edd837b076 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -2998,6 +2998,9 @@ int __init amd_iommu_prepare(void) { int ret; + if (amd_iommu_disabled) + return -ENODEV; + amd_iommu_irq_remap = true; ret = iommu_go_to_state(IOMMU_ACPI_FINISHED); -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths
On Wed, 2021-01-20 at 18:04 +0100, Greg KH wrote: > I tried applying these to 5.4, 4.19, and 4.14, and they all fail to > build: > > drivers/iommu/dmar.c: In function ‘free_iommu’: > drivers/iommu/dmar.c:1140:35: error: ‘struct intel_iommu’ has no member named > ‘drhd’ > 1140 | if (intel_iommu_enabled && !iommu->drhd->ignored) { > | ^~ > > So if you could provide a working set of patches backported, I will be > glad to queue them up. Thanks. I'm just heckling at Sam's backport of those, and we'll post tested patches as soon as we're done. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths
On Wed, 2021-01-20 at 13:06 +0100, Greg KH wrote: > On Wed, Jan 20, 2021 at 09:42:43AM +0000, David Woodhouse wrote: > > On Thu, 2020-09-24 at 15:08 +0100, David Woodhouse wrote: > > > From: David Woodhouse > > > > > > Instead of bailing out completely, such a unit can still be used for > > > interrupt remapping. > > > > > > Signed-off-by: David Woodhouse > > > > Could we have this for stable too please, along with the trivial > > subsequent fixup. They are: > > > > c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported > > address widths") > > 9def3b1a07c4 ("iommu/vt-d: Don't dereference iommu_device if IOMMU_API is > > not built") > > > > They apply fairly straightforwardly when backported; let me know if you > > want us to send patches. > > What stable kernel(s) do you want this in? The above patches are > already in 5.10. It's a fairly simple bug fix, to still use a given IOMMU for interrupt remapping even if it can't be used for DMA mapping. Those features are somewhat orthogonal, and it was wrong for the kernel to bail out on the IOMMU hardware completely. The interrupt remapping support is what's required for Intel boxes (or VMs) to run with more than 255 CPUs. It should be fairly simple to fix the same bug at least as far back as 4.14. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths
On Thu, 2020-09-24 at 15:08 +0100, David Woodhouse wrote: > From: David Woodhouse > > Instead of bailing out completely, such a unit can still be used for > interrupt remapping. > > Signed-off-by: David Woodhouse Could we have this for stable too please, along with the trivial subsequent fixup. They are: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") 9def3b1a07c4 ("iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built") They apply fairly straightforwardly when backported; let me know if you want us to send patches. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Stop irq_remapping_select() matching when remapping is disabled
From: David Woodhouse The AMD IOMMU initialisation registers the IRQ remapping domain for each IOMMU before doing the final sanity check that every I/OAPIC is covered. This means that the AMD irq_remapping_select() function gets invoked even when IRQ remapping has been disabled, eventually leading to a NULL pointer dereference in alloc_irq_table(). Unfortunately, the IVRS isn't fully parsed early enough that the sanity check can be done in time to registering the IRQ domain altogether. Doing that would be nice, but is a larger and more error-prone task. The simple fix is just for irq_remapping_select() to refuse to report a match when IRQ remapping has disabled. Link: https://lore.kernel.org/lkml/ed4be9b4-24ac-7128-c522-7ef359e81...@gmx.at Fixes: a1a785b57242 ("iommu/amd: Implement select() method on remapping irqdomain") Reported-by: Johnathan Smithinovic Signed-off-by: David Woodhouse --- drivers/iommu/amd/iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 7e2c445a1fae..f0adbc48fd17 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3854,6 +3854,9 @@ static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, struct amd_iommu *iommu; int devid = -1; + if (!amd_iommu_irq_remap) + return 0; + if (x86_fwspec_is_ioapic(fwspec)) devid = get_ioapic_devid(fwspec->param[0]); else if (x86_fwspec_is_hpet(fwspec)) -- 2.29.2 smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Set iommu->int_enabled consistently when interrupts are set up
From: David Woodhouse When I made the INTCAPXT support stop gratuitously pretending to be MSI, I missed the fact that iommu_setup_msi() also sets the ->int_enabled flag. I missed this in the iommu_setup_intcapxt() code path, which means that a resume from suspend will try to allocate the IRQ domains again, accidentally re-enabling interrupts as it does, resulting in much sadness. Lift out the bit which sets iommu->int_enabled into the iommu_init_irq() function which is also where it gets checked. Link: https://lore.kernel.org/r/20210104132250.ge32...@zn.tnic/ Fixes: d1adcfbb520c ("iommu/amd: Fix IOMMU interrupt generation in X2APIC mode") Reported-by: Borislav Petkov Signed-off-by: David Woodhouse --- There's a possibility we also need to ensure that the actual MMIO_INTCAPXT_xxx_OFFSET registers are restored too. Unless you actually trigger something to generate faults, you'll never know. I don't see offhand how that was working in the pretend-to-be-MSI case either. drivers/iommu/amd/init.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index f54cd79b43e4..6a1f7048dacc 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1973,8 +1973,6 @@ static int iommu_setup_msi(struct amd_iommu *iommu) return r; } - iommu->int_enabled = true; - return 0; } @@ -2169,6 +2167,7 @@ static int iommu_init_irq(struct amd_iommu *iommu) if (ret) return ret; + iommu->int_enabled = true; enable_faults: iommu_feature_enable(iommu, CONTROL_EVT_INT_EN); -- 2.29.2 smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] PROBLEM: commit f36a74b9345a leads to not booting system with AMD 2990WX
On Tue, 2021-01-05 at 00:05 +0100, Johnathan Smithinovic wrote: > commit f36a74b9345a leads to not booting system with AMD 2990WX > > > When trying to boot 5.11-rc2 as usual the messages of the bootloader stay on > my > screen and not much appears to happen (fans run a bit slower than in GRUB, > devices don't seem to get accessed). Without this commit everything seems to > work as usual. > > (In the hope that it is helpful I appended messages mentioning "IO APIC" > from 5.11-rc2 with the mentioned commit reverted (and the kernel parameter > apic=debug added). > I'm sorry that I can't provide more details: I'm unsure how I can gather > more information since my Motherboard (ASUS ROG ZENITH EXTREME) does not have > any documented serial ports and USB devices don't seem to get turned on.) No problem, that was enough to reproduce it in qemu just by simulating the same BIOS bug which causes it to *start* enabling, then abort, the IRQ remapping. Thanks for the report. Does this fix it? diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 7e2c445a1fae..f0adbc48fd17 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3854,6 +3854,9 @@ static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, struct amd_iommu *iommu; int devid = -1; + if (!amd_iommu_irq_remap) + return 0; + if (x86_fwspec_is_ioapic(fwspec)) devid = get_ioapic_devid(fwspec->param[0]); else if (x86_fwspec_is_hpet(fwspec)) smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Interrupts enabled after amd_iommu_resume+0x0/0x40
On Tue, 2021-01-05 at 00:23 +0100, Borislav Petkov wrote: > On Mon, Jan 04, 2021 at 02:22:50PM +0100, Borislav Petkov wrote: > > Hi folks, > > > > syscore_resume() doesn't like when the AMD iommu driver enables > > interrupts in its ->resume hook when I resume the box from suspend to > > RAM. > > > > All kinds of warnings get triggered too: > > > > [ 17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:824 > > ktime_get+0x8d/0xa0 > > [ 17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:867 > > ktime_get_with_offset+0xda/0xf0 > > [ 17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:824 > > ktime_get+0x8d/0xa0 > > [ 17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:867 > > ktime_get_with_offset+0xda/0xf0 > > [ 17.386830] WARNING: CPU: 0 PID: 1576 at drivers/base/syscore.c:103 > > syscore_resume+0x12d/0x160 > > > > but when I comment out the > > > >amd_iommu_enable_interrupts() > > > > call in the resume hook, all is fine and quiet and box resumes. > > > > I'll try to bisect later to try to pinpoint it better because I don't > > see what recent change would've caused this. But someone might have a > > better idea so CC the usual suspects. > > Ok, bisected to: > > # first bad commit: [d1adcfbb520c43c10fc22fcdccdd4204e014fb53] iommu/amd: Fix > IOMMU interrupt generation in X2APIC mode > > That patch reverts cleanly ontop of -rc2 and with it reverted, the > machine resumes fine. diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index f54cd79b43e4..6a1f7048dacc 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1973,8 +1973,6 @@ static int iommu_setup_msi(struct amd_iommu *iommu) return r; } - iommu->int_enabled = true; - return 0; } @@ -2169,6 +2167,7 @@ static int iommu_init_irq(struct amd_iommu *iommu) if (ret) return ret; + iommu->int_enabled = true; enable_faults: iommu_feature_enable(iommu, CONTROL_EVT_INT_EN); ? smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: Don't read VCCAP register unless it exists
From: David Woodhouse My virtual IOMMU implementation is whining that the guest is reading a register that doesn't exist. Only read the VCCAP_REG if the corresponding capability is set in ECAP_REG to indicate that it actually exists. Fixes: 3375303e8287 ("iommu/vt-d: Add custom allocator for IOASID") Signed-off-by: David Woodhouse Reviewed-by: Liu Yi L Cc: sta...@vger.kernel.org # v5.7+ --- drivers/iommu/intel/dmar.c | 3 ++- drivers/iommu/intel/iommu.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 404b40af31cb..38d1d40cfe34 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -967,7 +967,8 @@ static int map_iommu(struct intel_iommu *iommu, u64 phys_addr) warn_invalid_dmar(phys_addr, " returns all ones"); goto unmap; } - iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG); + if (ecap_vcs(iommu->ecap)) + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG); /* the registers might be more than one page */ map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap), diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 8651f6d4dfa0..0823761f3a7c 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1833,7 +1833,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu) if (ecap_prs(iommu->ecap)) intel_svm_finish_prq(iommu); } - if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) + if (vccap_pasid(iommu->vccap)) ioasid_unregister_allocator(&iommu->pasid_allocator); #endif @@ -3209,7 +3209,7 @@ static void register_pasid_allocator(struct intel_iommu *iommu) * is active. All vIOMMU allocators will eventually be calling the same * host allocator. */ - if (!ecap_vcs(iommu->ecap) || !vccap_pasid(iommu->vccap)) + if (!vccap_pasid(iommu->vccap)) return; pr_info("Register custom PASID allocator\n"); smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers
On Tue, 2020-11-10 at 01:31 -0500, Qian Cai wrote: > On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote: > > From: Thomas Gleixner > > > > 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling > > the trigger type (edge/level) and the active low/high configuration. While > > there are defines for initializing these variables and struct members, they > > are not used consequently and the meaning of 'trigger' and 'polarity' is > > opaque and confusing at best. > > > > Rename them to 'is_level' and 'active_low' and make them boolean in various > > structs so it's entirely clear what the meaning is. > > > > Signed-off-by: Thomas Gleixner > > Signed-off-by: David Woodhouse > > --- > > arch/x86/include/asm/hw_irq.h | 6 +- > > arch/x86/kernel/apic/io_apic.c | 244 +--- > > arch/x86/pci/intel_mid_pci.c| 8 +- > > drivers/iommu/amd/iommu.c | 10 +- > > drivers/iommu/intel/irq_remapping.c | 9 +- > > 5 files changed, 130 insertions(+), 147 deletions(-) > > Reverting the rest of patchset up to this commit on next-20201109 fixed an > endless soft-lockups issue booting an AMD server below. I noticed that the > failed boots always has this IOMMU IO_PAGE_FAULT before those soft-lockups: Hm, attempting to reproduce this shows something else. Ever since commit be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma- iommu api") in 5.5 the following stops working for me: $ qemu-system-x86_64 -serial mon:stdio -kernel bzImage -machine q35,accel=kvm,kernel-irqchip=split -m 2G -device amd-iommu,intremap=off -append "console=ttyS0 apic=verbose debug" -display none It hasn't got a hard drive but I can watch the SATA interrupts fail as it probes the CD-ROM: [7.403327] ata3.00: qc timeout (cmd 0xa1) [7.405980] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) Adding 'iommu=off' to the kernel command line makes it work again, in that it correctly panics at the lack of a root file system, quickly. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs
On Sun, 2020-10-25 at 09:49 +, David Laight wrote: > Just looking at a random one of these patches... > > Does the compiler manage to optimise that reasonably? > Or does it generate a lot of shifts and masks as each > bitfield is set? > > The code generation for bitfields is often a lot worse > that that for |= setting bits in a word. Indeed, it appears to be utterly appalling. That was one of my motivations for doing it with masks and shifts in the first place. Because in ioapic_setup_msg_from_msi(), for example, these two are consecutive in both source and destination: entry->vector = msg.arch_data.vector; entry->delivery_mode= msg.arch_data.delivery_mode; And so are those: entry->ir_format= msg.arch_addr_lo.dmar_format; entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14; But GCC 7.5.0 here is doing them all separately... 11e0 : { 11e0: 53 push %rbx 11e1: 48 89 f3mov%rsi,%rbx 11e4: 48 83 ec 18 sub$0x18,%rsp irq_chip_compose_msi_msg(irq_data, &msg); 11e8: 48 89 e6mov%rsp,%rsi { 11eb: 65 48 8b 04 25 28 00mov%gs:0x28,%rax 11f2: 00 00 11f4: 48 89 44 24 10 mov%rax,0x10(%rsp) 11f9: 31 c0 xor%eax,%eax irq_chip_compose_msi_msg(irq_data, &msg); 11fb: e8 00 00 00 00 callq 1200 entry->vector = msg.arch_data.vector; 1200: 0f b6 44 24 08 movzbl 0x8(%rsp),%eax entry->delivery_mode= msg.arch_data.delivery_mode; 1205: 0f b6 53 01 movzbl 0x1(%rbx),%edx 1209: 0f b6 74 24 09 movzbl 0x9(%rsp),%esi entry->vector = msg.arch_data.vector; 120e: 88 03 mov%al,(%rbx) entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical; 1210: 0f b6 04 24 movzbl (%rsp),%eax entry->delivery_mode= msg.arch_data.delivery_mode; 1214: 83 e2 f0and$0xfff0,%edx 1217: 83 e6 07and$0x7,%esi entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical; 121a: 09 f2 or %esi,%edx 121c: 8d 0c 00lea(%rax,%rax,1),%ecx entry->ir_format= msg.arch_addr_lo.dmar_format; 121f: c0 e8 04shr$0x4,%al entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical; 1222: 83 e1 08and$0x8,%ecx 1225: 09 ca or %ecx,%edx 1227: 88 53 01mov%dl,0x1(%rbx) entry->ir_format= msg.arch_addr_lo.dmar_format; 122a: 89 c2 mov%eax,%edx entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14; 122c: 8b 04 24mov(%rsp),%eax 122f: 83 e2 01and$0x1,%edx 1232: c1 e8 05shr$0x5,%eax 1235: 66 25 ff 7f and$0x7fff,%ax 1239: 8d 0c 00lea(%rax,%rax,1),%ecx 123c: 66 c1 e8 07 shr$0x7,%ax 1240: 88 43 07mov%al,0x7(%rbx) 1243: 09 ca or %ecx,%edx } 1245: 48 8b 44 24 10 mov0x10(%rsp),%rax 124a: 65 48 33 04 25 28 00xor%gs:0x28,%rax 1251: 00 00 entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14; 1253: 88 53 06mov%dl,0x6(%rbx) } 1256: 75 06 jne125e 1258: 48 83 c4 18 add$0x18,%rsp 125c: 5b pop%rbx 125d: c3 retq 125e: e8 00 00 00 00 callq 1263 1263: 0f 1f 00nopl (%rax) 1266: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 126d: 00 00 00 Still, this isn't really a fast path so I'm content to file the GCC PR for the really *obvious* misses and let it take its course. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields
On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote: > Fix the conditions for enabling x2apic on guests without interrupt > remapping, and support 15-bit Extended Destination ID to allow 32768 > CPUs without IR on hypervisors that support it. > > Make the I/OAPIC code generate its RTE directly from the MSI message > created by the parent irqchip, and fix up a bunch of magic mask/shift > macros to use bitfields for MSI messages and I/OAPIC RTEs while we're > at it. Forgot to mention (since I thought I'd posted it in a previous series) that v3 also ditches irq_remapping_get_irq_domain() and some icky special cases of hard-coding "x86_vector_domain", and makes HPET and I/OAPIC use irq_find_matching_fwspeC() to find their parent irqdomain. > v3: > • Lots of bitfield cleanups from Thomas. > • Disable hyperv-iommu if 15-bit extension is present. > • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c > • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream >through KVM tree (and the other half needs to wait, or have an #ifdef) so >is left at the top of the tree. > > v2: > • Minor cleanups. > • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static. > • Generate I/OAPIC RTE directly from parent irqchip's MSI messages. > • Clean up HPET MSI support into hpet.c now that we can. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 26/35] iommu/hyper-v: Implement select() method on remapping irqdomain
From: David Woodhouse Preparatory for removing irq_remapping_get_irq_domain() Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- drivers/iommu/hyperv-iommu.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index 37dd485a5640..78a264ad9405 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -101,7 +101,16 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain, irq_domain_free_irqs_common(domain, virq, nr_irqs); } +static int hyperv_irq_remapping_select(struct irq_domain *d, + struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + /* Claim only the first (and only) I/OAPIC */ + return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0; +} + static const struct irq_domain_ops hyperv_ir_domain_ops = { + .select = hyperv_irq_remapping_select, .alloc = hyperv_irq_remapping_alloc, .free = hyperv_irq_remapping_free, }; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c
From: David Woodhouse This isn't really dependent on PCI MSI; it's just generic MSI which is now supported by the generic x86_vector_domain. Move the HPET MSI support back into hpet.c with the rest of the HPET support. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/hpet.h | 11 arch/x86/kernel/apic/msi.c | 111 - arch/x86/kernel/hpet.c | 118 ++-- 3 files changed, 112 insertions(+), 128 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index 6352dee37cda..ab9f3dd87c80 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -74,17 +74,6 @@ extern void hpet_disable(void); extern unsigned int hpet_readl(unsigned int a); extern void force_hpet_resume(void); -struct irq_data; -struct hpet_channel; -struct irq_domain; - -extern void hpet_msi_unmask(struct irq_data *data); -extern void hpet_msi_mask(struct irq_data *data); -extern void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg); -extern struct irq_domain *hpet_create_irq_domain(int hpet_id); -extern int hpet_assign_irq(struct irq_domain *domain, - struct hpet_channel *hc, int dev_num); - #ifdef CONFIG_HPET_EMULATE_RTC #include diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 4eda617eda1e..44ebe25e7703 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -340,114 +340,3 @@ void dmar_free_hwirq(int irq) irq_domain_free_irqs(irq, 1); } #endif - -/* - * MSI message composition - */ -#ifdef CONFIG_HPET_TIMER -static inline int hpet_dev_id(struct irq_domain *domain) -{ - struct msi_domain_info *info = msi_get_domain_info(domain); - - return (int)(long)info->data; -} - -static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg) -{ - hpet_msi_write(irq_data_get_irq_handler_data(data), msg); -} - -static struct irq_chip hpet_msi_controller __ro_after_init = { - .name = "HPET-MSI", - .irq_unmask = hpet_msi_unmask, - .irq_mask = hpet_msi_mask, - .irq_ack = irq_chip_ack_parent, - .irq_set_affinity = msi_domain_set_affinity, - .irq_retrigger = irq_chip_retrigger_hierarchy, - .irq_write_msi_msg = hpet_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, -}; - -static int hpet_msi_init(struct irq_domain *domain, -struct msi_domain_info *info, unsigned int virq, -irq_hw_number_t hwirq, msi_alloc_info_t *arg) -{ - irq_set_status_flags(virq, IRQ_MOVE_PCNTXT); - irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL, - handle_edge_irq, arg->data, "edge"); - - return 0; -} - -static void hpet_msi_free(struct irq_domain *domain, - struct msi_domain_info *info, unsigned int virq) -{ - irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT); -} - -static struct msi_domain_ops hpet_msi_domain_ops = { - .msi_init = hpet_msi_init, - .msi_free = hpet_msi_free, -}; - -static struct msi_domain_info hpet_msi_domain_info = { - .ops= &hpet_msi_domain_ops, - .chip = &hpet_msi_controller, - .flags = MSI_FLAG_USE_DEF_DOM_OPS, -}; - -struct irq_domain *hpet_create_irq_domain(int hpet_id) -{ - struct msi_domain_info *domain_info; - struct irq_domain *parent, *d; - struct irq_alloc_info info; - struct fwnode_handle *fn; - - if (x86_vector_domain == NULL) - return NULL; - - domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL); - if (!domain_info) - return NULL; - - *domain_info = hpet_msi_domain_info; - domain_info->data = (void *)(long)hpet_id; - - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; - info.devid = hpet_id; - parent = irq_remapping_get_irq_domain(&info); - if (parent == NULL) - parent = x86_vector_domain; - else - hpet_msi_controller.name = "IR-HPET-MSI"; - - fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name, - hpet_id); - if (!fn) { - kfree(domain_info); - return NULL; - } - - d = msi_create_irq_domain(fn, domain_info, parent); - if (!d) { - irq_domain_free_fwnode(fn); - kfree(domain_info); - } - return d; -} - -int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc, - int dev_num) -{ - struct irq_alloc_info info; - - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_HPET; - info.data = hc; - info.devid = hpet_dev_id(domain); - info.hwirq = dev
[PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message
From: David Woodhouse The I/O-APIC generates an MSI cycle with address/data bits taken from its Redirection Table Entry in some combination which used to make sense, but now is just a bunch of bits which get passed through in some seemingly arbitrary order. Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let them just do their job and generate an MSI message. The bit swizzling to turn that MSI message into the I/O-APIC's RTE is the same in all cases, since it's a function of the I/O-APIC hardware. The IRQ remappers have no real need to get involved with that. The only slight caveat is that the I/OAPIC is interpreting some of those fields too, and it does want the 'vector' field to be unique to make EOI work. The AMD IOMMU happens to put its IRTE index in the bits that the I/O-APIC thinks are the vector field, and accommodates this requirement by reserving the first 32 indices for the I/O-APIC. The Intel IOMMU doesn't actually use the bits that the I/O-APIC thinks are the vector field, so it fills in the 'pin' value there instead. [ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg bitfields and added commentry to explain the mapping magic ] Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/hw_irq.h | 11 ++- arch/x86/kernel/apic/io_apic.c | 103 +++- drivers/iommu/amd/iommu.c | 12 drivers/iommu/hyperv-iommu.c| 31 - drivers/iommu/intel/irq_remapping.c | 31 ++--- 5 files changed, 83 insertions(+), 105 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 517847a94dbe..83a69f62637e 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -45,12 +45,11 @@ enum irq_alloc_type { }; struct ioapic_alloc_info { - int pin; - int node; - u32 is_level: 1; - u32 active_low : 1; - u32 valid : 1; - struct IO_APIC_route_entry *entry; + int pin; + int node; + u32 is_level: 1; + u32 active_low : 1; + u32 valid : 1; }; struct uv_alloc_info { diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 07e754131854..ea983da1a57f 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -48,6 +48,7 @@ #include /* time_after() */ #include #include +#include #include #include @@ -63,7 +64,6 @@ #include #include #include - #include #definefor_each_ioapic(idx)\ @@ -1848,21 +1848,58 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data) eoi_ioapic_pin(data->entry.vector, data); } +/* + * The I/OAPIC is just a device for generating MSI messages from legacy + * interrupt pins. Various fields of the RTE translate into bits of the + * resulting MSI which had a historical meaning. + * + * With interrupt remapping, many of those bits have different meanings + * in the underlying MSI, but the way that the I/OAPIC transforms them + * from its RTE to the MSI message is the same. This function allows + * the parent IRQ domain to compose the MSI message, then takes the + * relevant bits to put them in the appropriate places in the RTE in + * order to generate that message when the IRQ happens. + * + * The setup here relies on a preconfigured route entry (is_level, + * active_low, masked) because the parent domain is merely composing the + * generic message routing information which is used for the MSI. + */ +static void ioapic_setup_msg_from_msi(struct irq_data *irq_data, + struct IO_APIC_route_entry *entry) +{ + struct msi_msg msg; + + /* Let the parent domain compose the MSI message */ + irq_chip_compose_msi_msg(irq_data, &msg); + + /* +* - Real vector +* - DMAR/IR: 8bit subhandle (ioapic.pin) +* - AMD/IR: 8bit IRTE index +*/ + entry->vector = msg.arch_data.vector; + /* Delivery mode (for DMAR/IR all 0) */ + entry->delivery_mode= msg.arch_data.delivery_mode; + /* Destination mode or DMAR/IR index bit 15 */ + entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical; + /* DMAR/IR: 1, 0 for all other modes */ + entry->ir_format= msg.arch_addr_lo.dmar_format; + /* +* DMAR/IR: index bit 0-14. +* +* All other modes have bit 0-6 of dmar_index_0_14 cleared and the +* topmost 8 bits are destination id bit 0-7 (entry::destid_0_7). +*/ + entry->ir_index_0_14
[PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain()
From: David Woodhouse All users are converted to use the fwspec based parent domain lookup. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/hw_irq.h| 2 -- arch/x86/include/asm/irq_remapping.h | 9 drivers/iommu/amd/iommu.c| 34 drivers/iommu/hyperv-iommu.c | 9 drivers/iommu/intel/irq_remapping.c | 17 -- drivers/iommu/irq_remapping.c| 14 drivers/iommu/irq_remapping.h| 3 --- 7 files changed, 88 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 83a69f62637e..458f5a676402 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -40,8 +40,6 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_PCI_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, - X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT, - X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT, }; struct ioapic_alloc_info { diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index af4a151d70b3..7cc49432187f 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int); extern int irq_remap_enable_fault_handling(void); extern void panic_if_irq_remap(const char *msg); -extern struct irq_domain * -irq_remapping_get_irq_domain(struct irq_alloc_info *info); - /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */ extern struct irq_domain * arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id); @@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg) { } -static inline struct irq_domain * -irq_remapping_get_irq_domain(struct irq_alloc_info *info) -{ - return NULL; -} - #endif /* CONFIG_IRQ_REMAP */ #endif /* __X86_IRQ_REMAPPING_H */ diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 31b22244e9c2..463d322a4f3b 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3601,10 +3601,8 @@ static int get_devid(struct irq_alloc_info *info) { switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: - case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: return get_ioapic_devid(info->devid); case X86_IRQ_ALLOC_TYPE_HPET: - case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: return get_hpet_devid(info->devid); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: @@ -3615,44 +3613,12 @@ static int get_devid(struct irq_alloc_info *info) } } -static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info, - int devid) -{ - struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; - - if (!iommu) - return NULL; - - switch (info->type) { - case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: - case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: - return iommu->ir_domain; - default: - WARN_ON_ONCE(1); - return NULL; - } -} - -static struct irq_domain *get_irq_domain(struct irq_alloc_info *info) -{ - int devid; - - if (!info) - return NULL; - - devid = get_devid(info); - if (devid < 0) - return NULL; - return get_irq_domain_for_devid(info, devid); -} - struct irq_remap_ops amd_iommu_irq_ops = { .prepare= amd_iommu_prepare, .enable = amd_iommu_enable, .disable= amd_iommu_disable, .reenable = amd_iommu_reenable, .enable_faulting= amd_iommu_enable_faulting, - .get_irq_domain = get_irq_domain, }; static void fill_msi_msg(struct msi_msg *msg, u32 index) diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index 78a264ad9405..a629a6be65c7 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void) return IRQ_REMAP_X2APIC_MODE; } -static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info) -{ - if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT) - return ioapic_ir_domain; - else - return NULL; -} - struct irq_remap_ops hyperv_irq_remap_ops = { .prepare= hyperv_prepare_irq_remapping, .enable = hyperv_enable_irq_remapping, - .get_irq_domain = hyperv_get_irq_domain, }; #endif diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index b3b079c0b51e..bca44015bc1d 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1127,29 +1127,12 @@ static void prepare_i
[PATCH v3 13/35] iommu/intel: Use msi_msg shadow structs
From: Thomas Gleixner Use the bitfields in the x86 shadow struct to compose the MSI message. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- drivers/iommu/intel/irq_remapping.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 5628d43b795e..30269b738fa5 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -20,7 +20,6 @@ #include #include #include -#include #include "../irq_remapping.h" @@ -1260,6 +1259,21 @@ static struct irq_chip intel_ir_chip = { .irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity, }; +static void fill_msi_msg(struct msi_msg *msg, u32 index, u32 subhandle) +{ + memset(msg, 0, sizeof(*msg)); + + msg->arch_addr_lo.dmar_base_address = X86_MSI_BASE_ADDRESS_LOW; + msg->arch_addr_lo.dmar_subhandle_valid = true; + msg->arch_addr_lo.dmar_format = true; + msg->arch_addr_lo.dmar_index_0_14 = index & 0x7FFF; + msg->arch_addr_lo.dmar_index_15 = !!(index & 0x8000); + + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; + + msg->arch_data.dmar_subhandle = subhandle; +} + static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, struct irq_cfg *irq_cfg, struct irq_alloc_info *info, @@ -1267,7 +1281,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, { struct IR_IO_APIC_route_entry *entry; struct irte *irte = &data->irte_entry; - struct msi_msg *msg = &data->msi_entry; prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid); switch (info->type) { @@ -1308,12 +1321,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, else set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); - msg->address_hi = MSI_ADDR_BASE_HI; - msg->data = sub_handle; - msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT | - MSI_ADDR_IR_SHV | - MSI_ADDR_IR_INDEX1(index) | - MSI_ADDR_IR_INDEX2(index); + fill_msi_msg(&data->msi_entry, index, sub_handle); break; default: -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs
From: Thomas Gleixner Having two seperate structs for the I/O-APIC RTE entries (non-remapped and DMAR remapped) requires type casts and makes it hard to map. Combine them in IO_APIC_routing_entry by defining a union of two 64bit bitfields. Use naming which reflects which bits are shared and which bits are actually different for the operating modes. [dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for register access into the same union and eliminating a few more places where bits were accessed through masks and shifts.] Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/io_apic.h | 78 ++- arch/x86/kernel/apic/io_apic.c | 144 +--- drivers/iommu/amd/iommu.c | 8 +- drivers/iommu/hyperv-iommu.c| 4 +- drivers/iommu/intel/irq_remapping.c | 19 ++-- 5 files changed, 108 insertions(+), 145 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a1a26f6d3aa4..73da644b2f0d 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -13,15 +13,6 @@ * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar */ -/* I/O Unit Redirection Table */ -#define IO_APIC_REDIR_VECTOR_MASK 0x000FF -#define IO_APIC_REDIR_DEST_LOGICAL 0x00800 -#define IO_APIC_REDIR_DEST_PHYSICAL0x0 -#define IO_APIC_REDIR_SEND_PENDING (1 << 12) -#define IO_APIC_REDIR_REMOTE_IRR (1 << 14) -#define IO_APIC_REDIR_LEVEL_TRIGGER(1 << 15) -#define IO_APIC_REDIR_MASKED (1 << 16) - /* * The structure of the IO-APIC: */ @@ -65,52 +56,39 @@ union IO_APIC_reg_03 { }; struct IO_APIC_route_entry { - __u32 vector : 8, - delivery_mode : 3, /* 000: FIXED -* 001: lowest prio -* 111: ExtINT -*/ - dest_mode : 1, /* 0: physical, 1: logical */ - delivery_status : 1, - polarity: 1, - irr : 1, - trigger : 1, /* 0: edge, 1: level */ - mask: 1, /* 0: enabled, 1: disabled */ - __reserved_2: 15; - - __u32 __reserved_3: 24, - dest: 8; -} __attribute__ ((packed)); - -struct IR_IO_APIC_route_entry { - __u64 vector : 8, - zero: 3, - index2 : 1, - delivery_status : 1, - polarity: 1, - irr : 1, - trigger : 1, - mask: 1, - reserved: 31, - format : 1, - index : 15; + union { + struct { + u64 vector : 8, + delivery_mode : 3, + dest_mode_logical : 1, + delivery_status : 1, + active_low : 1, + irr : 1, + is_level: 1, + masked : 1, + reserved_0 : 15, + reserved_1 : 24, + destid_0_7 : 8; + }; + struct { + u64 ir_shared_0 : 8, + ir_zero : 3, + ir_index_15 : 1, + ir_shared_1 : 5, + ir_reserved_0 : 31, + ir_format : 1, + ir_index_0_14 : 15; + }; + struct { + u64 w1 : 32, + w2 : 32; + }; + }; } __attribute__ ((packed)); struct irq_alloc_info; struct ioapic_domain_cfg; -#define IOAPIC_EDGE0 -#define IOAPIC_LEVEL 1 - -#define IOAPIC_MASKED 1 -#define IOAPIC_UNMASKED0 - -#define IOAPIC_POL_HIGH0 -#define IOAPIC_POL_LOW 1 - -#define IOAPIC_DEST_MODE_PHYSICAL 0 -#define IOAPIC_DEST_MODE_LOGICAL 1 - #defineIOAPIC_MAP_ALLOC0x1 #defineIOAPIC_MAP_CHECK0x2 diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 24a7bba7cbf4..07e754131854 100644 --- a/arch/x
[PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member
From: Thomas Gleixner Architectures like x86 have their MSI messages in various bits of the data, address_lo and address_hi field. Composing or decomposing these messages with bitmasks and shifts is possible, but unreadable gunk. Allow architectures to provide an architecture specific representation for each member of msi_msg. Provide empty defaults for each and stick them into an union. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- include/asm-generic/msi.h | 4 include/linux/msi.h | 46 +++ 2 files changed, 46 insertions(+), 4 deletions(-) diff --git a/include/asm-generic/msi.h b/include/asm-generic/msi.h index e6795f088bdd..25344de0e8f9 100644 --- a/include/asm-generic/msi.h +++ b/include/asm-generic/msi.h @@ -4,6 +4,8 @@ #include +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN + #ifndef NUM_MSI_ALLOC_SCRATCHPAD_REGS # define NUM_MSI_ALLOC_SCRATCHPAD_REGS 2 #endif @@ -30,4 +32,6 @@ typedef struct msi_alloc_info { #define GENERIC_MSI_DOMAIN_OPS 1 +#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */ + #endif diff --git a/include/linux/msi.h b/include/linux/msi.h index 6b584cc4757c..360a0a7e7341 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -4,11 +4,50 @@ #include #include +#include + +/* Dummy shadow structures if an architecture does not define them */ +#ifndef arch_msi_msg_addr_lo +typedef struct arch_msi_msg_addr_lo { + u32 address_lo; +} __attribute__ ((packed)) arch_msi_msg_addr_lo_t; +#endif + +#ifndef arch_msi_msg_addr_hi +typedef struct arch_msi_msg_addr_hi { + u32 address_hi; +} __attribute__ ((packed)) arch_msi_msg_addr_hi_t; +#endif + +#ifndef arch_msi_msg_data +typedef struct arch_msi_msg_data { + u32 data; +} __attribute__ ((packed)) arch_msi_msg_data_t; +#endif +/** + * msi_msg - Representation of a MSI message + * @address_lo:Low 32 bits of msi message address + * @arch_addrlo: Architecture specific shadow of @address_lo + * @address_hi:High 32 bits of msi message address + * (only used when device supports it) + * @arch_addrhi: Architecture specific shadow of @address_hi + * @data: MSI message data (usually 16 bits) + * @arch_data: Architecture specific shadow of @data + */ struct msi_msg { - u32 address_lo; /* low 32 bits of msi message address */ - u32 address_hi; /* high 32 bits of msi message address */ - u32 data; /* 16 bits of msi message data */ + union { + u32 address_lo; + arch_msi_msg_addr_lo_t arch_addr_lo; + }; + union { + u32 address_hi; + arch_msi_msg_addr_hi_t arch_addr_hi; + }; + union { + u32 data; + arch_msi_msg_data_t arch_data; + }; }; extern int pci_msi_ignore_mask; @@ -243,7 +282,6 @@ struct msi_controller { #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN #include -#include struct irq_domain; struct irq_domain_ops; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain
From: David Woodhouse Preparatory change to remove irq_remapping_get_irq_domain(). Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- drivers/iommu/amd/iommu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 9744cdbefd88..31b22244e9c2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3882,7 +3882,26 @@ static void irq_remapping_deactivate(struct irq_domain *domain, irte_info->index); } +static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + struct amd_iommu *iommu; + int devid = -1; + + if (x86_fwspec_is_ioapic(fwspec)) + devid = get_ioapic_devid(fwspec->param[0]); + else if (x86_fwspec_is_hpet(fwspec)) + devid = get_hpet_devid(fwspec->param[0]); + + if (devid < 0) + return 0; + + iommu = amd_iommu_rlookup_table[devid]; + return iommu && iommu->ir_domain == d; +} + static const struct irq_domain_ops amd_ir_domain_ops = { + .select = irq_remapping_select, .alloc = irq_remapping_alloc, .free = irq_remapping_free, .activate = irq_remapping_activate, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers
From: Thomas Gleixner 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling the trigger type (edge/level) and the active low/high configuration. While there are defines for initializing these variables and struct members, they are not used consequently and the meaning of 'trigger' and 'polarity' is opaque and confusing at best. Rename them to 'is_level' and 'active_low' and make them boolean in various structs so it's entirely clear what the meaning is. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/hw_irq.h | 6 +- arch/x86/kernel/apic/io_apic.c | 244 +--- arch/x86/pci/intel_mid_pci.c| 8 +- drivers/iommu/amd/iommu.c | 10 +- drivers/iommu/intel/irq_remapping.c | 9 +- 5 files changed, 130 insertions(+), 147 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index a4aeeaace040..517847a94dbe 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -47,9 +47,9 @@ enum irq_alloc_type { struct ioapic_alloc_info { int pin; int node; - u32 trigger : 1; - u32 polarity : 1; - u32 valid : 1; + u32 is_level: 1; + u32 active_low : 1; + u32 valid : 1; struct IO_APIC_route_entry *entry; }; diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index c6d92d2570d0..24a7bba7cbf4 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -89,12 +89,12 @@ struct irq_pin_list { }; struct mp_chip_data { - struct list_head irq_2_pin; - struct IO_APIC_route_entry entry; - int trigger; - int polarity; + struct list_headirq_2_pin; + struct IO_APIC_route_entry entry; + boolis_level; + boolactive_low; + boolisa_irq; u32 count; - bool isa_irq; }; struct mp_ioapic_gsi { @@ -745,44 +745,7 @@ static int __init find_isa_irq_apic(int irq, int type) return -1; } -#ifdef CONFIG_EISA -/* - * EISA Edge/Level control register, ELCR - */ -static int EISA_ELCR(unsigned int irq) -{ - if (irq < nr_legacy_irqs()) { - unsigned int port = 0x4d0 + (irq >> 3); - return (inb(port) >> (irq & 7)) & 1; - } - apic_printk(APIC_VERBOSE, KERN_INFO - "Broken MPtable reports ISA irq %d\n", irq); - return 0; -} - -#endif - -/* ISA interrupts are always active high edge triggered, - * when listed as conforming in the MP table. */ - -#define default_ISA_trigger(idx) (IOAPIC_EDGE) -#define default_ISA_polarity(idx) (IOAPIC_POL_HIGH) - -/* EISA interrupts are always polarity zero and can be edge or level - * trigger depending on the ELCR value. If an interrupt is listed as - * EISA conforming in the MP table, that means its trigger type must - * be read in from the ELCR */ - -#define default_EISA_trigger(idx) (EISA_ELCR(mp_irqs[idx].srcbusirq)) -#define default_EISA_polarity(idx) default_ISA_polarity(idx) - -/* PCI interrupts are always active low level triggered, - * when listed as conforming in the MP table. */ - -#define default_PCI_trigger(idx) (IOAPIC_LEVEL) -#define default_PCI_polarity(idx) (IOAPIC_POL_LOW) - -static int irq_polarity(int idx) +static bool irq_active_low(int idx) { int bus = mp_irqs[idx].srcbus; @@ -791,90 +754,139 @@ static int irq_polarity(int idx) */ switch (mp_irqs[idx].irqflag & MP_IRQPOL_MASK) { case MP_IRQPOL_DEFAULT: - /* conforms to spec, ie. bus-type dependent polarity */ - if (test_bit(bus, mp_bus_not_pci)) - return default_ISA_polarity(idx); - else - return default_PCI_polarity(idx); + /* +* Conforms to spec, ie. bus-type dependent polarity. PCI +* defaults to low active. [E]ISA defaults to high active. +*/ + return !test_bit(bus, mp_bus_not_pci); case MP_IRQPOL_ACTIVE_HIGH: - return IOAPIC_POL_HIGH; + return false; case MP_IRQPOL_RESERVED: pr_warn("IOAPIC: Invalid polarity: 2, defaulting to low\n"); fallthrough; case MP_IRQPOL_ACTIVE_LOW: default: /* Pointless default required due to do gcc stupidity */ - return IOAPIC_POL_LOW; + return true; } } #ifdef
[PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
From: David Woodhouse All possible parent domains have a select method now. Make use of it. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/kernel/hpet.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 3b8b12769f3b..08651a4e6aa0 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id) { struct msi_domain_info *domain_info; struct irq_domain *parent, *d; - struct irq_alloc_info info; struct fwnode_handle *fn; + struct irq_fwspec fwspec; if (x86_vector_domain == NULL) return NULL; @@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id) *domain_info = hpet_msi_domain_info; domain_info->data = (void *)(long)hpet_id; - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; - info.devid = hpet_id; - parent = irq_remapping_get_irq_domain(&info); - if (parent == NULL) - parent = x86_vector_domain; - else - hpet_msi_controller.name = "IR-HPET-MSI"; - fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name, hpet_id); if (!fn) { @@ -572,6 +563,19 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id) return NULL; } + fwspec.fwnode = fn; + fwspec.param_count = 1; + fwspec.param[0] = hpet_id; + + parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY); + if (!parent) { + irq_domain_free_fwnode(fn); + kfree(domain_info); + return NULL; + } + if (parent != x86_vector_domain) + hpet_msi_controller.name = "IR-HPET-MSI"; + d = msi_create_irq_domain(fn, domain_info, parent); if (!d) { irq_domain_free_fwnode(fn); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 08/35] x86/apic: Cleanup destination mode
From: Thomas Gleixner apic::irq_dest_mode is actually a boolean, but defined as u32 and named in a way which does not explain what it means. Make it a boolean and rename it to 'dest_mode_logical' Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/apic.h | 2 +- arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/apic_flat_64.c | 4 ++-- arch/x86/kernel/apic/apic_noop.c | 4 +--- arch/x86/kernel/apic/apic_numachip.c | 4 ++-- arch/x86/kernel/apic/bigsmp_32.c | 3 +-- arch/x86/kernel/apic/io_apic.c| 2 +- arch/x86/kernel/apic/msi.c| 6 +++--- arch/x86/kernel/apic/probe_32.c | 3 +-- arch/x86/kernel/apic/x2apic_cluster.c | 2 +- arch/x86/kernel/apic/x2apic_phys.c| 2 +- arch/x86/kernel/apic/x2apic_uv_x.c| 2 +- arch/x86/kernel/smpboot.c | 7 ++- arch/x86/platform/uv/uv_irq.c | 2 +- arch/x86/xen/apic.c | 3 +-- drivers/iommu/amd/amd_iommu_types.h | 2 +- drivers/iommu/amd/iommu.c | 8 drivers/iommu/intel/irq_remapping.c | 2 +- 18 files changed, 26 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 019d7ac3b16e..652f62252349 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -309,7 +309,7 @@ struct apic { u32 disable_esr; enum apic_delivery_modes delivery_mode; - u32 irq_dest_mode; + booldest_mode_logical; u32 (*calc_dest_apicid)(unsigned int cpu); diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 29d28b34cb2f..54f04355aaa2 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1591,7 +1591,7 @@ static void setup_local_APIC(void) apic->init_apic_ldr(); #ifdef CONFIG_X86_32 - if (apic->irq_dest_mode == 1) { + if (apic->dest_mode_logical) { int logical_apicid, ldr_apicid; /* diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c index bbb1b89fe711..8f72b4351c9f 100644 --- a/arch/x86/kernel/apic/apic_flat_64.c +++ b/arch/x86/kernel/apic/apic_flat_64.c @@ -114,7 +114,7 @@ static struct apic apic_flat __ro_after_init = { .apic_id_registered = flat_apic_id_registered, .delivery_mode = APIC_DELIVERY_MODE_FIXED, - .irq_dest_mode = 1, /* logical */ + .dest_mode_logical = true, .disable_esr= 0, @@ -205,7 +205,7 @@ static struct apic apic_physflat __ro_after_init = { .apic_id_registered = flat_apic_id_registered, .delivery_mode = APIC_DELIVERY_MODE_FIXED, - .irq_dest_mode = 0, /* physical */ + .dest_mode_logical = false, .disable_esr= 0, diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c index 38f167ce5031..fe78319e0f7a 100644 --- a/arch/x86/kernel/apic/apic_noop.c +++ b/arch/x86/kernel/apic/apic_noop.c @@ -96,8 +96,7 @@ struct apic apic_noop __ro_after_init = { .apic_id_registered = noop_apic_id_registered, .delivery_mode = APIC_DELIVERY_MODE_FIXED, - /* logical delivery broadcast to all CPUs: */ - .irq_dest_mode = 1, + .dest_mode_logical = true, .disable_esr= 0, @@ -105,7 +104,6 @@ struct apic apic_noop __ro_after_init = { .init_apic_ldr = noop_init_apic_ldr, .ioapic_phys_id_map = default_ioapic_phys_id_map, .setup_apic_routing = NULL, - .cpu_present_to_apicid = default_cpu_present_to_apicid, .apicid_to_cpu_present = physid_set_mask_of_physid, diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c index 4ebf9fe2c95d..a54d817eb4b6 100644 --- a/arch/x86/kernel/apic/apic_numachip.c +++ b/arch/x86/kernel/apic/apic_numachip.c @@ -247,7 +247,7 @@ static const struct apic apic_numachip1 __refconst = { .apic_id_registered = numachip_apic_id_registered, .delivery_mode = APIC_DELIVERY_MODE_FIXED, - .irq_dest_mode = 0, /* physical */ + .dest_mode_logical = false, .disable_esr= 0, @@ -294,7 +294,7 @@ static const struct apic apic_numachip2 __refconst = { .apic_id_registered = numachip_apic_id_registered, .delivery_mode = APIC_DELIVERY_MODE_FIXED, - .irq_dest_mode = 0, /* physical */ + .dest_mode_logical = false, .disable_esr= 0, diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kern
[PATCH v3 16/35] x86/kvm: Use msi_msg shadow structs
From: Thomas Gleixner Use the bitfields in the x86 shadow structs instead of decomposing the 32bit value with macros. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/kvm/irq_comm.c | 31 +-- 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index 4aa1c2e00e2a..8a4de3f12820 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -16,8 +16,6 @@ #include -#include - #include "irq.h" #include "ioapic.h" @@ -104,22 +102,19 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, struct kvm_lapic_irq *irq) { - trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_format ? -(u64)e->msi.address_hi << 32 : 0), - e->msi.data); - - irq->dest_id = (e->msi.address_lo & - MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT; - if (kvm->arch.x2apic_format) - irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi); - irq->vector = (e->msi.data & - MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT; - irq->dest_mode = kvm_lapic_irq_dest_mode( - !!((1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo)); - irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data; - irq->delivery_mode = e->msi.data & 0x700; - irq->msi_redir_hint = ((e->msi.address_lo - & MSI_ADDR_REDIRECTION_LOWPRI) > 0); + struct msi_msg msg = { .address_lo = e->msi.address_lo, + .address_hi = e->msi.address_hi, + .data = e->msi.data }; + + trace_kvm_msi_set_irq(msg.address_lo | (kvm->arch.x2apic_format ? + (u64)msg.address_hi << 32 : 0), msg.data); + + irq->dest_id = x86_msi_msg_get_destid(&msg, kvm->arch.x2apic_format); + irq->vector = msg.arch_data.vector; + irq->dest_mode = kvm_lapic_irq_dest_mode(msg.arch_addr_lo.dest_mode_logical); + irq->trig_mode = msg.arch_data.is_level; + irq->delivery_mode = msg.arch_data.delivery_mode << 8; + irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint; irq->level = 1; irq->shorthand = APIC_DEST_NOSHORT; } -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select()
From: David Woodhouse Now that the old get_irq_domain() method has gone, consolidate on just the map_XXX_to_iommu() functions. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- drivers/iommu/intel/irq_remapping.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index bca44015bc1d..aeffda92b10b 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -203,13 +203,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu, return rc; } -static struct irq_domain *map_hpet_to_ir(u8 hpet_id) +static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id) { int i; for (i = 0; i < MAX_HPET_TBS; i++) { if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu) - return ir_hpet[i].iommu->ir_domain; + return ir_hpet[i].iommu; } return NULL; } @@ -225,13 +225,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic) return NULL; } -static struct irq_domain *map_ioapic_to_ir(int apic) -{ - struct intel_iommu *iommu = map_ioapic_to_iommu(apic); - - return iommu ? iommu->ir_domain : NULL; -} - static struct irq_domain *map_dev_to_ir(struct pci_dev *dev) { struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev); @@ -1418,12 +1411,14 @@ static int intel_irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, enum irq_domain_bus_token bus_token) { + struct intel_iommu *iommu = NULL; + if (x86_fwspec_is_ioapic(fwspec)) - return d == map_ioapic_to_ir(fwspec->param[0]); + iommu = map_ioapic_to_iommu(fwspec->param[0]); else if (x86_fwspec_is_hpet(fwspec)) - return d == map_hpet_to_ir(fwspec->param[0]); + iommu = map_hpet_to_iommu(fwspec->param[0]); - return 0; + return iommu && d == iommu->ir_domain; } static const struct irq_domain_ops intel_ir_domain_ops = { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 34/35] x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
From: David Woodhouse No functional change; just reserve the feature bit for now so that VMMs can start to implement it. This will allow the host to indicate that MSI emulation supports 15-bit destination IDs, allowing up to 32768 CPUs without interrupt remapping. cf. https://patchwork.kernel.org/patch/11816693/ for qemu Signed-off-by: David Woodhouse Acked-by: Paolo Bonzini --- Documentation/virt/kvm/cpuid.rst | 4 arch/x86/include/uapi/asm/kvm_para.h | 1 + 2 files changed, 5 insertions(+) diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index 9150e9d1c39b..f70b655821d5 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit async pf acknowledgment msr 0x4b564d07. +KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit + before using extended destination + ID bits in MSI address bits 11-5. + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 812e9b4c1114..950afebfba88 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -32,6 +32,7 @@ #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 +#define KVM_FEATURE_MSI_EXT_DEST_ID15 #define KVM_HINTS_REALTIME 0 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 25/35] iommu/vt-d: Implement select() method on remapping irqdomain
From: David Woodhouse Preparatory for removing irq_remapping_get_irq_domain() Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- drivers/iommu/intel/irq_remapping.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 96c84b19940e..b3b079c0b51e 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1431,7 +1431,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain, modify_irte(&data->irq_2_iommu, &entry); } +static int intel_irq_remapping_select(struct irq_domain *d, + struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + if (x86_fwspec_is_ioapic(fwspec)) + return d == map_ioapic_to_ir(fwspec->param[0]); + else if (x86_fwspec_is_hpet(fwspec)) + return d == map_hpet_to_ir(fwspec->param[0]); + + return 0; +} + static const struct irq_domain_ops intel_ir_domain_ops = { + .select = intel_irq_remapping_select, .alloc = intel_irq_remapping_alloc, .free = intel_irq_remapping_free, .activate = intel_irq_remapping_activate, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit
From: David Woodhouse The Intel IOMMU has an MSI-like configuration for its interrupt, but it isn't really MSI. So it gets to abuse the high 32 bits of the address, and puts the high 24 bits of the extended APIC ID there. This isn't something that can be used in the general case for real MSIs, since external devices using the high bits of the address would be performing writes to actual memory space above 4GiB, not targeted at the APIC. Factor the hack out and allow it only to be used when appropriate, adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC ID. That should never happen since the compatibility MSI messages are not used when Interrupt Remapping is enabled. The x2apic_enabled() check isn't needed because Linux won't bring up CPUs with higher APIC IDs unless IR and x2apic are enabled anyway. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/msi.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 6313f0a05db7..516df47bde73 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -23,13 +23,11 @@ struct irq_domain *x86_pci_msi_default_domain __ro_after_init; -static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) +static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, + bool dmar) { msg->address_hi = MSI_ADDR_BASE_HI; - if (x2apic_enabled()) - msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); - msg->address_lo = MSI_ADDR_BASE_LO | ((apic->irq_dest_mode == 0) ? @@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) MSI_DATA_LEVEL_ASSERT | MSI_DATA_DELIVERY_FIXED | MSI_DATA_VECTOR(cfg->vector); + + /* +* Only the IOMMU itself can use the trick of putting destination +* APIC ID into the high bits of the address. Anything else would +* just be writing to memory if it tried that, and needs IR to +* address higher APIC IDs. +*/ + if (dmar) + msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else + WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); } void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) { - __irq_msi_compose_msg(irqd_cfg(data), msg); + __irq_msi_compose_msg(irqd_cfg(data), msg, false); } static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg) { struct msi_msg msg[2] = { [1] = { }, }; - __irq_msi_compose_msg(cfg, msg); + __irq_msi_compose_msg(cfg, msg, false); irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg); } @@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent, #endif #ifdef CONFIG_DMAR_TABLE +/* + * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the + * high bits of the destination APIC ID. This can't be done in the general + * case for MSIs as it would be targeting real memory above 4GiB not the + * APIC. + */ +static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) +{ + __irq_msi_compose_msg(irqd_cfg(data), msg, true); +} + static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg) { dmar_msi_write(data->irq, msg); @@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = { .irq_ack= irq_chip_ack_parent, .irq_set_affinity = msi_domain_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_compose_msi_msg= dmar_msi_compose_msg, .irq_write_msi_msg = dmar_msi_write_msg, .flags = IRQCHIP_SKIP_SET_WAKE, }; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode
From: Thomas Gleixner The UV x2apic is strictly using physical destination mode, but apic::dest_logical is initialized with APIC_DEST_LOGICAL. This does not matter much because UV does not use any of the generic functions which use apic::dest_logical, but is still inconsistent. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/x2apic_uv_x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index 714233cee0b5..9ade9e6a95ff 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -811,7 +811,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = { .irq_dest_mode = 0, /* Physical */ .disable_esr= 0, - .dest_logical = APIC_DEST_LOGICAL, + .dest_logical = APIC_DEST_PHYSICAL, .check_apicid_used = NULL, .init_apic_ldr = uv_init_apic_ldr, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage
From: Thomas Gleixner All these functions are only used for logical destination mode. So reading the destination mode mask from the apic structure is a pointless exercise. Just hand in the proper constant: APIC_DEST_LOGICAL. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/apic_flat_64.c | 2 +- arch/x86/kernel/apic/ipi.c| 6 +++--- arch/x86/kernel/apic/x2apic_cluster.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c index fdd38a17f835..6df837fd5081 100644 --- a/arch/x86/kernel/apic/apic_flat_64.c +++ b/arch/x86/kernel/apic/apic_flat_64.c @@ -53,7 +53,7 @@ static void _flat_send_IPI_mask(unsigned long mask, int vector) unsigned long flags; local_irq_save(flags); - __default_send_IPI_dest_field(mask, vector, apic->dest_logical); + __default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL); local_irq_restore(flags); } diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c index 387154e39e08..d1fb874fbe64 100644 --- a/arch/x86/kernel/apic/ipi.c +++ b/arch/x86/kernel/apic/ipi.c @@ -260,7 +260,7 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask, for_each_cpu(query_cpu, mask) __default_send_IPI_dest_field( early_per_cpu(x86_cpu_to_logical_apicid, query_cpu), - vector, apic->dest_logical); + vector, APIC_DEST_LOGICAL); local_irq_restore(flags); } @@ -279,7 +279,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask, continue; __default_send_IPI_dest_field( early_per_cpu(x86_cpu_to_logical_apicid, query_cpu), - vector, apic->dest_logical); + vector, APIC_DEST_LOGICAL); } local_irq_restore(flags); } @@ -297,7 +297,7 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector) local_irq_save(flags); WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]); - __default_send_IPI_dest_field(mask, vector, apic->dest_logical); + __default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL); local_irq_restore(flags); } diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index 82fb43d807f7..f77e9fb7aac1 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -61,7 +61,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) if (!dest) continue; - __x2apic_send_IPI_dest(dest, vector, apic->dest_logical); + __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); /* Remove cluster CPUs from tmpmask */ cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask); } -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs
From: Thomas Gleixner Use the msi_msg shadow structs and compose the message with named bitfields instead of the unreadable macro maze. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/pci/xen.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index c552cd2d0632..3d41a09c2c14 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi, #if defined(CONFIG_PCI_MSI) #include -#include struct xen_pci_frontend_ops *xen_pci_frontend; EXPORT_SYMBOL_GPL(xen_pci_frontend); @@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) return ret; } -#define XEN_PIRQ_MSI_DATA (MSI_DATA_TRIGGER_EDGE | \ - MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0)) - static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq, struct msi_msg *msg) { - /* We set vector == 0 to tell the hypervisor we don't care about it, -* but we want a pirq setup instead. -* We use the dest_id field to pass the pirq that we want. */ - msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq); - msg->address_lo = - MSI_ADDR_BASE_LO | - MSI_ADDR_DEST_MODE_PHYSICAL | - MSI_ADDR_REDIRECTION_CPU | - MSI_ADDR_DEST_ID(pirq); - - msg->data = XEN_PIRQ_MSI_DATA; + /* +* We set vector == 0 to tell the hypervisor we don't care about +* it, but we want a pirq setup instead. We use the dest_id fields +* to pass the pirq that we want. +*/ + memset(msg, 0, sizeof(*msg)); + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; + msg->arch_addr_hi.destid_8_31 = pirq >> 8; + msg->arch_addr_lo.destid_0_7 = pirq & 0xFF; + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; + msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT; } static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 28/35] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
From: David Woodhouse All possible parent domains have a select method now. Make use of it. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/io_apic.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index ea983da1a57f..443d2c9086b9 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2320,36 +2320,37 @@ static inline void __init check_timer(void) static int mp_irqdomain_create(int ioapic) { - struct irq_alloc_info info; struct irq_domain *parent; int hwirqs = mp_ioapic_pin_count(ioapic); struct ioapic *ip = &ioapics[ioapic]; struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg; struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic); struct fwnode_handle *fn; - char *name = "IO-APIC"; + struct irq_fwspec fwspec; if (cfg->type == IOAPIC_DOMAIN_INVALID) return 0; - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT; - info.devid = mpc_ioapic_id(ioapic); - parent = irq_remapping_get_irq_domain(&info); - if (!parent) - parent = x86_vector_domain; - else - name = "IO-APIC-IR"; - /* Handle device tree enumerated APICs proper */ if (cfg->dev) { fn = of_node_to_fwnode(cfg->dev); } else { - fn = irq_domain_alloc_named_id_fwnode(name, ioapic); + fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic); if (!fn) return -ENOMEM; } + fwspec.fwnode = fn; + fwspec.param_count = 1; + fwspec.param[0] = ioapic; + + parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY); + if (!parent) { + if (!cfg->dev) + irq_domain_free_fwnode(fn); + return -ENODEV; + } + ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops, (void *)(long)ioapic); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE
From: David Woodhouse Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4 of the address used in the resulting MSI cycle. Historically, the x86 MSI format only used the top 8 of those 16 bits as the destination APIC ID, and the "Extended Destination ID" in the lower 8 bits was unused. With interrupt remapping, the lowest bit of the Extended Destination ID (bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable format MSI. A hypervisor can use the other 7 bits of the Extended Destination ID to permit guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs before having to expose a vIOMMU and interrupt remapping to the guest. No behavioural change in this patch, since nothing yet permits APIC IDs above 255 to be used with the non-IR I/OAPIC domain. [ tglx: Converted it to the cleaned up entry/msi_msg format and added commentry ] Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/io_apic.h | 3 ++- arch/x86/kernel/apic/io_apic.c | 20 +++- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index 73da644b2f0d..437aa8d00e53 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -67,7 +67,8 @@ struct IO_APIC_route_entry { is_level: 1, masked : 1, reserved_0 : 15, - reserved_1 : 24, + reserved_1 : 17, + virt_destid_8_14: 7, destid_0_7 : 8; }; struct { diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 443d2c9086b9..1cfd65ef295b 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1238,9 +1238,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries) (entry.ir_index_15 << 15) | entry.ir_index_0_14, entry.ir_zero); } else { - printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf, + printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n", buf, entry.dest_mode_logical ? "logical " : "physical", - entry.destid_0_7, entry.delivery_mode); + entry.virt_destid_8_14, entry.destid_0_7, + entry.delivery_mode); } } } @@ -1409,6 +1410,7 @@ void native_restore_boot_irq_mode(void) */ if (ioapic_i8259.pin != -1) { struct IO_APIC_route_entry entry; + u32 apic_id = read_apic_id(); memset(&entry, 0, sizeof(entry)); entry.masked= false; @@ -1416,7 +1418,8 @@ void native_restore_boot_irq_mode(void) entry.active_low= false; entry.dest_mode_logical = false; entry.delivery_mode = APIC_DELIVERY_MODE_EXTINT; - entry.destid_0_7= read_apic_id(); + entry.destid_0_7= apic_id & 0xFF; + entry.virt_destid_8_14 = apic_id >> 8; /* * Add it to the IO-APIC irq-routing table: @@ -1885,7 +1888,11 @@ static void ioapic_setup_msg_from_msi(struct irq_data *irq_data, /* DMAR/IR: 1, 0 for all other modes */ entry->ir_format= msg.arch_addr_lo.dmar_format; /* -* DMAR/IR: index bit 0-14. +* - DMAR/IR: index bit 0-14. +* +* - Virt: If the host supports x2apic without a virtualized IR +* unit then bit 0-6 of dmar_index_0_14 are providing bit +* 8-14 of the destination id. * * All other modes have bit 0-6 of dmar_index_0_14 cleared and the * topmost 8 bits are destination id bit 0-7 (entry::destid_0_7). @@ -2063,6 +2070,7 @@ static inline void __init unlock_ExtINT_logic(void) int apic, pin, i; struct IO_APIC_route_entry entry0, entry1; unsigned char save_control, save_freq_select; + u32 apic_id; pin = find_isa_irq_pin(8, mp_INT); if (pin == -1) { @@ -2078,11 +2086,13 @@ static inline void __init unlock_ExtINT_logic(void) entry0 = ioapic_read_entry(apic, pin); clear_IO_APIC_pin(apic, pin); + apic_id = hard_smp_processor_id(); memset(&entry1, 0, sizeof(entry1)); entry1.dest_mode_logical= true; entry1.masked = false; - entry1.destid_0_7 = hard_smp_processor_i
[PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
From: David Woodhouse If the 15-bit APIC ID support is present in emulated MSI then there's no need for the pseudo-remapping support. Signed-off-by: David Woodhouse --- drivers/iommu/hyperv-iommu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index a629a6be65c7..9438daa24fdb 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -121,6 +121,7 @@ static int __init hyperv_prepare_irq_remapping(void) int i; if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) || + x86_init.hyper.msi_ext_dest_id() || !x2apic_supported()) return -ENODEV; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 05/35] x86/apic: Cleanup delivery mode defines
From: Thomas Gleixner The enum ioapic_irq_destination_types and the enumerated constants starting with 'dest_' are gross misnomers because they describe the delivery mode. Rename then enum and the constants so they actually make sense. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/apic.h | 3 ++- arch/x86/include/asm/apicdef.h| 16 +++- arch/x86/kernel/apic/apic_flat_64.c | 4 ++-- arch/x86/kernel/apic/apic_noop.c | 2 +- arch/x86/kernel/apic/apic_numachip.c | 4 ++-- arch/x86/kernel/apic/bigsmp_32.c | 2 +- arch/x86/kernel/apic/io_apic.c| 11 ++- arch/x86/kernel/apic/probe_32.c | 2 +- arch/x86/kernel/apic/x2apic_cluster.c | 2 +- arch/x86/kernel/apic/x2apic_phys.c| 2 +- arch/x86/kernel/apic/x2apic_uv_x.c| 6 +++--- arch/x86/platform/uv/uv_irq.c | 2 +- drivers/iommu/amd/iommu.c | 4 ++-- drivers/iommu/intel/irq_remapping.c | 2 +- drivers/pci/controller/pci-hyperv.c | 6 +++--- 15 files changed, 34 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index b0fd204e0023..53bb62e0fdd5 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -309,7 +309,8 @@ struct apic { /* dest_logical is used by the IPI functions */ u32 dest_logical; u32 disable_esr; - u32 irq_delivery_mode; + + enum apic_delivery_modes delivery_mode; u32 irq_dest_mode; u32 (*calc_dest_apicid)(unsigned int cpu); diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h index 05e694ed8386..5716f22f81ac 100644 --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -432,15 +432,13 @@ struct local_apic { #define BAD_APICID 0xu #endif -enum ioapic_irq_destination_types { - dest_Fixed = 0, - dest_LowestPrio = 1, - dest_SMI= 2, - dest__reserved_1= 3, - dest_NMI= 4, - dest_INIT = 5, - dest__reserved_2= 6, - dest_ExtINT = 7 +enum apic_delivery_modes { + APIC_DELIVERY_MODE_FIXED= 0, + APIC_DELIVERY_MODE_LOWESTPRIO = 1, + APIC_DELIVERY_MODE_SMI = 2, + APIC_DELIVERY_MODE_NMI = 4, + APIC_DELIVERY_MODE_INIT = 5, + APIC_DELIVERY_MODE_EXTINT = 7, }; #endif /* _ASM_X86_APICDEF_H */ diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c index 7862b152a052..fdd38a17f835 100644 --- a/arch/x86/kernel/apic/apic_flat_64.c +++ b/arch/x86/kernel/apic/apic_flat_64.c @@ -113,7 +113,7 @@ static struct apic apic_flat __ro_after_init = { .apic_id_valid = default_apic_id_valid, .apic_id_registered = flat_apic_id_registered, - .irq_delivery_mode = dest_Fixed, + .delivery_mode = APIC_DELIVERY_MODE_FIXED, .irq_dest_mode = 1, /* logical */ .disable_esr= 0, @@ -206,7 +206,7 @@ static struct apic apic_physflat __ro_after_init = { .apic_id_valid = default_apic_id_valid, .apic_id_registered = flat_apic_id_registered, - .irq_delivery_mode = dest_Fixed, + .delivery_mode = APIC_DELIVERY_MODE_FIXED, .irq_dest_mode = 0, /* physical */ .disable_esr= 0, diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c index 780c702969b7..4fc934b11851 100644 --- a/arch/x86/kernel/apic/apic_noop.c +++ b/arch/x86/kernel/apic/apic_noop.c @@ -95,7 +95,7 @@ struct apic apic_noop __ro_after_init = { .apic_id_valid = default_apic_id_valid, .apic_id_registered = noop_apic_id_registered, - .irq_delivery_mode = dest_Fixed, + .delivery_mode = APIC_DELIVERY_MODE_FIXED, /* logical delivery broadcast to all CPUs: */ .irq_dest_mode = 1, diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c index 35edd57f064a..db715d082ec9 100644 --- a/arch/x86/kernel/apic/apic_numachip.c +++ b/arch/x86/kernel/apic/apic_numachip.c @@ -246,7 +246,7 @@ static const struct apic apic_numachip1 __refconst = { .apic_id_valid = numachip_apic_id_valid, .apic_id_registered = numachip_apic_id_registered, - .irq_delivery_mode = dest_Fixed, + .delivery_mode = APIC_DELIVERY_MODE_FIXED, .irq_dest_mode = 0, /* physical */ .disable_esr= 0, @@ -295,7 +295,7 @@ static const struct apic apic_numachip2 __refconst = { .api
[PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
From: David Woodhouse Prerequesite to make x86 more irqdomain compliant. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- kernel/irq/irqdomain.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index cf8b374b892d..673fa64c1c44 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { } static inline void debugfs_remove_domain_dir(struct irq_domain *d) { } #endif -const struct fwnode_operations irqchip_fwnode_ops; +static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode) +{ + struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode); + + return fwid->name; +} + +const struct fwnode_operations irqchip_fwnode_ops = { + .get_name = irqchip_fwnode_get_name, +}; EXPORT_SYMBOL_GPL(irqchip_fwnode_ops); /** -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 18/35] x86/msi: Remove msidef.h
From: Thomas Gleixner Nothing uses the macro maze anymore. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/msidef.h | 57 --- 1 file changed, 57 deletions(-) delete mode 100644 arch/x86/include/asm/msidef.h diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h deleted file mode 100644 index ee2f8ccc32d0.. --- a/arch/x86/include/asm/msidef.h +++ /dev/null @@ -1,57 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_X86_MSIDEF_H -#define _ASM_X86_MSIDEF_H - -/* - * Constants for Intel APIC based MSI messages. - */ - -/* - * Shifts for MSI data - */ - -#define MSI_DATA_VECTOR_SHIFT 0 -#define MSI_DATA_VECTOR_MASK 0x00ff -#define MSI_DATA_VECTOR(v) (((v) << MSI_DATA_VECTOR_SHIFT) & \ -MSI_DATA_VECTOR_MASK) - -#define MSI_DATA_DELIVERY_MODE_SHIFT 8 -#define MSI_DATA_DELIVERY_FIXED (0 << MSI_DATA_DELIVERY_MODE_SHIFT) -#define MSI_DATA_DELIVERY_LOWPRI (1 << MSI_DATA_DELIVERY_MODE_SHIFT) - -#define MSI_DATA_LEVEL_SHIFT 14 -#define MSI_DATA_LEVEL_DEASSERT(0 << MSI_DATA_LEVEL_SHIFT) -#define MSI_DATA_LEVEL_ASSERT (1 << MSI_DATA_LEVEL_SHIFT) - -#define MSI_DATA_TRIGGER_SHIFT 15 -#define MSI_DATA_TRIGGER_EDGE (0 << MSI_DATA_TRIGGER_SHIFT) -#define MSI_DATA_TRIGGER_LEVEL(1 << MSI_DATA_TRIGGER_SHIFT) - -/* - * Shift/mask fields for msi address - */ - -#define MSI_ADDR_BASE_HI 0 -#define MSI_ADDR_BASE_LO 0xfee0 - -#define MSI_ADDR_DEST_MODE_SHIFT 2 -#define MSI_ADDR_DEST_MODE_PHYSICAL (0 << MSI_ADDR_DEST_MODE_SHIFT) -#define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT) - -#define MSI_ADDR_REDIRECTION_SHIFT 3 -#define MSI_ADDR_REDIRECTION_CPU (0 << MSI_ADDR_REDIRECTION_SHIFT) - /* dedicated cpu */ -#define MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT) - /* lowest priority */ - -#define MSI_ADDR_DEST_ID_SHIFT 12 -#define MSI_ADDR_DEST_ID_MASK 0x000 -#define MSI_ADDR_DEST_ID(dest)(((dest) << MSI_ADDR_DEST_ID_SHIFT) & \ -MSI_ADDR_DEST_ID_MASK) -#define MSI_ADDR_EXT_DEST_ID(dest) ((dest) & 0xff00) - -#define MSI_ADDR_IR_EXT_INT(1 << 4) -#define MSI_ADDR_IR_SHV(1 << 3) -#define MSI_ADDR_IR_INDEX1(index) ((index & 0x8000) >> 13) -#define MSI_ADDR_IR_INDEX2(index) ((index & 0x7fff) << 5) -#endif /* _ASM_X86_MSIDEF_H */ -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields
Fix the conditions for enabling x2apic on guests without interrupt remapping, and support 15-bit Extended Destination ID to allow 32768 CPUs without IR on hypervisors that support it. Make the I/OAPIC code generate its RTE directly from the MSI message created by the parent irqchip, and fix up a bunch of magic mask/shift macros to use bitfields for MSI messages and I/OAPIC RTEs while we're at it. v3: • Lots of bitfield cleanups from Thomas. • Disable hyperv-iommu if 15-bit extension is present. • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream through KVM tree (and the other half needs to wait, or have an #ifdef) so is left at the top of the tree. v2: • Minor cleanups. • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static. • Generate I/OAPIC RTE directly from parent irqchip's MSI messages. • Clean up HPET MSI support into hpet.c now that we can. David Woodhouse (19): x86/apic: Fix x2apic enablement without interrupt remapping x86/msi: Only use high bits of MSI address for DMAR unit x86/apic: Always provide irq_compose_msi_msg() method for vector domain x86/hpet: Move MSI support into hpet.c x86/ioapic: Generate RTE directly from parent irqchip's MSI message genirq/irqdomain: Implement get_name() method on irqchip fwnodes x86/apic: Add select() method on vector irqdomain iommu/amd: Implement select() method on remapping irqdomain iommu/vt-d: Implement select() method on remapping irqdomain iommu/hyper-v: Implement select() method on remapping irqdomain x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain x86: Kill all traces of irq_remapping_get_irq_domain() iommu/vt-d: Simplify intel_irq_remapping_select() x86/ioapic: Handle Extended Destination ID field in RTE x86/apic: Support 15 bits of APIC ID in MSI where available iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected Thomas Gleixner (16): x86/apic/uv: Fix inconsistent destination mode x86/devicetree: Fix the ioapic interrupt type table x86/apic: Cleanup delivery mode defines x86/apic: Replace pointless apic::dest_logical usage x86/apic: Get rid of apic::dest_logical x86/apic: Cleanup destination mode genirq/msi: Allow shadow declarations of msi_msg::$member x86/msi: Provide msi message shadow structs iommu/intel: Use msi_msg shadow structs iommu/amd: Use msi_msg shadow structs PCI: vmd: Use msi_msg shadow structs x86/kvm: Use msi_msg shadow structs x86/pci/xen: Use msi_msg shadow structs x86/msi: Remove msidef.h x86/io_apic: Cleanup trigger/polarity helpers x86/ioapic: Cleanup IO/APIC route entry structs Documentation/virt/kvm/cpuid.rst | 4 + arch/x86/include/asm/apic.h | 16 +- arch/x86/include/asm/apicdef.h| 16 +- arch/x86/include/asm/hpet.h | 11 - arch/x86/include/asm/hw_irq.h | 13 +- arch/x86/include/asm/io_apic.h| 79 ++ arch/x86/include/asm/irq_remapping.h | 9 - arch/x86/include/asm/irqdomain.h | 3 + arch/x86/include/asm/msi.h| 50 arch/x86/include/asm/msidef.h | 57 arch/x86/include/asm/x86_init.h | 2 + arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kernel/apic/apic.c | 73 - arch/x86/kernel/apic/apic_flat_64.c | 18 +- arch/x86/kernel/apic/apic_noop.c | 10 +- arch/x86/kernel/apic/apic_numachip.c | 16 +- arch/x86/kernel/apic/bigsmp_32.c | 9 +- arch/x86/kernel/apic/io_apic.c| 503 ++ arch/x86/kernel/apic/ipi.c| 6 +- arch/x86/kernel/apic/msi.c| 153 +-- arch/x86/kernel/apic/probe_32.c | 9 +- arch/x86/kernel/apic/vector.c | 49 arch/x86/kernel/apic/x2apic_cluster.c | 10 +- arch/x86/kernel/apic/x2apic_phys.c| 17 +- arch/x86/kernel/apic/x2apic_uv_x.c| 12 +- arch/x86/kernel/devicetree.c | 30 +- arch/x86/kernel/hpet.c| 122 - arch/x86/kernel/kvm.c | 6 + arch/x86/kernel/smpboot.c | 8 +- arch/x86/kernel/x86_init.c| 1 + arch/x86/kvm/irq_comm.c | 31 +-- arch/x86/pci/intel_mid_pci.c | 8 +- arch/x86/pci/xen.c| 26 +- arch/x86/platform/uv/uv_irq.c | 4 +- arch/x86/xen/apic.c | 7 +- drivers/iommu/amd/amd_iommu_types.h | 2 +- drivers/iommu/amd/init.c | 46 ++-- drivers/iommu/amd/iommu.c | 93 +++ drivers/iommu
[PATCH v3 12/35] x86/msi: Provide msi message shadow structs
From: Thomas Gleixner Create shadow structs with named bitfields for msi_msg data, address_lo and address_hi and use them in the MSI message composer. Provide a function to retrieve the destination ID. This could be inline, but that'd create a circular header dependency. [dwmw2: fix bitfields not all to be a union] Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/msi.h | 49 + arch/x86/kernel/apic/apic.c | 35 ++ 2 files changed, 68 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h index cd30013d15d3..322fd905da9c 100644 --- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -9,4 +9,53 @@ typedef struct irq_alloc_info msi_alloc_info_t; int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg); +/* Structs and defines for the X86 specific MSI message format */ + +typedef struct x86_msi_data { + u32 vector : 8, + delivery_mode : 3, + dest_mode_logical : 1, + reserved: 2, + active_low : 1, + is_level: 1; + + u32 dmar_subhandle; +} __attribute__ ((packed)) arch_msi_msg_data_t; +#define arch_msi_msg_data x86_msi_data + +typedef struct x86_msi_addr_lo { + union { + struct { + u32 reserved_0 : 2, + dest_mode_logical : 1, + redirect_hint : 1, + reserved_1 : 8, + destid_0_7 : 8, + base_address: 12; + }; + struct { + u32 dmar_reserved_0 : 2, + dmar_index_15 : 1, + dmar_subhandle_valid: 1, + dmar_format : 1, + dmar_index_0_14 : 15, + dmar_base_address : 12; + }; + }; +} __attribute__ ((packed)) arch_msi_msg_addr_lo_t; +#define arch_msi_msg_addr_lo x86_msi_addr_lo + +#define X86_MSI_BASE_ADDRESS_LOW (0xfee0 >> 20) + +typedef struct x86_msi_addr_hi { + u32 reserved: 8, + destid_8_31 : 24; +} __attribute__ ((packed)) arch_msi_msg_addr_hi_t; +#define arch_msi_msg_addr_hi x86_msi_addr_hi + +#define X86_MSI_BASE_ADDRESS_HIGH (0) + +struct msi_msg; +u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid); + #endif /* _ASM_X86_MSI_H */ diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 4c15bf29ea2c..f7196ee0f005 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -50,7 +50,6 @@ #include #include #include -#include #include #include #include @@ -2484,22 +2483,16 @@ int hard_smp_processor_id(void) void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, bool dmar) { - msg->address_hi = MSI_ADDR_BASE_HI; + memset(msg, 0, sizeof(*msg)); - msg->address_lo = - MSI_ADDR_BASE_LO | - (apic->dest_mode_logical ? - MSI_ADDR_DEST_MODE_LOGICAL : - MSI_ADDR_DEST_MODE_PHYSICAL) | - MSI_ADDR_REDIRECTION_CPU | - MSI_ADDR_DEST_ID(cfg->dest_apicid); + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; + msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical; + msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF; - msg->data = - MSI_DATA_TRIGGER_EDGE | - MSI_DATA_LEVEL_ASSERT | - MSI_DATA_DELIVERY_FIXED | - MSI_DATA_VECTOR(cfg->vector); + msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_FIXED; + msg->arch_data.vector = cfg->vector; + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; /* * Only the IOMMU itself can use the trick of putting destination * APIC ID into the high bits of the address. Anything else would @@ -2507,11 +2500,21 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, * address higher APIC IDs. */ if (dmar) - msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8; else - WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); + WARN_ON_ONCE(cfg->dest_apicid > 0xFF); } +u32 x86_msi_msg_get_des
[PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available
From: David Woodhouse Some hypervisors can allow the guest to use the Extended Destination ID field in the MSI address to address up to 32768 CPUs. This applies to all downstream devices which generate MSI cycles, including HPET, I/OAPIC and PCI MSI. HPET and PCI MSI use the same __irq_msi_compose_msg() function, while I/OAPIC generates its own and had support for the extended bits added in a previous commit. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201009104616.1314746-6-dw...@infradead.org --- arch/x86/include/asm/msi.h | 3 ++- arch/x86/include/asm/x86_init.h | 2 ++ arch/x86/kernel/apic/apic.c | 26 -- arch/x86/kernel/x86_init.c | 1 + 4 files changed, 29 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h index 322fd905da9c..b85147d75626 100644 --- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -29,7 +29,8 @@ typedef struct x86_msi_addr_lo { u32 reserved_0 : 2, dest_mode_logical : 1, redirect_hint : 1, - reserved_1 : 8, + reserved_1 : 1, + virt_destid_8_14: 7, destid_0_7 : 8, base_address: 12; }; diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index dde5b3f1e7cd..5c69f7eb5d47 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -116,6 +116,7 @@ struct x86_init_pci { * @init_platform: platform setup * @guest_late_init: guest late init * @x2apic_available: X2APIC detection + * @msi_ext_dest_id: MSI supports 15-bit APIC IDs * @init_mem_mapping: setup early mappings during init_mem_mapping() * @init_after_bootmem:guest init after boot allocator is finished */ @@ -123,6 +124,7 @@ struct x86_hyper_init { void (*init_platform)(void); void (*guest_late_init)(void); bool (*x2apic_available)(void); + bool (*msi_ext_dest_id)(void); void (*init_mem_mapping)(void); void (*init_after_bootmem)(void); }; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index f7196ee0f005..6bd20c0de8bc 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -93,6 +93,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID; */ static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP; +/* + * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID + */ +static bool virt_ext_dest_id __ro_after_init; + /* * Map cpu index to physical APIC ID */ @@ -1841,6 +1846,8 @@ static __init void try_to_enable_x2apic(int remap_mode) return; if (remap_mode != IRQ_REMAP_X2APIC_MODE) { + u32 apic_limit = 255; + /* * Using X2APIC without IR is not architecturally supported * on bare metal but may be supported in guests. @@ -1851,12 +1858,22 @@ static __init void try_to_enable_x2apic(int remap_mode) return; } + /* +* If the hypervisor supports extended destination ID in +* MSI, that increases the maximum APIC ID that can be +* used for non-remapped IRQ domains. +*/ + if (x86_init.hyper.msi_ext_dest_id()) { + virt_ext_dest_id = 1; + apic_limit = 32767; + } + /* * Without IR, all CPUs can be addressed by IOAPIC/MSI only * in physical mode, and CPUs with an APIC ID that cannnot * be addressed must not be brought online. */ - x2apic_set_max_apicid(255); + x2apic_set_max_apicid(apic_limit); x2apic_phys = 1; } x2apic_enable(); @@ -2497,10 +2514,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, * Only the IOMMU itself can use the trick of putting destination * APIC ID into the high bits of the address. Anything else would * just be writing to memory if it tried that, and needs IR to -* address higher APIC IDs. +* address APICs which can't be addressed in the normal 32-bit +* address range at 0xFFEx. That is typically just 8 bits, but +* some hypervisors allow the extended destination ID field in bits +* 5-11 to be used, giving support for 15 bits of APIC IDs in total. */ if (dmar) msg->arch_addr_hi.destid_8_
[PATCH v3 14/35] iommu/amd: Use msi_msg shadow structs
From: Thomas Gleixner Get rid of the macro mess and use the shadow structs for the x86 specific MSI message format. Convert the intcapxt setup to use named bitfields as well while touching it anyway. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- drivers/iommu/amd/init.c | 46 ++- drivers/iommu/amd/iommu.c | 14 +++- 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 82e4af8f09bb..263670d36fed 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -1966,10 +1965,16 @@ static int iommu_setup_msi(struct amd_iommu *iommu) return 0; } -#define XT_INT_DEST_MODE(x)(((x) & 0x1ULL) << 2) -#define XT_INT_DEST_LO(x) (((x) & 0xFFULL) << 8) -#define XT_INT_VEC(x) (((x) & 0xFFULL) << 32) -#define XT_INT_DEST_HI(x) x) >> 24) & 0xFFULL) << 56) +union intcapxt { + u64 capxt; + u64 reserved_0 : 2, + dest_mode_logical : 1, + reserved_1 : 5, + destid_0_23 : 24, + vector : 8, + reserved_2 : 16, + destid_24_31: 8; +} __attribute__ ((packed)); /* * Setup the IntCapXT registers with interrupt routing information @@ -1978,28 +1983,29 @@ static int iommu_setup_msi(struct amd_iommu *iommu) */ static void iommu_update_intcapxt(struct amd_iommu *iommu) { - u64 val; - u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET); - u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET); - u32 data= readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET); - bool dm = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1; - u32 dest= ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF); + struct msi_msg msg; + union intcapxt xt; + u32 destid; - if (x2apic_enabled()) - dest |= MSI_ADDR_EXT_DEST_ID(addr_hi); + msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET); + msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET); + msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET); - val = XT_INT_VEC(data & 0xFF) | - XT_INT_DEST_MODE(dm) | - XT_INT_DEST_LO(dest) | - XT_INT_DEST_HI(dest); + destid = x86_msi_msg_get_destid(&msg, x2apic_enabled()); + + xt.capxt = 0ULL; + xt.dest_mode_logical = msg.arch_data.dest_mode_logical; + xt.vector = msg.arch_data.vector; + xt.destid_0_23 = destid & GENMASK(23, 0); + xt.destid_24_31 = destid >> 24; /** * Current IOMMU implemtation uses the same IRQ for all * 3 IOMMU interrupts. */ - writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET); - writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET); - writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET); + writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET); + writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET); + writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET); } static void _irq_notifier_notify(struct irq_affinity_notify *notify, diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index d7f0c8908602..473de0920b64 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -35,7 +35,6 @@ #include #include #include -#include #include #include #include @@ -3656,13 +3655,20 @@ struct irq_remap_ops amd_iommu_irq_ops = { .get_irq_domain = get_irq_domain, }; +static void fill_msi_msg(struct msi_msg *msg, u32 index) +{ + msg->data = index; + msg->address_lo = 0; + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; +} + static void irq_remapping_prepare_irte(struct amd_ir_data *data, struct irq_cfg *irq_cfg, struct irq_alloc_info *info, int devid, int index, int sub_handle) { struct irq_2_irte *irte_info = &data->irq_2_irte; - struct msi_msg *msg = &data->msi_entry; struct IO_APIC_route_entry *entry; struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; @@ -3693,9 +3699,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - msg->address_hi = MSI_ADDR_BASE_HI; -
[PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical
From: Thomas Gleixner struct apic has two members which store information about the destination mode: dest_logical and irq_dest_mode. dest_logical contains a mask which was historically used to set the destination mode in IPI messages. Over time the usage was reduced and the logical/physical functions were seperated. There are only a few places which still use 'dest_logical' but they can ues 'irq_dest_mode' instead. irq_dest_mode is actually a boolean where 0 means physical destination mode and 1 means logical destination mode. Of course the name does not reflect the functionality. This will be cleaned up in a subsequent change. Remove apic::dest_logical and fixup the remaining users. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/apic.h | 2 -- arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/apic_flat_64.c | 8 ++-- arch/x86/kernel/apic/apic_noop.c | 4 +--- arch/x86/kernel/apic/apic_numachip.c | 8 ++-- arch/x86/kernel/apic/bigsmp_32.c | 4 +--- arch/x86/kernel/apic/probe_32.c | 4 +--- arch/x86/kernel/apic/x2apic_cluster.c | 4 +--- arch/x86/kernel/apic/x2apic_phys.c| 4 +--- arch/x86/kernel/apic/x2apic_uv_x.c| 4 +--- arch/x86/kernel/smpboot.c | 5 +++-- arch/x86/xen/apic.c | 4 +--- 12 files changed, 15 insertions(+), 38 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 53bb62e0fdd5..019d7ac3b16e 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -306,8 +306,6 @@ struct apic { void(*send_IPI_all)(int vector); void(*send_IPI_self)(int vector); - /* dest_logical is used by the IPI functions */ - u32 dest_logical; u32 disable_esr; enum apic_delivery_modes delivery_mode; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 113f6ca7b828..29d28b34cb2f 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1591,7 +1591,7 @@ static void setup_local_APIC(void) apic->init_apic_ldr(); #ifdef CONFIG_X86_32 - if (apic->dest_logical) { + if (apic->irq_dest_mode == 1) { int logical_apicid, ldr_apicid; /* diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c index 6df837fd5081..bbb1b89fe711 100644 --- a/arch/x86/kernel/apic/apic_flat_64.c +++ b/arch/x86/kernel/apic/apic_flat_64.c @@ -117,11 +117,9 @@ static struct apic apic_flat __ro_after_init = { .irq_dest_mode = 1, /* logical */ .disable_esr= 0, - .dest_logical = APIC_DEST_LOGICAL, - .check_apicid_used = NULL, + .check_apicid_used = NULL, .init_apic_ldr = flat_init_apic_ldr, - .ioapic_phys_id_map = NULL, .setup_apic_routing = NULL, .cpu_present_to_apicid = default_cpu_present_to_apicid, @@ -210,11 +208,9 @@ static struct apic apic_physflat __ro_after_init = { .irq_dest_mode = 0, /* physical */ .disable_esr= 0, - .dest_logical = 0, - .check_apicid_used = NULL, + .check_apicid_used = NULL, .init_apic_ldr = physflat_init_apic_ldr, - .ioapic_phys_id_map = NULL, .setup_apic_routing = NULL, .cpu_present_to_apicid = default_cpu_present_to_apicid, diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c index 4fc934b11851..38f167ce5031 100644 --- a/arch/x86/kernel/apic/apic_noop.c +++ b/arch/x86/kernel/apic/apic_noop.c @@ -100,11 +100,9 @@ struct apic apic_noop __ro_after_init = { .irq_dest_mode = 1, .disable_esr= 0, - .dest_logical = APIC_DEST_LOGICAL, - .check_apicid_used = default_check_apicid_used, + .check_apicid_used = default_check_apicid_used, .init_apic_ldr = noop_init_apic_ldr, - .ioapic_phys_id_map = default_ioapic_phys_id_map, .setup_apic_routing = NULL, diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c index db715d082ec9..4ebf9fe2c95d 100644 --- a/arch/x86/kernel/apic/apic_numachip.c +++ b/arch/x86/kernel/apic/apic_numachip.c @@ -250,11 +250,9 @@ static const struct apic apic_numachip1 __refconst = { .irq_dest_mode = 0, /* physical */ .disable_esr= 0, - .dest_logical = 0, - .check_apicid_used = NULL, + .check_apicid_used = NULL, .init_apic_ldr
[PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table
From: Thomas Gleixner The ioapic interrupt type table is wrong as it assumes that polarity in IO/APIC context means active high when set. But the IO/APIC polarity is working the other way round. This works because the ordering of the entries is consistent with the device tree and the type information is not used by the IO/APIC interrupt chip. The whole trigger and polarity business of IO/APIC is misleading and the corresponding constants which are defined as 0/1 are not used consistently and are going to be removed. Rename the type table members to 'is_level' and 'active_low' and adjust the type information for consistency sake. No functional change. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/kernel/devicetree.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c index ddffd80f5c52..6a4cb71c2498 100644 --- a/arch/x86/kernel/devicetree.c +++ b/arch/x86/kernel/devicetree.c @@ -184,31 +184,31 @@ static unsigned int ioapic_id; struct of_ioapic_type { u32 out_type; - u32 trigger; - u32 polarity; + u32 is_level; + u32 active_low; }; static struct of_ioapic_type of_ioapic_type[] = { { - .out_type = IRQ_TYPE_EDGE_RISING, - .trigger= IOAPIC_EDGE, - .polarity = 1, + .out_type = IRQ_TYPE_EDGE_FALLING, + .is_level = 0, + .active_low = 1, }, { - .out_type = IRQ_TYPE_LEVEL_LOW, - .trigger= IOAPIC_LEVEL, - .polarity = 0, + .out_type = IRQ_TYPE_LEVEL_HIGH, + .is_level = 1, + .active_low = 0, }, { - .out_type = IRQ_TYPE_LEVEL_HIGH, - .trigger= IOAPIC_LEVEL, - .polarity = 1, + .out_type = IRQ_TYPE_LEVEL_LOW, + .is_level = 1, + .active_low = 1, }, { - .out_type = IRQ_TYPE_EDGE_FALLING, - .trigger= IOAPIC_EDGE, - .polarity = 0, + .out_type = IRQ_TYPE_EDGE_RISING, + .is_level = 0, + .active_low = 0, }, }; @@ -228,7 +228,7 @@ static int dt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, return -EINVAL; it = &of_ioapic_type[type_index]; - ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->trigger, it->polarity); + ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->is_level, it->active_low); tmp.devid = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain)); tmp.ioapic.pin = fwspec->param[0]; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping
From: David Woodhouse Currently, Linux as a hypervisor guest will enable x2apic only if there are no CPUs present at boot time with an APIC ID above 255. Hotplugging a CPU later with a higher APIC ID would result in a CPU which cannot be targeted by external interrupts. Add a filter in x2apic_apic_id_valid() which can be used to prevent such CPUs from coming online, and allow x2apic to be enabled even if they are present at boot time. Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM") Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/apic.h| 1 + arch/x86/kernel/apic/apic.c| 14 -- arch/x86/kernel/apic/x2apic_phys.c | 9 + 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 1c129abb7f09..b0fd204e0023 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void) extern int x2apic_mode; extern int x2apic_phys; +extern void __init x2apic_set_max_apicid(u32 apicid); extern void __init check_x2apic(void); extern void x2apic_setup(void); static inline int x2apic_enabled(void) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b3eef1d5c903..113f6ca7b828 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode) return; if (remap_mode != IRQ_REMAP_X2APIC_MODE) { - /* IR is required if there is APIC ID > 255 even when running -* under KVM + /* +* Using X2APIC without IR is not architecturally supported +* on bare metal but may be supported in guests. */ - if (max_physical_apicid > 255 || - !x86_init.hyper.x2apic_available()) { + if (!x86_init.hyper.x2apic_available()) { pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n"); x2apic_disable(); return; } /* -* without IR all CPUs can be addressed by IOAPIC/MSI -* only in physical mode +* Without IR, all CPUs can be addressed by IOAPIC/MSI only +* in physical mode, and CPUs with an APIC ID that cannnot +* be addressed must not be brought online. */ + x2apic_set_max_apicid(255); x2apic_phys = 1; } x2apic_enable(); diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index bc9693841353..e14eae6d6ea7 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -8,6 +8,12 @@ int x2apic_phys; static struct apic apic_x2apic_phys; +static u32 x2apic_max_apicid __ro_after_init; + +void __init x2apic_set_max_apicid(u32 apicid) +{ + x2apic_max_apicid = apicid; +} static int __init set_x2apic_phys_mode(char *arg) { @@ -98,6 +104,9 @@ static int x2apic_phys_probe(void) /* Common x2apic functions, also used by x2apic_cluster */ int x2apic_apic_id_valid(u32 apicid) { + if (x2apic_max_apicid && apicid > x2apic_max_apicid) + return 0; + return 1; } -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 15/35] PCI: vmd: Use msi_msg shadow structs
From: Thomas Gleixner Use the x86 shadow structs in msi_msg instead of the macros. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- drivers/pci/controller/vmd.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index aa1b12bac9a1..72de3c6f644e 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -18,7 +18,6 @@ #include #include #include -#include #define VMD_CFGBAR 0 #define VMD_MEMBAR12 @@ -131,10 +130,10 @@ static void vmd_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) struct vmd_irq_list *irq = vmdirq->irq; struct vmd_dev *vmd = irq_data_get_irq_handler_data(data); - msg->address_hi = MSI_ADDR_BASE_HI; - msg->address_lo = MSI_ADDR_BASE_LO | - MSI_ADDR_DEST_ID(index_from_irqs(vmd, irq)); - msg->data = 0; + memset(&msg, 0, sizeof(*msg); + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH; + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; + msg->arch_addr_lo.destid_0_7 = index_from_irqs(vmd, irq); } /* -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain
From: David Woodhouse This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver interrupts through MSI without having any PCI in the system at all. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/apic.h | 8 +++- arch/x86/kernel/apic/apic.c | 32 ++ arch/x86/kernel/apic/msi.c| 37 --- arch/x86/kernel/apic/vector.c | 6 ++ 4 files changed, 41 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 652f62252349..b489bb9894ae 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -520,12 +520,10 @@ static inline void apic_smt_update(void) { } #endif struct msi_msg; +struct irq_cfg; -#ifdef CONFIG_PCI_MSI -void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg); -#else -# define x86_vector_msi_compose_msg NULL -#endif +extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, + bool dmar); extern void ioapic_zap_locks(void); diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 54f04355aaa2..4c15bf29ea2c 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include #include @@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void) return read_apic_id(); } +void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, + bool dmar) +{ + msg->address_hi = MSI_ADDR_BASE_HI; + + msg->address_lo = + MSI_ADDR_BASE_LO | + (apic->dest_mode_logical ? + MSI_ADDR_DEST_MODE_LOGICAL : + MSI_ADDR_DEST_MODE_PHYSICAL) | + MSI_ADDR_REDIRECTION_CPU | + MSI_ADDR_DEST_ID(cfg->dest_apicid); + + msg->data = + MSI_DATA_TRIGGER_EDGE | + MSI_DATA_LEVEL_ASSERT | + MSI_DATA_DELIVERY_FIXED | + MSI_DATA_VECTOR(cfg->vector); + + /* +* Only the IOMMU itself can use the trick of putting destination +* APIC ID into the high bits of the address. Anything else would +* just be writing to memory if it tried that, and needs IR to +* address higher APIC IDs. +*/ + if (dmar) + msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else + WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); +} + /* * Override the generic EOI implementation with an optimized version. * Only called during early boot when only one CPU is active and with diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 46ffd41a4238..4eda617eda1e 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -15,7 +15,6 @@ #include #include #include -#include #include #include #include @@ -23,42 +22,6 @@ struct irq_domain *x86_pci_msi_default_domain __ro_after_init; -static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, - bool dmar) -{ - msg->address_hi = MSI_ADDR_BASE_HI; - - msg->address_lo = - MSI_ADDR_BASE_LO | - (apic->dest_mode_logical ? - MSI_ADDR_DEST_MODE_LOGICAL : - MSI_ADDR_DEST_MODE_PHYSICAL) | - MSI_ADDR_REDIRECTION_CPU | - MSI_ADDR_DEST_ID(cfg->dest_apicid); - - msg->data = - MSI_DATA_TRIGGER_EDGE | - MSI_DATA_LEVEL_ASSERT | - MSI_DATA_DELIVERY_FIXED | - MSI_DATA_VECTOR(cfg->vector); - - /* -* Only the IOMMU itself can use the trick of putting destination -* APIC ID into the high bits of the address. Anything else would -* just be writing to memory if it tried that, and needs IR to -* address higher APIC IDs. -*/ - if (dmar) - msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); - else - WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); -} - -void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) -{ - __irq_msi_compose_msg(irqd_cfg(data), msg, false); -} - static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg) { struct msi_msg msg[2] = { [1] = { }, }; diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 1eac53632786..bb2e2a2488a5 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd) apic_ack_irq(irqd); } +static void x86_vector_msi_compose_msg(struct irq_data *data, + struct msi_msg *msg) +{ + __irq_msi_c
[PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain
From: David Woodhouse This will be used to select the irqdomain for I/O-APIC and HPET. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/irqdomain.h | 3 +++ arch/x86/kernel/apic/vector.c| 43 2 files changed, 46 insertions(+) diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h index cd684d45cb5f..125c23b7bad3 100644 --- a/arch/x86/include/asm/irqdomain.h +++ b/arch/x86/include/asm/irqdomain.h @@ -12,6 +12,9 @@ enum { X86_IRQ_ALLOC_LEGACY= 0x2, }; +extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec); +extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec); + extern struct irq_domain *x86_vector_domain; extern void init_irq_alloc_info(struct irq_alloc_info *info, diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index bb2e2a2488a5..b9b05caa28a4 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d, } #endif +int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec) +{ + if (fwspec->param_count != 1) + return 0; + + if (is_fwnode_irqchip(fwspec->fwnode)) { + const char *fwname = fwnode_get_name(fwspec->fwnode); + return fwname && !strncmp(fwname, "IO-APIC-", 8) && + simple_strtol(fwname+8, NULL, 10) == fwspec->param[0]; + } + return to_of_node(fwspec->fwnode) && + of_device_is_compatible(to_of_node(fwspec->fwnode), + "intel,ce4100-ioapic"); +} + +int x86_fwspec_is_hpet(struct irq_fwspec *fwspec) +{ + if (fwspec->param_count != 1) + return 0; + + if (is_fwnode_irqchip(fwspec->fwnode)) { + const char *fwname = fwnode_get_name(fwspec->fwnode); + return fwname && !strncmp(fwname, "HPET-MSI-", 9) && + simple_strtol(fwname+9, NULL, 10) == fwspec->param[0]; + } + return 0; +} + +static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec, +enum irq_domain_bus_token bus_token) +{ + /* +* HPET and I/OAPIC cannot be parented in the vector domain +* if IRQ remapping is enabled. APIC IDs above 15 bits are +* only permitted if IRQ remapping is enabled, so check that. +*/ + if (apic->apic_id_valid(32768)) + return 0; + + return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec); +} + static const struct irq_domain_ops x86_vector_domain_ops = { + .select = x86_vector_select, .alloc = x86_vector_alloc_irqs, .free = x86_vector_free_irqs, .activate = x86_vector_activate, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
From: David Woodhouse This allows the host to indicate that MSI emulation supports 15-bit destination IDs, allowing up to 32768 CPUs without interrupt remapping. cf. https://patchwork.kernel.org/patch/11816693/ for qemu Signed-off-by: David Woodhouse Acked-by: Paolo Bonzini --- arch/x86/kernel/kvm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 1c0f2560a41c..b82de2843814 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -740,6 +740,11 @@ static void __init kvm_apic_init(void) #endif } +static bool __init kvm_msi_ext_dest_id(void) +{ + return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID); +} + static void __init kvm_init_platform(void) { kvmclock_init(); @@ -769,6 +774,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = { .type = X86_HYPER_KVM, .init.guest_late_init = kvm_guest_init, .init.x2apic_available = kvm_para_available, + .init.msi_ext_dest_id = kvm_msi_ext_dest_id, .init.init_platform = kvm_init_platform, #if defined(CONFIG_AMD_MEM_ENCRYPT) .runtime.sev_es_hcall_prepare = kvm_sev_es_hcall_prepare, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built
On Wed, 2020-10-14 at 14:57 +0200, Joerg Roedel wrote: > On Wed, Oct 14, 2020 at 03:25:08PM +0800, Lu Baolu wrote: > > I suppose Joerg will pick this up. I guess you don't need to resend it > > unless Joerg asks you to do. > > Yes, will pick this up soon, no need to re-send. Please could it, and the commit it fixes, both go to stable@ smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built
On Tue, 2020-10-13 at 09:30 +0200, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski > > Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units > with no supported address widths") dmar.c needs struct iommu_device to > be selected. We can drop this dependency by not dereferencing struct > iommu_device if IOMMU_API is not selected and by reusing the information > stored in iommu->drhd->ignored instead. > > This fixes the following build error when IOMMU_API is not selected: > > drivers/iommu/intel/dmar.c: In function ‘free_iommu’: > drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no > member named ‘ops’ > 1139 | if (intel_iommu_enabled && iommu->iommu.ops) { > ^ > > Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no > supported address widths") > Signed-off-by: Bartosz Golaszewski Oops, apologies for that. Thanks for fixing it. Acked-by: David Woodhouse smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
From: David Woodhouse Signed-off-by: David Woodhouse --- arch/x86/kernel/hpet.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 3b8b12769f3b..fb7736ca7b5b 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id) { struct msi_domain_info *domain_info; struct irq_domain *parent, *d; - struct irq_alloc_info info; struct fwnode_handle *fn; + struct irq_fwspec fwspec; if (x86_vector_domain == NULL) return NULL; @@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id) *domain_info = hpet_msi_domain_info; domain_info->data = (void *)(long)hpet_id; - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; - info.devid = hpet_id; - parent = irq_remapping_get_irq_domain(&info); - if (parent == NULL) - parent = x86_vector_domain; - else - hpet_msi_controller.name = "IR-HPET-MSI"; - fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name, hpet_id); if (!fn) { @@ -572,6 +563,18 @@ static struct irq_domain *hpet_create_irq_domain(int hpet_id) return NULL; } + fwspec.fwnode = fn; + fwspec.param_count = 1; + fwspec.param[0] = hpet_id; + parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY); + if (!parent) { + irq_domain_free_fwnode(fn); + kfree(domain_info); + return NULL; + } + if (parent != x86_vector_domain) + hpet_msi_controller.name = "IR-HPET-MSI"; + d = msi_create_irq_domain(fn, domain_info, parent); if (!d) { irq_domain_free_fwnode(fn); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/9] iommu/vt-d: Implement select() method on remapping irqdomain
From: David Woodhouse Signed-off-by: David Woodhouse --- drivers/iommu/intel/irq_remapping.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 511dfb4884bc..40c2fec122b8 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1435,7 +1435,20 @@ static void intel_irq_remapping_deactivate(struct irq_domain *domain, modify_irte(&data->irq_2_iommu, &entry); } +static int intel_irq_remapping_select(struct irq_domain *d, + struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + if (x86_fwspec_is_ioapic(fwspec)) + return d == map_ioapic_to_ir(fwspec->param[0]); + else if (x86_fwspec_is_hpet(fwspec)) + return d == map_hpet_to_ir(fwspec->param[0]); + + return 0; +} + static const struct irq_domain_ops intel_ir_domain_ops = { + .select = intel_irq_remapping_select, .alloc = intel_irq_remapping_alloc, .free = intel_irq_remapping_free, .activate = intel_irq_remapping_activate, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/9] Remove irq_remapping_get_irq_domain()
I didn't much like the I/OAPIC and HPET drivers having magical knowledge that they had to substitute x86_vector_domain if their call to irq_remapping_get_irq_domain() returned NULL. When Thomas tried to make it handle error returns from …get_irq_domain() distinctly from the NULL case too, it made me even sadder. So I killed it with fire. Now they just use irq_find_matching_fwspec() to find an appropriate irqdomain. Each remapping irqdomain just needs to say 'yep, that's me' for the HPETs or I/OAPICs which are within their scope, while the x86_vector_domain accepts them all but only if interrupt remapping is *disabled*. No more special knowledge in the caller. If IR is enabled and there's a child device which escapes the scope of all remapping units, it gets NULL for its parent irqdomain and will fail to initialise, which is the correct thing to do in that "should never happen" case. For HPET that'll mean that it just doesn't support MSI, while I/OAPIC will refuse to initialise and trigger a BUG_ON because Linux quite likes it when *all* the I/OAPICs it knows about get initialised successfully. This is on top of the previous 'ext_dest_id' series at https://patchwork.kernel.org/project/kvm/list/?series=362037 https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id David Woodhouse (9): genirq/irqdomain: Implement get_name() method on irqchip fwnodes x86/apic: Add select() method on vector irqdomain iommu/amd: Implement select() method on remapping irqdomain iommu/vt-d: Implement select() method on remapping irqdomain iommu/hyper-v: Implement select() method on remapping irqdomain x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain x86: Kill all traces of irq_remapping_get_irq_domain() iommu/vt-d: Simplify intel_irq_remapping_select() arch/x86/include/asm/hw_irq.h| 2 -- arch/x86/include/asm/irq_remapping.h | 9 - arch/x86/include/asm/irqdomain.h | 3 +++ arch/x86/kernel/apic/io_apic.c | 24 arch/x86/kernel/apic/vector.c| 43 +++ arch/x86/kernel/hpet.c | 23 +-- drivers/iommu/amd/iommu.c| 53 +++-- drivers/iommu/hyperv-iommu.c | 18 +- drivers/iommu/intel/irq_remapping.c | 43 +-- drivers/iommu/irq_remapping.c| 14 -- drivers/iommu/irq_remapping.h| 3 --- kernel/irq/irqdomain.c | 11 ++- 12 files changed, 126 insertions(+), 120 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/9] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
From: David Woodhouse Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/io_apic.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index ca2da19d5c55..73cacc92c3bb 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2305,36 +2305,36 @@ static inline void __init check_timer(void) static int mp_irqdomain_create(int ioapic) { - struct irq_alloc_info info; struct irq_domain *parent; int hwirqs = mp_ioapic_pin_count(ioapic); struct ioapic *ip = &ioapics[ioapic]; struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg; struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic); struct fwnode_handle *fn; - char *name = "IO-APIC"; + struct irq_fwspec fwspec; if (cfg->type == IOAPIC_DOMAIN_INVALID) return 0; - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT; - info.devid = mpc_ioapic_id(ioapic); - parent = irq_remapping_get_irq_domain(&info); - if (!parent) - parent = x86_vector_domain; - else - name = "IO-APIC-IR"; - /* Handle device tree enumerated APICs proper */ if (cfg->dev) { fn = of_node_to_fwnode(cfg->dev); } else { - fn = irq_domain_alloc_named_id_fwnode(name, ioapic); + fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic); if (!fn) return -ENOMEM; } + fwspec.fwnode = fn; + fwspec.param_count = 1; + fwspec.param[0] = ioapic; + parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY); + if (!parent) { + if (!cfg->dev) + irq_domain_free_fwnode(fn); + return -ENODEV; + } + ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops, (void *)(long)ioapic); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes
From: David Woodhouse Signed-off-by: David Woodhouse --- kernel/irq/irqdomain.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 76cd7ebd1178..6440f97c412e 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain *d) { } static inline void debugfs_remove_domain_dir(struct irq_domain *d) { } #endif -const struct fwnode_operations irqchip_fwnode_ops; +static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode) +{ + struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, fwnode); + + return fwid->name; +} + +const struct fwnode_operations irqchip_fwnode_ops = { + .get_name = irqchip_fwnode_get_name, +}; EXPORT_SYMBOL_GPL(irqchip_fwnode_ops); /** -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/9] iommu/hyper-v: Implement select() method on remapping irqdomain
From: David Woodhouse Signed-off-by: David Woodhouse --- drivers/iommu/hyperv-iommu.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index 37dd485a5640..6a8966fbc3bd 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -61,6 +61,14 @@ static struct irq_chip hyperv_ir_chip = { .irq_set_affinity = hyperv_ir_set_affinity, }; +static int hyperv_irq_remapping_select(struct irq_domain *d, + struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + /* Claim only the first (and only) I/OAPIC */ + return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0; +} + static int hyperv_irq_remapping_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *arg) @@ -102,6 +110,7 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain, } static const struct irq_domain_ops hyperv_ir_domain_ops = { + .select = hyperv_irq_remapping_select, .alloc = hyperv_irq_remapping_alloc, .free = hyperv_irq_remapping_free, }; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select()
From: David Woodhouse Now that the old get_irq_domain() method has gone, we can consolidate on just the map_XXX_to_iommu() functions. Signed-off-by: David Woodhouse --- drivers/iommu/intel/irq_remapping.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index ccf61cd18f69..8a41bcb10e26 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -204,13 +204,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu, return rc; } -static struct irq_domain *map_hpet_to_ir(u8 hpet_id) +static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id) { int i; for (i = 0; i < MAX_HPET_TBS; i++) { if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu) - return ir_hpet[i].iommu->ir_domain; + return ir_hpet[i].iommu; } return NULL; } @@ -226,13 +226,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic) return NULL; } -static struct irq_domain *map_ioapic_to_ir(int apic) -{ - struct intel_iommu *iommu = map_ioapic_to_iommu(apic); - - return iommu ? iommu->ir_domain : NULL; -} - static struct irq_domain *map_dev_to_ir(struct pci_dev *dev) { struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev); @@ -1422,12 +1415,14 @@ static int intel_irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, enum irq_domain_bus_token bus_token) { + struct intel_iommu *iommu = NULL; + if (x86_fwspec_is_ioapic(fwspec)) - return d == map_ioapic_to_ir(fwspec->param[0]); + iommu = map_ioapic_to_iommu(fwspec->param[0]); else if (x86_fwspec_is_hpet(fwspec)) - return d == map_hpet_to_ir(fwspec->param[0]); + iommu = map_hpet_to_iommu(fwspec->param[0]); - return 0; + return iommu && d == iommu->ir_domain; } static const struct irq_domain_ops intel_ir_domain_ops = { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain
From: David Woodhouse Signed-off-by: David Woodhouse --- drivers/iommu/amd/iommu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 13d0a8f42d56..7ecebc5d255f 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3590,6 +3590,24 @@ struct irq_remap_ops amd_iommu_irq_ops = { .get_irq_domain = get_irq_domain, }; +static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, + enum irq_domain_bus_token bus_token) +{ + struct amd_iommu *iommu; + int devid = -1; + + if (x86_fwspec_is_ioapic(fwspec)) + devid = get_ioapic_devid(fwspec->param[0]); + else if (x86_fwspec_is_ioapic(fwspec)) + devid = get_hpet_devid(fwspec->param[0]); + + if (devid < 0) + return 0; + + iommu = amd_iommu_rlookup_table[devid]; + return iommu && iommu->ir_domain == d; +} + static void irq_remapping_prepare_irte(struct amd_ir_data *data, struct irq_cfg *irq_cfg, struct irq_alloc_info *info, @@ -3813,6 +3831,7 @@ static void irq_remapping_deactivate(struct irq_domain *domain, } static const struct irq_domain_ops amd_ir_domain_ops = { + .select = irq_remapping_select, .alloc = irq_remapping_alloc, .free = irq_remapping_free, .activate = irq_remapping_activate, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain()
From: David Woodhouse Signed-off-by: David Woodhouse --- arch/x86/include/asm/hw_irq.h| 2 -- arch/x86/include/asm/irq_remapping.h | 9 drivers/iommu/amd/iommu.c| 34 drivers/iommu/hyperv-iommu.c | 9 drivers/iommu/intel/irq_remapping.c | 17 -- drivers/iommu/irq_remapping.c| 14 drivers/iommu/irq_remapping.h| 3 --- 7 files changed, 88 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index aabd8f1b6bb0..aef795f17478 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -40,8 +40,6 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_PCI_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, - X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT, - X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT, }; struct ioapic_alloc_info { diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index af4a151d70b3..7cc49432187f 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int); extern int irq_remap_enable_fault_handling(void); extern void panic_if_irq_remap(const char *msg); -extern struct irq_domain * -irq_remapping_get_irq_domain(struct irq_alloc_info *info); - /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */ extern struct irq_domain * arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id); @@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg) { } -static inline struct irq_domain * -irq_remapping_get_irq_domain(struct irq_alloc_info *info) -{ - return NULL; -} - #endif /* CONFIG_IRQ_REMAP */ #endif /* __X86_IRQ_REMAPPING_H */ diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 7ecebc5d255f..16adbf9d8fbb 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3536,10 +3536,8 @@ static int get_devid(struct irq_alloc_info *info) { switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: - case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: return get_ioapic_devid(info->devid); case X86_IRQ_ALLOC_TYPE_HPET: - case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: return get_hpet_devid(info->devid); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: @@ -3550,44 +3548,12 @@ static int get_devid(struct irq_alloc_info *info) } } -static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info, - int devid) -{ - struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; - - if (!iommu) - return NULL; - - switch (info->type) { - case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: - case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: - return iommu->ir_domain; - default: - WARN_ON_ONCE(1); - return NULL; - } -} - -static struct irq_domain *get_irq_domain(struct irq_alloc_info *info) -{ - int devid; - - if (!info) - return NULL; - - devid = get_devid(info); - if (devid < 0) - return NULL; - return get_irq_domain_for_devid(info, devid); -} - struct irq_remap_ops amd_iommu_irq_ops = { .prepare= amd_iommu_prepare, .enable = amd_iommu_enable, .disable= amd_iommu_disable, .reenable = amd_iommu_reenable, .enable_faulting= amd_iommu_enable_faulting, - .get_irq_domain = get_irq_domain, }; static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec, diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index 6a8966fbc3bd..e7ed2bb83358 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void) return IRQ_REMAP_X2APIC_MODE; } -static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info) -{ - if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT) - return ioapic_ir_domain; - else - return NULL; -} - struct irq_remap_ops hyperv_irq_remap_ops = { .prepare= hyperv_prepare_irq_remapping, .enable = hyperv_enable_irq_remapping, - .get_irq_domain = hyperv_get_irq_domain, }; #endif diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 40c2fec122b8..ccf61cd18f69 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1128,29 +1128,12 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest) irte->redi
[PATCH 2/9] x86/apic: Add select() method on vector irqdomain
From: David Woodhouse This will be used to select the irqdomain for I/OAPIC and HPET. Signed-off-by: David Woodhouse --- arch/x86/include/asm/irqdomain.h | 3 +++ arch/x86/kernel/apic/vector.c| 43 2 files changed, 46 insertions(+) diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h index cd684d45cb5f..125c23b7bad3 100644 --- a/arch/x86/include/asm/irqdomain.h +++ b/arch/x86/include/asm/irqdomain.h @@ -12,6 +12,9 @@ enum { X86_IRQ_ALLOC_LEGACY= 0x2, }; +extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec); +extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec); + extern struct irq_domain *x86_vector_domain; extern void init_irq_alloc_info(struct irq_alloc_info *info, diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index bb2e2a2488a5..b9b05caa28a4 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d, } #endif +int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec) +{ + if (fwspec->param_count != 1) + return 0; + + if (is_fwnode_irqchip(fwspec->fwnode)) { + const char *fwname = fwnode_get_name(fwspec->fwnode); + return fwname && !strncmp(fwname, "IO-APIC-", 8) && + simple_strtol(fwname+8, NULL, 10) == fwspec->param[0]; + } + return to_of_node(fwspec->fwnode) && + of_device_is_compatible(to_of_node(fwspec->fwnode), + "intel,ce4100-ioapic"); +} + +int x86_fwspec_is_hpet(struct irq_fwspec *fwspec) +{ + if (fwspec->param_count != 1) + return 0; + + if (is_fwnode_irqchip(fwspec->fwnode)) { + const char *fwname = fwnode_get_name(fwspec->fwnode); + return fwname && !strncmp(fwname, "HPET-MSI-", 9) && + simple_strtol(fwname+9, NULL, 10) == fwspec->param[0]; + } + return 0; +} + +static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec, +enum irq_domain_bus_token bus_token) +{ + /* +* HPET and I/OAPIC cannot be parented in the vector domain +* if IRQ remapping is enabled. APIC IDs above 15 bits are +* only permitted if IRQ remapping is enabled, so check that. +*/ + if (apic->apic_id_valid(32768)) + return 0; + + return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec); +} + static const struct irq_domain_ops x86_vector_domain_ops = { + .select = x86_vector_select, .alloc = x86_vector_alloc_irqs, .free = x86_vector_free_irqs, .activate = x86_vector_activate, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On Thu, 2020-10-08 at 14:40 +0200, Thomas Gleixner wrote: > Subject: x86/iommu: Make interrupt remapping more robust > From: Thomas Gleixner > Date: Thu, 08 Oct 2020 14:09:44 +0200 > > Needs to be split into pieces and cover PCI proper. Right now PCI gets a > NULL pointer assigned which makes it explode at the wrong place > later. Also hyperv iommu wants some love. > > NOT-Signed-off-by: Thomas Gleixner > --- > arch/x86/kernel/apic/io_apic.c |4 +++- > arch/x86/kernel/apic/msi.c | 24 ++-- > drivers/iommu/amd/iommu.c |6 +++--- > drivers/iommu/intel/irq_remapping.c |4 ++-- > 4 files changed, 22 insertions(+), 16 deletions(-) > > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -2300,7 +2300,9 @@ static int mp_irqdomain_create(int ioapi > info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT; > info.devid = mpc_ioapic_id(ioapic); > parent = irq_remapping_get_irq_domain(&info); > - if (!parent) > + if (IS_ERR(parent)) > + return PTR_ERR(parent); > + else if (!parent) > parent = x86_vector_domain; > else > name = "IO-APIC-IR"; > --- a/arch/x86/kernel/apic/msi.c > +++ b/arch/x86/kernel/apic/msi.c > @@ -415,9 +415,9 @@ static struct msi_domain_info hpet_msi_d > struct irq_domain *hpet_create_irq_domain(int hpet_id) > { > struct msi_domain_info *domain_info; > + struct fwnode_handle *fn = NULL; > struct irq_domain *parent, *d; > struct irq_alloc_info info; > - struct fwnode_handle *fn; > > if (x86_vector_domain == NULL) > return NULL; > @@ -432,25 +432,29 @@ struct irq_domain *hpet_create_irq_domai > init_irq_alloc_info(&info, NULL); > info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; > info.devid = hpet_id; > + > parent = irq_remapping_get_irq_domain(&info); > - if (parent == NULL) > + if (IS_ERR(parent)) > + goto fail; > + else if (!parent) > parent = x86_vector_domain; > else > hpet_msi_controller.name = "IR-HPET-MSI"; Hrm... this is the '-ENODEV changes' that you wanted me to pick up, right? I confess I don't like it very much. I'd much prefer to be able to use a generic foo_get_parent_irq_domain() function which just returns an answer or an error. Definitely none of this "if it returns NULL, caller fills it in for themselves with some magic global because we're special". And I don't much like abusing the irq_alloc_info for this either. I didn't like the irq_alloc_info very much in its *original* use cases, for actually allocating IRQs. Abusing it for this is worse. I'm more than happy to start digging into this, but once I do I fear I'm not going to stop until HPET and IOAPIC don't know *anything* about whether they're behind IR or not. And yes, I know that IOAPIC has a different EOI method for IR but AFAICT that's *already* hosed for Hyper-V because it doesn't *really* do remapping and so the RTEs *can* change there to move interrupts around. There's more differences between ioapic_ack_level() and ioapic_ir_ack_level() than the erratum workaround and whether we use data->entry.vector... aren't there? For the specific case you were targeting with this patch, you already successfully persuaded me it should never happen, and there's a WARN_ON which would trigger if it ever does (with too many CPUs in the system). Let's come back and tackle this can of worms in a separate cleanup series. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On Thu, 2020-10-08 at 11:34 +0200, Thomas Gleixner wrote: > The overall conclusion for this is: > > 1) X2APIC support on bare metal w/o irq remapping is not going to > happen unless you: > > - added support in multi-queue devices which utilize managed > interrupts > > - audited the whole tree for other assumptions related to the > reachability of possible CPUs. > > I'm not expecting you to be done with that before I retire so for > me it's just not going to happen :) Makes sense. It probably does mean we should a BUG_ON for the case where IRQ remapping *is* enabled but any device is found which isn't behind it. But that's OK. > 2) X2APIC support on VIRT is possible if the extended ID magic is > supported by the hypervisor because that does not make any CPU > unreachable for MSI and therefore the multi-queue muck and > everything else just works. > > This requires to have either the domain affinity limitation for HPET > in place or just to force disable HPET or at least HPET-MSI which is > a reasonable tradeoff. > > HPET is not required for guests which have kvmclock and > APIC/deadline timer and known (hypervisor provided) frequencies. HPET-MSI should work fine. Like the IOAPIC, it's just a child of the *actual* MSI domain. The address/data in the MSI message are completely opaque to it, and if the parent domain happens to put meaningful information into bits 11-5 of the MSI address, the HPET won't even notice. The HPET's Tn_FSB_INT_ADDR register does have a full 32 bits of the MSI address; it's not doing bit-swizzling like the IOAPIC does, which might potentially *not* have been able to set certain bits in the MSI. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On Wed, 2020-10-07 at 17:57 +0200, Thomas Gleixner wrote: > TLDR & HTF; > > Multiqueue devices want to have at max 1 queue per CPU or if the device > has less queues than CPUs they want the queues to have a fixed > association to a set of CPUs. > > At setup time this is established considering possible CPUs to handle > 'physical' hotplug correctly. > > If a queue has no online CPUs it cannot be started. If it's active and > the last CPU goes down then it's quiesced and stopped and the core code > shuts down the interrupt and does not move it to a still online CPU. > > So with your hackery, we end up in a situation where we have a large > possible mask, but not all CPUs in that mask can be reached, which means > in a 1 queue per CPU scenario all unreachable CPUs would have > disfunctional queues. > > So that spreading algorithm needs to know about this limitation. OK, thanks. So the queue exists, with an MSI assigned to point to an offline CPU(s), but it cannot actually be used until/unless at least one CPU in its mask comes online. So when I said I wanted to try treating "reachable" the same way as "online", that would mean the queue can't start until/unless at least one *reachable* CPU in its mask comes online. The underlying problem here is that until a CPU comes online, we don't actually *know* if it's reachable or not. So if we want carefully create the affinity masks at setup time so that they don't include any unreachable CPUs... that basically means we don't include any non-present CPUs at all (unless they've been added once and then removed). That's because — at least on x86 — we don't assign CPU numbers to CPUs which haven't yet been added. Theoretically we *could* but if there are more than NR_CPUS listed in (e.g.) the MADT then we could run out of CPU numbers. Then if the admin hotplugs one of the CPUs we *didn't* have space for, we'd not be able to handle it. I suppose there might be options like pre-assigning CPU numbers only to non-present APIC IDs below 256 (or 32768). Or *grouping* the CPU numbers, so some not-yet-assigned CPU#s are for low APICIDs and some are for high APICIDs but it doesn't actually have to be a 1:1 predetermined mapping. But those really do seem like hacks which might only apply on x86, while the generic approach of treating "reachable" like "online" seems like it would work in other cases too. Fundamentally, there are three sets of CPUs. There are those known to be reachable, those known not to be, and those which are not yet known. So another approach we could use is to work with a cpumask of those *known* not to be reachable, and to filter those *out* of the prebuilt affinities. That gives us basically the right behaviour without hotplug, but does include absent CPUs in a mask that *if* they are ever added, wouldn't be able to receive the IRQ. Which does mean we'd have to refrain from bringing up the corresponding queue. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On Wed, 2020-10-07 at 19:23 +0200, Thomas Gleixner wrote: > > It so happens that in Linux, we don't really architect the software > > like that. So each of the PCI MSI domain, HPET, and IOAPIC have their > > *own* message composer which has the same limits and composes basically > > the same messages as if it was *their* format, not dictated to them by > > the APIC upstream. And that's what we're both getting our panties in a > > knot about, I think. > > Are you actually reading what I write and caring to look at the code? > > PCI-MSI does not have a compose message callback in the irq chip. The > message is composed by the underlying parent domain. Same for HPET. > > The only dogdy part is the IO/APIC for hysterical raisins and because > I did not come around yet to sort that out. Right. And the problem is that the "underlying parent domain" in this case is actually x86_vector_domain. Whereas it probably makes more sense, at least in theory, for there to be an *intermediate* domain responsible for composing the Compat MSI messages. Both the Compat-MSI and the IR-MSI domains would be children of the generic x86_vector_domain, and then any given HPET, IOAPIC or PCI-MSI domain would be a child of either one of those generic MSI domains. > > It really doesn't matter that much to the underlying generic irqdomain > > support for limited affinities. Except that you want to make the > > generic code support the concept of a child domain supporting *more* > > CPUs than its parent, which really doesn't make much sense if you think > > about it. > > Right. So we really want to stick the restriction into a compat-MSI > domain to make stuff match reality and to avoid banging the head against > the wall sooner than later. Right. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On 7 October 2020 17:02:59 BST, Thomas Gleixner wrote: >On Wed, Oct 07 2020 at 15:23, David Woodhouse wrote: >> On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote: >>> On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote: >>> > On 7 October 2020 13:59:00 BST, Thomas Gleixner > wrote: >>> > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote: >>> > > > To fix *that* case, we really do need the whole series giving >us per- >>> > > > domain restricted affinity, and to use it for those >MSIs/IOAPICs that >>> > > > the IRQ remapping doesn't cover. >>> > > >>> > > Which do not exist today. >>> > >>> > Sure. But at patch 10/13 into this particular patch series, it >*does* >>> > exist. >>> >>> As I told you before: Your ordering is wrong. We do not introduce >bugs >>> first and then fix them later >> >> I didn't introduce that bug; it's been there for years. Fixing it >> properly requires per-irqdomain affinity limits. >> >> There's a cute little TODO at least in the Intel irq-remapping >driver, >> noting that we should probably check if there are any IOAPICs that >> aren't in the scope of any DRHD at all. But that's all. > >So someone forgot to remove the cute little TODO when this was added: > > if (parse_ioapics_under_ir()) { >pr_info("Not enabling interrupt remapping\n"); >goto error; >} And HPET, and PCI devices including those that might be hotplugged in future and not be covered by any extant IOMMU's scope? -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On 7 October 2020 16:57:36 BST, Thomas Gleixner wrote: >On Wed, Oct 07 2020 at 15:10, David Woodhouse wrote: >> On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote: >>> What is preventing you to change the function signature? But handing >>> down irqdomain here is not cutting it. The right thing to do is to >>> replace 'struct irq_affinity_desc *affinity' with something more >>> flexible. >> >> Yeah, although I do think I want to ditch this part completely, and >> treat the "possible" mask for the domain very much more like we do >> cpu_online_mask. In that we can allow affinities to be *requested* >> which are outside it, and they can just never be *effective* while >> those CPUs aren't present and reachable. > >Huch? > >> That way a lot of the nastiness about preparing an "acceptable" mask >in >> advance can go away. > >There is not lot's of nastiness. OK, but I think we do have to cope with the fact that the limit is dynamic, and a CPU might be added which widens the mask. I think that's fundamental and not x86-specific. >> It's *also* driven, as I noted, by the realisation that on x86, the >> x86_non_ir_cpumask will only ever contain those CPUs which have been >> brought online so far and meet the criteria... but won't that be true >> on *any* system where CPU numbers are virtual and not 1:1 mapped with >> whatever determines reachability by the IRQ domain? It isn't *such* >an >> x86ism, surely? > >Yes, but that's exactly the reason why I don't want to have whatever >mask name you chose to be directly exposed and want it to be part of >the >irq domains because then you can express arbitrary per domain limits. The x86_non_ir_mask isn't intended to be directly exposed to any generic IRQ code. It's set up by the x86 APIC code so that those x86 IRQ domains which are affected by it, can set it with irqdomain_set_max_affinity() for the generic code to see. Without each having to create the cpumask from scratch for themselves. > ... reading for once the kids are elsewhere... Thanks. >It's not rocket science to fix that as the irq remapping code already >differentiates between the device types. I don't actually like that very much. The IRQ remapping code could just compose the MSI messages that it desires without really having to care so much about the child device. The bit where IRQ remapping actually composes IOAPIC RTEs makes me particularly sad. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On Wed, 2020-10-07 at 17:25 +0200, Thomas Gleixner wrote: > It's clearly how the hardware works. MSI has a message store of some > sorts and if the entry is enabled then the MSI chip (in PCI or > elsewhere) will send exactly the message which is in that message > store. It knows absolutely nothing about what the message means and how > it is composed. The only things which MSI knows about is whether the > message address is 64bit wide or not and whether the entries are > maskable or not and how many entries it can store. > > Software allocates a message target at the underlying irq domain (vector > or remap) and that underlying irq domain defines the properties. > > If qemu emulates it differently then it's qemu's problem, but that does > not make it in anyway something which influences the irq domain > abstractions which are correctly modeled after how the hardware works. > > > Not really the important part to deal with right now, either way. > > Oh yes it is. We define that upfront and not after the fact. The way the hardware works is that something handles physical address cycles to addresses in the (on x86) 0xFEEx range, and turns them into actual interrupts on the appropriate CPU — where the APIC ID and vector (etc.) are directly encoded in the bits of the address and the data written. That compatibility x86 APIC MSI format is where the 8-bit (or 15-bit) limit comes from. Then interrupt remapping comes along, and now those physical address cycles are actually handled by the IOMMU — which can either handle the compatibility format as before, or use a different format of address/data bits and perform a lookup in its IRTE table. The PCI MSI domain, HPET, and even the IOAPIC are just the things out there on the bus which might perform those physical address cycles. And yes, as you say they're just a message store sending exactly the message that was composed for them. They know absolutely nothing about what the message means and how it is composed. It so happens that in Linux, we don't really architect the software like that. So each of the PCI MSI domain, HPET, and IOAPIC have their *own* message composer which has the same limits and composes basically the same messages as if it was *their* format, not dictated to them by the APIC upstream. And that's what we're both getting our panties in a knot about, I think. It really doesn't matter that much to the underlying generic irqdomain support for limited affinities. Except that you want to make the generic code support the concept of a child domain supporting *more* CPUs than its parent, which really doesn't make much sense if you think about it. But it isn't that hard to do, and if it means we don't have to argue any more about the x86 hierarchy not matching the hardware then it's a price worth paying. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote: > > > The information has to property of the relevant irq domains and the > > > hierarchy allows you nicely to retrieve it from there instead of > > > sprinkling this all over the place. > > > > No. This is not a property of the parent domain per se. Especially if > > you're thinking that we could inherit the affinity mask from the > > parent, then twice no. > > > > This is a property of the MSI domain itself, and how many bits of > > destination ID the hardware at *this* level can interpret and pass on > > to the parent domain. > > Errm what? > > The MSI domain does not know anything about what the underlying domain > can handle and it shouldn't. > > If MSI is on top of remapping then the remapping domain defines what the > MSI domain can do and not the other way round. Ditto for the non > remapped case in which the vector domain decides. > > The top most MSI irq chip does not even have a compose function, neither > for the remap nor for the vector case. The composition is done by the > parent domain from the data which the parent domain constructed. Same > for the IO/APIC just less clearly separated. > > The top most chip just takes what the underlying domain constructed and > writes it to the message store, because that's what the top most chip > controls. It does not control the content. Sure, whatever. The way we arrange the IRQ domain hierarchy in x86 Linux doesn't really match my understanding of the real hardware, or how qemu emulates it either. But it is at least internally self- consistent, and in this model it probably does make some sense to claim the 8-bit limit on x86_vector_domain itself, and *remove* that limit in the remapping irqdomain. Not really the important part to deal with right now, either way. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote: > On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote: > > On 7 October 2020 13:59:00 BST, Thomas Gleixner wrote: > > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote: > > > > To fix *that* case, we really do need the whole series giving us per- > > > > domain restricted affinity, and to use it for those MSIs/IOAPICs that > > > > the IRQ remapping doesn't cover. > > > > > > Which do not exist today. > > > > Sure. But at patch 10/13 into this particular patch series, it *does* > > exist. > > As I told you before: Your ordering is wrong. We do not introduce bugs > first and then fix them later I didn't introduce that bug; it's been there for years. Fixing it properly requires per-irqdomain affinity limits. There's a cute little TODO at least in the Intel irq-remapping driver, noting that we should probably check if there are any IOAPICs that aren't in the scope of any DRHD at all. But that's all. If it happens, I think we'll end up doing the right thing and instantiating a non-IR IOAPIC domain for it, and if we *don't* have any CPU with an APIC ID above... (checks notes)... 7 ... then it'll just about work out. Over 7 and we're screwed (because logical mode; see also https://git.infradead.org/users/dwmw2/linux.git/commit/429f0c33f for a straw man but that's getting even further down the rabbit hole) smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote: > On Wed, Oct 07 2020 at 08:19, David Woodhouse wrote: > > On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote: > > > On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote: > > > > From: David Woodhouse > > > > > > > > This is the maximum possible set of CPUs which can be used. Use it > > > > to calculate the default affinity requested from __irq_alloc_descs() > > > > by first attempting to find the intersection with irq_default_affinity, > > > > or falling back to using just the max_affinity if the intersection > > > > would be empty. > > > > > > And why do we need that as yet another argument? > > > > > > This is an optional property of the irq domain, really and no caller has > > > any business with that. > > > > Because irq_domain_alloc_descs() doesn't actually *take* the domain as > > an argument. It's more of an internal function, which is only non- > > static because it's used from kernel/irq/ipi.c too for some reason. If > > we convert the IPI code to just call __irq_alloc_descs() directly, > > perhaps that we can actually make irq_domain_alloc_decs() static. > > What is preventing you to change the function signature? But handing > down irqdomain here is not cutting it. The right thing to do is to > replace 'struct irq_affinity_desc *affinity' with something more > flexible. Yeah, although I do think I want to ditch this part completely, and treat the "possible" mask for the domain very much more like we do cpu_online_mask. In that we can allow affinities to be *requested* which are outside it, and they can just never be *effective* while those CPUs aren't present and reachable. That way a lot of the nastiness about preparing an "acceptable" mask in advance can go away. It's *also* driven, as I noted, by the realisation that on x86, the x86_non_ir_cpumask will only ever contain those CPUs which have been brought online so far and meet the criteria... but won't that be true on *any* system where CPU numbers are virtual and not 1:1 mapped with whatever determines reachability by the IRQ domain? It isn't *such* an x86ism, surely? > Fact is, that if there are CPUs which cannot be targeted by device > interrupts then the multiqueue affinity mechanism has to be fixed to > handle this. Right now it's just broken. I think part of the problem there is that I don't really understand how this part is *supposed* to work. I was focusing on getting the simple case working first, in the expectation that we'd come back to that part ansd you'd keep me honest. Is there some decent documentation on this that I'm missing? > It's pretty obvious that the irq domains are the right place to store > that information: > > const struct cpumask *irqdomain_get_possible_affinity(struct irq_domain *d) > { > while (d) { > if (d->get_possible_affinity) > return d->get_possible_affinity(d); > d = d->parent; > } > return cpu_possible_mask; > } Sure. > So if you look at X86 then you have either: > >[VECTOR] - [IO/APIC] > |-- [MSI] > |-- [WHATEVER] > > or > >[VECTOR] ---[REMAP]--- [IO/APIC] > ||-- [MSI] > |[WHATEVER] Hierarchically, theoretically the IOAPIC and HPET really ought to be children of the MSI domain. It's the Compatibility MSI which has the restriction on destination ID, because of how many bits it interprets from the MSI address. HPET and IOAPIC are just generating MSIs that hit that upstream limit. > So if REMAP allows cpu_possible_mask and VECTOR some restricted subset > then irqdomain_get_possible_affinity() will return the correct result > independent whether remapping is enabled or not. Sure. Although VECTOR doesn't have the restriction. As noted, it's the Compatibility MSI that does. So the diagram looks something like this: [ VECTOR ] [ REMAP ] [ IR-MSI ] [ IR-HPET ] | |---[ IR-PCI-MSI ] | |---[ IR-IOAPIC ] | |--[ COMPAT-MSI ] [ HPET ] |-- [ PCI-MSI ] |-- [ IOAPIC ] In this diagram, it's COMPAT-MSI that has the affinity restriction, inherited by its children. Now, I understand that you're not keen on IOAPIC actually being a child of the MSI domain, and that's fine. In Linux right now, those generic 'IR-MSI
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On 7 October 2020 13:59:00 BST, Thomas Gleixner wrote: >On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote: >> On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote: >>> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote: >> This is the case I called out in the cover letter: >> >> This patch series implements a per-domain "maximum affinity" set >and >> uses it for the non-remapped IOAPIC and MSI domains on x86. As >well as >> allowing more CPUs to be used without interrupt remapping, this >also >> fixes the case where some IOAPICs or PCI devices aren't actually >in >> scope of any active IOMMU and are operating without remapping. >> >> To fix *that* case, we really do need the whole series giving us per- >> domain restricted affinity, and to use it for those MSIs/IOAPICs that >> the IRQ remapping doesn't cover. > >Which do not exist today. Sure. But at patch 10/13 into this particular patch series, it *does* exist. (Ignoring, for the moment, that I'm about to rewrite half the preceding patches to take a different approach) >>> > ip->irqdomain->parent = parent; >>> > + if (parent == x86_vector_domain) >>> > + irq_domain_set_affinity(ip->irqdomain, &x86_non_ir_cpumask); >>> >>> OMG >> >> This IOAPIC function may or may not be behind remapping. > >>> > d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK; >>> > + irq_domain_set_affinity(d, &x86_non_ir_cpumask); >>> >>> So here it's unconditional >> >> Yes, this code is only for the non-remapped case and there's a >separate >> arch_create_remap_msi_irq_domain() function a few lines further down >> which creates the IR-PCI-MSI IRQ domain. So no need for a condition >> here. >> >>> > + if (parent == x86_vector_domain) >>> > + irq_domain_set_affinity(d, &x86_non_ir_cpumask); >>> >>> And here we need a condition again. Completely obvious and >reviewable - NOT. >> >> And HPET may or may not be behind remapping so again needs the >> condition. I had figured that part was fairly obvious but can note it >> in the commit comment. > >And all of this is completely wrong to begin with. > >The information has to property of the relevant irq domains and the >hierarchy allows you nicely to retrieve it from there instead of >sprinkling this all over the place. No. This is not a property of the parent domain per se. Especially if you're thinking that we could inherit the affinity mask from the parent, then twice no. This is a property of the MSI domain itself, and how many bits of destination ID the hardware at *this* level can interpret and pass on to the parent domain. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
On Wed, 2020-10-07 at 13:15 +0200, Paolo Bonzini wrote: > On 07/10/20 10:59, David Woodhouse wrote: > > Yeah, I was expecting the per-irqdomain affinity support to take a few > > iterations. But this part, still sticking with the current behaviour of > > only allowing CPUs to come online at all if they can be reached by all > > interrupts, can probably go in first. > > > > It's presumably (hopefully!) a blocker for the qemu patch which exposes > > the same feature bit defined in this patch. > > Yeah, though we could split it further and get the documentation part in > first. That would let the QEMU part go through. Potentially. Although I've worked out that the first patch in my series, adding x2apic_set_max_apicid(), is actually a bug fix because it fixes the behaviour if you only *hotplug* CPUs with APIC IDs > 255 and there were none of them present at boot time. So I'll post this set on its own to start with, and then focus on the per-irqdomain affinity support after that. David Woodhouse (5): x86/apic: Fix x2apic enablement without interrupt remapping x86/msi: Only use high bits of MSI address for DMAR unit x86/ioapic: Handle Extended Destination ID field in RTE x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
On Wed, 2020-10-07 at 10:14 +0200, Paolo Bonzini wrote: > Looks like the rest of the series needs some more work, but anyway: > > Acked-by: Paolo Bonzini Thanks. Yeah, I was expecting the per-irqdomain affinity support to take a few iterations. But this part, still sticking with the current behaviour of only allowing CPUs to come online at all if they can be reached by all interrupts, can probably go in first. It's presumably (hopefully!) a blocker for the qemu patch which exposes the same feature bit defined in this patch. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote: > On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote: > > > From: David Woodhouse > > > > When interrupt remapping isn't enabled, only the first 255 CPUs can > > No, only CPUs with an APICid < 255 Ack. > > receive external interrupts. Set the appropriate max affinity for > > the IOAPIC and MSI IRQ domains accordingly. > > > > This also fixes the case where interrupt remapping is enabled but some > > devices are not within the scope of any active IOMMU. > > What? If this fixes an pre-existing problem then > > 1) Explain the problem proper > 2) Have a patch at the beginning of the series which fixes it > independently of this pile > > If it's fixing a problem in your pile, then you got the ordering wrong. It's not that simple; there's not a single patch which fixes that and which can go first. I can, and do, fix the "no IR" case in a simple patch that goes first, simply by restricting the kernel to the APIC IDs which can be targeted. This is the case I called out in the cover letter: This patch series implements a per-domain "maximum affinity" set and uses it for the non-remapped IOAPIC and MSI domains on x86. As well as allowing more CPUs to be used without interrupt remapping, this also fixes the case where some IOAPICs or PCI devices aren't actually in scope of any active IOMMU and are operating without remapping. To fix *that* case, we really do need the whole series giving us per- domain restricted affinity, and to use it for those MSIs/IOAPICs that the IRQ remapping doesn't cover. > You didn't start kernel programming as of yesterday, so you really know > how that works. > > > ip->irqdomain->parent = parent; > > + if (parent == x86_vector_domain) > > + irq_domain_set_affinity(ip->irqdomain, &x86_non_ir_cpumask); > > OMG This IOAPIC function may or may not be behind remapping. > > > if (cfg->type == IOAPIC_DOMAIN_LEGACY || > > cfg->type == IOAPIC_DOMAIN_STRICT) > > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c > > index 4d891967bea4..af5ce5c4da02 100644 > > --- a/arch/x86/kernel/apic/msi.c > > +++ b/arch/x86/kernel/apic/msi.c > > @@ -259,6 +259,7 @@ struct irq_domain * __init > > native_create_pci_msi_domain(void) > > pr_warn("Failed to initialize PCI-MSI irqdomain.\n"); > > } else { > > d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK; > > + irq_domain_set_affinity(d, &x86_non_ir_cpumask); > > So here it's unconditional Yes, this code is only for the non-remapped case and there's a separate arch_create_remap_msi_irq_domain() function a few lines further down which creates the IR-PCI-MSI IRQ domain. So no need for a condition here. > > } > > return d; > > } > > @@ -479,6 +480,8 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id) > > irq_domain_free_fwnode(fn); > > kfree(domain_info); > > } > > + if (parent == x86_vector_domain) > > + irq_domain_set_affinity(d, &x86_non_ir_cpumask); > > And here we need a condition again. Completely obvious and reviewable - NOT. And HPET may or may not be behind remapping so again needs the condition. I had figured that part was fairly obvious but can note it in the commit comment. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask
On Tue, 2020-10-06 at 23:42 +0200, Thomas Gleixner wrote: > On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote: > > From: David Woodhouse > > > > This is the mask of CPUs to which IRQs can be delivered without > > interrupt > > remapping. > > > > +/* Mask of CPUs which can be targeted by non-remapped interrupts. > > */ > > +cpumask_t x86_non_ir_cpumask = { CPU_BITS_ALL }; > > What? By default, if we didn't hit any limits, all CPUs can be targeted by external interrupts. It's the default today. Or at least we pretend it is, modulo the bugs :) > > #ifdef CONFIG_X86_32 > > > > /* > > @@ -1838,6 +1841,7 @@ static __init void x2apic_enable(void) > > static __init void try_to_enable_x2apic(int remap_mode) > > { > > u32 apic_limit = 0; > > + int i; > > > > if (x2apic_state == X2APIC_DISABLED) > > return; > > @@ -1880,6 +1884,14 @@ static __init void try_to_enable_x2apic(int > > remap_mode) > > if (apic_limit) > > x2apic_set_max_apicid(apic_limit); > > > > + /* Build the affinity mask for interrupts that can't be remapped. */ > > + cpumask_clear(&x86_non_ir_cpumask); > > + i = min_t(unsigned int, num_possible_cpus() - 1, apic_limit); > > + for ( ; i >= 0; i--) { > > + if (cpu_physical_id(i) <= apic_limit) > > + cpumask_set_cpu(i, &x86_non_ir_cpumask); > > + } > > Blink. If the APIC id is not linear with the cpu numbers then this > results in a reduced addressable set of CPUs. WHY? Hm, good question. That loop was cargo-culted from hyperv-iommu.c; perhaps it makes more sense there because Hyper-V really does promise that linearity, or perhaps it was already buggy. Will fix. In fact, in apic.c I could probably just use the cpuid_to_apicid array which is right there in the file. > > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c > > index aa9a3b54a96c..4d0ef46fedb9 100644 > > --- a/arch/x86/kernel/apic/io_apic.c > > +++ b/arch/x86/kernel/apic/io_apic.c > > @@ -2098,6 +2098,8 @@ static int mp_alloc_timer_irq(int ioapic, int pin) > > struct irq_alloc_info info; > > > > ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 0, 0); > > + if (domain->parent == x86_vector_domain) > > + info.mask = &x86_non_ir_cpumask; > > We are not going to sprinkle such domain checks all over the > place. Again, the mask is a property of the interrupt domain. Yeah, that's a hangover from the first attempts which I forgot to delete. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 08/13] genirq: Add irq_domain_set_affinity()
On Tue, 2020-10-06 at 23:32 +0200, Thomas Gleixner wrote: > What the heck? Why does this need a setter function which is exported? > So that random driver writers can fiddle with it? > > The affinity mask restriction of an irq domain is already known when the > domain is created. It's exported because __irq_domain_add is exported. I didn't want to just add a rarely-used extra argument to __irq_domain_add and the other public APIs which call into it, and figured a separate setter function was the simplest option. I can rework to add the argument to __irq_domain_add if you prefer. Or we can declare that restricted affinity *isn't* something that IRQ domains provided by modules can have, and just not export the setter function? smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote: > On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote: > > From: David Woodhouse > > > > This is the maximum possible set of CPUs which can be used. Use it > > to calculate the default affinity requested from __irq_alloc_descs() > > by first attempting to find the intersection with irq_default_affinity, > > or falling back to using just the max_affinity if the intersection > > would be empty. > > And why do we need that as yet another argument? > > This is an optional property of the irq domain, really and no caller has > any business with that. Because irq_domain_alloc_descs() doesn't actually *take* the domain as an argument. It's more of an internal function, which is only non- static because it's used from kernel/irq/ipi.c too for some reason. If we convert the IPI code to just call __irq_alloc_descs() directly, perhaps that we can actually make irq_domain_alloc_decs() static. > > int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t > > hwirq, > > - int node, const struct irq_affinity_desc *affinity) > > + int node, const struct irq_affinity_desc *affinity, > > + const struct cpumask *max_affinity) > > { > > + cpumask_var_t default_affinity; > > unsigned int hint; > > + int i; > > + > > + /* Check requested per-IRQ affinities are in the possible range */ > > + if (affinity && max_affinity) { > > + for (i = 0; i < cnt; i++) > > + if (!cpumask_subset(&affinity[i].mask, max_affinity)) > > + return -EINVAL; > > https://lore.kernel.org/r/alpine.DEB.2.20.1701171956290.3645@nanos > > What is preventing the affinity spreading code from spreading the masks > out to unusable CPUs? The changelog is silent about that part. I'm coming to the conclusion that we should allow unusable CPUs to be specified at this point, just as we do offline CPUs. That's largely driven by the realisation that our x86_non_ir_cpumask is only going to contain online CPUs anyway, and hotplugged CPUs only get added to it as they are brought online. > > + /* > > +* Generate default affinity. Either the possible subset of > > +* irq_default_affinity if such a subset is non-empty, or fall > > +* back to the provided max_affinity if there is no intersection. > > .. > > +* And just a copy of irq_default_affinity in the > > +* !CONFIG_CPUMASK_OFFSTACK case. > > We know that already... > > > +*/ > > + memset(&default_affinity, 0, sizeof(default_affinity)); > > Right, memset() before allocating is useful. The memset is because there's no cpumask_var_t initialiser that I can see. So cpumask_available() on the uninitialised 'default_affinity' variable might be true even in the OFFSTACK case. > > + if ((max_affinity && > > +!cpumask_subset(irq_default_affinity, max_affinity))) { > > + if (!alloc_cpumask_var(&default_affinity, GFP_KERNEL)) > > + return -ENOMEM; > > + cpumask_and(default_affinity, max_affinity, > > + irq_default_affinity); > > + if (cpumask_empty(default_affinity)) > > + cpumask_copy(default_affinity, max_affinity); > > + } else if (cpumask_available(default_affinity)) > > + cpumask_copy(default_affinity, irq_default_affinity); > > That's garbage and unreadable. That's why there was a comment explaining it... at which point you claimed to already know :) Clearly the comment didn't do the job it was supposed to do. I'll rework. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs()
On 6 October 2020 22:01:18 BST, Thomas Gleixner wrote: >On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote: >> >> #else /* CONFIG_SMP */ >> >> +#define irq_default_affinity (NULL) > >... > >> static int alloc_descs(unsigned int start, unsigned int cnt, int >node, >> const struct irq_affinity_desc *affinity, >> + const struct cpumask *default_affinity, >> struct module *owner) >> { >> struct irq_desc *desc; >> int i; >> >> /* Validate affinity mask(s) */ >> +if (!default_affinity || cpumask_empty(default_affinity)) >> +return -EINVAL; > >How is that supposed to work on UP? Hm, good point. >Aside of that I really hate to have yet another argument just >because. Yeah, I was trying to avoid having to allocate a whole array of irq_affinity_desc just to fill them all in with the same default. But perhaps I need to treat the "affinity_max" like we do cpu_online_mask, and allow affinity to be set even to offline/unreachable CPUs at this point. Then we can be more relaxed about the default affinities. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs()
From: David Woodhouse Instead of plugging in irq_default_affinity at the lowest level in desc_smp_init() if the caller didn't specify an affinity for this specific IRQ in its array, allow the default affinity to be passed down from __irq_alloc_descs() instead. This is in preparation for irq domains having their own default (and indeed maximum) affinities. An alternative possibility would have been to make the caller allocate an entire array of struct irq_affinity_desc for the allocation, but that's a lot nastier than simply passing in an additional const cpumask. This commit has no visible API change outside static functions in irqdesc.c for now; the actual change will come later. Signed-off-by: David Woodhouse --- include/linux/interrupt.h | 2 ++ kernel/irq/irqdesc.c | 21 + 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index f9aee3538461..cd0ff293486a 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -364,6 +364,8 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec, #else /* CONFIG_SMP */ +#define irq_default_affinity (NULL) + static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m) { return -EINVAL; diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 1a7723604399..4ac91b9fc618 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -81,8 +81,6 @@ static int alloc_masks(struct irq_desc *desc, int node) static void desc_smp_init(struct irq_desc *desc, int node, const struct cpumask *affinity) { - if (!affinity) - affinity = irq_default_affinity; cpumask_copy(desc->irq_common_data.affinity, affinity); #ifdef CONFIG_GENERIC_PENDING_IRQ @@ -465,12 +463,16 @@ static void free_desc(unsigned int irq) static int alloc_descs(unsigned int start, unsigned int cnt, int node, const struct irq_affinity_desc *affinity, + const struct cpumask *default_affinity, struct module *owner) { struct irq_desc *desc; int i; /* Validate affinity mask(s) */ + if (!default_affinity || cpumask_empty(default_affinity)) + return -EINVAL; + if (affinity) { for (i = 0; i < cnt; i++) { if (cpumask_empty(&affinity[i].mask)) @@ -479,7 +481,7 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node, } for (i = 0; i < cnt; i++) { - const struct cpumask *mask = NULL; + const struct cpumask *mask = default_affinity; unsigned int flags = 0; if (affinity) { @@ -488,10 +490,10 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node, IRQD_MANAGED_SHUTDOWN; } mask = &affinity->mask; - node = cpu_to_node(cpumask_first(mask)); affinity++; } + node = cpu_to_node(cpumask_first(mask)); desc = alloc_desc(start + i, node, flags, mask, owner); if (!desc) goto err; @@ -538,7 +540,7 @@ int __init early_irq_init(void) nr_irqs = initcnt; for (i = 0; i < initcnt; i++) { - desc = alloc_desc(i, node, 0, NULL, NULL); + desc = alloc_desc(i, node, 0, irq_default_affinity, NULL); set_bit(i, allocated_irqs); irq_insert_desc(i, desc); } @@ -573,7 +575,8 @@ int __init early_irq_init(void) raw_spin_lock_init(&desc[i].lock); lockdep_set_class(&desc[i].lock, &irq_desc_lock_class); mutex_init(&desc[i].request_mutex); - desc_set_defaults(i, &desc[i], node, NULL, NULL); + desc_set_defaults(i, &desc[i], node, irq_default_affinity, + NULL); } return arch_early_irq_init(); } @@ -590,12 +593,14 @@ static void free_desc(unsigned int irq) unsigned long flags; raw_spin_lock_irqsave(&desc->lock, flags); - desc_set_defaults(irq, desc, irq_desc_get_node(desc), NULL, NULL); + desc_set_defaults(irq, desc, irq_desc_get_node(desc), + irq_default_affinity, NULL); raw_spin_unlock_irqrestore(&desc->lock, flags); } static inline int alloc_descs(unsigned int start, unsigned int cnt, int node, const struct irq_affinity_desc *affinity, + const struct cpumask *default_affinity, struct module *owner) { u32 i; @@ -803,7 +808,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned in
[PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
From: David Woodhouse Some hypervisors can allow the guest to use the Extended Destination ID field in the IOAPIC RTE and MSI address to address up to 32768 CPUs. Signed-off-by: David Woodhouse --- arch/x86/include/asm/mpspec.h | 1 + arch/x86/include/asm/x86_init.h | 2 ++ arch/x86/kernel/apic/apic.c | 15 ++- arch/x86/kernel/apic/msi.c | 9 - arch/x86/kernel/x86_init.c | 1 + 5 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h index e90ac7e9ae2c..25ee8ca0a1f2 100644 --- a/arch/x86/include/asm/mpspec.h +++ b/arch/x86/include/asm/mpspec.h @@ -42,6 +42,7 @@ extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES); extern unsigned int boot_cpu_physical_apicid; extern u8 boot_cpu_apic_version; extern unsigned long mp_lapic_addr; +extern int msi_ext_dest_id; #ifdef CONFIG_X86_LOCAL_APIC extern int smp_found_config; diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 397196fae24d..5af3fe9e38f3 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -114,6 +114,7 @@ struct x86_init_pci { * @init_platform: platform setup * @guest_late_init: guest late init * @x2apic_available: X2APIC detection + * @msi_ext_dest_id: MSI and IOAPIC support 15-bit APIC IDs * @init_mem_mapping: setup early mappings during init_mem_mapping() * @init_after_bootmem:guest init after boot allocator is finished */ @@ -121,6 +122,7 @@ struct x86_hyper_init { void (*init_platform)(void); void (*guest_late_init)(void); bool (*x2apic_available)(void); + bool (*msi_ext_dest_id)(void); void (*init_mem_mapping)(void); void (*init_after_bootmem)(void); }; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index a75767052a92..459c78558f36 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1837,6 +1837,8 @@ static __init void x2apic_enable(void) static __init void try_to_enable_x2apic(int remap_mode) { + u32 apic_limit = 0; + if (x2apic_state == X2APIC_DISABLED) return; @@ -1858,7 +1860,15 @@ static __init void try_to_enable_x2apic(int remap_mode) return; } - x2apic_set_max_apicid(255); + /* +* If the hypervisor supports extended destination ID +* in IOAPIC and MSI, we can support that many CPUs. +*/ + if (x86_init.hyper.msi_ext_dest_id()) { + msi_ext_dest_id = 1; + apic_limit = 32767; + } else + apic_limit = 255; } /* @@ -1867,6 +1877,9 @@ static __init void try_to_enable_x2apic(int remap_mode) */ x2apic_phys = 1; } + if (apic_limit) + x2apic_set_max_apicid(apic_limit); + x2apic_enable(); } diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 356f8acf4927..4d891967bea4 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -23,6 +23,8 @@ struct irq_domain *x86_pci_msi_default_domain __ro_after_init; +int msi_ext_dest_id __ro_after_init; + static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, int dmar) { msg->address_hi = MSI_ADDR_BASE_HI; @@ -45,10 +47,15 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, int * Only the IOMMU itself can use the trick of putting destination * APIC ID into the high bits of the address. Anything else would * just be writing to memory if it tried that, and needs IR to -* address APICs above 255. +* address APICs which can't be addressed in the normal 32-bit +* address range at 0xFFEx. That is typically just 8 bits, but +* some hypervisors allow the extended destination ID field in bits +* 11-5 to be used, giving support for 15 bits of APIC IDs in total. */ if (dmar) msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else if (msi_ext_dest_id && cfg->dest_apicid < 0x8000) + msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3; else WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); } diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a3038d8deb6a..8b395821cb8d 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = { .init_platform = x86_init_noop,
[PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant
From: David Woodhouse It took me a while to realise that this "IRQ remapping" driver exists not to actually remap interrupts, but to return -EINVAL if anyone ever tries to set the affinity to a set of CPUs which can't be reached *without* remapping. Having fixed the core IRQ domain code to handle such limited, all this hackery can now die. I haven't deleted it entirely because its existence still causes the kernel to use X2APIC in cluster mode not physical. I'm not sure we *want* that, but it wants further thought and testing before ripping it out too. Signed-off-by: David Woodhouse --- drivers/iommu/hyperv-iommu.c | 149 +-- 1 file changed, 1 insertion(+), 148 deletions(-) diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e09e2d734c57..46a794d34f57 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -24,156 +24,12 @@ #include "irq_remapping.h" #ifdef CONFIG_IRQ_REMAP - -/* - * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt - * Redirection Table. Hyper-V exposes one single IO-APIC and so define - * 24 IO APIC remmapping entries. - */ -#define IOAPIC_REMAPPING_ENTRY 24 - -static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE }; -static struct irq_domain *ioapic_ir_domain; - -static int hyperv_ir_set_affinity(struct irq_data *data, - const struct cpumask *mask, bool force) -{ - struct irq_data *parent = data->parent_data; - struct irq_cfg *cfg = irqd_cfg(data); - struct IO_APIC_route_entry *entry; - int ret; - - /* Return error If new irq affinity is out of ioapic_max_cpumask. */ - if (!cpumask_subset(mask, &ioapic_max_cpumask)) - return -EINVAL; - - ret = parent->chip->irq_set_affinity(parent, mask, force); - if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) - return ret; - - entry = data->chip_data; - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; - send_cleanup_vector(cfg); - - return 0; -} - -static struct irq_chip hyperv_ir_chip = { - .name = "HYPERV-IR", - .irq_ack= apic_ack_irq, - .irq_set_affinity = hyperv_ir_set_affinity, -}; - -static int hyperv_irq_remapping_alloc(struct irq_domain *domain, -unsigned int virq, unsigned int nr_irqs, -void *arg) -{ - struct irq_alloc_info *info = arg; - struct irq_data *irq_data; - struct irq_desc *desc; - int ret = 0; - - if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1) - return -EINVAL; - - ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg); - if (ret < 0) - return ret; - - irq_data = irq_domain_get_irq_data(domain, virq); - if (!irq_data) { - irq_domain_free_irqs_common(domain, virq, nr_irqs); - return -EINVAL; - } - - irq_data->chip = &hyperv_ir_chip; - - /* -* If there is interrupt remapping function of IOMMU, setting irq -* affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't -* support interrupt remapping function, setting irq affinity of IO-APIC -* interrupts still needs to change IO-APIC registers. But ioapic_ -* configure_entry() will ignore value of cfg->vector and cfg-> -* dest_apicid when IO-APIC's parent irq domain is not the vector -* domain.(See ioapic_configure_entry()) In order to setting vector -* and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved -* in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_ -* affinity() set vector and dest_apicid directly into IO-APIC entry. -*/ - irq_data->chip_data = info->ioapic.entry; - - /* -* Hypver-V IO APIC irq affinity should be in the scope of -* ioapic_max_cpumask because no irq remapping support. -*/ - desc = irq_data_to_desc(irq_data); - cpumask_copy(desc->irq_common_data.affinity, &ioapic_max_cpumask); - - return 0; -} - -static void hyperv_irq_remapping_free(struct irq_domain *domain, -unsigned int virq, unsigned int nr_irqs) -{ - irq_domain_free_irqs_common(domain, virq, nr_irqs); -} - -static int hyperv_irq_remapping_activate(struct irq_domain *domain, - struct irq_data *irq_data, bool reserve) -{ - struct irq_cfg *cfg = irqd_cfg(irq_data); - struct IO_APIC_route_entry *entry = irq_data->chip_data; - - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; - - return 0; -} - -static const struct irq_domain_ops hyperv_ir_domain_ops = { - .alloc = hyp
[PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID
From: David Woodhouse This allows the host to indicate that IOAPIC and MSI emulation supports 15-bit destination IDs, allowing up to 32Ki CPUs without remapping. Signed-off-by: David Woodhouse --- Documentation/virt/kvm/cpuid.rst | 4 arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kernel/kvm.c| 6 ++ 3 files changed, 11 insertions(+) diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index a7dff9186bed..1726b5925d2b 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit async pf acknowledgment msr 0x4b564d07. +KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit + before using extended destination + ID bits in MSI address bits 11-5. + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 812e9b4c1114..950afebfba88 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -32,6 +32,7 @@ #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 +#define KVM_FEATURE_MSI_EXT_DEST_ID15 #define KVM_HINTS_REALTIME 0 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 1b51b727b140..4986b4399aef 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -743,12 +743,18 @@ static void __init kvm_init_platform(void) x86_platform.apic_post_init = kvm_apic_init; } +static bool __init kvm_msi_ext_dest_id(void) +{ + return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID); +} + const __initconst struct hypervisor_x86 x86_hyper_kvm = { .name = "KVM", .detect = kvm_detect, .type = X86_HYPER_KVM, .init.guest_late_init = kvm_guest_init, .init.x2apic_available = kvm_para_available, + .init.msi_ext_dest_id = kvm_msi_ext_dest_id, .init.init_platform = kvm_init_platform, }; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs.
From: David Woodhouse If the BIOS has enabled x2apic mode, then leave it enabled and don't fall back to xapic just because some CPUs can't be targeted. Previously, if there are CPUs with APIC IDs > 255, the kernel will disable x2apic and fall back to xapic. And then not use the additional CPUs anyway, since xapic can't target them either. In fact, xapic mode can target even *fewer* CPUs, since it can't use the one with APIC ID 255 either. Using x2apic instead gives us at least one more CPU without needing interrupt remapping in the guest. Signed-off-by: David Woodhouse --- arch/x86/include/asm/apic.h| 1 + arch/x86/kernel/apic/apic.c| 18 ++ arch/x86/kernel/apic/x2apic_phys.c | 9 + 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 1c129abb7f09..b0fd204e0023 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void) extern int x2apic_mode; extern int x2apic_phys; +extern void __init x2apic_set_max_apicid(u32 apicid); extern void __init check_x2apic(void); extern void x2apic_setup(void); static inline int x2apic_enabled(void) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b3eef1d5c903..a75767052a92 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1841,14 +1841,24 @@ static __init void try_to_enable_x2apic(int remap_mode) return; if (remap_mode != IRQ_REMAP_X2APIC_MODE) { - /* IR is required if there is APIC ID > 255 even when running -* under KVM + /* +* If there are APIC IDs above 255, they cannot be used +* without interrupt remapping. In a virtual machine, but +* not on bare metal, X2APIC can be used anyway. In the +* case where BIOS has enabled X2APIC, don't disable it +* just because there are APIC IDs that can't be used. +* They couldn't be used if the kernel falls back to XAPIC +* anyway. */ if (max_physical_apicid > 255 || !x86_init.hyper.x2apic_available()) { pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n"); - x2apic_disable(); - return; + if (!x2apic_mode) { + x2apic_disable(); + return; + } + + x2apic_set_max_apicid(255); } /* diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index bc9693841353..b4b4e89c1118 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -8,6 +8,12 @@ int x2apic_phys; static struct apic apic_x2apic_phys; +static u32 x2apic_max_apicid; + +void __init x2apic_set_max_apicid(u32 apicid) +{ + x2apic_max_apicid = apicid; +} static int __init set_x2apic_phys_mode(char *arg) { @@ -98,6 +104,9 @@ static int x2apic_phys_probe(void) /* Common x2apic functions, also used by x2apic_cluster */ int x2apic_apic_id_valid(u32 apicid) { + if (x2apic_max_apicid && apicid > x2apic_max_apicid) + return 0; + return 1; } -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping
From: David Woodhouse Now that external interrupt affinity can be limited to the range of CPUs that can be reached through legacy IOAPIC RTEs and MSI, it is possible to use additional CPUs. Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/apic.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 069f5e9f1d28..750a92464bec 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1881,8 +1881,6 @@ static __init void try_to_enable_x2apic(int remap_mode) */ x2apic_phys = 1; } - if (apic_limit) - x2apic_set_max_apicid(apic_limit); /* Build the affinity mask for interrupts that can't be remapped. */ cpumask_clear(&x86_non_ir_cpumask); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
From: David Woodhouse This is the maximum possible set of CPUs which can be used. Use it to calculate the default affinity requested from __irq_alloc_descs() by first attempting to find the intersection with irq_default_affinity, or falling back to using just the max_affinity if the intersection would be empty. Signed-off-by: David Woodhouse --- include/linux/irqdomain.h | 3 ++- kernel/irq/ipi.c | 2 +- kernel/irq/irqdomain.c| 45 +-- 3 files changed, 42 insertions(+), 8 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 5d9de881..6b5576da77f0 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -278,7 +278,8 @@ extern void irq_set_default_host(struct irq_domain *host); extern struct irq_domain *irq_get_default_host(void); extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs, irq_hw_number_t hwirq, int node, - const struct irq_affinity_desc *affinity); + const struct irq_affinity_desc *affinity, + const struct cpumask *max_affinity); static inline struct fwnode_handle *of_node_to_fwnode(struct device_node *node) { diff --git a/kernel/irq/ipi.c b/kernel/irq/ipi.c index 43e3d1be622c..13f56210eca9 100644 --- a/kernel/irq/ipi.c +++ b/kernel/irq/ipi.c @@ -75,7 +75,7 @@ int irq_reserve_ipi(struct irq_domain *domain, } } - virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE, NULL); + virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE, NULL, dest); if (virq <= 0) { pr_warn("Can't reserve IPI, failed to alloc descs\n"); return -ENOMEM; diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index c93e00ca11d8..ffd41f21afca 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -660,7 +660,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain, } /* Allocate a virtual interrupt number */ - virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL); + virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), + NULL, NULL); if (virq <= 0) { pr_debug("-> virq allocation failed\n"); return 0; @@ -1011,25 +1012,57 @@ int irq_domain_translate_twocell(struct irq_domain *d, EXPORT_SYMBOL_GPL(irq_domain_translate_twocell); int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t hwirq, - int node, const struct irq_affinity_desc *affinity) + int node, const struct irq_affinity_desc *affinity, + const struct cpumask *max_affinity) { + cpumask_var_t default_affinity; unsigned int hint; + int i; + + /* Check requested per-IRQ affinities are in the possible range */ + if (affinity && max_affinity) { + for (i = 0; i < cnt; i++) + if (!cpumask_subset(&affinity[i].mask, max_affinity)) + return -EINVAL; + } + + /* +* Generate default affinity. Either the possible subset of +* irq_default_affinity if such a subset is non-empty, or fall +* back to the provided max_affinity if there is no intersection. +* And just a copy of irq_default_affinity in the +* !CONFIG_CPUMASK_OFFSTACK case. +*/ + memset(&default_affinity, 0, sizeof(default_affinity)); + if ((max_affinity && +!cpumask_subset(irq_default_affinity, max_affinity))) { + if (!alloc_cpumask_var(&default_affinity, GFP_KERNEL)) + return -ENOMEM; + cpumask_and(default_affinity, max_affinity, + irq_default_affinity); + if (cpumask_empty(default_affinity)) + cpumask_copy(default_affinity, max_affinity); + } else if (cpumask_available(default_affinity)) + cpumask_copy(default_affinity, irq_default_affinity); if (virq >= 0) { virq = __irq_alloc_descs(virq, virq, cnt, node, THIS_MODULE, -affinity, NULL); +affinity, default_affinity); } else { hint = hwirq % nr_irqs; if (hint == 0) hint++; virq = __irq_alloc_descs(-1, hint, cnt, node, THIS_MODULE, -affinity, NULL); +affinity, default_affinity); if (virq <= 0 && hint > 1) { virq = __irq_alloc_descs(-1, 1, cnt,
[PATCH 08/13] genirq: Add irq_domain_set_affinity()
From: David Woodhouse This allows a maximal affinity to be set, for IRQ domains which cannot target all CPUs in the system. Signed-off-by: David Woodhouse --- include/linux/irqdomain.h | 4 kernel/irq/irqdomain.c| 28 ++-- kernel/irq/manage.c | 19 +++ 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 6b5576da77f0..cf686f18c1fa 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -151,6 +151,7 @@ struct irq_domain_chip_generic; * drivers using the generic chip library which uses this pointer. * @parent: Pointer to parent irq_domain to support hierarchy irq_domains * @debugfs_file: dentry for the domain debugfs file + * @max_affinity: mask of CPUs targetable by this IRQ domain * * Revmap data, used internally by irq_domain * @revmap_direct_max_irq: The largest hwirq that can be set for controllers that @@ -177,6 +178,7 @@ struct irq_domain { #ifdef CONFIG_GENERIC_IRQ_DEBUGFS struct dentry *debugfs_file; #endif + const struct cpumask *max_affinity; /* reverse map data. The linear map gets appended to the irq_domain */ irq_hw_number_t hwirq_max; @@ -453,6 +455,8 @@ extern void irq_domain_set_info(struct irq_domain *domain, unsigned int virq, void *chip_data, irq_flow_handler_t handler, void *handler_data, const char *handler_name); extern void irq_domain_reset_irq_data(struct irq_data *irq_data); +extern int irq_domain_set_affinity(struct irq_domain *domain, + const struct cpumask *affinity); #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY extern struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent, unsigned int flags, unsigned int size, diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index ffd41f21afca..0b298355be1b 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -661,7 +661,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, /* Allocate a virtual interrupt number */ virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), - NULL, NULL); + NULL, domain->max_affinity); if (virq <= 0) { pr_debug("-> virq allocation failed\n"); return 0; @@ -1078,6 +1078,27 @@ void irq_domain_reset_irq_data(struct irq_data *irq_data) } EXPORT_SYMBOL_GPL(irq_domain_reset_irq_data); +/** + * irq_domain_set_affinity - Set maximum CPU affinity for domain + * @parent:Domain to set affinity for + * @affinity: Pointer to cpumask, consumed by domain + * + * Sets the maximal set of CPUs to which interrupts in this domain may + * be delivered. Must only be called after creation, before any interrupts + * have been in the domain. + * + * This function retains a pointer to the cpumask which is passed in. + */ +int irq_domain_set_affinity(struct irq_domain *domain, + const struct cpumask *affinity) +{ + if (cpumask_empty(affinity)) + return -EINVAL; + domain->max_affinity = affinity; + return 0; +} +EXPORT_SYMBOL_GPL(irq_domain_set_affinity); + #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY /** * irq_domain_create_hierarchy - Add a irqdomain into the hierarchy @@ -1110,6 +1131,9 @@ struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent, if (domain) { domain->parent = parent; domain->flags |= flags; + if (parent && parent->max_affinity) + irq_domain_set_affinity(domain, + parent->max_affinity); } return domain; @@ -1375,7 +1399,7 @@ int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base, virq = irq_base; } else { virq = irq_domain_alloc_descs(irq_base, nr_irqs, 0, node, - affinity, NULL); + affinity, domain->max_affinity); if (virq < 0) { pr_debug("cannot allocate IRQ(base %d, count %d)\n", irq_base, nr_irqs); diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 5df903fccb60..e8c5f8ecd306 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -345,6 +345,10 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, struct irq_desc *desc = irq_data_to_desc(data); int ret = 0; + if (data->domain && data->domain->max_affinity && + !cpumask_subset(mask, data->domain->max_affinity)) + return -EINVAL; + if (!chi
[PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit
From: David Woodhouse The Intel IOMMU has an MSI-like configuration for its interrupt, but it isn't really MSI. So it gets to abuse the high 32 bits of the address, and puts the high 24 bits of the extended APIC ID there. This isn't something that can be used in the general case for real MSIs, since external devices using the high bits of the address would be performing writes to actual memory space above 4GiB, not targeted at the APIC. Factor the hack out and allow it only to be used when appropriate, adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC ID. That should never happen since the legacy MSI messages are not supposed to be used with Interrupt Remapping enabled. The x2apic_enabled() check isn't needed because we won't bring up CPUs with higher APIC IDs unless x2apic is enabled anyway. Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/msi.c | 34 -- 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 6313f0a05db7..356f8acf4927 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -23,13 +23,10 @@ struct irq_domain *x86_pci_msi_default_domain __ro_after_init; -static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) +static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, int dmar) { msg->address_hi = MSI_ADDR_BASE_HI; - if (x2apic_enabled()) - msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); - msg->address_lo = MSI_ADDR_BASE_LO | ((apic->irq_dest_mode == 0) ? @@ -43,18 +40,42 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) MSI_DATA_LEVEL_ASSERT | MSI_DATA_DELIVERY_FIXED | MSI_DATA_VECTOR(cfg->vector); + + /* +* Only the IOMMU itself can use the trick of putting destination +* APIC ID into the high bits of the address. Anything else would +* just be writing to memory if it tried that, and needs IR to +* address APICs above 255. +*/ + if (dmar) + msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else + WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); } void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) { - __irq_msi_compose_msg(irqd_cfg(data), msg); + __irq_msi_compose_msg(irqd_cfg(data), msg, 0); } +/* + * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the + * high bits of the destination APIC ID. This can't be done in the general + * case for MSIs as it would be targeting real memory above 4GiB not the + * APIC. + */ +static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) +{ + __irq_msi_compose_msg(irqd_cfg(data), msg, 1); + + + +} static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg) { struct msi_msg msg[2] = { [1] = { }, }; - __irq_msi_compose_msg(cfg, msg); + __irq_msi_compose_msg(cfg, msg, 0); irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg); } @@ -288,6 +309,7 @@ static struct irq_chip dmar_msi_controller = { .irq_ack= irq_chip_ack_parent, .irq_set_affinity = msi_domain_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_compose_msi_msg= dmar_msi_compose_msg, .irq_write_msi_msg = dmar_msi_write_msg, .flags = IRQCHIP_SKIP_SET_WAKE, }; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 09/13] x86/irq: Add x86_non_ir_cpumask
From: David Woodhouse This is the mask of CPUs to which IRQs can be delivered without interrupt remapping. Signed-off-by: David Woodhouse --- arch/x86/include/asm/mpspec.h | 1 + arch/x86/kernel/apic/apic.c| 12 arch/x86/kernel/apic/io_apic.c | 2 ++ 3 files changed, 15 insertions(+) diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h index 25ee8ca0a1f2..b2090be5b444 100644 --- a/arch/x86/include/asm/mpspec.h +++ b/arch/x86/include/asm/mpspec.h @@ -141,5 +141,6 @@ static inline void physid_set_mask_of_physid(int physid, physid_mask_t *map) #define PHYSID_MASK_NONE { {[0 ... PHYSID_ARRAY_SIZE-1] = 0UL} } extern physid_mask_t phys_cpu_present_map; +extern cpumask_t x86_non_ir_cpumask; #endif /* _ASM_X86_MPSPEC_H */ diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 459c78558f36..069f5e9f1d28 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -103,6 +103,9 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid); EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid); EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_acpiid); +/* Mask of CPUs which can be targeted by non-remapped interrupts. */ +cpumask_t x86_non_ir_cpumask = { CPU_BITS_ALL }; + #ifdef CONFIG_X86_32 /* @@ -1838,6 +1841,7 @@ static __init void x2apic_enable(void) static __init void try_to_enable_x2apic(int remap_mode) { u32 apic_limit = 0; + int i; if (x2apic_state == X2APIC_DISABLED) return; @@ -1880,6 +1884,14 @@ static __init void try_to_enable_x2apic(int remap_mode) if (apic_limit) x2apic_set_max_apicid(apic_limit); + /* Build the affinity mask for interrupts that can't be remapped. */ + cpumask_clear(&x86_non_ir_cpumask); + i = min_t(unsigned int, num_possible_cpus() - 1, apic_limit); + for ( ; i >= 0; i--) { + if (cpu_physical_id(i) <= apic_limit) + cpumask_set_cpu(i, &x86_non_ir_cpumask); + } + x2apic_enable(); } diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index aa9a3b54a96c..4d0ef46fedb9 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2098,6 +2098,8 @@ static int mp_alloc_timer_irq(int ioapic, int pin) struct irq_alloc_info info; ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 0, 0); + if (domain->parent == x86_vector_domain) + info.mask = &x86_non_ir_cpumask; info.devid = mpc_ioapic_id(ioapic); info.ioapic.pin = pin; mutex_lock(&ioapic_mutex); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 06/13] genirq: Add default_affinity argument to __irq_alloc_descs()
From: David Woodhouse It already takes an array of affinities for each individual irq being allocated but that's awkward to allocate and populate in the case where they're all the same and inherited from the irqdomain, so pass the default in separately as a simple cpumask. Signed-off-by: David Woodhouse --- include/linux/irq.h| 10 ++ kernel/irq/devres.c| 8 ++-- kernel/irq/irqdesc.c | 10 -- kernel/irq/irqdomain.c | 6 +++--- 4 files changed, 23 insertions(+), 11 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 1b7f4dfee35b..6e119594d35d 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -897,15 +897,17 @@ unsigned int arch_dynirq_lower_bound(unsigned int from); int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, - const struct irq_affinity_desc *affinity); + const struct irq_affinity_desc *affinity, + const struct cpumask *default_affinity); int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, - const struct irq_affinity_desc *affinity); + const struct irq_affinity_desc *affinity, + const struct cpumask *default_affinity); /* use macros to avoid needing export.h for THIS_MODULE */ #define irq_alloc_descs(irq, from, cnt, node) \ - __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL) + __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL, NULL) #define irq_alloc_desc(node) \ irq_alloc_descs(-1, 0, 1, node) @@ -920,7 +922,7 @@ int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, irq_alloc_descs(-1, from, cnt, node) #define devm_irq_alloc_descs(dev, irq, from, cnt, node)\ - __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL) + __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL, NULL) #define devm_irq_alloc_desc(dev, node) \ devm_irq_alloc_descs(dev, -1, 0, 1, node) diff --git a/kernel/irq/devres.c b/kernel/irq/devres.c index f6e5515ee077..079339decc23 100644 --- a/kernel/irq/devres.c +++ b/kernel/irq/devres.c @@ -170,6 +170,8 @@ static void devm_irq_desc_release(struct device *dev, void *res) * @affinity: Optional pointer to an irq_affinity_desc array of size @cnt * which hints where the irq descriptors should be allocated * and which default affinities to use + * @default_affinity: Optional pointer to a cpumask indicating the default + * affinity to use where not specified by the @affinity array * * Returns the first irq number or error code. * @@ -177,7 +179,8 @@ static void devm_irq_desc_release(struct device *dev, void *res) */ int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, - const struct irq_affinity_desc *affinity) + const struct irq_affinity_desc *affinity, + const struct cpumask *default_affinity) { struct irq_desc_devres *dr; int base; @@ -186,7 +189,8 @@ int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, if (!dr) return -ENOMEM; - base = __irq_alloc_descs(irq, from, cnt, node, owner, affinity); + base = __irq_alloc_descs(irq, from, cnt, node, owner, affinity, +default_affinity); if (base < 0) { devres_free(dr); return base; diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 4ac91b9fc618..fcc3b8a1fe01 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -770,15 +770,21 @@ EXPORT_SYMBOL_GPL(irq_free_descs); * @affinity: Optional pointer to an affinity mask array of size @cnt which * hints where the irq descriptors should be allocated and which * default affinities to use + * @default_affinity: Optional pointer to a cpumask indicating the default + * affinity where not specified in the @affinity array * * Returns the first irq number or error code */ int __ref __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, - struct module *owner, const struct irq_affinity_desc *affinity) + struct module *owner, const struct irq_affinity_desc *affinity, + const struct cpumask *default_affinity) { int start, ret; + if (!default_affinity) + default_affinity = irq_default_affinity; + if (!cnt) return -EINVAL; @@ -808,7 +814,7 @@ __irq_alloc_descs(int irq, unsi
[PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE
From: David Woodhouse The IOAPIC Redirection Table Entries contain an 8-bit Extended Destination ID field which maps to bits 11-4 of the MSI address. The lowest bit is used to indicate remappable format, when interrupt remapping is in use. A hypervisor can use the other 7 bits to permit guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs before having to expose a vIOMMU and interrupt remapping to the guest. No behavioural change in this patch, since nothing yet permits APIC IDs above 255 to be used with the non-IR IOAPIC domain. Signed-off-by: David Woodhouse --- arch/x86/include/asm/io_apic.h | 3 ++- arch/x86/kernel/apic/io_apic.c | 19 +-- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a1a26f6d3aa4..e65a0b7379d0 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -78,7 +78,8 @@ struct IO_APIC_route_entry { mask: 1, /* 0: enabled, 1: disabled */ __reserved_2: 15; - __u32 __reserved_3: 24, + __u32 __reserved_3: 17, + ext_dest: 7, dest: 8; } __attribute__ ((packed)); diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index a33380059db6..aa9a3b54a96c 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries) buf, (ir_entry->index2 << 15) | ir_entry->index, ir_entry->zero); else - printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", + printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n", buf, entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ? - "logical " : "physical", + "logical " : "physical", entry.ext_dest, entry.dest, entry.delivery_mode); } } @@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void) */ if (ioapic_i8259.pin != -1) { struct IO_APIC_route_entry entry; + u32 apic_id = read_apic_id(); memset(&entry, 0, sizeof(entry)); entry.mask = IOAPIC_UNMASKED; @@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void) entry.polarity = IOAPIC_POL_HIGH; entry.dest_mode = IOAPIC_DEST_MODE_PHYSICAL; entry.delivery_mode = dest_ExtINT; - entry.dest = read_apic_id(); + entry.dest = apic_id & 0xff; + entry.ext_dest = apic_id >> 8; /* * Add it to the IO-APIC irq-routing table: @@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd) * ioapic chip to verify that. */ if (irqd->chip == &ioapic_chip) { - mpd->entry.dest = cfg->dest_apicid; + mpd->entry.dest = cfg->dest_apicid & 0xff; + mpd->entry.ext_dest = cfg->dest_apicid >> 8; mpd->entry.vector = cfg->vector; } for_each_irq_pin(entry, mpd->irq_2_pin) @@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void) int apic, pin, i; struct IO_APIC_route_entry entry0, entry1; unsigned char save_control, save_freq_select; + u32 apic_id; pin = find_isa_irq_pin(8, mp_INT); if (pin == -1) { @@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void) entry0 = ioapic_read_entry(apic, pin); clear_IO_APIC_pin(apic, pin); + apic_id = hard_smp_processor_id(); memset(&entry1, 0, sizeof(entry1)); entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL; entry1.mask = IOAPIC_UNMASKED; - entry1.dest = hard_smp_processor_id(); + entry1.dest = apic_id & 0xff; + entry1.ext_dest = apic_id >> 8; entry1.delivery_mode = dest_ExtINT; entry1.polarity = entry0.polarity; entry1.trigger = IOAPIC_EDGE; @@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, memset(entry, 0, sizeof(*entry)); entry->delivery_mode = apic->irq_delivery_mode; entry->dest_mode = apic->irq_dest_mode; - entry->dest = cfg->dest_apicid; + entry->dest = cfg->dest_apicid & 0xff; + entry->ext_dest = cfg->dest_apicid >> 8; entry->vector= cfg->vect
[PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping
Linux currently refuses to use >255 CPUs on x86 unless it has interrupt remapping. This is a bit gratuitous because it could use those extra CPUs just fine; it just can't target external interrupts at them. The only problem is that our generic IRQ domain code cann't cope with the concept of domains which can only target a subset of CPUs. The hyperv-iommu IRQ remapping driver works around this — not by actually doing any remapping, but just returning -EINVAL if the affinity is ever set to an unreachable CPU. This almost works, but ends up being a bit late because irq_set_affinity_locked() doesn't call into the irqchip driver immediately; the error only happens later. This patch series implements a per-domain "maximum affinity" set and uses it for the non-remapped IOAPIC and MSI domains on x86. As well as allowing more CPUs to be used without interrupt remapping, this also fixes the case where some IOAPICs or PCI devices aren't actually in scope of any active IOMMU and are operating without remapping. While we're at it, recognise that the 8-bit limit is a bit gratuitous and a hypervisor could offer at least 15 bits of APIC ID in the IOAPIC RTE and MSI address bits 11-5 without even needing to use remapping. David Woodhouse (13): x86/apic: Use x2apic in guest kernels even with unusable CPUs. x86/msi: Only use high bits of MSI address for DMAR unit x86/ioapic: Handle Extended Destination ID field in RTE x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available genirq: Prepare for default affinity to be passed to __irq_alloc_descs() genirq: Add default_affinity argument to __irq_alloc_descs() irqdomain: Add max_affinity argument to irq_domain_alloc_descs() genirq: Add irq_domain_set_affinity() x86/irq: Add x86_non_ir_cpumask x86/irq: Limit IOAPIC and MSI domains' affinity without IR x86/smp: Allow more than 255 CPUs even without interrupt remapping iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Documentation/virt/kvm/cpuid.rst | 4 + arch/x86/include/asm/apic.h | 1 + arch/x86/include/asm/io_apic.h | 3 +- arch/x86/include/asm/mpspec.h| 2 + arch/x86/include/asm/x86_init.h | 2 + arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kernel/apic/apic.c | 41 +- arch/x86/kernel/apic/io_apic.c | 23 -- arch/x86/kernel/apic/msi.c | 44 +-- arch/x86/kernel/apic/x2apic_phys.c | 9 +++ arch/x86/kernel/kvm.c| 6 ++ arch/x86/kernel/x86_init.c | 1 + drivers/iommu/hyperv-iommu.c | 149 +-- include/linux/interrupt.h| 2 + include/linux/irq.h | 10 ++- include/linux/irqdomain.h| 7 +- kernel/irq/devres.c | 8 +- kernel/irq/ipi.c | 2 +- kernel/irq/irqdesc.c | 29 --- kernel/irq/irqdomain.c | 69 ++-- kernel/irq/manage.c | 19 - 21 files changed, 240 insertions(+), 192 deletions(-) smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR
From: David Woodhouse When interrupt remapping isn't enabled, only the first 255 CPUs can receive external interrupts. Set the appropriate max affinity for the IOAPIC and MSI IRQ domains accordingly. This also fixes the case where interrupt remapping is enabled but some devices are not within the scope of any active IOMMU. Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/io_apic.c | 2 ++ arch/x86/kernel/apic/msi.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 4d0ef46fedb9..1c8ce7bc098f 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2332,6 +2332,8 @@ static int mp_irqdomain_create(int ioapic) } ip->irqdomain->parent = parent; + if (parent == x86_vector_domain) + irq_domain_set_affinity(ip->irqdomain, &x86_non_ir_cpumask); if (cfg->type == IOAPIC_DOMAIN_LEGACY || cfg->type == IOAPIC_DOMAIN_STRICT) diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 4d891967bea4..af5ce5c4da02 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -259,6 +259,7 @@ struct irq_domain * __init native_create_pci_msi_domain(void) pr_warn("Failed to initialize PCI-MSI irqdomain.\n"); } else { d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK; + irq_domain_set_affinity(d, &x86_non_ir_cpumask); } return d; } @@ -479,6 +480,8 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id) irq_domain_free_fwnode(fn); kfree(domain_info); } + if (parent == x86_vector_domain) + irq_domain_set_affinity(d, &x86_non_ir_cpumask); return d; } -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths
From: David Woodhouse Instead of bailing out completely, such a unit can still be used for interrupt remapping. Signed-off-by: David Woodhouse --- drivers/iommu/intel/dmar.c | 46 +- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 93e6345f3414..4420a759f095 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1024,8 +1024,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) { struct intel_iommu *iommu; u32 ver, sts; - int agaw = 0; - int msagaw = 0; + int agaw = -1; + int msagaw = -1; int err; if (!drhd->reg_base_addr) { @@ -1050,17 +1050,28 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) } err = -EINVAL; - agaw = iommu_calculate_agaw(iommu); - if (agaw < 0) { - pr_err("Cannot get a valid agaw for iommu (seq_id = %d)\n", - iommu->seq_id); - goto err_unmap; - } - msagaw = iommu_calculate_max_sagaw(iommu); - if (msagaw < 0) { - pr_err("Cannot get a valid max agaw for iommu (seq_id = %d)\n", - iommu->seq_id); - goto err_unmap; + if (cap_sagaw(iommu->cap) == 0) { + pr_info("%s: No supported address widths. Not attempting DMA translation.\n", + iommu->name); + drhd->ignored = 1; + } + + if (!drhd->ignored) { + agaw = iommu_calculate_agaw(iommu); + if (agaw < 0) { + pr_err("Cannot get a valid agaw for iommu (seq_id = %d)\n", + iommu->seq_id); + drhd->ignored = 1; + } + } + if (!drhd->ignored) { + msagaw = iommu_calculate_max_sagaw(iommu); + if (msagaw < 0) { + pr_err("Cannot get a valid max agaw for iommu (seq_id = %d)\n", + iommu->seq_id); + drhd->ignored = 1; + agaw = -1; + } } iommu->agaw = agaw; iommu->msagaw = msagaw; @@ -1087,7 +1098,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) raw_spin_lock_init(&iommu->register_lock); - if (intel_iommu_enabled) { + /* +* This is only for hotplug; at boot time intel_iommu_enabled won't +* be set yet. When intel_iommu_init() runs, it registers the units +* present at boot time, then sets intel_iommu_enabled. +*/ + if (intel_iommu_enabled && !drhd->ignored) { err = iommu_device_sysfs_add(&iommu->iommu, NULL, intel_iommu_groups, "%s", iommu->name); @@ -1117,7 +1133,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) static void free_iommu(struct intel_iommu *iommu) { - if (intel_iommu_enabled) { + if (intel_iommu_enabled && iommu->iommu.ops) { iommu_device_unregister(&iommu->iommu); iommu_device_sysfs_remove(&iommu->iommu); } -- 2.17.1 smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement
On Fri, 2020-05-15 at 10:19 -0700, Raj, Ashok wrote: > Hi Christoph > > On Fri, May 15, 2020 at 08:43:51AM -0700, Christoph Hellwig wrote: > > Can you please lift the untrusted flag into struct device? It really > > isn't a PCI specific concept, and we should not have code poking into > > pci_dev all over the iommu code. > > Just for clarification: All IOMMU's today mostly pertain to only PCI devices > and for devices that aren't PCI like HPET for instance we give a PCI > identifier. Facilities like ATS for instance require the platform to have > an IOMMU. > > what additional problems does moving this to struct device solve? Even the Intel IOMMU supports ACPI devices for which there is no struct pci_dev; only a B/D/F in the ANDD record indicating which entry in the context table to use. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address
On Thu, 2020-03-26 at 18:11 +0100, Alexander Graf wrote: > I'm with you on that sentiment, but in the environment I'm currently > looking at, we have neither DT nor ACPI: The kernel gets purely > configured via kernel command line. For other unenumerable artifacts on > the system, such as virtio-mmio platform devices, that works well enough > and also basically "hacks a kernel parameter" to specify the system layout. Well... maybe it should also feed in a DT for those too? smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] iommu/vt-d: Consolidate various cache flush ops
On Fri, 2019-11-22 at 11:04 +0800, Lu Baolu wrote: > Intel VT-d 3.0 introduces more caches and interfaces for software to > flush when it runs in the scalable mode. Currently various cache flush > helpers are scattered around. This consolidates them by putting them in > the existing iommu_flush structure. > > /* struct iommu_flush - Intel IOMMU cache invalidation ops > * > * @cc_inv: invalidate context cache > * @iotlb_inv: Invalidate IOTLB and paging structure caches when software > * has changed second-level tables. > * @p_iotlb_inv: Invalidate IOTLB and paging structure caches when software > * has changed first-level tables. > * @pc_inv: invalidate pasid cache > * @dev_tlb_inv: invalidate cached mappings used by requests-without-PASID > * from the Device-TLB on a endpoint device. > * @p_dev_tlb_inv: invalidate cached mappings used by requests-with-PASID > * from the Device-TLB on an endpoint device > */ > struct iommu_flush { > void (*cc_inv)(struct intel_iommu *iommu, u16 did, >u16 sid, u8 fm, u64 type); > void (*iotlb_inv)(struct intel_iommu *iommu, u16 did, u64 addr, > unsigned int size_order, u64 type); > void (*p_iotlb_inv)(struct intel_iommu *iommu, u16 did, u32 pasid, > u64 addr, unsigned long npages, bool ih); > void (*pc_inv)(struct intel_iommu *iommu, u16 did, u32 pasid, >u64 granu); > void (*dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16 pfsid, > u16 qdep, u64 addr, unsigned int mask); > void (*p_dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16 pfsid, > u32 pasid, u16 qdep, u64 addr, > unsigned long npages); > }; > > The name of each cache flush ops is defined according to the spec section 6.5 > so that people are easy to look up them in the spec. Hm, indirect function calls are quite expensive these days. I would have preferred to go in the opposite direction, since surely aren't going to have *many* of these implementations. Currently there's only one for register-based and one for queued invalidation, right? Even if VT-d 3.0 throws an extra version in, I think I'd prefer to take out the indirection completely and have an if/then helper. Would love to see a microbenchmark of unmap operations before and after this patch series with retpoline enabled, to see the effect. smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] intel-iommu: Turn off translations at shutdown
On Fri, 2019-11-08 at 08:47 +, Zeng, Jason wrote: > > -Original Message- > > From: David Woodhouse > > Sent: Friday, November 8, 2019 3:54 PM > > To: Deepa Dinamani ; j...@8bytes.org; linux- > > ker...@vger.kernel.org > > Cc: iommu@lists.linux-foundation.org; Zeng, Jason ; > > Tian, Kevin > > Subject: Re: [PATCH] intel-iommu: Turn off translations at shutdown > > > > On Thu, 2019-11-07 at 12:59 -0800, Deepa Dinamani wrote: > > > The intel-iommu driver assumes that the iommu state is > > > cleaned up at the start of the new kernel. > > > But, when we try to kexec boot something other than the > > > Linux kernel, the cleanup cannot be relied upon. > > > Hence, cleanup before we go down for reboot. > > > > > > Keeping the cleanup at initialization also, in case BIOS > > > leaves the IOMMU enabled. > > > > > > I considered turning off iommu only during kexec reboot, > > > but a clean shutdown seems always a good idea. But if > > > someone wants to make it conditional, we can do that. > > > > This is going to break things for the VMM live update scheme that Jason > > presented at KVM Forum, isn't it? > > > > In that case we rely on the IOMMU still operating during the > > transition. > > For VMM live update case, we should be able to detect and bypass > the shutdown that Deepa introduced here, so keep IOMMU still operating? Is that a 'yes' to Deepa's "if someone wants to make it conditional, we can do that" ? smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu