Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread David Woodhouse
On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote:
> > From: Baolu Lu 
> > 
> > > --- a/include/linux/dmar.h
> > > +++ b/include/linux/dmar.h
> > > @@ -19,7 +19,7 @@
> > >   struct acpi_dmar_header;
> > > 
> > >   #ifdef  CONFIG_X86
> > > -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > +# define DMAR_UNITS_SUPPORTED640
> > >   #else
> > >   # defineDMAR_UNITS_SUPPORTED64
> > >   #endif
> 
> ... is it necessary to permanently do 10x increase which wastes memory
> on most platforms which won't have such need.

I was just looking at that. It mostly adds about 3½ KiB to each struct
dmar_domain.

I think the only actual static array is the dmar_seq_ids bitmap which
grows to 640 *bits* which is fairly negligible, and the main growth is
that it adds about 3½ KiB to each struct dmar_domain for the
iommu_refcnt[] and iommu_did[] arrays.

> Does it make more sense to have a configurable approach similar to
> CONFIG_NR_CPUS? or even better can we just replace those static
> arrays with dynamic allocation so removing this restriction
> completely?

Hotplug makes that fun, but I suppose you only need to grow the array
in a given struct dmar_domain if you actually add a device to it that's
behind a newly added IOMMU. I don't know if the complexity of making it
fully dynamic is worth it though. We could make it a config option,
and/or a command line option (perhaps automatically derived from
CONFIG_NR_CPUS).

If it wasn't for hotplug, I think we'd know the right number by the
time we actually need it anyway, wouldn't we? Can we have a heuristic
for how many DMAR units are likely to be hotplugged? Is it as simple as
the ratio of present to not-yet-present CPUs in MADT?


> another nit: dmar is intel specific thus CONFIG_X86 is always true.

DMAR exists on IA64 too.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled

2021-03-17 Thread David Woodhouse



On 17 March 2021 13:32:35 GMT, Joerg Roedel  wrote:
>On Wed, Mar 17, 2021 at 11:47:11AM +0000, David Woodhouse wrote:
>> If you've already moved the Stoney Ridge check out of the way,
>there's
>> no real reason why you can't just set
>init_state=IOMMU_CMDLINE_DISABLED
>> directly from parse_amd_iommu_options(), is there? Then you don't
>need
>> the condition here at all?
>
>True, there is even more room for optimization. The amd_iommu_disabled
>variable can go away entirely, including its checks in
>early_amd_iommu_init(). I will do a patch-set on-top of this for v5.13
>which does more cleanups.

If we can get to the point where we don't even need to check 
amd_iommu_irq_remap in the ...select() function because the IRQ domain is never 
even registered in the case where the flag ends up false, all the better :)

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled

2021-03-17 Thread David Woodhouse
On Wed, 2021-03-17 at 10:10 +0100, Joerg Roedel wrote:
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 3280e6f5b720..61dae1800b7f 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2919,12 +2919,12 @@ static int __init state_next(void)
> }
> break;
> case IOMMU_IVRS_DETECTED:
> -   ret = early_amd_iommu_init();
> -   init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED;
> -   if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) {
> -   pr_info("AMD IOMMU disabled\n");
> +   if (amd_iommu_disabled) {
> init_state = IOMMU_CMDLINE_DISABLED;
> ret = -EINVAL;
> +   } else {
> +   ret = early_amd_iommu_init();
> +   init_state = ret ? IOMMU_INIT_ERROR : 
> IOMMU_ACPI_FINISHED;
> }
> break;
> case IOMMU_ACPI_FINISHED:
> -- 

If you've already moved the Stoney Ridge check out of the way, there's
no real reason why you can't just set init_state=IOMMU_CMDLINE_DISABLED
directly from parse_amd_iommu_options(), is there? Then you don't need
the condition here at all?


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: Don't initialise remapping irqdomain if IOMMU is disabled

2021-03-15 Thread David Woodhouse
From: David Woodhouse 

When the IOMMU is disabled, the driver still enumerates and initialises
the hardware in order to turn it off. Because IRQ remapping setup is
done early, the irqdomain is set up opportunistically.

In commit b34f10c2dc59 ("iommu/amd: Stop irq_remapping_select() matching
when remapping is disabled") I already make the irq_remapping_select()
function check the amd_iommu_irq_setup flag because that might get
cleared only after the irqdomain setup is done, when the IVRS is parsed.

However, in the case where 'amd_iommu=off' is passed on the command line,
the IRQ remapping setup isn't done but the amd_iommu_irq_setup flag is
still set by the early IRQ remap init code. Stop it doing that, by
bailing out of amd_iommu_prepare() early when it's disabled.

This avoids the crash in irq_remapping_select() as it dereferences the
NULL amd_iommu_rlookup_table[]:

[0.243659] Switched APIC routing to physical x2apic.
[0.262206] BUG: kernel NULL pointer dereference, address: 0500
[0.262927] #PF: supervisor read access in kernel mode
[0.263390] #PF: error_code(0x) - not-present page
[0.263844] PGD 0 P4D 0
[0.264135] Oops:  [#1] SMP PTI
[0.264460] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3 #831
[0.265069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.14.0-1.fc33 04/01/2014
[0.265825] RIP: 0010:irq_remapping_select+0x57/0xb0
[0.266327] Code: 4b 0c 48 3d 30 e0 a7 9e 75 0d eb 35 48 8b 00 48 3d 30 e0 
a7 9e 74 2a 0f b6 50 10 39 d1 75 ed 0f b7 40 12 48 8b 15 69 e3 d2 01 <48> 8b 14 
c2 48 85 d2 74 0e b8 01 00 00 00 48 3b aa 90 04 00 00 74
[0.268412] RSP: :9e803db0 EFLAGS: 00010246
[0.268919] RAX: 00a0 RBX: 9e803df8 RCX: 
[0.269550] RDX:  RSI:  RDI: 98120112fe79
[0.270245] RBP: 9812011c8218 R08: 0001 R09: 000a
[0.270922] R10: 000a R11: f000 R12: 9812011c8218
[0.271549] R13: 98120181ed88 R14:  R15: 
[0.272221] FS:  () GS:98127dc0() 
knlGS:
[0.272997] CS:  0010 DS:  ES:  CR0: 80050033
[0.273508] CR2: 0500 CR3: 3081 CR4: 06b0
[0.274178] Call Trace:
[0.274416]  irq_find_matching_fwspec+0x41/0xc0
[0.274812]  mp_irqdomain_create+0x65/0x150
[0.275251]  setup_IO_APIC+0x70/0x811

Fixes: a1a785b57242 ("iommu/amd: Implement select() method on remapping 
irqdomain")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212017
Signed-off-by: David Woodhouse 
---
 drivers/iommu/amd/init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9126efcbaf2c..07edd837b076 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2998,6 +2998,9 @@ int __init amd_iommu_prepare(void)
 {
int ret;
 
+   if (amd_iommu_disabled)
+   return -ENODEV;
+
amd_iommu_irq_remap = true;
 
ret = iommu_go_to_state(IOMMU_ACPI_FINISHED);
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2021-01-20 Thread David Woodhouse
On Wed, 2021-01-20 at 18:04 +0100, Greg KH wrote:
> I tried applying these to 5.4, 4.19, and 4.14, and they all fail to
> build:
> 
> drivers/iommu/dmar.c: In function ‘free_iommu’:
> drivers/iommu/dmar.c:1140:35: error: ‘struct intel_iommu’ has no member named 
> ‘drhd’
>  1140 |  if (intel_iommu_enabled && !iommu->drhd->ignored) {
>   |   ^~
> 
> So if you could provide a working set of patches backported, I will be
> glad to queue them up.

Thanks.

I'm just heckling at Sam's backport of those, and we'll post tested
patches as soon as we're done.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2021-01-20 Thread David Woodhouse
On Wed, 2021-01-20 at 13:06 +0100, Greg KH wrote:
> On Wed, Jan 20, 2021 at 09:42:43AM +0000, David Woodhouse wrote:
> > On Thu, 2020-09-24 at 15:08 +0100, David Woodhouse wrote:
> > > From: David Woodhouse 
> > > 
> > > Instead of bailing out completely, such a unit can still be used for
> > > interrupt remapping.
> > > 
> > > Signed-off-by: David Woodhouse 
> > 
> > Could we have this for stable too please, along with the trivial
> > subsequent fixup. They are:
> > 
> > c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported 
> > address widths")
> > 9def3b1a07c4 ("iommu/vt-d: Don't dereference iommu_device if IOMMU_API is 
> > not built")
> > 
> > They apply fairly straightforwardly when backported; let me know if you
> > want us to send patches.
> 
> What stable kernel(s) do you want this in?  The above patches are
> already in 5.10.

It's a fairly simple bug fix, to still use a given IOMMU for interrupt
remapping even if it can't be used for DMA mapping.

Those features are somewhat orthogonal, and it was wrong for the kernel
to bail out on the IOMMU hardware completely.

The interrupt remapping support is what's required for Intel boxes (or
VMs) to run with more than 255 CPUs. It should be fairly simple to fix
the same bug at least as far back as 4.14.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2021-01-20 Thread David Woodhouse
On Thu, 2020-09-24 at 15:08 +0100, David Woodhouse wrote:
> From: David Woodhouse 
> 
> Instead of bailing out completely, such a unit can still be used for
> interrupt remapping.
> 
> Signed-off-by: David Woodhouse 

Could we have this for stable too please, along with the trivial
subsequent fixup. They are:

c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported 
address widths")
9def3b1a07c4 ("iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not 
built")

They apply fairly straightforwardly when backported; let me know if you
want us to send patches.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: Stop irq_remapping_select() matching when remapping is disabled

2021-01-04 Thread David Woodhouse
From: David Woodhouse 

The AMD IOMMU initialisation registers the IRQ remapping domain for
each IOMMU before doing the final sanity check that every I/OAPIC is
covered.

This means that the AMD irq_remapping_select() function gets invoked
even when IRQ remapping has been disabled, eventually leading to a NULL
pointer dereference in alloc_irq_table().

Unfortunately, the IVRS isn't fully parsed early enough that the sanity
check can be done in time to registering the IRQ domain altogether.
Doing that would be nice, but is a larger and more error-prone task. The
simple fix is just for irq_remapping_select() to refuse to report a
match when IRQ remapping has disabled.

Link: https://lore.kernel.org/lkml/ed4be9b4-24ac-7128-c522-7ef359e81...@gmx.at
Fixes: a1a785b57242 ("iommu/amd: Implement select() method on remapping 
irqdomain")
Reported-by: Johnathan Smithinovic 
Signed-off-by: David Woodhouse 
---
 drivers/iommu/amd/iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7e2c445a1fae..f0adbc48fd17 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3854,6 +3854,9 @@ static int irq_remapping_select(struct irq_domain *d, 
struct irq_fwspec *fwspec,
struct amd_iommu *iommu;
int devid = -1;
 
+   if (!amd_iommu_irq_remap)
+   return 0;
+
if (x86_fwspec_is_ioapic(fwspec))
devid = get_ioapic_devid(fwspec->param[0]);
else if (x86_fwspec_is_hpet(fwspec))
-- 
2.29.2



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: Set iommu->int_enabled consistently when interrupts are set up

2021-01-04 Thread David Woodhouse
From: David Woodhouse 

When I made the INTCAPXT support stop gratuitously pretending to be MSI,
I missed the fact that iommu_setup_msi() also sets the ->int_enabled
flag. I missed this in the iommu_setup_intcapxt() code path, which means
that a resume from suspend will try to allocate the IRQ domains again,
accidentally re-enabling interrupts as it does, resulting in much sadness.

Lift out the bit which sets iommu->int_enabled into the iommu_init_irq()
function which is also where it gets checked.

Link: https://lore.kernel.org/r/20210104132250.ge32...@zn.tnic/
Fixes: d1adcfbb520c ("iommu/amd: Fix IOMMU interrupt generation in X2APIC mode")
Reported-by: Borislav Petkov 
Signed-off-by: David Woodhouse 
---
There's a possibility we also need to ensure that the actual
MMIO_INTCAPXT_xxx_OFFSET registers are restored too. Unless you
actually trigger something to generate faults, you'll never know.
I don't see offhand how that was working in the pretend-to-be-MSI case
either.

 drivers/iommu/amd/init.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f54cd79b43e4..6a1f7048dacc 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1973,8 +1973,6 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
return r;
}
 
-   iommu->int_enabled = true;
-
return 0;
 }
 
@@ -2169,6 +2167,7 @@ static int iommu_init_irq(struct amd_iommu *iommu)
if (ret)
return ret;
 
+   iommu->int_enabled = true;
 enable_faults:
iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
-- 
2.29.2



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [EXTERNAL] PROBLEM: commit f36a74b9345a leads to not booting system with AMD 2990WX

2021-01-04 Thread David Woodhouse
On Tue, 2021-01-05 at 00:05 +0100, Johnathan Smithinovic wrote:
> commit f36a74b9345a leads to not booting system with AMD 2990WX
> 
> 
> When trying to boot 5.11-rc2 as usual the messages of the bootloader stay on 
> my
> screen and not much appears to happen (fans run a bit slower than in GRUB,
> devices don't seem to get accessed). Without this commit everything seems to
> work as usual.
> 
> (In the hope that it is helpful I appended messages mentioning "IO APIC"
> from 5.11-rc2 with the mentioned commit reverted (and the kernel parameter
> apic=debug added).
> I'm sorry that I can't provide more details: I'm unsure how I can gather
> more information since my Motherboard (ASUS ROG ZENITH EXTREME) does not have
> any documented serial ports and USB devices don't seem to get turned on.)

No problem, that was enough to reproduce it in qemu just by simulating
the same BIOS bug which causes it to *start* enabling, then abort, the
IRQ remapping. Thanks for the report.

Does this fix it?

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7e2c445a1fae..f0adbc48fd17 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3854,6 +3854,9 @@ static int irq_remapping_select(struct irq_domain *d, 
struct irq_fwspec *fwspec,
struct amd_iommu *iommu;
int devid = -1;
 
+   if (!amd_iommu_irq_remap)
+   return 0;
+
if (x86_fwspec_is_ioapic(fwspec))
devid = get_ioapic_devid(fwspec->param[0]);
else if (x86_fwspec_is_hpet(fwspec))



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Interrupts enabled after amd_iommu_resume+0x0/0x40

2021-01-04 Thread David Woodhouse
On Tue, 2021-01-05 at 00:23 +0100, Borislav Petkov wrote:
> On Mon, Jan 04, 2021 at 02:22:50PM +0100, Borislav Petkov wrote:
> > Hi folks,
> > 
> > syscore_resume() doesn't like when the AMD iommu driver enables
> > interrupts in its ->resume hook when I resume the box from suspend to
> > RAM.
> > 
> > All kinds of warnings get triggered too:
> > 
> > [   17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:824 
> > ktime_get+0x8d/0xa0
> > [   17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:867 
> > ktime_get_with_offset+0xda/0xf0
> > [   17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:824 
> > ktime_get+0x8d/0xa0
> > [   17.386830] WARNING: CPU: 0 PID: 2 at kernel/time/timekeeping.c:867 
> > ktime_get_with_offset+0xda/0xf0
> > [   17.386830] WARNING: CPU: 0 PID: 1576 at drivers/base/syscore.c:103 
> > syscore_resume+0x12d/0x160
> > 
> > but when I comment out the
> > 
> >amd_iommu_enable_interrupts()
> > 
> > call in the resume hook, all is fine and quiet and box resumes.
> > 
> > I'll try to bisect later to try to pinpoint it better because I don't
> > see what recent change would've caused this. But someone might have a
> > better idea so CC the usual suspects.
> 
> Ok, bisected to:
> 
> # first bad commit: [d1adcfbb520c43c10fc22fcdccdd4204e014fb53] iommu/amd: Fix 
> IOMMU interrupt generation in X2APIC mode
> 
> That patch reverts cleanly ontop of -rc2 and with it reverted, the
> machine resumes fine.

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f54cd79b43e4..6a1f7048dacc 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1973,8 +1973,6 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
return r;
}
 
-   iommu->int_enabled = true;
-
return 0;
 }
 
@@ -2169,6 +2167,7 @@ static int iommu_init_irq(struct amd_iommu *iommu)
if (ret)
return ret;
 
+   iommu->int_enabled = true;
 enable_faults:
iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 


?


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/vt-d: Don't read VCCAP register unless it exists

2020-11-26 Thread David Woodhouse
From: David Woodhouse 

My virtual IOMMU implementation is whining that the guest is reading a
register that doesn't exist. Only read the VCCAP_REG if the corresponding
capability is set in ECAP_REG to indicate that it actually exists.

Fixes: 3375303e8287 ("iommu/vt-d: Add custom allocator for IOASID")
Signed-off-by: David Woodhouse 
Reviewed-by: Liu Yi L 
Cc: sta...@vger.kernel.org # v5.7+
---
 drivers/iommu/intel/dmar.c  | 3 ++-
 drivers/iommu/intel/iommu.c | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 404b40af31cb..38d1d40cfe34 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -967,7 +967,8 @@ static int map_iommu(struct intel_iommu *iommu, u64 
phys_addr)
warn_invalid_dmar(phys_addr, " returns all ones");
goto unmap;
}
-   iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
+   if (ecap_vcs(iommu->ecap))
+   iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
 
/* the registers might be more than one page */
map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8651f6d4dfa0..0823761f3a7c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1833,7 +1833,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
}
-   if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
+   if (vccap_pasid(iommu->vccap))
ioasid_unregister_allocator(&iommu->pasid_allocator);
 
 #endif
@@ -3209,7 +3209,7 @@ static void register_pasid_allocator(struct intel_iommu 
*iommu)
 * is active. All vIOMMU allocators will eventually be calling the same
 * host allocator.
 */
-   if (!ecap_vcs(iommu->ecap) || !vccap_pasid(iommu->vccap))
+   if (!vccap_pasid(iommu->vccap))
return;
 
pr_info("Register custom PASID allocator\n");




smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers

2020-11-10 Thread David Woodhouse
On Tue, 2020-11-10 at 01:31 -0500, Qian Cai wrote:
> On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> > From: Thomas Gleixner 
> > 
> > 'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
> > the trigger type (edge/level) and the active low/high configuration. While
> > there are defines for initializing these variables and struct members, they
> > are not used consequently and the meaning of 'trigger' and 'polarity' is
> > opaque and confusing at best.
> > 
> > Rename them to 'is_level' and 'active_low' and make them boolean in various
> > structs so it's entirely clear what the meaning is.
> > 
> > Signed-off-by: Thomas Gleixner 
> > Signed-off-by: David Woodhouse 
> > ---
> >   arch/x86/include/asm/hw_irq.h   |   6 +-
> >   arch/x86/kernel/apic/io_apic.c  | 244 +---
> >   arch/x86/pci/intel_mid_pci.c|   8 +-
> >   drivers/iommu/amd/iommu.c   |  10 +-
> >   drivers/iommu/intel/irq_remapping.c |   9 +-
> >   5 files changed, 130 insertions(+), 147 deletions(-)
> 
> Reverting the rest of patchset up to this commit on next-20201109 fixed an
> endless soft-lockups issue booting an AMD server below. I noticed that the
> failed boots always has this IOMMU IO_PAGE_FAULT before those soft-lockups:

Hm, attempting to reproduce this shows something else. Ever since
commit be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-
iommu api") in 5.5 the following stops working for me:

$ qemu-system-x86_64 -serial mon:stdio -kernel bzImage  -machine 
q35,accel=kvm,kernel-irqchip=split -m 2G -device amd-iommu,intremap=off -append 
"console=ttyS0 apic=verbose debug" -display none

It hasn't got a hard drive but I can watch the SATA interrupts fail as
it probes the CD-ROM:

[7.403327] ata3.00: qc timeout (cmd 0xa1)
[7.405980] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Adding 'iommu=off' to the kernel command line makes it work again, in
that it correctly panics at the lack of a root file system, quickly.




smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Woodhouse
On Sun, 2020-10-25 at 09:49 +, David Laight wrote:
> Just looking at a random one of these patches...
> 
> Does the compiler manage to optimise that reasonably?
> Or does it generate a lot of shifts and masks as each
> bitfield is set?
> 
> The code generation for bitfields is often a lot worse
> that that for |= setting bits in a word.

Indeed, it appears to be utterly appalling. That was one of my
motivations for doing it with masks and shifts in the first place.

Because in ioapic_setup_msg_from_msi(), for example, these two are
consecutive in both source and destination:

entry->vector   = msg.arch_data.vector;
entry->delivery_mode= msg.arch_data.delivery_mode;

And so are those:

entry->ir_format= msg.arch_addr_lo.dmar_format;
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;

But GCC 7.5.0 here is doing them all separately...

11e0 :
{
11e0:   53  push   %rbx
11e1:   48 89 f3mov%rsi,%rbx
11e4:   48 83 ec 18 sub$0x18,%rsp
irq_chip_compose_msi_msg(irq_data, &msg);
11e8:   48 89 e6mov%rsp,%rsi
{
11eb:   65 48 8b 04 25 28 00mov%gs:0x28,%rax
11f2:   00 00 
11f4:   48 89 44 24 10  mov%rax,0x10(%rsp)
11f9:   31 c0   xor%eax,%eax
irq_chip_compose_msi_msg(irq_data, &msg);
11fb:   e8 00 00 00 00  callq  1200 

entry->vector   = msg.arch_data.vector;
1200:   0f b6 44 24 08  movzbl 0x8(%rsp),%eax
entry->delivery_mode= msg.arch_data.delivery_mode;
1205:   0f b6 53 01 movzbl 0x1(%rbx),%edx
1209:   0f b6 74 24 09  movzbl 0x9(%rsp),%esi
entry->vector   = msg.arch_data.vector;
120e:   88 03   mov%al,(%rbx)
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
1210:   0f b6 04 24 movzbl (%rsp),%eax
entry->delivery_mode= msg.arch_data.delivery_mode;
1214:   83 e2 f0and$0xfff0,%edx
1217:   83 e6 07and$0x7,%esi
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
121a:   09 f2   or %esi,%edx
121c:   8d 0c 00lea(%rax,%rax,1),%ecx
entry->ir_format= msg.arch_addr_lo.dmar_format;
121f:   c0 e8 04shr$0x4,%al
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
1222:   83 e1 08and$0x8,%ecx
1225:   09 ca   or %ecx,%edx
1227:   88 53 01mov%dl,0x1(%rbx)
entry->ir_format= msg.arch_addr_lo.dmar_format;
122a:   89 c2   mov%eax,%edx
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;
122c:   8b 04 24mov(%rsp),%eax
122f:   83 e2 01and$0x1,%edx
1232:   c1 e8 05shr$0x5,%eax
1235:   66 25 ff 7f and$0x7fff,%ax
1239:   8d 0c 00lea(%rax,%rax,1),%ecx
123c:   66 c1 e8 07 shr$0x7,%ax
1240:   88 43 07mov%al,0x7(%rbx)
1243:   09 ca   or %ecx,%edx
}
1245:   48 8b 44 24 10  mov0x10(%rsp),%rax
124a:   65 48 33 04 25 28 00xor%gs:0x28,%rax
1251:   00 00 
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;
1253:   88 53 06mov%dl,0x6(%rbx)
}
1256:   75 06   jne125e 

1258:   48 83 c4 18 add$0x18,%rsp
125c:   5b  pop%rbx
125d:   c3  retq   
125e:   e8 00 00 00 00  callq  1263 

1263:   0f 1f 00nopl   (%rax)
1266:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
126d:   00 00 00 


Still, this isn't really a fast path so I'm content to file the GCC PR
for the really *obvious* misses and let it take its course.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields

2020-10-25 Thread David Woodhouse
On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> Fix the conditions for enabling x2apic on guests without interrupt 
> remapping, and support 15-bit Extended Destination ID to allow 32768 
> CPUs without IR on hypervisors that support it.
> 
> Make the I/OAPIC code generate its RTE directly from the MSI message
> created by the parent irqchip, and fix up a bunch of magic mask/shift
> macros to use bitfields for MSI messages and I/OAPIC RTEs while we're
> at it.

Forgot to mention (since I thought I'd posted it in a previous series)
that v3 also ditches irq_remapping_get_irq_domain() and some icky
special cases of hard-coding "x86_vector_domain", and makes HPET and
I/OAPIC use irq_find_matching_fwspeC() to find their parent irqdomain.

> v3:
>  • Lots of bitfield cleanups from Thomas.
>  • Disable hyperv-iommu if 15-bit extension is present.
>  • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c
>  • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream
>through KVM tree (and the other half needs to wait, or have an #ifdef) so
>is left at the top of the tree.
> 
> v2:
>  • Minor cleanups.
>  • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
>  • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
>  • Clean up HPET MSI support into hpet.c now that we can.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 26/35] iommu/hyper-v: Implement select() method on remapping irqdomain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Preparatory for removing irq_remapping_get_irq_domain()

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 drivers/iommu/hyperv-iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..78a264ad9405 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -101,7 +101,16 @@ static void hyperv_irq_remapping_free(struct irq_domain 
*domain,
irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+  struct irq_fwspec *fwspec,
+  enum irq_domain_bus_token bus_token)
+{
+   /* Claim only the first (and only) I/OAPIC */
+   return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+   .select = hyperv_irq_remapping_select,
.alloc = hyperv_irq_remapping_alloc,
.free = hyperv_irq_remapping_free,
 };
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/35] x86/hpet: Move MSI support into hpet.c

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

This isn't really dependent on PCI MSI; it's just generic MSI which is now
supported by the generic x86_vector_domain. Move the HPET MSI support back
into hpet.c with the rest of the HPET support.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hpet.h |  11 
 arch/x86/kernel/apic/msi.c  | 111 -
 arch/x86/kernel/hpet.c  | 118 ++--
 3 files changed, 112 insertions(+), 128 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6352dee37cda..ab9f3dd87c80 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,17 +74,6 @@ extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
 extern void force_hpet_resume(void);
 
-struct irq_data;
-struct hpet_channel;
-struct irq_domain;
-
-extern void hpet_msi_unmask(struct irq_data *data);
-extern void hpet_msi_mask(struct irq_data *data);
-extern void hpet_msi_write(struct hpet_channel *hc, struct msi_msg *msg);
-extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
-extern int hpet_assign_irq(struct irq_domain *domain,
-  struct hpet_channel *hc, int dev_num);
-
 #ifdef CONFIG_HPET_EMULATE_RTC
 
 #include 
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 4eda617eda1e..44ebe25e7703 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -340,114 +340,3 @@ void dmar_free_hwirq(int irq)
irq_domain_free_irqs(irq, 1);
 }
 #endif
-
-/*
- * MSI message composition
- */
-#ifdef CONFIG_HPET_TIMER
-static inline int hpet_dev_id(struct irq_domain *domain)
-{
-   struct msi_domain_info *info = msi_get_domain_info(domain);
-
-   return (int)(long)info->data;
-}
-
-static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
-{
-   hpet_msi_write(irq_data_get_irq_handler_data(data), msg);
-}
-
-static struct irq_chip hpet_msi_controller __ro_after_init = {
-   .name = "HPET-MSI",
-   .irq_unmask = hpet_msi_unmask,
-   .irq_mask = hpet_msi_mask,
-   .irq_ack = irq_chip_ack_parent,
-   .irq_set_affinity = msi_domain_set_affinity,
-   .irq_retrigger = irq_chip_retrigger_hierarchy,
-   .irq_write_msi_msg = hpet_msi_write_msg,
-   .flags = IRQCHIP_SKIP_SET_WAKE,
-};
-
-static int hpet_msi_init(struct irq_domain *domain,
-struct msi_domain_info *info, unsigned int virq,
-irq_hw_number_t hwirq, msi_alloc_info_t *arg)
-{
-   irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-   irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
-   handle_edge_irq, arg->data, "edge");
-
-   return 0;
-}
-
-static void hpet_msi_free(struct irq_domain *domain,
- struct msi_domain_info *info, unsigned int virq)
-{
-   irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
-}
-
-static struct msi_domain_ops hpet_msi_domain_ops = {
-   .msi_init   = hpet_msi_init,
-   .msi_free   = hpet_msi_free,
-};
-
-static struct msi_domain_info hpet_msi_domain_info = {
-   .ops= &hpet_msi_domain_ops,
-   .chip   = &hpet_msi_controller,
-   .flags  = MSI_FLAG_USE_DEF_DOM_OPS,
-};
-
-struct irq_domain *hpet_create_irq_domain(int hpet_id)
-{
-   struct msi_domain_info *domain_info;
-   struct irq_domain *parent, *d;
-   struct irq_alloc_info info;
-   struct fwnode_handle *fn;
-
-   if (x86_vector_domain == NULL)
-   return NULL;
-
-   domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
-   if (!domain_info)
-   return NULL;
-
-   *domain_info = hpet_msi_domain_info;
-   domain_info->data = (void *)(long)hpet_id;
-
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-   info.devid = hpet_id;
-   parent = irq_remapping_get_irq_domain(&info);
-   if (parent == NULL)
-   parent = x86_vector_domain;
-   else
-   hpet_msi_controller.name = "IR-HPET-MSI";
-
-   fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
- hpet_id);
-   if (!fn) {
-   kfree(domain_info);
-   return NULL;
-   }
-
-   d = msi_create_irq_domain(fn, domain_info, parent);
-   if (!d) {
-   irq_domain_free_fwnode(fn);
-   kfree(domain_info);
-   }
-   return d;
-}
-
-int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc,
-   int dev_num)
-{
-   struct irq_alloc_info info;
-
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_HPET;
-   info.data = hc;
-   info.devid = hpet_dev_id(domain);
-   info.hwirq = dev

[PATCH v3 21/35] x86/ioapic: Generate RTE directly from parent irqchip's MSI message

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

The I/O-APIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense, but
now is just a bunch of bits which get passed through in some seemingly
arbitrary order.

Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let
them just do their job and generate an MSI message. The bit swizzling to
turn that MSI message into the I/O-APIC's RTE is the same in all cases,
since it's a function of the I/O-APIC hardware. The IRQ remappers have no
real need to get involved with that.

The only slight caveat is that the I/OAPIC is interpreting some of those
fields too, and it does want the 'vector' field to be unique to make EOI
work. The AMD IOMMU happens to put its IRTE index in the bits that the
I/O-APIC thinks are the vector field, and accommodates this requirement by
reserving the first 32 indices for the I/O-APIC.  The Intel IOMMU doesn't
actually use the bits that the I/O-APIC thinks are the vector field, so it
fills in the 'pin' value there instead.

[ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg
bitfields and added commentry to explain the mapping magic ]

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h   |  11 ++-
 arch/x86/kernel/apic/io_apic.c  | 103 +++-
 drivers/iommu/amd/iommu.c   |  12 
 drivers/iommu/hyperv-iommu.c|  31 -
 drivers/iommu/intel/irq_remapping.c |  31 ++---
 5 files changed, 83 insertions(+), 105 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 517847a94dbe..83a69f62637e 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -45,12 +45,11 @@ enum irq_alloc_type {
 };
 
 struct ioapic_alloc_info {
-   int pin;
-   int node;
-   u32 is_level: 1;
-   u32 active_low  : 1;
-   u32 valid   : 1;
-   struct IO_APIC_route_entry  *entry;
+   int pin;
+   int node;
+   u32 is_level: 1;
+   u32 active_low  : 1;
+   u32 valid   : 1;
 };
 
 struct uv_alloc_info {
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 07e754131854..ea983da1a57f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -48,6 +48,7 @@
 #include  /* time_after() */
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -63,7 +64,6 @@
 #include 
 #include 
 #include 
-
 #include 
 
 #definefor_each_ioapic(idx)\
@@ -1848,21 +1848,58 @@ static void ioapic_ir_ack_level(struct irq_data 
*irq_data)
eoi_ioapic_pin(data->entry.vector, data);
 }
 
+/*
+ * The I/OAPIC is just a device for generating MSI messages from legacy
+ * interrupt pins. Various fields of the RTE translate into bits of the
+ * resulting MSI which had a historical meaning.
+ *
+ * With interrupt remapping, many of those bits have different meanings
+ * in the underlying MSI, but the way that the I/OAPIC transforms them
+ * from its RTE to the MSI message is the same. This function allows
+ * the parent IRQ domain to compose the MSI message, then takes the
+ * relevant bits to put them in the appropriate places in the RTE in
+ * order to generate that message when the IRQ happens.
+ *
+ * The setup here relies on a preconfigured route entry (is_level,
+ * active_low, masked) because the parent domain is merely composing the
+ * generic message routing information which is used for the MSI.
+ */
+static void ioapic_setup_msg_from_msi(struct irq_data *irq_data,
+ struct IO_APIC_route_entry *entry)
+{
+   struct msi_msg msg;
+
+   /* Let the parent domain compose the MSI message */
+   irq_chip_compose_msi_msg(irq_data, &msg);
+
+   /*
+* - Real vector
+* - DMAR/IR: 8bit subhandle (ioapic.pin)
+* - AMD/IR:  8bit IRTE index
+*/
+   entry->vector   = msg.arch_data.vector;
+   /* Delivery mode (for DMAR/IR all 0) */
+   entry->delivery_mode= msg.arch_data.delivery_mode;
+   /* Destination mode or DMAR/IR index bit 15 */
+   entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
+   /* DMAR/IR: 1, 0 for all other modes */
+   entry->ir_format= msg.arch_addr_lo.dmar_format;
+   /*
+* DMAR/IR: index bit 0-14.
+*
+* All other modes have bit 0-6 of dmar_index_0_14 cleared and the
+* topmost 8 bits are destination id bit 0-7 (entry::destid_0_7).
+*/
+   entry->ir_index_0_14

[PATCH v3 29/35] x86: Kill all traces of irq_remapping_get_irq_domain()

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

All users are converted to use the fwspec based parent domain lookup.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h|  2 --
 arch/x86/include/asm/irq_remapping.h |  9 
 drivers/iommu/amd/iommu.c| 34 
 drivers/iommu/hyperv-iommu.c |  9 
 drivers/iommu/intel/irq_remapping.c  | 17 --
 drivers/iommu/irq_remapping.c| 14 
 drivers/iommu/irq_remapping.h|  3 ---
 7 files changed, 88 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 83a69f62637e..458f5a676402 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
-   X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-   X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int 
id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-   return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 31b22244e9c2..463d322a4f3b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3601,10 +3601,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
-   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
return get_ioapic_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_HPET:
-   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
return get_hpet_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3615,44 +3613,12 @@ static int get_devid(struct irq_alloc_info *info)
}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-  int devid)
-{
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-   if (!iommu)
-   return NULL;
-
-   switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   return iommu->ir_domain;
-   default:
-   WARN_ON_ONCE(1);
-   return NULL;
-   }
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-   int devid;
-
-   if (!info)
-   return NULL;
-
-   devid = get_devid(info);
-   if (devid < 0)
-   return NULL;
-   return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
.prepare= amd_iommu_prepare,
.enable = amd_iommu_enable,
.disable= amd_iommu_disable,
.reenable   = amd_iommu_reenable,
.enable_faulting= amd_iommu_enable_faulting,
-   .get_irq_domain = get_irq_domain,
 };
 
 static void fill_msi_msg(struct msi_msg *msg, u32 index)
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 78a264ad9405..a629a6be65c7 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-   return ioapic_ir_domain;
-   else
-   return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
.prepare= hyperv_prepare_irq_remapping,
.enable = hyperv_enable_irq_remapping,
-   .get_irq_domain = hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index b3b079c0b51e..bca44015bc1d 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1127,29 +1127,12 @@ static void prepare_i

[PATCH v3 13/35] iommu/intel: Use msi_msg shadow structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Use the bitfields in the x86 shadow struct to compose the MSI message.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 drivers/iommu/intel/irq_remapping.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 5628d43b795e..30269b738fa5 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "../irq_remapping.h"
 
@@ -1260,6 +1259,21 @@ static struct irq_chip intel_ir_chip = {
.irq_set_vcpu_affinity  = intel_ir_set_vcpu_affinity,
 };
 
+static void fill_msi_msg(struct msi_msg *msg, u32 index, u32 subhandle)
+{
+   memset(msg, 0, sizeof(*msg));
+
+   msg->arch_addr_lo.dmar_base_address = X86_MSI_BASE_ADDRESS_LOW;
+   msg->arch_addr_lo.dmar_subhandle_valid = true;
+   msg->arch_addr_lo.dmar_format = true;
+   msg->arch_addr_lo.dmar_index_0_14 = index & 0x7FFF;
+   msg->arch_addr_lo.dmar_index_15 = !!(index & 0x8000);
+
+   msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+
+   msg->arch_data.dmar_subhandle = subhandle;
+}
+
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 struct irq_cfg *irq_cfg,
 struct irq_alloc_info *info,
@@ -1267,7 +1281,6 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
 {
struct IR_IO_APIC_route_entry *entry;
struct irte *irte = &data->irte_entry;
-   struct msi_msg *msg = &data->msi_entry;
 
prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
switch (info->type) {
@@ -1308,12 +1321,7 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
else
set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 
-   msg->address_hi = MSI_ADDR_BASE_HI;
-   msg->data = sub_handle;
-   msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
- MSI_ADDR_IR_SHV |
- MSI_ADDR_IR_INDEX1(index) |
- MSI_ADDR_IR_INDEX2(index);
+   fill_msi_msg(&data->msi_entry, index, sub_handle);
break;
 
default:
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 20/35] x86/ioapic: Cleanup IO/APIC route entry structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.

Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.

[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
register access into the same union and eliminating a few more
places where bits were accessed through masks and shifts.]

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/io_apic.h  |  78 ++-
 arch/x86/kernel/apic/io_apic.c  | 144 +---
 drivers/iommu/amd/iommu.c   |   8 +-
 drivers/iommu/hyperv-iommu.c|   4 +-
 drivers/iommu/intel/irq_remapping.c |  19 ++--
 5 files changed, 108 insertions(+), 145 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..73da644b2f0d 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -13,15 +13,6 @@
  * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar
  */
 
-/* I/O Unit Redirection Table */
-#define IO_APIC_REDIR_VECTOR_MASK  0x000FF
-#define IO_APIC_REDIR_DEST_LOGICAL 0x00800
-#define IO_APIC_REDIR_DEST_PHYSICAL0x0
-#define IO_APIC_REDIR_SEND_PENDING (1 << 12)
-#define IO_APIC_REDIR_REMOTE_IRR   (1 << 14)
-#define IO_APIC_REDIR_LEVEL_TRIGGER(1 << 15)
-#define IO_APIC_REDIR_MASKED   (1 << 16)
-
 /*
  * The structure of the IO-APIC:
  */
@@ -65,52 +56,39 @@ union IO_APIC_reg_03 {
 };
 
 struct IO_APIC_route_entry {
-   __u32   vector  :  8,
-   delivery_mode   :  3,   /* 000: FIXED
-* 001: lowest prio
-* 111: ExtINT
-*/
-   dest_mode   :  1,   /* 0: physical, 1: logical */
-   delivery_status :  1,
-   polarity:  1,
-   irr :  1,
-   trigger :  1,   /* 0: edge, 1: level */
-   mask:  1,   /* 0: enabled, 1: disabled */
-   __reserved_2: 15;
-
-   __u32   __reserved_3: 24,
-   dest:  8;
-} __attribute__ ((packed));
-
-struct IR_IO_APIC_route_entry {
-   __u64   vector  : 8,
-   zero: 3,
-   index2  : 1,
-   delivery_status : 1,
-   polarity: 1,
-   irr : 1,
-   trigger : 1,
-   mask: 1,
-   reserved: 31,
-   format  : 1,
-   index   : 15;
+   union {
+   struct {
+   u64 vector  :  8,
+   delivery_mode   :  3,
+   dest_mode_logical   :  1,
+   delivery_status :  1,
+   active_low  :  1,
+   irr :  1,
+   is_level:  1,
+   masked  :  1,
+   reserved_0  : 15,
+   reserved_1  : 24,
+   destid_0_7  :  8;
+   };
+   struct {
+   u64 ir_shared_0 :  8,
+   ir_zero :  3,
+   ir_index_15 :  1,
+   ir_shared_1 :  5,
+   ir_reserved_0   : 31,
+   ir_format   :  1,
+   ir_index_0_14   : 15;
+   };
+   struct {
+   u64 w1  : 32,
+   w2  : 32;
+   };
+   };
 } __attribute__ ((packed));
 
 struct irq_alloc_info;
 struct ioapic_domain_cfg;
 
-#define IOAPIC_EDGE0
-#define IOAPIC_LEVEL   1
-
-#define IOAPIC_MASKED  1
-#define IOAPIC_UNMASKED0
-
-#define IOAPIC_POL_HIGH0
-#define IOAPIC_POL_LOW 1
-
-#define IOAPIC_DEST_MODE_PHYSICAL  0
-#define IOAPIC_DEST_MODE_LOGICAL   1
-
 #defineIOAPIC_MAP_ALLOC0x1
 #defineIOAPIC_MAP_CHECK0x2
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 24a7bba7cbf4..07e754131854 100644
--- a/arch/x

[PATCH v3 11/35] genirq/msi: Allow shadow declarations of msi_msg::$member

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Architectures like x86 have their MSI messages in various bits of the data,
address_lo and address_hi field. Composing or decomposing these messages
with bitmasks and shifts is possible, but unreadable gunk.

Allow architectures to provide an architecture specific representation for
each member of msi_msg. Provide empty defaults for each and stick them into
an union.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 include/asm-generic/msi.h |  4 
 include/linux/msi.h   | 46 +++
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/msi.h b/include/asm-generic/msi.h
index e6795f088bdd..25344de0e8f9 100644
--- a/include/asm-generic/msi.h
+++ b/include/asm-generic/msi.h
@@ -4,6 +4,8 @@
 
 #include 
 
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+
 #ifndef NUM_MSI_ALLOC_SCRATCHPAD_REGS
 # define NUM_MSI_ALLOC_SCRATCHPAD_REGS 2
 #endif
@@ -30,4 +32,6 @@ typedef struct msi_alloc_info {
 
 #define GENERIC_MSI_DOMAIN_OPS 1
 
+#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
+
 #endif
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 6b584cc4757c..360a0a7e7341 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -4,11 +4,50 @@
 
 #include 
 #include 
+#include 
+
+/* Dummy shadow structures if an architecture does not define them */
+#ifndef arch_msi_msg_addr_lo
+typedef struct arch_msi_msg_addr_lo {
+   u32 address_lo;
+} __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
+#endif
+
+#ifndef arch_msi_msg_addr_hi
+typedef struct arch_msi_msg_addr_hi {
+   u32 address_hi;
+} __attribute__ ((packed)) arch_msi_msg_addr_hi_t;
+#endif
+
+#ifndef arch_msi_msg_data
+typedef struct arch_msi_msg_data {
+   u32 data;
+} __attribute__ ((packed)) arch_msi_msg_data_t;
+#endif
 
+/**
+ * msi_msg - Representation of a MSI message
+ * @address_lo:Low 32 bits of msi message address
+ * @arch_addrlo:   Architecture specific shadow of @address_lo
+ * @address_hi:High 32 bits of msi message address
+ * (only used when device supports it)
+ * @arch_addrhi:   Architecture specific shadow of @address_hi
+ * @data:  MSI message data (usually 16 bits)
+ * @arch_data: Architecture specific shadow of @data
+ */
 struct msi_msg {
-   u32 address_lo; /* low 32 bits of msi message address */
-   u32 address_hi; /* high 32 bits of msi message address */
-   u32 data;   /* 16 bits of msi message data */
+   union {
+   u32 address_lo;
+   arch_msi_msg_addr_lo_t  arch_addr_lo;
+   };
+   union {
+   u32 address_hi;
+   arch_msi_msg_addr_hi_t  arch_addr_hi;
+   };
+   union {
+   u32 data;
+   arch_msi_msg_data_t arch_data;
+   };
 };
 
 extern int pci_msi_ignore_mask;
@@ -243,7 +282,6 @@ struct msi_controller {
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
 
 #include 
-#include 
 
 struct irq_domain;
 struct irq_domain_ops;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 24/35] iommu/amd: Implement select() method on remapping irqdomain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Preparatory change to remove irq_remapping_get_irq_domain().

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 drivers/iommu/amd/iommu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9744cdbefd88..31b22244e9c2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3882,7 +3882,26 @@ static void irq_remapping_deactivate(struct irq_domain 
*domain,
irte_info->index);
 }
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec 
*fwspec,
+   enum irq_domain_bus_token bus_token)
+{
+   struct amd_iommu *iommu;
+   int devid = -1;
+
+   if (x86_fwspec_is_ioapic(fwspec))
+   devid = get_ioapic_devid(fwspec->param[0]);
+   else if (x86_fwspec_is_hpet(fwspec))
+   devid = get_hpet_devid(fwspec->param[0]);
+
+   if (devid < 0)
+   return 0;
+
+   iommu = amd_iommu_rlookup_table[devid];
+   return iommu && iommu->ir_domain == d;
+}
+
 static const struct irq_domain_ops amd_ir_domain_ops = {
+   .select = irq_remapping_select,
.alloc = irq_remapping_alloc,
.free = irq_remapping_free,
.activate = irq_remapping_activate,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 19/35] x86/io_apic: Cleanup trigger/polarity helpers

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
the trigger type (edge/level) and the active low/high configuration. While
there are defines for initializing these variables and struct members, they
are not used consequently and the meaning of 'trigger' and 'polarity' is
opaque and confusing at best.

Rename them to 'is_level' and 'active_low' and make them boolean in various
structs so it's entirely clear what the meaning is.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/hw_irq.h   |   6 +-
 arch/x86/kernel/apic/io_apic.c  | 244 +---
 arch/x86/pci/intel_mid_pci.c|   8 +-
 drivers/iommu/amd/iommu.c   |  10 +-
 drivers/iommu/intel/irq_remapping.c |   9 +-
 5 files changed, 130 insertions(+), 147 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index a4aeeaace040..517847a94dbe 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -47,9 +47,9 @@ enum irq_alloc_type {
 struct ioapic_alloc_info {
int pin;
int node;
-   u32 trigger : 1;
-   u32 polarity : 1;
-   u32 valid : 1;
+   u32 is_level: 1;
+   u32 active_low  : 1;
+   u32 valid   : 1;
struct IO_APIC_route_entry  *entry;
 };
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c6d92d2570d0..24a7bba7cbf4 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -89,12 +89,12 @@ struct irq_pin_list {
 };
 
 struct mp_chip_data {
-   struct list_head irq_2_pin;
-   struct IO_APIC_route_entry entry;
-   int trigger;
-   int polarity;
+   struct list_headirq_2_pin;
+   struct IO_APIC_route_entry  entry;
+   boolis_level;
+   boolactive_low;
+   boolisa_irq;
u32 count;
-   bool isa_irq;
 };
 
 struct mp_ioapic_gsi {
@@ -745,44 +745,7 @@ static int __init find_isa_irq_apic(int irq, int type)
return -1;
 }
 
-#ifdef CONFIG_EISA
-/*
- * EISA Edge/Level control register, ELCR
- */
-static int EISA_ELCR(unsigned int irq)
-{
-   if (irq < nr_legacy_irqs()) {
-   unsigned int port = 0x4d0 + (irq >> 3);
-   return (inb(port) >> (irq & 7)) & 1;
-   }
-   apic_printk(APIC_VERBOSE, KERN_INFO
-   "Broken MPtable reports ISA irq %d\n", irq);
-   return 0;
-}
-
-#endif
-
-/* ISA interrupts are always active high edge triggered,
- * when listed as conforming in the MP table. */
-
-#define default_ISA_trigger(idx)   (IOAPIC_EDGE)
-#define default_ISA_polarity(idx)  (IOAPIC_POL_HIGH)
-
-/* EISA interrupts are always polarity zero and can be edge or level
- * trigger depending on the ELCR value.  If an interrupt is listed as
- * EISA conforming in the MP table, that means its trigger type must
- * be read in from the ELCR */
-
-#define default_EISA_trigger(idx)  (EISA_ELCR(mp_irqs[idx].srcbusirq))
-#define default_EISA_polarity(idx) default_ISA_polarity(idx)
-
-/* PCI interrupts are always active low level triggered,
- * when listed as conforming in the MP table. */
-
-#define default_PCI_trigger(idx)   (IOAPIC_LEVEL)
-#define default_PCI_polarity(idx)  (IOAPIC_POL_LOW)
-
-static int irq_polarity(int idx)
+static bool irq_active_low(int idx)
 {
int bus = mp_irqs[idx].srcbus;
 
@@ -791,90 +754,139 @@ static int irq_polarity(int idx)
 */
switch (mp_irqs[idx].irqflag & MP_IRQPOL_MASK) {
case MP_IRQPOL_DEFAULT:
-   /* conforms to spec, ie. bus-type dependent polarity */
-   if (test_bit(bus, mp_bus_not_pci))
-   return default_ISA_polarity(idx);
-   else
-   return default_PCI_polarity(idx);
+   /*
+* Conforms to spec, ie. bus-type dependent polarity.  PCI
+* defaults to low active. [E]ISA defaults to high active.
+*/
+   return !test_bit(bus, mp_bus_not_pci);
case MP_IRQPOL_ACTIVE_HIGH:
-   return IOAPIC_POL_HIGH;
+   return false;
case MP_IRQPOL_RESERVED:
pr_warn("IOAPIC: Invalid polarity: 2, defaulting to low\n");
fallthrough;
case MP_IRQPOL_ACTIVE_LOW:
default: /* Pointless default required due to do gcc stupidity */
-   return IOAPIC_POL_LOW;
+   return true;
}
 }
 
 #ifdef 

[PATCH v3 27/35] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/hpet.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..08651a4e6aa0 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int 
hpet_id)
 {
struct msi_domain_info *domain_info;
struct irq_domain *parent, *d;
-   struct irq_alloc_info info;
struct fwnode_handle *fn;
+   struct irq_fwspec fwspec;
 
if (x86_vector_domain == NULL)
return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int 
hpet_id)
*domain_info = hpet_msi_domain_info;
domain_info->data = (void *)(long)hpet_id;
 
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-   info.devid = hpet_id;
-   parent = irq_remapping_get_irq_domain(&info);
-   if (parent == NULL)
-   parent = x86_vector_domain;
-   else
-   hpet_msi_controller.name = "IR-HPET-MSI";
-
fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
  hpet_id);
if (!fn) {
@@ -572,6 +563,19 @@ static struct irq_domain *hpet_create_irq_domain(int 
hpet_id)
return NULL;
}
 
+   fwspec.fwnode = fn;
+   fwspec.param_count = 1;
+   fwspec.param[0] = hpet_id;
+
+   parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+   if (!parent) {
+   irq_domain_free_fwnode(fn);
+   kfree(domain_info);
+   return NULL;
+   }
+   if (parent != x86_vector_domain)
+   hpet_msi_controller.name = "IR-HPET-MSI";
+
d = msi_create_irq_domain(fn, domain_info, parent);
if (!d) {
irq_domain_free_fwnode(fn);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 08/35] x86/apic: Cleanup destination mode

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

apic::irq_dest_mode is actually a boolean, but defined as u32 and named in
a way which does not explain what it means.

Make it a boolean and rename it to 'dest_mode_logical'

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/apic.h   | 2 +-
 arch/x86/kernel/apic/apic.c   | 2 +-
 arch/x86/kernel/apic/apic_flat_64.c   | 4 ++--
 arch/x86/kernel/apic/apic_noop.c  | 4 +---
 arch/x86/kernel/apic/apic_numachip.c  | 4 ++--
 arch/x86/kernel/apic/bigsmp_32.c  | 3 +--
 arch/x86/kernel/apic/io_apic.c| 2 +-
 arch/x86/kernel/apic/msi.c| 6 +++---
 arch/x86/kernel/apic/probe_32.c   | 3 +--
 arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
 arch/x86/kernel/apic/x2apic_phys.c| 2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c| 2 +-
 arch/x86/kernel/smpboot.c | 7 ++-
 arch/x86/platform/uv/uv_irq.c | 2 +-
 arch/x86/xen/apic.c   | 3 +--
 drivers/iommu/amd/amd_iommu_types.h   | 2 +-
 drivers/iommu/amd/iommu.c | 8 
 drivers/iommu/intel/irq_remapping.c   | 2 +-
 18 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 019d7ac3b16e..652f62252349 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -309,7 +309,7 @@ struct apic {
u32 disable_esr;
 
enum apic_delivery_modes delivery_mode;
-   u32 irq_dest_mode;
+   booldest_mode_logical;
 
u32 (*calc_dest_apicid)(unsigned int cpu);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 29d28b34cb2f..54f04355aaa2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1591,7 +1591,7 @@ static void setup_local_APIC(void)
apic->init_apic_ldr();
 
 #ifdef CONFIG_X86_32
-   if (apic->irq_dest_mode == 1) {
+   if (apic->dest_mode_logical) {
int logical_apicid, ldr_apicid;
 
/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c 
b/arch/x86/kernel/apic/apic_flat_64.c
index bbb1b89fe711..8f72b4351c9f 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -114,7 +114,7 @@ static struct apic apic_flat __ro_after_init = {
.apic_id_registered = flat_apic_id_registered,
 
.delivery_mode  = APIC_DELIVERY_MODE_FIXED,
-   .irq_dest_mode  = 1, /* logical */
+   .dest_mode_logical  = true,
 
.disable_esr= 0,
 
@@ -205,7 +205,7 @@ static struct apic apic_physflat __ro_after_init = {
.apic_id_registered = flat_apic_id_registered,
 
.delivery_mode  = APIC_DELIVERY_MODE_FIXED,
-   .irq_dest_mode  = 0, /* physical */
+   .dest_mode_logical  = false,
 
.disable_esr= 0,
 
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 38f167ce5031..fe78319e0f7a 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -96,8 +96,7 @@ struct apic apic_noop __ro_after_init = {
.apic_id_registered = noop_apic_id_registered,
 
.delivery_mode  = APIC_DELIVERY_MODE_FIXED,
-   /* logical delivery broadcast to all CPUs: */
-   .irq_dest_mode  = 1,
+   .dest_mode_logical  = true,
 
.disable_esr= 0,
 
@@ -105,7 +104,6 @@ struct apic apic_noop __ro_after_init = {
.init_apic_ldr  = noop_init_apic_ldr,
.ioapic_phys_id_map = default_ioapic_phys_id_map,
.setup_apic_routing = NULL,
-
.cpu_present_to_apicid  = default_cpu_present_to_apicid,
.apicid_to_cpu_present  = physid_set_mask_of_physid,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index 4ebf9fe2c95d..a54d817eb4b6 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -247,7 +247,7 @@ static const struct apic apic_numachip1 __refconst = {
.apic_id_registered = numachip_apic_id_registered,
 
.delivery_mode  = APIC_DELIVERY_MODE_FIXED,
-   .irq_dest_mode  = 0, /* physical */
+   .dest_mode_logical  = false,
 
.disable_esr= 0,
 
@@ -294,7 +294,7 @@ static const struct apic apic_numachip2 __refconst = {
.apic_id_registered = numachip_apic_id_registered,
 
.delivery_mode  = APIC_DELIVERY_MODE_FIXED,
-   .irq_dest_mode  = 0, /* physical */
+   .dest_mode_logical  = false,
 
.disable_esr= 0,
 
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kern

[PATCH v3 16/35] x86/kvm: Use msi_msg shadow structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Use the bitfields in the x86 shadow structs instead of decomposing the
32bit value with macros.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/kvm/irq_comm.c | 31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 4aa1c2e00e2a..8a4de3f12820 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -16,8 +16,6 @@
 
 #include 
 
-#include 
-
 #include "irq.h"
 
 #include "ioapic.h"
@@ -104,22 +102,19 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
 void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 struct kvm_lapic_irq *irq)
 {
-   trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_format ?
-(u64)e->msi.address_hi << 32 : 0),
- e->msi.data);
-
-   irq->dest_id = (e->msi.address_lo &
-   MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-   if (kvm->arch.x2apic_format)
-   irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
-   irq->vector = (e->msi.data &
-   MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-   irq->dest_mode = kvm_lapic_irq_dest_mode(
-   !!((1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo));
-   irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
-   irq->delivery_mode = e->msi.data & 0x700;
-   irq->msi_redir_hint = ((e->msi.address_lo
-   & MSI_ADDR_REDIRECTION_LOWPRI) > 0);
+   struct msi_msg msg = { .address_lo = e->msi.address_lo,
+  .address_hi = e->msi.address_hi,
+  .data = e->msi.data };
+
+   trace_kvm_msi_set_irq(msg.address_lo | (kvm->arch.x2apic_format ?
+ (u64)msg.address_hi << 32 : 0), msg.data);
+
+   irq->dest_id = x86_msi_msg_get_destid(&msg, kvm->arch.x2apic_format);
+   irq->vector = msg.arch_data.vector;
+   irq->dest_mode = 
kvm_lapic_irq_dest_mode(msg.arch_addr_lo.dest_mode_logical);
+   irq->trig_mode = msg.arch_data.is_level;
+   irq->delivery_mode = msg.arch_data.delivery_mode << 8;
+   irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint;
irq->level = 1;
irq->shorthand = APIC_DEST_NOSHORT;
 }
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 30/35] iommu/vt-d: Simplify intel_irq_remapping_select()

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Now that the old get_irq_domain() method has gone, consolidate on just the
map_XXX_to_iommu() functions.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 drivers/iommu/intel/irq_remapping.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index bca44015bc1d..aeffda92b10b 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -203,13 +203,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
return rc;
 }
 
-static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
+static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id)
 {
int i;
 
for (i = 0; i < MAX_HPET_TBS; i++) {
if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-   return ir_hpet[i].iommu->ir_domain;
+   return ir_hpet[i].iommu;
}
return NULL;
 }
@@ -225,13 +225,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic)
return NULL;
 }
 
-static struct irq_domain *map_ioapic_to_ir(int apic)
-{
-   struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
-
-   return iommu ? iommu->ir_domain : NULL;
-}
-
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
@@ -1418,12 +1411,14 @@ static int intel_irq_remapping_select(struct irq_domain 
*d,
  struct irq_fwspec *fwspec,
  enum irq_domain_bus_token bus_token)
 {
+   struct intel_iommu *iommu = NULL;
+
if (x86_fwspec_is_ioapic(fwspec))
-   return d == map_ioapic_to_ir(fwspec->param[0]);
+   iommu = map_ioapic_to_iommu(fwspec->param[0]);
else if (x86_fwspec_is_hpet(fwspec))
-   return d == map_hpet_to_ir(fwspec->param[0]);
+   iommu = map_hpet_to_iommu(fwspec->param[0]);
 
-   return 0;
+   return iommu && d == iommu->ir_domain;
 }
 
 static const struct irq_domain_ops intel_ir_domain_ops = {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 34/35] x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

No functional change; just reserve the feature bit for now so that VMMs
can start to implement it.

This will allow the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse 
Acked-by: Paolo Bonzini 
---
 Documentation/virt/kvm/cpuid.rst | 4 
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 9150e9d1c39b..f70b655821d5 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT  14  guest checks 
this feature bit
   async pf acknowledgment msr
   0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID   15  guest checks this feature bit
+  before using extended destination
+  ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24  host will warn if no guest-side
   per-cpu warps are expeced in
   kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL   12
 #define KVM_FEATURE_PV_SCHED_YIELD 13
 #define KVM_FEATURE_ASYNC_PF_INT   14
+#define KVM_FEATURE_MSI_EXT_DEST_ID15
 
 #define KVM_HINTS_REALTIME  0
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 25/35] iommu/vt-d: Implement select() method on remapping irqdomain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Preparatory for removing irq_remapping_get_irq_domain()

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 drivers/iommu/intel/irq_remapping.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 96c84b19940e..b3b079c0b51e 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1431,7 +1431,20 @@ static void intel_irq_remapping_deactivate(struct 
irq_domain *domain,
modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+ struct irq_fwspec *fwspec,
+ enum irq_domain_bus_token bus_token)
+{
+   if (x86_fwspec_is_ioapic(fwspec))
+   return d == map_ioapic_to_ir(fwspec->param[0]);
+   else if (x86_fwspec_is_hpet(fwspec))
+   return d == map_hpet_to_ir(fwspec->param[0]);
+
+   return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+   .select = intel_irq_remapping_select,
.alloc = intel_irq_remapping_alloc,
.free = intel_irq_remapping_free,
.activate = intel_irq_remapping_activate,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/35] x86/msi: Only use high bits of MSI address for DMAR unit

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

The Intel IOMMU has an MSI-like configuration for its interrupt, but it
isn't really MSI. So it gets to abuse the high 32 bits of the address, and
puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate, adding a
WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC ID. That
should never happen since the compatibility MSI messages are not used when
Interrupt Remapping is enabled.

The x2apic_enabled() check isn't needed because Linux won't bring up CPUs
with higher APIC IDs unless IR and x2apic are enabled anyway.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/msi.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..516df47bde73 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,11 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+ bool dmar)
 {
msg->address_hi = MSI_ADDR_BASE_HI;
 
-   if (x2apic_enabled())
-   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
msg->address_lo =
MSI_ADDR_BASE_LO |
((apic->irq_dest_mode == 0) ?
@@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, 
struct msi_msg *msg)
MSI_DATA_LEVEL_ASSERT |
MSI_DATA_DELIVERY_FIXED |
MSI_DATA_VECTOR(cfg->vector);
+
+   /*
+* Only the IOMMU itself can use the trick of putting destination
+* APIC ID into the high bits of the address. Anything else would
+* just be writing to memory if it tried that, and needs IR to
+* address higher APIC IDs.
+*/
+   if (dmar)
+   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+   else
+   WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-   __irq_msi_compose_msg(irqd_cfg(data), msg);
+   __irq_msi_compose_msg(irqd_cfg(data), msg, false);
 }
 
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
struct msi_msg msg[2] = { [1] = { }, };
 
-   __irq_msi_compose_msg(cfg, msg);
+   __irq_msi_compose_msg(cfg, msg, false);
irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct 
irq_domain *parent,
 #endif
 
 #ifdef CONFIG_DMAR_TABLE
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+   __irq_msi_compose_msg(irqd_cfg(data), msg, true);
+}
+
 static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
dmar_msi_write(data->irq, msg);
@@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = {
.irq_ack= irq_chip_ack_parent,
.irq_set_affinity   = msi_domain_set_affinity,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
+   .irq_compose_msi_msg= dmar_msi_compose_msg,
.irq_write_msi_msg  = dmar_msi_write_msg,
.flags  = IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/35] x86/apic/uv: Fix inconsistent destination mode

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

The UV x2apic is strictly using physical destination mode, but
apic::dest_logical is initialized with APIC_DEST_LOGICAL.

This does not matter much because UV does not use any of the generic
functions which use apic::dest_logical, but is still inconsistent.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c 
b/arch/x86/kernel/apic/x2apic_uv_x.c
index 714233cee0b5..9ade9e6a95ff 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -811,7 +811,7 @@ static struct apic apic_x2apic_uv_x __ro_after_init = {
.irq_dest_mode  = 0, /* Physical */
 
.disable_esr= 0,
-   .dest_logical   = APIC_DEST_LOGICAL,
+   .dest_logical   = APIC_DEST_PHYSICAL,
.check_apicid_used  = NULL,
 
.init_apic_ldr  = uv_init_apic_ldr,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 06/35] x86/apic: Replace pointless apic::dest_logical usage

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

All these functions are only used for logical destination mode. So reading
the destination mode mask from the apic structure is a pointless
exercise. Just hand in the proper constant: APIC_DEST_LOGICAL.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/apic/apic_flat_64.c   | 2 +-
 arch/x86/kernel/apic/ipi.c| 6 +++---
 arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c 
b/arch/x86/kernel/apic/apic_flat_64.c
index fdd38a17f835..6df837fd5081 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -53,7 +53,7 @@ static void _flat_send_IPI_mask(unsigned long mask, int 
vector)
unsigned long flags;
 
local_irq_save(flags);
-   __default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+   __default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
local_irq_restore(flags);
 }
 
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 387154e39e08..d1fb874fbe64 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -260,7 +260,7 @@ void default_send_IPI_mask_sequence_logical(const struct 
cpumask *mask,
for_each_cpu(query_cpu, mask)
__default_send_IPI_dest_field(
early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-   vector, apic->dest_logical);
+   vector, APIC_DEST_LOGICAL);
local_irq_restore(flags);
 }
 
@@ -279,7 +279,7 @@ void default_send_IPI_mask_allbutself_logical(const struct 
cpumask *mask,
continue;
__default_send_IPI_dest_field(
early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-   vector, apic->dest_logical);
+   vector, APIC_DEST_LOGICAL);
}
local_irq_restore(flags);
 }
@@ -297,7 +297,7 @@ void default_send_IPI_mask_logical(const struct cpumask 
*cpumask, int vector)
 
local_irq_save(flags);
WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
-   __default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+   __default_send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
local_irq_restore(flags);
 }
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c 
b/arch/x86/kernel/apic/x2apic_cluster.c
index 82fb43d807f7..f77e9fb7aac1 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -61,7 +61,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int 
vector, int apic_dest)
if (!dest)
continue;
 
-   __x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
+   __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
/* Remove cluster CPUs from tmpmask */
cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
}
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Use the msi_msg shadow structs and compose the message with named bitfields
instead of the unreadable macro maze.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/pci/xen.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index c552cd2d0632..3d41a09c2c14 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 
gsi,
 
 #if defined(CONFIG_PCI_MSI)
 #include 
-#include 
 
 struct xen_pci_frontend_ops *xen_pci_frontend;
 EXPORT_SYMBOL_GPL(xen_pci_frontend);
@@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int 
nvec, int type)
return ret;
 }
 
-#define XEN_PIRQ_MSI_DATA  (MSI_DATA_TRIGGER_EDGE | \
-   MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0))
-
 static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq,
struct msi_msg *msg)
 {
-   /* We set vector == 0 to tell the hypervisor we don't care about it,
-* but we want a pirq setup instead.
-* We use the dest_id field to pass the pirq that we want. */
-   msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq);
-   msg->address_lo =
-   MSI_ADDR_BASE_LO |
-   MSI_ADDR_DEST_MODE_PHYSICAL |
-   MSI_ADDR_REDIRECTION_CPU |
-   MSI_ADDR_DEST_ID(pirq);
-
-   msg->data = XEN_PIRQ_MSI_DATA;
+   /*
+* We set vector == 0 to tell the hypervisor we don't care about
+* it, but we want a pirq setup instead.  We use the dest_id fields
+* to pass the pirq that we want.
+*/
+   memset(msg, 0, sizeof(*msg));
+   msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+   msg->arch_addr_hi.destid_8_31 = pirq >> 8;
+   msg->arch_addr_lo.destid_0_7 = pirq & 0xFF;
+   msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+   msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
 }
 
 static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 28/35] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/io_apic.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ea983da1a57f..443d2c9086b9 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2320,36 +2320,37 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-   struct irq_alloc_info info;
struct irq_domain *parent;
int hwirqs = mp_ioapic_pin_count(ioapic);
struct ioapic *ip = &ioapics[ioapic];
struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
struct fwnode_handle *fn;
-   char *name = "IO-APIC";
+   struct irq_fwspec fwspec;
 
if (cfg->type == IOAPIC_DOMAIN_INVALID)
return 0;
 
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-   info.devid = mpc_ioapic_id(ioapic);
-   parent = irq_remapping_get_irq_domain(&info);
-   if (!parent)
-   parent = x86_vector_domain;
-   else
-   name = "IO-APIC-IR";
-
/* Handle device tree enumerated APICs proper */
if (cfg->dev) {
fn = of_node_to_fwnode(cfg->dev);
} else {
-   fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+   fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
if (!fn)
return -ENOMEM;
}
 
+   fwspec.fwnode = fn;
+   fwspec.param_count = 1;
+   fwspec.param[0] = ioapic;
+
+   parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+   if (!parent) {
+   if (!cfg->dev)
+   irq_domain_free_fwnode(fn);
+   return -ENODEV;
+   }
+
ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 (void *)(long)ioapic);
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 31/35] x86/ioapic: Handle Extended Destination ID field in RTE

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4
of the address used in the resulting MSI cycle.

Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower 8
bits was unused.

With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable
format MSI.

A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.

[ tglx: Converted it to the cleaned up entry/msi_msg format and added
commentry ]

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 20 +++-
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 73da644b2f0d..437aa8d00e53 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -67,7 +67,8 @@ struct IO_APIC_route_entry {
is_level:  1,
masked  :  1,
reserved_0  : 15,
-   reserved_1  : 24,
+   reserved_1  : 17,
+   virt_destid_8_14:  7,
destid_0_7  :  8;
};
struct {
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 443d2c9086b9..1cfd65ef295b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1238,9 +1238,10 @@ static void io_apic_print_entries(unsigned int apic, 
unsigned int nr_entries)
   (entry.ir_index_15 << 15) | entry.ir_index_0_14,
entry.ir_zero);
} else {
-   printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", buf,
+   printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n", buf,
   entry.dest_mode_logical ? "logical " : 
"physical",
-  entry.destid_0_7, entry.delivery_mode);
+  entry.virt_destid_8_14, entry.destid_0_7,
+  entry.delivery_mode);
}
}
 }
@@ -1409,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 */
if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;
+   u32 apic_id = read_apic_id();
 
memset(&entry, 0, sizeof(entry));
entry.masked= false;
@@ -1416,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
entry.active_low= false;
entry.dest_mode_logical = false;
entry.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
-   entry.destid_0_7= read_apic_id();
+   entry.destid_0_7= apic_id & 0xFF;
+   entry.virt_destid_8_14  = apic_id >> 8;
 
/*
 * Add it to the IO-APIC irq-routing table:
@@ -1885,7 +1888,11 @@ static void ioapic_setup_msg_from_msi(struct irq_data 
*irq_data,
/* DMAR/IR: 1, 0 for all other modes */
entry->ir_format= msg.arch_addr_lo.dmar_format;
/*
-* DMAR/IR: index bit 0-14.
+* - DMAR/IR: index bit 0-14.
+*
+* - Virt: If the host supports x2apic without a virtualized IR
+* unit then bit 0-6 of dmar_index_0_14 are providing bit
+* 8-14 of the destination id.
 *
 * All other modes have bit 0-6 of dmar_index_0_14 cleared and the
 * topmost 8 bits are destination id bit 0-7 (entry::destid_0_7).
@@ -2063,6 +2070,7 @@ static inline void __init unlock_ExtINT_logic(void)
int apic, pin, i;
struct IO_APIC_route_entry entry0, entry1;
unsigned char save_control, save_freq_select;
+   u32 apic_id;
 
pin  = find_isa_irq_pin(8, mp_INT);
if (pin == -1) {
@@ -2078,11 +2086,13 @@ static inline void __init unlock_ExtINT_logic(void)
entry0 = ioapic_read_entry(apic, pin);
clear_IO_APIC_pin(apic, pin);
 
+   apic_id = hard_smp_processor_id();
memset(&entry1, 0, sizeof(entry1));
 
entry1.dest_mode_logical= true;
entry1.masked   = false;
-   entry1.destid_0_7   = hard_smp_processor_i

[PATCH v3 33/35] iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

If the 15-bit APIC ID support is present in emulated MSI then there's no
need for the pseudo-remapping support.

Signed-off-by: David Woodhouse 
---
 drivers/iommu/hyperv-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index a629a6be65c7..9438daa24fdb 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -121,6 +121,7 @@ static int __init hyperv_prepare_irq_remapping(void)
int i;
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
+   x86_init.hyper.msi_ext_dest_id() ||
!x2apic_supported())
return -ENODEV;
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/35] x86/apic: Cleanup delivery mode defines

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

The enum ioapic_irq_destination_types and the enumerated constants starting
with 'dest_' are gross misnomers because they describe the delivery mode.

Rename then enum and the constants so they actually make sense.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/apic.h   |  3 ++-
 arch/x86/include/asm/apicdef.h| 16 +++-
 arch/x86/kernel/apic/apic_flat_64.c   |  4 ++--
 arch/x86/kernel/apic/apic_noop.c  |  2 +-
 arch/x86/kernel/apic/apic_numachip.c  |  4 ++--
 arch/x86/kernel/apic/bigsmp_32.c  |  2 +-
 arch/x86/kernel/apic/io_apic.c| 11 ++-
 arch/x86/kernel/apic/probe_32.c   |  2 +-
 arch/x86/kernel/apic/x2apic_cluster.c |  2 +-
 arch/x86/kernel/apic/x2apic_phys.c|  2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c|  6 +++---
 arch/x86/platform/uv/uv_irq.c |  2 +-
 drivers/iommu/amd/iommu.c |  4 ++--
 drivers/iommu/intel/irq_remapping.c   |  2 +-
 drivers/pci/controller/pci-hyperv.c   |  6 +++---
 15 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b0fd204e0023..53bb62e0fdd5 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -309,7 +309,8 @@ struct apic {
/* dest_logical is used by the IPI functions */
u32 dest_logical;
u32 disable_esr;
-   u32 irq_delivery_mode;
+
+   enum apic_delivery_modes delivery_mode;
u32 irq_dest_mode;
 
u32 (*calc_dest_apicid)(unsigned int cpu);
diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 05e694ed8386..5716f22f81ac 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -432,15 +432,13 @@ struct local_apic {
  #define BAD_APICID 0xu
 #endif
 
-enum ioapic_irq_destination_types {
-   dest_Fixed  = 0,
-   dest_LowestPrio = 1,
-   dest_SMI= 2,
-   dest__reserved_1= 3,
-   dest_NMI= 4,
-   dest_INIT   = 5,
-   dest__reserved_2= 6,
-   dest_ExtINT = 7
+enum apic_delivery_modes {
+   APIC_DELIVERY_MODE_FIXED= 0,
+   APIC_DELIVERY_MODE_LOWESTPRIO   = 1,
+   APIC_DELIVERY_MODE_SMI  = 2,
+   APIC_DELIVERY_MODE_NMI  = 4,
+   APIC_DELIVERY_MODE_INIT = 5,
+   APIC_DELIVERY_MODE_EXTINT   = 7,
 };
 
 #endif /* _ASM_X86_APICDEF_H */
diff --git a/arch/x86/kernel/apic/apic_flat_64.c 
b/arch/x86/kernel/apic/apic_flat_64.c
index 7862b152a052..fdd38a17f835 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -113,7 +113,7 @@ static struct apic apic_flat __ro_after_init = {
.apic_id_valid  = default_apic_id_valid,
.apic_id_registered = flat_apic_id_registered,
 
-   .irq_delivery_mode  = dest_Fixed,
+   .delivery_mode  = APIC_DELIVERY_MODE_FIXED,
.irq_dest_mode  = 1, /* logical */
 
.disable_esr= 0,
@@ -206,7 +206,7 @@ static struct apic apic_physflat __ro_after_init = {
.apic_id_valid  = default_apic_id_valid,
.apic_id_registered = flat_apic_id_registered,
 
-   .irq_delivery_mode  = dest_Fixed,
+   .delivery_mode  = APIC_DELIVERY_MODE_FIXED,
.irq_dest_mode  = 0, /* physical */
 
.disable_esr= 0,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 780c702969b7..4fc934b11851 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -95,7 +95,7 @@ struct apic apic_noop __ro_after_init = {
.apic_id_valid  = default_apic_id_valid,
.apic_id_registered = noop_apic_id_registered,
 
-   .irq_delivery_mode  = dest_Fixed,
+   .delivery_mode  = APIC_DELIVERY_MODE_FIXED,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode  = 1,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index 35edd57f064a..db715d082ec9 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -246,7 +246,7 @@ static const struct apic apic_numachip1 __refconst = {
.apic_id_valid  = numachip_apic_id_valid,
.apic_id_registered = numachip_apic_id_registered,
 
-   .irq_delivery_mode  = dest_Fixed,
+   .delivery_mode  = APIC_DELIVERY_MODE_FIXED,
.irq_dest_mode  = 0, /* physical */
 
.disable_esr= 0,
@@ -295,7 +295,7 @@ static const struct apic apic_numachip2 __refconst = {
.api

[PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Prerequesite to make x86 more irqdomain compliant.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 kernel/irq/irqdomain.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..673fa64c1c44 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain 
*d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+   struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, 
fwnode);
+
+   return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+   .get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 18/35] x86/msi: Remove msidef.h

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Nothing uses the macro maze anymore.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/msidef.h | 57 ---
 1 file changed, 57 deletions(-)
 delete mode 100644 arch/x86/include/asm/msidef.h

diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
deleted file mode 100644
index ee2f8ccc32d0..
--- a/arch/x86/include/asm/msidef.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_MSIDEF_H
-#define _ASM_X86_MSIDEF_H
-
-/*
- * Constants for Intel APIC based MSI messages.
- */
-
-/*
- * Shifts for MSI data
- */
-
-#define MSI_DATA_VECTOR_SHIFT  0
-#define  MSI_DATA_VECTOR_MASK  0x00ff
-#define MSI_DATA_VECTOR(v) (((v) << MSI_DATA_VECTOR_SHIFT) 
& \
-MSI_DATA_VECTOR_MASK)
-
-#define MSI_DATA_DELIVERY_MODE_SHIFT   8
-#define  MSI_DATA_DELIVERY_FIXED   (0 << MSI_DATA_DELIVERY_MODE_SHIFT)
-#define  MSI_DATA_DELIVERY_LOWPRI  (1 << MSI_DATA_DELIVERY_MODE_SHIFT)
-
-#define MSI_DATA_LEVEL_SHIFT   14
-#define MSI_DATA_LEVEL_DEASSERT(0 << MSI_DATA_LEVEL_SHIFT)
-#define MSI_DATA_LEVEL_ASSERT  (1 << MSI_DATA_LEVEL_SHIFT)
-
-#define MSI_DATA_TRIGGER_SHIFT 15
-#define  MSI_DATA_TRIGGER_EDGE (0 << MSI_DATA_TRIGGER_SHIFT)
-#define  MSI_DATA_TRIGGER_LEVEL(1 << MSI_DATA_TRIGGER_SHIFT)
-
-/*
- * Shift/mask fields for msi address
- */
-
-#define MSI_ADDR_BASE_HI   0
-#define MSI_ADDR_BASE_LO   0xfee0
-
-#define MSI_ADDR_DEST_MODE_SHIFT   2
-#define  MSI_ADDR_DEST_MODE_PHYSICAL   (0 << MSI_ADDR_DEST_MODE_SHIFT)
-#define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT)
-
-#define MSI_ADDR_REDIRECTION_SHIFT 3
-#define  MSI_ADDR_REDIRECTION_CPU  (0 << MSI_ADDR_REDIRECTION_SHIFT)
-   /* dedicated cpu */
-#define  MSI_ADDR_REDIRECTION_LOWPRI   (1 << MSI_ADDR_REDIRECTION_SHIFT)
-   /* lowest priority */
-
-#define MSI_ADDR_DEST_ID_SHIFT 12
-#define MSI_ADDR_DEST_ID_MASK  0x000
-#define  MSI_ADDR_DEST_ID(dest)(((dest) << 
MSI_ADDR_DEST_ID_SHIFT) & \
-MSI_ADDR_DEST_ID_MASK)
-#define MSI_ADDR_EXT_DEST_ID(dest) ((dest) & 0xff00)
-
-#define MSI_ADDR_IR_EXT_INT(1 << 4)
-#define MSI_ADDR_IR_SHV(1 << 3)
-#define MSI_ADDR_IR_INDEX1(index)  ((index & 0x8000) >> 13)
-#define MSI_ADDR_IR_INDEX2(index)  ((index & 0x7fff) << 5)
-#endif /* _ASM_X86_MSIDEF_H */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields

2020-10-24 Thread David Woodhouse

Fix the conditions for enabling x2apic on guests without interrupt 
remapping, and support 15-bit Extended Destination ID to allow 32768 
CPUs without IR on hypervisors that support it.

Make the I/OAPIC code generate its RTE directly from the MSI message
created by the parent irqchip, and fix up a bunch of magic mask/shift
macros to use bitfields for MSI messages and I/OAPIC RTEs while we're
at it.

v3:
 • Lots of bitfield cleanups from Thomas.
 • Disable hyperv-iommu if 15-bit extension is present.
 • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c
 • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream
   through KVM tree (and the other half needs to wait, or have an #ifdef) so
   is left at the top of the tree.

v2:
 • Minor cleanups.
 • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
 • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
 • Clean up HPET MSI support into hpet.c now that we can.

David Woodhouse (19):
  x86/apic: Fix x2apic enablement without interrupt remapping
  x86/msi: Only use high bits of MSI address for DMAR unit
  x86/apic: Always provide irq_compose_msi_msg() method for vector domain
  x86/hpet: Move MSI support into hpet.c
  x86/ioapic: Generate RTE directly from parent irqchip's MSI message
  genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  x86/apic: Add select() method on vector irqdomain
  iommu/amd: Implement select() method on remapping irqdomain
  iommu/vt-d: Implement select() method on remapping irqdomain
  iommu/hyper-v: Implement select() method on remapping irqdomain
  x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
  x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
  x86: Kill all traces of irq_remapping_get_irq_domain()
  iommu/vt-d: Simplify intel_irq_remapping_select()
  x86/ioapic: Handle Extended Destination ID field in RTE
  x86/apic: Support 15 bits of APIC ID in MSI where available
  iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are 
available
  x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
  x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected

Thomas Gleixner (16):
  x86/apic/uv: Fix inconsistent destination mode
  x86/devicetree: Fix the ioapic interrupt type table
  x86/apic: Cleanup delivery mode defines
  x86/apic: Replace pointless apic::dest_logical usage
  x86/apic: Get rid of apic::dest_logical
  x86/apic: Cleanup destination mode
  genirq/msi: Allow shadow declarations of msi_msg::$member
  x86/msi: Provide msi message shadow structs
  iommu/intel: Use msi_msg shadow structs
  iommu/amd: Use msi_msg shadow structs
  PCI: vmd: Use msi_msg shadow structs
  x86/kvm: Use msi_msg shadow structs
  x86/pci/xen: Use msi_msg shadow structs
  x86/msi: Remove msidef.h
  x86/io_apic: Cleanup trigger/polarity helpers
  x86/ioapic: Cleanup IO/APIC route entry structs

 Documentation/virt/kvm/cpuid.rst  |   4 +
 arch/x86/include/asm/apic.h   |  16 +-
 arch/x86/include/asm/apicdef.h|  16 +-
 arch/x86/include/asm/hpet.h   |  11 -
 arch/x86/include/asm/hw_irq.h |  13 +-
 arch/x86/include/asm/io_apic.h|  79 ++
 arch/x86/include/asm/irq_remapping.h  |   9 -
 arch/x86/include/asm/irqdomain.h  |   3 +
 arch/x86/include/asm/msi.h|  50 
 arch/x86/include/asm/msidef.h |  57 
 arch/x86/include/asm/x86_init.h   |   2 +
 arch/x86/include/uapi/asm/kvm_para.h  |   1 +
 arch/x86/kernel/apic/apic.c   |  73 -
 arch/x86/kernel/apic/apic_flat_64.c   |  18 +-
 arch/x86/kernel/apic/apic_noop.c  |  10 +-
 arch/x86/kernel/apic/apic_numachip.c  |  16 +-
 arch/x86/kernel/apic/bigsmp_32.c  |   9 +-
 arch/x86/kernel/apic/io_apic.c| 503 ++
 arch/x86/kernel/apic/ipi.c|   6 +-
 arch/x86/kernel/apic/msi.c| 153 +--
 arch/x86/kernel/apic/probe_32.c   |   9 +-
 arch/x86/kernel/apic/vector.c |  49 
 arch/x86/kernel/apic/x2apic_cluster.c |  10 +-
 arch/x86/kernel/apic/x2apic_phys.c|  17 +-
 arch/x86/kernel/apic/x2apic_uv_x.c|  12 +-
 arch/x86/kernel/devicetree.c  |  30 +-
 arch/x86/kernel/hpet.c| 122 -
 arch/x86/kernel/kvm.c |   6 +
 arch/x86/kernel/smpboot.c |   8 +-
 arch/x86/kernel/x86_init.c|   1 +
 arch/x86/kvm/irq_comm.c   |  31 +--
 arch/x86/pci/intel_mid_pci.c  |   8 +-
 arch/x86/pci/xen.c|  26 +-
 arch/x86/platform/uv/uv_irq.c |   4 +-
 arch/x86/xen/apic.c   |   7 +-
 drivers/iommu/amd/amd_iommu_types.h   |   2 +-
 drivers/iommu/amd/init.c  |  46 ++--
 drivers/iommu/amd/iommu.c |  93 +++
 drivers/iommu

[PATCH v3 12/35] x86/msi: Provide msi message shadow structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Create shadow structs with named bitfields for msi_msg data, address_lo and
address_hi and use them in the MSI message composer.

Provide a function to retrieve the destination ID. This could be inline,
but that'd create a circular header dependency.

[dwmw2: fix bitfields not all to be a union]
Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/msi.h  | 49 +
 arch/x86/kernel/apic/apic.c | 35 ++
 2 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index cd30013d15d3..322fd905da9c 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -9,4 +9,53 @@ typedef struct irq_alloc_info msi_alloc_info_t;
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg);
 
+/* Structs and defines for the X86 specific MSI message format */
+
+typedef struct x86_msi_data {
+   u32 vector  :  8,
+   delivery_mode   :  3,
+   dest_mode_logical   :  1,
+   reserved:  2,
+   active_low  :  1,
+   is_level:  1;
+
+   u32 dmar_subhandle;
+} __attribute__ ((packed)) arch_msi_msg_data_t;
+#define arch_msi_msg_data  x86_msi_data
+
+typedef struct x86_msi_addr_lo {
+   union {
+   struct {
+   u32 reserved_0  :  2,
+   dest_mode_logical   :  1,
+   redirect_hint   :  1,
+   reserved_1  :  8,
+   destid_0_7  :  8,
+   base_address: 12;
+   };
+   struct {
+   u32 dmar_reserved_0 :  2,
+   dmar_index_15   :  1,
+   dmar_subhandle_valid:  1,
+   dmar_format :  1,
+   dmar_index_0_14 : 15,
+   dmar_base_address   : 12;
+   };
+   };
+} __attribute__ ((packed)) arch_msi_msg_addr_lo_t;
+#define arch_msi_msg_addr_lo   x86_msi_addr_lo
+
+#define X86_MSI_BASE_ADDRESS_LOW   (0xfee0 >> 20)
+
+typedef struct x86_msi_addr_hi {
+   u32 reserved:  8,
+   destid_8_31 : 24;
+} __attribute__ ((packed)) arch_msi_msg_addr_hi_t;
+#define arch_msi_msg_addr_hi   x86_msi_addr_hi
+
+#define X86_MSI_BASE_ADDRESS_HIGH  (0)
+
+struct msi_msg;
+u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
+
 #endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4c15bf29ea2c..f7196ee0f005 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,7 +50,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -2484,22 +2483,16 @@ int hard_smp_processor_id(void)
 void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
   bool dmar)
 {
-   msg->address_hi = MSI_ADDR_BASE_HI;
+   memset(msg, 0, sizeof(*msg));
 
-   msg->address_lo =
-   MSI_ADDR_BASE_LO |
-   (apic->dest_mode_logical ?
-   MSI_ADDR_DEST_MODE_LOGICAL :
-   MSI_ADDR_DEST_MODE_PHYSICAL) |
-   MSI_ADDR_REDIRECTION_CPU |
-   MSI_ADDR_DEST_ID(cfg->dest_apicid);
+   msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+   msg->arch_addr_lo.dest_mode_logical = apic->dest_mode_logical;
+   msg->arch_addr_lo.destid_0_7 = cfg->dest_apicid & 0xFF;
 
-   msg->data =
-   MSI_DATA_TRIGGER_EDGE |
-   MSI_DATA_LEVEL_ASSERT |
-   MSI_DATA_DELIVERY_FIXED |
-   MSI_DATA_VECTOR(cfg->vector);
+   msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_FIXED;
+   msg->arch_data.vector = cfg->vector;
 
+   msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
/*
 * Only the IOMMU itself can use the trick of putting destination
 * APIC ID into the high bits of the address. Anything else would
@@ -2507,11 +2500,21 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct 
msi_msg *msg,
 * address higher APIC IDs.
 */
if (dmar)
-   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+   msg->arch_addr_hi.destid_8_31 = cfg->dest_apicid >> 8;
else
-   WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+   WARN_ON_ONCE(cfg->dest_apicid > 0xFF);
 }
 
+u32 x86_msi_msg_get_des

[PATCH v3 32/35] x86/apic: Support 15 bits of APIC ID in MSI where available

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Some hypervisors can allow the guest to use the Extended Destination ID
field in the MSI address to address up to 32768 CPUs.

This applies to all downstream devices which generate MSI cycles,
including HPET, I/OAPIC and PCI MSI.

HPET and PCI MSI use the same __irq_msi_compose_msg() function, while
I/OAPIC generates its own and had support for the extended bits added in
a previous commit.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201009104616.1314746-6-dw...@infradead.org
---
 arch/x86/include/asm/msi.h  |  3 ++-
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c | 26 --
 arch/x86/kernel/x86_init.c  |  1 +
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 322fd905da9c..b85147d75626 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -29,7 +29,8 @@ typedef struct x86_msi_addr_lo {
u32 reserved_0  :  2,
dest_mode_logical   :  1,
redirect_hint   :  1,
-   reserved_1  :  8,
+   reserved_1  :  1,
+   virt_destid_8_14:  7,
destid_0_7  :  8,
base_address: 12;
};
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index dde5b3f1e7cd..5c69f7eb5d47 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -116,6 +116,7 @@ struct x86_init_pci {
  * @init_platform: platform setup
  * @guest_late_init:   guest late init
  * @x2apic_available:  X2APIC detection
+ * @msi_ext_dest_id:   MSI supports 15-bit APIC IDs
  * @init_mem_mapping:  setup early mappings during init_mem_mapping()
  * @init_after_bootmem:guest init after boot allocator is 
finished
  */
@@ -123,6 +124,7 @@ struct x86_hyper_init {
void (*init_platform)(void);
void (*guest_late_init)(void);
bool (*x2apic_available)(void);
+   bool (*msi_ext_dest_id)(void);
void (*init_mem_mapping)(void);
void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f7196ee0f005..6bd20c0de8bc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -93,6 +93,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = 
BAD_APICID;
  */
 static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP;
 
+/*
+ * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID
+ */
+static bool virt_ext_dest_id __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -1841,6 +1846,8 @@ static __init void try_to_enable_x2apic(int remap_mode)
return;
 
if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
+   u32 apic_limit = 255;
+
/*
 * Using X2APIC without IR is not architecturally supported
 * on bare metal but may be supported in guests.
@@ -1851,12 +1858,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
return;
}
 
+   /*
+* If the hypervisor supports extended destination ID in
+* MSI, that increases the maximum APIC ID that can be
+* used for non-remapped IRQ domains.
+*/
+   if (x86_init.hyper.msi_ext_dest_id()) {
+   virt_ext_dest_id = 1;
+   apic_limit = 32767;
+   }
+
/*
 * Without IR, all CPUs can be addressed by IOAPIC/MSI only
 * in physical mode, and CPUs with an APIC ID that cannnot
 * be addressed must not be brought online.
 */
-   x2apic_set_max_apicid(255);
+   x2apic_set_max_apicid(apic_limit);
x2apic_phys = 1;
}
x2apic_enable();
@@ -2497,10 +2514,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct 
msi_msg *msg,
 * Only the IOMMU itself can use the trick of putting destination
 * APIC ID into the high bits of the address. Anything else would
 * just be writing to memory if it tried that, and needs IR to
-* address higher APIC IDs.
+* address APICs which can't be addressed in the normal 32-bit
+* address range at 0xFFEx. That is typically just 8 bits, but
+* some hypervisors allow the extended destination ID field in bits
+* 5-11 to be used, giving support for 15 bits of APIC IDs in total.
 */
if (dmar)
msg->arch_addr_hi.destid_8_

[PATCH v3 14/35] iommu/amd: Use msi_msg shadow structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Get rid of the macro mess and use the shadow structs for the x86 specific
MSI message format. Convert the intcapxt setup to use named bitfields as
well while touching it anyway.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 drivers/iommu/amd/init.c  | 46 ++-
 drivers/iommu/amd/iommu.c | 14 +++-
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 82e4af8f09bb..263670d36fed 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1966,10 +1965,16 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
return 0;
 }
 
-#define XT_INT_DEST_MODE(x)(((x) & 0x1ULL) << 2)
-#define XT_INT_DEST_LO(x)  (((x) & 0xFFULL) << 8)
-#define XT_INT_VEC(x)  (((x) & 0xFFULL) << 32)
-#define XT_INT_DEST_HI(x)  x) >> 24) & 0xFFULL) << 56)
+union intcapxt {
+   u64 capxt;
+   u64 reserved_0  :  2,
+   dest_mode_logical   :  1,
+   reserved_1  :  5,
+   destid_0_23 : 24,
+   vector  :  8,
+   reserved_2  : 16,
+   destid_24_31:  8;
+} __attribute__ ((packed));
 
 /*
  * Setup the IntCapXT registers with interrupt routing information
@@ -1978,28 +1983,29 @@ static int iommu_setup_msi(struct amd_iommu *iommu)
  */
 static void iommu_update_intcapxt(struct amd_iommu *iommu)
 {
-   u64 val;
-   u32 addr_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
-   u32 addr_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
-   u32 data= readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
-   bool dm = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-   u32 dest= ((addr_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xFF);
+   struct msi_msg msg;
+   union intcapxt xt;
+   u32 destid;
 
-   if (x2apic_enabled())
-   dest |= MSI_ADDR_EXT_DEST_ID(addr_hi);
+   msg.address_lo = readl(iommu->mmio_base + MMIO_MSI_ADDR_LO_OFFSET);
+   msg.address_hi = readl(iommu->mmio_base + MMIO_MSI_ADDR_HI_OFFSET);
+   msg.data = readl(iommu->mmio_base + MMIO_MSI_DATA_OFFSET);
 
-   val = XT_INT_VEC(data & 0xFF) |
- XT_INT_DEST_MODE(dm) |
- XT_INT_DEST_LO(dest) |
- XT_INT_DEST_HI(dest);
+   destid = x86_msi_msg_get_destid(&msg, x2apic_enabled());
+
+   xt.capxt = 0ULL;
+   xt.dest_mode_logical = msg.arch_data.dest_mode_logical;
+   xt.vector = msg.arch_data.vector;
+   xt.destid_0_23 = destid & GENMASK(23, 0);
+   xt.destid_24_31 = destid >> 24;
 
/**
 * Current IOMMU implemtation uses the same IRQ for all
 * 3 IOMMU interrupts.
 */
-   writeq(val, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
-   writeq(val, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
-   writeq(val, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
 }
 
 static void _irq_notifier_notify(struct irq_affinity_notify *notify,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d7f0c8908602..473de0920b64 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -35,7 +35,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -3656,13 +3655,20 @@ struct irq_remap_ops amd_iommu_irq_ops = {
.get_irq_domain = get_irq_domain,
 };
 
+static void fill_msi_msg(struct msi_msg *msg, u32 index)
+{
+   msg->data = index;
+   msg->address_lo = 0;
+   msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+   msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
   struct irq_cfg *irq_cfg,
   struct irq_alloc_info *info,
   int devid, int index, int sub_handle)
 {
struct irq_2_irte *irte_info = &data->irq_2_irte;
-   struct msi_msg *msg = &data->msi_entry;
struct IO_APIC_route_entry *entry;
struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
@@ -3693,9 +3699,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
case X86_IRQ_ALLOC_TYPE_HPET:
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   msg->address_hi = MSI_ADDR_BASE_HI;
-   

[PATCH v3 07/35] x86/apic: Get rid of apic::dest_logical

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

struct apic has two members which store information about the destination
mode: dest_logical and irq_dest_mode.

dest_logical contains a mask which was historically used to set the
destination mode in IPI messages. Over time the usage was reduced and the
logical/physical functions were seperated.

There are only a few places which still use 'dest_logical' but they can
ues 'irq_dest_mode' instead.

irq_dest_mode is actually a boolean where 0 means physical destination mode
and 1 means logical destination mode. Of course the name does not reflect
the functionality. This will be cleaned up in a subsequent change.

Remove apic::dest_logical and fixup the remaining users.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/apic.h   | 2 --
 arch/x86/kernel/apic/apic.c   | 2 +-
 arch/x86/kernel/apic/apic_flat_64.c   | 8 ++--
 arch/x86/kernel/apic/apic_noop.c  | 4 +---
 arch/x86/kernel/apic/apic_numachip.c  | 8 ++--
 arch/x86/kernel/apic/bigsmp_32.c  | 4 +---
 arch/x86/kernel/apic/probe_32.c   | 4 +---
 arch/x86/kernel/apic/x2apic_cluster.c | 4 +---
 arch/x86/kernel/apic/x2apic_phys.c| 4 +---
 arch/x86/kernel/apic/x2apic_uv_x.c| 4 +---
 arch/x86/kernel/smpboot.c | 5 +++--
 arch/x86/xen/apic.c   | 4 +---
 12 files changed, 15 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 53bb62e0fdd5..019d7ac3b16e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -306,8 +306,6 @@ struct apic {
void(*send_IPI_all)(int vector);
void(*send_IPI_self)(int vector);
 
-   /* dest_logical is used by the IPI functions */
-   u32 dest_logical;
u32 disable_esr;
 
enum apic_delivery_modes delivery_mode;
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 113f6ca7b828..29d28b34cb2f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1591,7 +1591,7 @@ static void setup_local_APIC(void)
apic->init_apic_ldr();
 
 #ifdef CONFIG_X86_32
-   if (apic->dest_logical) {
+   if (apic->irq_dest_mode == 1) {
int logical_apicid, ldr_apicid;
 
/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c 
b/arch/x86/kernel/apic/apic_flat_64.c
index 6df837fd5081..bbb1b89fe711 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -117,11 +117,9 @@ static struct apic apic_flat __ro_after_init = {
.irq_dest_mode  = 1, /* logical */
 
.disable_esr= 0,
-   .dest_logical   = APIC_DEST_LOGICAL,
-   .check_apicid_used  = NULL,
 
+   .check_apicid_used  = NULL,
.init_apic_ldr  = flat_init_apic_ldr,
-
.ioapic_phys_id_map = NULL,
.setup_apic_routing = NULL,
.cpu_present_to_apicid  = default_cpu_present_to_apicid,
@@ -210,11 +208,9 @@ static struct apic apic_physflat __ro_after_init = {
.irq_dest_mode  = 0, /* physical */
 
.disable_esr= 0,
-   .dest_logical   = 0,
-   .check_apicid_used  = NULL,
 
+   .check_apicid_used  = NULL,
.init_apic_ldr  = physflat_init_apic_ldr,
-
.ioapic_phys_id_map = NULL,
.setup_apic_routing = NULL,
.cpu_present_to_apicid  = default_cpu_present_to_apicid,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 4fc934b11851..38f167ce5031 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -100,11 +100,9 @@ struct apic apic_noop __ro_after_init = {
.irq_dest_mode  = 1,
 
.disable_esr= 0,
-   .dest_logical   = APIC_DEST_LOGICAL,
-   .check_apicid_used  = default_check_apicid_used,
 
+   .check_apicid_used  = default_check_apicid_used,
.init_apic_ldr  = noop_init_apic_ldr,
-
.ioapic_phys_id_map = default_ioapic_phys_id_map,
.setup_apic_routing = NULL,
 
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index db715d082ec9..4ebf9fe2c95d 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -250,11 +250,9 @@ static const struct apic apic_numachip1 __refconst = {
.irq_dest_mode  = 0, /* physical */
 
.disable_esr= 0,
-   .dest_logical   = 0,
-   .check_apicid_used  = NULL,
 
+   .check_apicid_used  = NULL,
.init_apic_ldr 

[PATCH v3 04/35] x86/devicetree: Fix the ioapic interrupt type table

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

The ioapic interrupt type table is wrong as it assumes that polarity in
IO/APIC context means active high when set. But the IO/APIC polarity is
working the other way round. This works because the ordering of the entries
is consistent with the device tree and the type information is not used by
the IO/APIC interrupt chip.

The whole trigger and polarity business of IO/APIC is misleading and the
corresponding constants which are defined as 0/1 are not used consistently
and are going to be removed.

Rename the type table members to 'is_level' and 'active_low' and adjust the
type information for consistency sake.

No functional change.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/devicetree.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index ddffd80f5c52..6a4cb71c2498 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -184,31 +184,31 @@ static unsigned int ioapic_id;
 
 struct of_ioapic_type {
u32 out_type;
-   u32 trigger;
-   u32 polarity;
+   u32 is_level;
+   u32 active_low;
 };
 
 static struct of_ioapic_type of_ioapic_type[] =
 {
{
-   .out_type   = IRQ_TYPE_EDGE_RISING,
-   .trigger= IOAPIC_EDGE,
-   .polarity   = 1,
+   .out_type   = IRQ_TYPE_EDGE_FALLING,
+   .is_level   = 0,
+   .active_low = 1,
},
{
-   .out_type   = IRQ_TYPE_LEVEL_LOW,
-   .trigger= IOAPIC_LEVEL,
-   .polarity   = 0,
+   .out_type   = IRQ_TYPE_LEVEL_HIGH,
+   .is_level   = 1,
+   .active_low = 0,
},
{
-   .out_type   = IRQ_TYPE_LEVEL_HIGH,
-   .trigger= IOAPIC_LEVEL,
-   .polarity   = 1,
+   .out_type   = IRQ_TYPE_LEVEL_LOW,
+   .is_level   = 1,
+   .active_low = 1,
},
{
-   .out_type   = IRQ_TYPE_EDGE_FALLING,
-   .trigger= IOAPIC_EDGE,
-   .polarity   = 0,
+   .out_type   = IRQ_TYPE_EDGE_RISING,
+   .is_level   = 0,
+   .active_low = 0,
},
 };
 
@@ -228,7 +228,7 @@ static int dt_irqdomain_alloc(struct irq_domain *domain, 
unsigned int virq,
return -EINVAL;
 
it = &of_ioapic_type[type_index];
-   ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->trigger, it->polarity);
+   ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->is_level, it->active_low);
tmp.devid = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain));
tmp.ioapic.pin = fwspec->param[0];
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/35] x86/apic: Fix x2apic enablement without interrupt remapping

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

Currently, Linux as a hypervisor guest will enable x2apic only if there are
no CPUs present at boot time with an APIC ID above 255.

Hotplugging a CPU later with a higher APIC ID would result in a CPU which
cannot be targeted by external interrupts.

Add a filter in x2apic_apic_id_valid() which can be used to prevent such
CPUs from coming online, and allow x2apic to be enabled even if they are
present at boot time.

Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under 
KVM")
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/apic.h|  1 +
 arch/x86/kernel/apic/apic.c| 14 --
 arch/x86/kernel/apic/x2apic_phys.c |  9 +
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..113f6ca7b828 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode)
return;
 
if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-   /* IR is required if there is APIC ID > 255 even when running
-* under KVM
+   /*
+* Using X2APIC without IR is not architecturally supported
+* on bare metal but may be supported in guests.
 */
-   if (max_physical_apicid > 255 ||
-   !x86_init.hyper.x2apic_available()) {
+   if (!x86_init.hyper.x2apic_available()) {
pr_info("x2apic: IRQ remapping doesn't support X2APIC 
mode\n");
x2apic_disable();
return;
}
 
/*
-* without IR all CPUs can be addressed by IOAPIC/MSI
-* only in physical mode
+* Without IR, all CPUs can be addressed by IOAPIC/MSI only
+* in physical mode, and CPUs with an APIC ID that cannnot
+* be addressed must not be brought online.
 */
+   x2apic_set_max_apicid(255);
x2apic_phys = 1;
}
x2apic_enable();
diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..e14eae6d6ea7 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid __ro_after_init;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+   x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+   if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+   return 0;
+
return 1;
 }
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 15/35] PCI: vmd: Use msi_msg shadow structs

2020-10-24 Thread David Woodhouse
From: Thomas Gleixner 

Use the x86 shadow structs in msi_msg instead of the macros.

Signed-off-by: Thomas Gleixner 
Signed-off-by: David Woodhouse 
---
 drivers/pci/controller/vmd.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index aa1b12bac9a1..72de3c6f644e 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define VMD_CFGBAR 0
 #define VMD_MEMBAR12
@@ -131,10 +130,10 @@ static void vmd_compose_msi_msg(struct irq_data *data, 
struct msi_msg *msg)
struct vmd_irq_list *irq = vmdirq->irq;
struct vmd_dev *vmd = irq_data_get_irq_handler_data(data);
 
-   msg->address_hi = MSI_ADDR_BASE_HI;
-   msg->address_lo = MSI_ADDR_BASE_LO |
- MSI_ADDR_DEST_ID(index_from_irqs(vmd, irq));
-   msg->data = 0;
+   memset(&msg, 0, sizeof(*msg);
+   msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
+   msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
+   msg->arch_addr_lo.destid_0_7 = index_from_irqs(vmd, irq);
 }
 
 /*
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 09/35] x86/apic: Always provide irq_compose_msi_msg() method for vector domain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver
interrupts through MSI without having any PCI in the system at all.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/apic.h   |  8 +++-
 arch/x86/kernel/apic/apic.c   | 32 ++
 arch/x86/kernel/apic/msi.c| 37 ---
 arch/x86/kernel/apic/vector.c |  6 ++
 4 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 652f62252349..b489bb9894ae 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -520,12 +520,10 @@ static inline void apic_smt_update(void) { }
 #endif
 
 struct msi_msg;
+struct irq_cfg;
 
-#ifdef CONFIG_PCI_MSI
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg);
-#else
-# define x86_vector_msi_compose_msg NULL
-#endif
+extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+ bool dmar);
 
 extern void ioapic_zap_locks(void);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 54f04355aaa2..4c15bf29ea2c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void)
return read_apic_id();
 }
 
+void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
+  bool dmar)
+{
+   msg->address_hi = MSI_ADDR_BASE_HI;
+
+   msg->address_lo =
+   MSI_ADDR_BASE_LO |
+   (apic->dest_mode_logical ?
+   MSI_ADDR_DEST_MODE_LOGICAL :
+   MSI_ADDR_DEST_MODE_PHYSICAL) |
+   MSI_ADDR_REDIRECTION_CPU |
+   MSI_ADDR_DEST_ID(cfg->dest_apicid);
+
+   msg->data =
+   MSI_DATA_TRIGGER_EDGE |
+   MSI_DATA_LEVEL_ASSERT |
+   MSI_DATA_DELIVERY_FIXED |
+   MSI_DATA_VECTOR(cfg->vector);
+
+   /*
+* Only the IOMMU itself can use the trick of putting destination
+* APIC ID into the high bits of the address. Anything else would
+* just be writing to memory if it tried that, and needs IR to
+* address higher APIC IDs.
+*/
+   if (dmar)
+   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+   else
+   WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
+}
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 46ffd41a4238..4eda617eda1e 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -23,42 +22,6 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
- bool dmar)
-{
-   msg->address_hi = MSI_ADDR_BASE_HI;
-
-   msg->address_lo =
-   MSI_ADDR_BASE_LO |
-   (apic->dest_mode_logical ?
-   MSI_ADDR_DEST_MODE_LOGICAL :
-   MSI_ADDR_DEST_MODE_PHYSICAL) |
-   MSI_ADDR_REDIRECTION_CPU |
-   MSI_ADDR_DEST_ID(cfg->dest_apicid);
-
-   msg->data =
-   MSI_DATA_TRIGGER_EDGE |
-   MSI_DATA_LEVEL_ASSERT |
-   MSI_DATA_DELIVERY_FIXED |
-   MSI_DATA_VECTOR(cfg->vector);
-
-   /*
-* Only the IOMMU itself can use the trick of putting destination
-* APIC ID into the high bits of the address. Anything else would
-* just be writing to memory if it tried that, and needs IR to
-* address higher APIC IDs.
-*/
-   if (dmar)
-   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-   else
-   WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
-}
-
-void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
-{
-   __irq_msi_compose_msg(irqd_cfg(data), msg, false);
-}
-
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
struct msi_msg msg[2] = { [1] = { }, };
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 1eac53632786..bb2e2a2488a5 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd)
apic_ack_irq(irqd);
 }
 
+static void x86_vector_msi_compose_msg(struct irq_data *data,
+  struct msi_msg *msg)
+{
+   __irq_msi_c

[PATCH v3 23/35] x86/apic: Add select() method on vector irqdomain

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

This will be used to select the irqdomain for I/O-APIC and HPET.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/irqdomain.h |  3 +++
 arch/x86/kernel/apic/vector.c| 43 
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..125c23b7bad3 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
X86_IRQ_ALLOC_LEGACY= 0x2,
 };
 
+extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..b9b05caa28a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, 
struct irq_domain *d,
 }
 #endif
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+   if (fwspec->param_count != 1)
+   return 0;
+
+   if (is_fwnode_irqchip(fwspec->fwnode)) {
+   const char *fwname = fwnode_get_name(fwspec->fwnode);
+   return fwname && !strncmp(fwname, "IO-APIC-", 8) &&
+   simple_strtol(fwname+8, NULL, 10) == fwspec->param[0];
+   }
+   return to_of_node(fwspec->fwnode) &&
+   of_device_is_compatible(to_of_node(fwspec->fwnode),
+   "intel,ce4100-ioapic");
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+   if (fwspec->param_count != 1)
+   return 0;
+
+   if (is_fwnode_irqchip(fwspec->fwnode)) {
+   const char *fwname = fwnode_get_name(fwspec->fwnode);
+   return fwname && !strncmp(fwname, "HPET-MSI-", 9) &&
+   simple_strtol(fwname+9, NULL, 10) == fwspec->param[0];
+   }
+   return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+enum irq_domain_bus_token bus_token)
+{
+   /*
+* HPET and I/OAPIC cannot be parented in the vector domain
+* if IRQ remapping is enabled. APIC IDs above 15 bits are
+* only permitted if IRQ remapping is enabled, so check that.
+*/
+   if (apic->apic_id_valid(32768))
+   return 0;
+
+   return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+   .select = x86_vector_select,
.alloc  = x86_vector_alloc_irqs,
.free   = x86_vector_free_irqs,
.activate   = x86_vector_activate,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 35/35] x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected

2020-10-24 Thread David Woodhouse
From: David Woodhouse 

This allows the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse 
Acked-by: Paolo Bonzini 
---
 arch/x86/kernel/kvm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1c0f2560a41c..b82de2843814 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -740,6 +740,11 @@ static void __init kvm_apic_init(void)
 #endif
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+   return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 static void __init kvm_init_platform(void)
 {
kvmclock_init();
@@ -769,6 +774,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = {
.type   = X86_HYPER_KVM,
.init.guest_late_init   = kvm_guest_init,
.init.x2apic_available  = kvm_para_available,
+   .init.msi_ext_dest_id   = kvm_msi_ext_dest_id,
.init.init_platform = kvm_init_platform,
 #if defined(CONFIG_AMD_MEM_ENCRYPT)
.runtime.sev_es_hcall_prepare   = kvm_sev_es_hcall_prepare,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built

2020-10-17 Thread David Woodhouse
On Wed, 2020-10-14 at 14:57 +0200, Joerg Roedel wrote:
> On Wed, Oct 14, 2020 at 03:25:08PM +0800, Lu Baolu wrote:
> > I suppose Joerg will pick this up. I guess you don't need to resend it
> > unless Joerg asks you to do.
> 
> Yes, will pick this up soon, no need to re-send.

Please could it, and the commit it fixes, both go to stable@


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built

2020-10-13 Thread David Woodhouse
On Tue, 2020-10-13 at 09:30 +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units
> with no supported address widths") dmar.c needs struct iommu_device to
> be selected. We can drop this dependency by not dereferencing struct
> iommu_device if IOMMU_API is not selected and by reusing the information
> stored in iommu->drhd->ignored instead.
> 
> This fixes the following build error when IOMMU_API is not selected:
> 
> drivers/iommu/intel/dmar.c: In function ‘free_iommu’:
> drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no 
> member named ‘ops’
>  1139 |  if (intel_iommu_enabled && iommu->iommu.ops) {
> ^
> 
> Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no 
> supported address widths")
> Signed-off-by: Bartosz Golaszewski 

Oops, apologies for that. Thanks for fixing it.

Acked-by: David Woodhouse 




smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 6/9] x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/hpet.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3b8b12769f3b..fb7736ca7b5b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -543,8 +543,8 @@ static struct irq_domain *hpet_create_irq_domain(int 
hpet_id)
 {
struct msi_domain_info *domain_info;
struct irq_domain *parent, *d;
-   struct irq_alloc_info info;
struct fwnode_handle *fn;
+   struct irq_fwspec fwspec;
 
if (x86_vector_domain == NULL)
return NULL;
@@ -556,15 +556,6 @@ static struct irq_domain *hpet_create_irq_domain(int 
hpet_id)
*domain_info = hpet_msi_domain_info;
domain_info->data = (void *)(long)hpet_id;
 
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-   info.devid = hpet_id;
-   parent = irq_remapping_get_irq_domain(&info);
-   if (parent == NULL)
-   parent = x86_vector_domain;
-   else
-   hpet_msi_controller.name = "IR-HPET-MSI";
-
fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
  hpet_id);
if (!fn) {
@@ -572,6 +563,18 @@ static struct irq_domain *hpet_create_irq_domain(int 
hpet_id)
return NULL;
}
 
+   fwspec.fwnode = fn;
+   fwspec.param_count = 1;
+   fwspec.param[0] = hpet_id;
+   parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+   if (!parent) {
+   irq_domain_free_fwnode(fn);
+   kfree(domain_info);
+   return NULL;
+   }
+   if (parent != x86_vector_domain)
+   hpet_msi_controller.name = "IR-HPET-MSI";
+
d = msi_create_irq_domain(fn, domain_info, parent);
if (!d) {
irq_domain_free_fwnode(fn);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/9] iommu/vt-d: Implement select() method on remapping irqdomain

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 drivers/iommu/intel/irq_remapping.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 511dfb4884bc..40c2fec122b8 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1435,7 +1435,20 @@ static void intel_irq_remapping_deactivate(struct 
irq_domain *domain,
modify_irte(&data->irq_2_iommu, &entry);
 }
 
+static int intel_irq_remapping_select(struct irq_domain *d,
+ struct irq_fwspec *fwspec,
+ enum irq_domain_bus_token bus_token)
+{
+   if (x86_fwspec_is_ioapic(fwspec))
+   return d == map_ioapic_to_ir(fwspec->param[0]);
+   else if (x86_fwspec_is_hpet(fwspec))
+   return d == map_hpet_to_ir(fwspec->param[0]);
+
+   return 0;
+}
+
 static const struct irq_domain_ops intel_ir_domain_ops = {
+   .select = intel_irq_remapping_select,
.alloc = intel_irq_remapping_alloc,
.free = intel_irq_remapping_free,
.activate = intel_irq_remapping_activate,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/9] Remove irq_remapping_get_irq_domain()

2020-10-13 Thread David Woodhouse
I didn't much like the I/OAPIC and HPET drivers having magical knowledge
that they had to substitute x86_vector_domain if their call to
irq_remapping_get_irq_domain() returned NULL.

When Thomas tried to make it handle error returns from …get_irq_domain() 
distinctly from the NULL case too, it made me even sadder. So I killed 
it with fire.

Now they just use irq_find_matching_fwspec() to find an appropriate
irqdomain. Each remapping irqdomain just needs to say 'yep, that's me'
for the HPETs or I/OAPICs which are within their scope, while the
x86_vector_domain accepts them all but only if interrupt remapping
is *disabled*. No more special knowledge in the caller.

If IR is enabled and there's a child device which escapes the scope of
all remapping units, it gets NULL for its parent irqdomain and will
fail to initialise, which is the correct thing to do in that "should
never happen" case. For HPET that'll mean that it just doesn't support
MSI, while I/OAPIC will refuse to initialise and trigger a BUG_ON
because Linux quite likes it when *all* the I/OAPICs it knows about get
initialised successfully.

This is on top of the previous 'ext_dest_id' series at
https://patchwork.kernel.org/project/kvm/list/?series=362037

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/ext_dest_id

David Woodhouse (9):
  genirq/irqdomain: Implement get_name() method on irqchip fwnodes
  x86/apic: Add select() method on vector irqdomain
  iommu/amd: Implement select() method on remapping irqdomain
  iommu/vt-d: Implement select() method on remapping irqdomain
  iommu/hyper-v: Implement select() method on remapping irqdomain
  x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
  x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
  x86: Kill all traces of irq_remapping_get_irq_domain()
  iommu/vt-d: Simplify intel_irq_remapping_select()

 arch/x86/include/asm/hw_irq.h|  2 --
 arch/x86/include/asm/irq_remapping.h |  9 -
 arch/x86/include/asm/irqdomain.h |  3 +++
 arch/x86/kernel/apic/io_apic.c   | 24 
 arch/x86/kernel/apic/vector.c| 43 
+++
 arch/x86/kernel/hpet.c   | 23 +--
 drivers/iommu/amd/iommu.c| 53 
+++--
 drivers/iommu/hyperv-iommu.c | 18 +-
 drivers/iommu/intel/irq_remapping.c  | 43 
+--
 drivers/iommu/irq_remapping.c| 14 --
 drivers/iommu/irq_remapping.h|  3 ---
 kernel/irq/irqdomain.c   | 11 ++-
 12 files changed, 126 insertions(+), 120 deletions(-)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 7/9] x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/apic/io_apic.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ca2da19d5c55..73cacc92c3bb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2305,36 +2305,36 @@ static inline void __init check_timer(void)
 
 static int mp_irqdomain_create(int ioapic)
 {
-   struct irq_alloc_info info;
struct irq_domain *parent;
int hwirqs = mp_ioapic_pin_count(ioapic);
struct ioapic *ip = &ioapics[ioapic];
struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
struct fwnode_handle *fn;
-   char *name = "IO-APIC";
+   struct irq_fwspec fwspec;
 
if (cfg->type == IOAPIC_DOMAIN_INVALID)
return 0;
 
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
-   info.devid = mpc_ioapic_id(ioapic);
-   parent = irq_remapping_get_irq_domain(&info);
-   if (!parent)
-   parent = x86_vector_domain;
-   else
-   name = "IO-APIC-IR";
-
/* Handle device tree enumerated APICs proper */
if (cfg->dev) {
fn = of_node_to_fwnode(cfg->dev);
} else {
-   fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+   fn = irq_domain_alloc_named_id_fwnode("IO-APIC", ioapic);
if (!fn)
return -ENOMEM;
}
 
+   fwspec.fwnode = fn;
+   fwspec.param_count = 1;
+   fwspec.param[0] = ioapic;
+   parent = irq_find_matching_fwspec(&fwspec, DOMAIN_BUS_ANY);
+   if (!parent) {
+   if (!cfg->dev)
+   irq_domain_free_fwnode(fn);
+   return -ENODEV;
+   }
+
ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
 (void *)(long)ioapic);
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/9] genirq/irqdomain: Implement get_name() method on irqchip fwnodes

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 kernel/irq/irqdomain.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 76cd7ebd1178..6440f97c412e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct irq_domain 
*d) { }
 static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
 #endif
 
-const struct fwnode_operations irqchip_fwnode_ops;
+static const char *irqchip_fwnode_get_name(const struct fwnode_handle *fwnode)
+{
+   struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, 
fwnode);
+
+   return fwid->name;
+}
+
+const struct fwnode_operations irqchip_fwnode_ops = {
+   .get_name = irqchip_fwnode_get_name,
+};
 EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
 
 /**
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/9] iommu/hyper-v: Implement select() method on remapping irqdomain

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 drivers/iommu/hyperv-iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 37dd485a5640..6a8966fbc3bd 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -61,6 +61,14 @@ static struct irq_chip hyperv_ir_chip = {
.irq_set_affinity   = hyperv_ir_set_affinity,
 };
 
+static int hyperv_irq_remapping_select(struct irq_domain *d,
+  struct irq_fwspec *fwspec,
+  enum irq_domain_bus_token bus_token)
+{
+   /* Claim only the first (and only) I/OAPIC */
+   return x86_fwspec_is_ioapic(fwspec) && fwspec->param[0] == 0;
+}
+
 static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
 unsigned int virq, unsigned int nr_irqs,
 void *arg)
@@ -102,6 +110,7 @@ static void hyperv_irq_remapping_free(struct irq_domain 
*domain,
 }
 
 static const struct irq_domain_ops hyperv_ir_domain_ops = {
+   .select = hyperv_irq_remapping_select,
.alloc = hyperv_irq_remapping_alloc,
.free = hyperv_irq_remapping_free,
 };
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 9/9] iommu/vt-d: Simplify intel_irq_remapping_select()

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Now that the old get_irq_domain() method has gone, we can consolidate on
just the map_XXX_to_iommu() functions.

Signed-off-by: David Woodhouse 
---
 drivers/iommu/intel/irq_remapping.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index ccf61cd18f69..8a41bcb10e26 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -204,13 +204,13 @@ static int modify_irte(struct irq_2_iommu *irq_iommu,
return rc;
 }
 
-static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
+static struct intel_iommu *map_hpet_to_iommu(u8 hpet_id)
 {
int i;
 
for (i = 0; i < MAX_HPET_TBS; i++) {
if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-   return ir_hpet[i].iommu->ir_domain;
+   return ir_hpet[i].iommu;
}
return NULL;
 }
@@ -226,13 +226,6 @@ static struct intel_iommu *map_ioapic_to_iommu(int apic)
return NULL;
 }
 
-static struct irq_domain *map_ioapic_to_ir(int apic)
-{
-   struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
-
-   return iommu ? iommu->ir_domain : NULL;
-}
-
 static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
@@ -1422,12 +1415,14 @@ static int intel_irq_remapping_select(struct irq_domain 
*d,
  struct irq_fwspec *fwspec,
  enum irq_domain_bus_token bus_token)
 {
+   struct intel_iommu *iommu = NULL;
+
if (x86_fwspec_is_ioapic(fwspec))
-   return d == map_ioapic_to_ir(fwspec->param[0]);
+   iommu = map_ioapic_to_iommu(fwspec->param[0]);
else if (x86_fwspec_is_hpet(fwspec))
-   return d == map_hpet_to_ir(fwspec->param[0]);
+   iommu = map_hpet_to_iommu(fwspec->param[0]);
 
-   return 0;
+   return iommu && d == iommu->ir_domain;
 }
 
 static const struct irq_domain_ops intel_ir_domain_ops = {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/9] iommu/amd: Implement select() method on remapping irqdomain

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 drivers/iommu/amd/iommu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 13d0a8f42d56..7ecebc5d255f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3590,6 +3590,24 @@ struct irq_remap_ops amd_iommu_irq_ops = {
.get_irq_domain = get_irq_domain,
 };
 
+static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec 
*fwspec,
+   enum irq_domain_bus_token bus_token)
+{
+   struct amd_iommu *iommu;
+   int devid = -1;
+
+   if (x86_fwspec_is_ioapic(fwspec))
+   devid = get_ioapic_devid(fwspec->param[0]);
+   else if (x86_fwspec_is_ioapic(fwspec))
+   devid = get_hpet_devid(fwspec->param[0]);
+
+   if (devid < 0)
+   return 0;
+
+   iommu = amd_iommu_rlookup_table[devid];
+   return iommu && iommu->ir_domain == d;
+}
+
 static void irq_remapping_prepare_irte(struct amd_ir_data *data,
   struct irq_cfg *irq_cfg,
   struct irq_alloc_info *info,
@@ -3813,6 +3831,7 @@ static void irq_remapping_deactivate(struct irq_domain 
*domain,
 }
 
 static const struct irq_domain_ops amd_ir_domain_ops = {
+   .select = irq_remapping_select,
.alloc = irq_remapping_alloc,
.free = irq_remapping_free,
.activate = irq_remapping_activate,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 8/9] x86: Kill all traces of irq_remapping_get_irq_domain()

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/hw_irq.h|  2 --
 arch/x86/include/asm/irq_remapping.h |  9 
 drivers/iommu/amd/iommu.c| 34 
 drivers/iommu/hyperv-iommu.c |  9 
 drivers/iommu/intel/irq_remapping.c  | 17 --
 drivers/iommu/irq_remapping.c| 14 
 drivers/iommu/irq_remapping.h|  3 ---
 7 files changed, 88 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index aabd8f1b6bb0..aef795f17478 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,8 +40,6 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
-   X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
-   X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct ioapic_alloc_info {
diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index af4a151d70b3..7cc49432187f 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,9 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-extern struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info);
-
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *
 arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int 
id);
@@ -71,11 +68,5 @@ static inline void panic_if_irq_remap(const char *msg)
 {
 }
 
-static inline struct irq_domain *
-irq_remapping_get_irq_domain(struct irq_alloc_info *info)
-{
-   return NULL;
-}
-
 #endif /* CONFIG_IRQ_REMAP */
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7ecebc5d255f..16adbf9d8fbb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3536,10 +3536,8 @@ static int get_devid(struct irq_alloc_info *info)
 {
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
-   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
return get_ioapic_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_HPET:
-   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
return get_hpet_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
@@ -3550,44 +3548,12 @@ static int get_devid(struct irq_alloc_info *info)
}
 }
 
-static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
-  int devid)
-{
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-   if (!iommu)
-   return NULL;
-
-   switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   return iommu->ir_domain;
-   default:
-   WARN_ON_ONCE(1);
-   return NULL;
-   }
-}
-
-static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
-{
-   int devid;
-
-   if (!info)
-   return NULL;
-
-   devid = get_devid(info);
-   if (devid < 0)
-   return NULL;
-   return get_irq_domain_for_devid(info, devid);
-}
-
 struct irq_remap_ops amd_iommu_irq_ops = {
.prepare= amd_iommu_prepare,
.enable = amd_iommu_enable,
.disable= amd_iommu_disable,
.reenable   = amd_iommu_reenable,
.enable_faulting= amd_iommu_enable_faulting,
-   .get_irq_domain = get_irq_domain,
 };
 
 static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec 
*fwspec,
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 6a8966fbc3bd..e7ed2bb83358 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -160,18 +160,9 @@ static int __init hyperv_enable_irq_remapping(void)
return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
-{
-   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
-   return ioapic_ir_domain;
-   else
-   return NULL;
-}
-
 struct irq_remap_ops hyperv_irq_remap_ops = {
.prepare= hyperv_prepare_irq_remapping,
.enable = hyperv_enable_irq_remapping,
-   .get_irq_domain = hyperv_get_irq_domain,
 };
 
 #endif
diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 40c2fec122b8..ccf61cd18f69 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1128,29 +1128,12 @@ static void prepare_irte(struct irte *irte, int vector, 
unsigned int dest)
irte->redi

[PATCH 2/9] x86/apic: Add select() method on vector irqdomain

2020-10-13 Thread David Woodhouse
From: David Woodhouse 

This will be used to select the irqdomain for I/OAPIC and HPET.

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/irqdomain.h |  3 +++
 arch/x86/kernel/apic/vector.c| 43 
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
index cd684d45cb5f..125c23b7bad3 100644
--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -12,6 +12,9 @@ enum {
X86_IRQ_ALLOC_LEGACY= 0x2,
 };
 
+extern int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec);
+extern int x86_fwspec_is_hpet(struct irq_fwspec *fwspec);
+
 extern struct irq_domain *x86_vector_domain;
 
 extern void init_irq_alloc_info(struct irq_alloc_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb2e2a2488a5..b9b05caa28a4 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -636,7 +636,50 @@ static void x86_vector_debug_show(struct seq_file *m, 
struct irq_domain *d,
 }
 #endif
 
+int x86_fwspec_is_ioapic(struct irq_fwspec *fwspec)
+{
+   if (fwspec->param_count != 1)
+   return 0;
+
+   if (is_fwnode_irqchip(fwspec->fwnode)) {
+   const char *fwname = fwnode_get_name(fwspec->fwnode);
+   return fwname && !strncmp(fwname, "IO-APIC-", 8) &&
+   simple_strtol(fwname+8, NULL, 10) == fwspec->param[0];
+   }
+   return to_of_node(fwspec->fwnode) &&
+   of_device_is_compatible(to_of_node(fwspec->fwnode),
+   "intel,ce4100-ioapic");
+}
+
+int x86_fwspec_is_hpet(struct irq_fwspec *fwspec)
+{
+   if (fwspec->param_count != 1)
+   return 0;
+
+   if (is_fwnode_irqchip(fwspec->fwnode)) {
+   const char *fwname = fwnode_get_name(fwspec->fwnode);
+   return fwname && !strncmp(fwname, "HPET-MSI-", 9) &&
+   simple_strtol(fwname+9, NULL, 10) == fwspec->param[0];
+   }
+   return 0;
+}
+
+static int x86_vector_select(struct irq_domain *d, struct irq_fwspec *fwspec,
+enum irq_domain_bus_token bus_token)
+{
+   /*
+* HPET and I/OAPIC cannot be parented in the vector domain
+* if IRQ remapping is enabled. APIC IDs above 15 bits are
+* only permitted if IRQ remapping is enabled, so check that.
+*/
+   if (apic->apic_id_valid(32768))
+   return 0;
+
+   return x86_fwspec_is_ioapic(fwspec) || x86_fwspec_is_hpet(fwspec);
+}
+
 static const struct irq_domain_ops x86_vector_domain_ops = {
+   .select = x86_vector_select,
.alloc  = x86_vector_alloc_irqs,
.free   = x86_vector_free_irqs,
.activate   = x86_vector_activate,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-09 Thread David Woodhouse
On Thu, 2020-10-08 at 14:40 +0200, Thomas Gleixner wrote:
> Subject: x86/iommu: Make interrupt remapping more robust
> From: Thomas Gleixner 
> Date: Thu, 08 Oct 2020 14:09:44 +0200
> 
> Needs to be split into pieces and cover PCI proper. Right now PCI gets a
> NULL pointer assigned which makes it explode at the wrong place
> later. Also hyperv iommu wants some love.
> 
> NOT-Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/kernel/apic/io_apic.c  |4 +++-
>  arch/x86/kernel/apic/msi.c  |   24 ++--
>  drivers/iommu/amd/iommu.c   |6 +++---
>  drivers/iommu/intel/irq_remapping.c |4 ++--
>  4 files changed, 22 insertions(+), 16 deletions(-)
> 
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2300,7 +2300,9 @@ static int mp_irqdomain_create(int ioapi
> info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
> info.devid = mpc_ioapic_id(ioapic);
> parent = irq_remapping_get_irq_domain(&info);
> -   if (!parent)
> +   if (IS_ERR(parent))
> +   return PTR_ERR(parent);
> +   else if (!parent)
> parent = x86_vector_domain;
> else
> name = "IO-APIC-IR";
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -415,9 +415,9 @@ static struct msi_domain_info hpet_msi_d
>  struct irq_domain *hpet_create_irq_domain(int hpet_id)
>  {
> struct msi_domain_info *domain_info;
> +   struct fwnode_handle *fn = NULL;
> struct irq_domain *parent, *d;
> struct irq_alloc_info info;
> -   struct fwnode_handle *fn;
>  
> if (x86_vector_domain == NULL)
> return NULL;
> @@ -432,25 +432,29 @@ struct irq_domain *hpet_create_irq_domai
> init_irq_alloc_info(&info, NULL);
> info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
> info.devid = hpet_id;
> +
> parent = irq_remapping_get_irq_domain(&info);
> -   if (parent == NULL)
> +   if (IS_ERR(parent))
> +   goto fail;
> +   else if (!parent)
> parent = x86_vector_domain;
> else
> hpet_msi_controller.name = "IR-HPET-MSI";

Hrm... this is the '-ENODEV changes' that you wanted me to pick up,
right?

I confess I don't like it very much.

I'd much prefer to be able to use a generic foo_get_parent_irq_domain()
function which just returns an answer or an error.

Definitely none of this "if it returns NULL, caller fills it in for
themselves with some magic global because we're special".

And I don't much like abusing the irq_alloc_info for this either. I
didn't like the irq_alloc_info very much in its *original* use cases,
for actually allocating IRQs. Abusing it for this is worse.

I'm more than happy to start digging into this, but once I do I fear
I'm not going to stop until HPET and IOAPIC don't know *anything* about
whether they're behind IR or not.

And yes, I know that IOAPIC has a different EOI method for IR but
AFAICT that's *already* hosed for Hyper-V because it doesn't *really*
do remapping and so the RTEs *can* change there to move interrupts
around. There's more differences between ioapic_ack_level() and
ioapic_ir_ack_level() than the erratum workaround and whether we use
data->entry.vector... aren't there?

For the specific case you were targeting with this patch, you already
successfully persuaded me it should never happen, and there's a WARN_ON
which would trigger if it ever does (with too many CPUs in the system).

Let's come back and tackle this can of worms in a separate cleanup
series.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-08 Thread David Woodhouse
On Thu, 2020-10-08 at 11:34 +0200, Thomas Gleixner wrote:
> The overall conclusion for this is:
> 
>  1) X2APIC support on bare metal w/o irq remapping is not going to
> happen unless you:
> 
>   - added support in multi-queue devices which utilize managed
> interrupts
> 
>   - audited the whole tree for other assumptions related to the
> reachability of possible CPUs.
> 
> I'm not expecting you to be done with that before I retire so for
> me it's just not going to happen :)

Makes sense. It probably does mean we should a BUG_ON for the case
where IRQ remapping *is* enabled but any device is found which isn't
behind it. But that's OK.

>  2) X2APIC support on VIRT is possible if the extended ID magic is
> supported by the hypervisor because that does not make any CPU
> unreachable for MSI and therefore the multi-queue muck and
> everything else just works.
> 
> This requires to have either the domain affinity limitation for HPET
> in place or just to force disable HPET or at least HPET-MSI which is
> a reasonable tradeoff.
> 
> HPET is not required for guests which have kvmclock and
> APIC/deadline timer and known (hypervisor provided) frequencies.

HPET-MSI should work fine. Like the IOAPIC, it's just a child of the
*actual* MSI domain. The address/data in the MSI message are completely
opaque to it, and if the parent domain happens to put meaningful
information into bits 11-5 of the MSI address, the HPET won't even
notice.

The HPET's Tn_FSB_INT_ADDR register does have a full 32 bits of the MSI
address; it's not doing bit-swizzling like the IOAPIC does, which might
potentially *not* have been able to set certain bits in the MSI.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-08 Thread David Woodhouse
On Wed, 2020-10-07 at 17:57 +0200, Thomas Gleixner wrote:
> TLDR & HTF;
> 
> Multiqueue devices want to have at max 1 queue per CPU or if the device
> has less queues than CPUs they want the queues to have a fixed
> association to a set of CPUs.
> 
> At setup time this is established considering possible CPUs to handle
> 'physical' hotplug correctly.
> 
> If a queue has no online CPUs it cannot be started. If it's active and
> the last CPU goes down then it's quiesced and stopped and the core code
> shuts down the interrupt and does not move it to a still online CPU.
> 
> So with your hackery, we end up in a situation where we have a large
> possible mask, but not all CPUs in that mask can be reached, which means
> in a 1 queue per CPU scenario all unreachable CPUs would have
> disfunctional queues.
> 
> So that spreading algorithm needs to know about this limitation.

OK, thanks. So the queue exists, with an MSI assigned to point to an
offline CPU(s), but it cannot actually be used until/unless at least
one CPU in its mask comes online.

So when I said I wanted to try treating "reachable" the same way as
"online", that would mean the queue can't start until/unless at least
one *reachable* CPU in its mask comes online.


The underlying problem here is that until a CPU comes online, we don't
actually *know* if it's reachable or not.

So if we want carefully create the affinity masks at setup time so that
they don't include any unreachable CPUs... that basically means we
don't include any non-present CPUs at all (unless they've been added
once and then removed).

That's because — at least on x86 — we don't assign CPU numbers to CPUs
which haven't yet been added. Theoretically we *could* but if there are
more than NR_CPUS listed in (e.g.) the MADT then we could run out of
CPU numbers. Then if the admin hotplugs one of the CPUs we *didn't*
have space for, we'd not be able to handle it.

I suppose there might be options like pre-assigning CPU numbers only to
non-present APIC IDs below 256 (or 32768). Or *grouping* the CPU
numbers, so some not-yet-assigned CPU#s are for low APICIDs and some
are for high APICIDs but it doesn't actually have to be a 1:1
predetermined mapping.

But those really do seem like hacks which might only apply on x86,
while the generic approach of treating "reachable" like "online" seems
like it would work in other cases too.

Fundamentally, there are three sets of CPUs. There are those known to
be reachable, those known not to be, and those which are not yet known.

So another approach we could use is to work with a cpumask of those
*known* not to be reachable, and to filter those *out* of the prebuilt
affinities. That gives us basically the right behaviour without
hotplug, but does include absent CPUs in a mask that *if* they are ever
added, wouldn't be able to receive the IRQ. Which does mean we'd have
to refrain from bringing up the corresponding queue. 



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 19:23 +0200, Thomas Gleixner wrote:
> > It so happens that in Linux, we don't really architect the software
> > like that. So each of the PCI MSI domain, HPET, and IOAPIC have their
> > *own* message composer which has the same limits and composes basically
> > the same messages as if it was *their* format, not dictated to them by
> > the APIC upstream. And that's what we're both getting our panties in a
> > knot about, I think.
> 
> Are you actually reading what I write and caring to look at the code?
> 
> PCI-MSI does not have a compose message callback in the irq chip. The
> message is composed by the underlying parent domain. Same for HPET.
> 
> The only dogdy part is the IO/APIC for hysterical raisins and because
> I did not come around yet to sort that out.

Right. And the problem is that the "underlying parent domain" in this
case is actually x86_vector_domain. Whereas it probably makes more
sense, at least in theory, for there to be an *intermediate* domain
responsible for composing the Compat MSI messages.

Both the Compat-MSI and the IR-MSI domains would be children of the
generic x86_vector_domain, and then any given HPET, IOAPIC or PCI-MSI
domain would be a child of either one of those generic MSI domains.

> > It really doesn't matter that much to the underlying generic irqdomain
> > support for limited affinities. Except that you want to make the
> > generic code support the concept of a child domain supporting *more*
> > CPUs than its parent, which really doesn't make much sense if you think
> > about it.
> 
> Right. So we really want to stick the restriction into a compat-MSI
> domain to make stuff match reality and to avoid banging the head against
> the wall sooner than later.

Right.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse



On 7 October 2020 17:02:59 BST, Thomas Gleixner  wrote:
>On Wed, Oct 07 2020 at 15:23, David Woodhouse wrote:
>> On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
>>> On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote:
>>> > On 7 October 2020 13:59:00 BST, Thomas Gleixner
> wrote:
>>> > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
>>> > > > To fix *that* case, we really do need the whole series giving
>us per-
>>> > > > domain restricted affinity, and to use it for those
>MSIs/IOAPICs that
>>> > > > the IRQ remapping doesn't cover.
>>> > > 
>>> > > Which do not exist today.
>>> > 
>>> > Sure. But at patch 10/13 into this particular patch series, it
>*does*
>>> > exist.
>>> 
>>> As I told you before: Your ordering is wrong. We do not introduce
>bugs
>>> first and then fix them later 
>>
>> I didn't introduce that bug; it's been there for years. Fixing it
>> properly requires per-irqdomain affinity limits.
>>
>> There's a cute little TODO at least in the Intel irq-remapping
>driver,
>> noting that we should probably check if there are any IOAPICs that
>> aren't in the scope of any DRHD at all. But that's all.
>
>So someone forgot to remove the cute little TODO when this was added:
>
>   if (parse_ioapics_under_ir()) {
>pr_info("Not enabling interrupt remapping\n");
>goto error;
>}

And HPET, and PCI devices including those that might be hotplugged in future 
and not be covered by any extant IOMMU's scope?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread David Woodhouse



On 7 October 2020 16:57:36 BST, Thomas Gleixner  wrote:
>On Wed, Oct 07 2020 at 15:10, David Woodhouse wrote:
>> On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
>>> What is preventing you to change the function signature? But handing
>>> down irqdomain here is not cutting it. The right thing to do is to
>>> replace 'struct irq_affinity_desc *affinity' with something more
>>> flexible.
>>
>> Yeah, although I do think I want to ditch this part completely, and
>> treat the "possible" mask for the domain very much more like we do
>> cpu_online_mask. In that we can allow affinities to be *requested*
>> which are outside it, and they can just never be *effective* while
>> those CPUs aren't present and reachable.
>
>Huch?
>
>> That way a lot of the nastiness about preparing an "acceptable" mask
>in
>> advance can go away.
>
>There is not lot's of nastiness.

OK, but I think we do have to cope with the fact that the limit is dynamic, and 
a CPU might be added which widens the mask. I think that's fundamental and not 
x86-specific.

>> It's *also* driven, as I noted, by the realisation that on x86, the
>> x86_non_ir_cpumask will only ever contain those CPUs which have been
>> brought online so far and meet the criteria... but won't that be true
>> on *any* system where CPU numbers are virtual and not 1:1 mapped with
>> whatever determines reachability by the IRQ domain? It isn't *such*
>an
>> x86ism, surely?
>
>Yes, but that's exactly the reason why I don't want to have whatever
>mask name you chose to be directly exposed and want it to be part of
>the
>irq domains because then you can express arbitrary per domain limits.

The x86_non_ir_mask isn't intended to be directly exposed to any generic IRQ 
code.

It's set up by the x86 APIC code so that those x86 IRQ domains which are 
affected by it, can set it with irqdomain_set_max_affinity() for the generic 
code to see. Without each having to create the cpumask from scratch for 
themselves.

> ... reading for once the kids are elsewhere...

Thanks.

>It's not rocket science to fix that as the irq remapping code already
>differentiates between the device types.

I don't actually like that very much. The IRQ remapping code could just compose 
the MSI messages that it desires without really having to care so much about 
the child device. The bit where IRQ remapping actually composes IOAPIC RTEs 
makes me particularly sad.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 17:25 +0200, Thomas Gleixner wrote:
> It's clearly how the hardware works. MSI has a message store of some
> sorts and if the entry is enabled then the MSI chip (in PCI or
> elsewhere) will send exactly the message which is in that message
> store. It knows absolutely nothing about what the message means and how
> it is composed. The only things which MSI knows about is whether the
> message address is 64bit wide or not and whether the entries are
> maskable or not and how many entries it can store.
> 
> Software allocates a message target at the underlying irq domain (vector
> or remap) and that underlying irq domain defines the properties.
> 
> If qemu emulates it differently then it's qemu's problem, but that does
> not make it in anyway something which influences the irq domain
> abstractions which are correctly modeled after how the hardware works.
> 
> > Not really the important part to deal with right now, either way.
> 
> Oh yes it is. We define that upfront and not after the fact.

The way the hardware works is that something handles physical address
cycles to addresses in the (on x86) 0xFEEx range, and turns them
into actual interrupts on the appropriate CPU — where the APIC ID and
vector (etc.) are directly encoded in the bits of the address and the
data written. That compatibility x86 APIC MSI format is where the 8-bit 
(or 15-bit) limit comes from.

Then interrupt remapping comes along, and now those physical address
cycles are actually handled by the IOMMU — which can either handle the
compatibility format as before, or use a different format of
address/data bits and perform a lookup in its IRTE table.

The PCI MSI domain, HPET, and even the IOAPIC are just the things out
there on the bus which might perform those physical address cycles. And
yes, as you say they're just a message store sending exactly the
message that was composed for them. They know absolutely nothing about
what the message means and how it is composed.

It so happens that in Linux, we don't really architect the software
like that. So each of the PCI MSI domain, HPET, and IOAPIC have their
*own* message composer which has the same limits and composes basically
the same messages as if it was *their* format, not dictated to them by
the APIC upstream. And that's what we're both getting our panties in a
knot about, I think.

It really doesn't matter that much to the underlying generic irqdomain
support for limited affinities. Except that you want to make the
generic code support the concept of a child domain supporting *more*
CPUs than its parent, which really doesn't make much sense if you think
about it. But it isn't that hard to do, and if it means we don't have
to argue any more about the x86 hierarchy not matching the hardware
then it's a price worth paying. 




smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
> > > The information has to property of the relevant irq domains and the
> > > hierarchy allows you nicely to retrieve it from there instead of
> > > sprinkling this all over the place.
> > 
> > No. This is not a property of the parent domain per se. Especially if
> > you're thinking that we could inherit the affinity mask from the
> > parent, then twice no.
> > 
> > This is a property of the MSI domain itself, and how many bits of
> > destination ID the hardware at *this* level can interpret and pass on
> > to the parent domain.
> 
> Errm what?
> 
> The MSI domain does not know anything about what the underlying domain
> can handle and it shouldn't.
> 
> If MSI is on top of remapping then the remapping domain defines what the
> MSI domain can do and not the other way round. Ditto for the non
> remapped case in which the vector domain decides.
> 
> The top most MSI irq chip does not even have a compose function, neither
> for the remap nor for the vector case. The composition is done by the
> parent domain from the data which the parent domain constructed. Same
> for the IO/APIC just less clearly separated.
> 
> The top most chip just takes what the underlying domain constructed and
> writes it to the message store, because that's what the top most chip
> controls. It does not control the content.

Sure, whatever. The way we arrange the IRQ domain hierarchy in x86
Linux doesn't really match my understanding of the real hardware, or
how qemu emulates it either. But it is at least internally self-
consistent, and in this model it probably does make some sense to claim
the 8-bit limit on x86_vector_domain itself, and *remove* that limit in
the remapping irqdomain.

Not really the important part to deal with right now, either way.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote:
> > On 7 October 2020 13:59:00 BST, Thomas Gleixner  wrote:
> > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
> > > > To fix *that* case, we really do need the whole series giving us per-
> > > > domain restricted affinity, and to use it for those MSIs/IOAPICs that
> > > > the IRQ remapping doesn't cover.
> > > 
> > > Which do not exist today.
> > 
> > Sure. But at patch 10/13 into this particular patch series, it *does*
> > exist.
> 
> As I told you before: Your ordering is wrong. We do not introduce bugs
> first and then fix them later 

I didn't introduce that bug; it's been there for years. Fixing it
properly requires per-irqdomain affinity limits.

There's a cute little TODO at least in the Intel irq-remapping driver,
noting that we should probably check if there are any IOAPICs that
aren't in the scope of any DRHD at all. But that's all.

If it happens, I think we'll end up doing the right thing and
instantiating a non-IR IOAPIC domain for it, and if we *don't* have any
CPU with an APIC ID above... (checks notes)... 7 ... then it'll just
about work out. Over 7 and we're screwed (because logical mode; see
also https://git.infradead.org/users/dwmw2/linux.git/commit/429f0c33f
for a straw man but that's getting even further down the rabbit hole)



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 08:19, David Woodhouse wrote:
> > On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote:
> > > On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> > > > From: David Woodhouse 
> > > > 
> > > > This is the maximum possible set of CPUs which can be used. Use it
> > > > to calculate the default affinity requested from __irq_alloc_descs()
> > > > by first attempting to find the intersection with irq_default_affinity,
> > > > or falling back to using just the max_affinity if the intersection
> > > > would be empty.
> > > 
> > > And why do we need that as yet another argument?
> > > 
> > > This is an optional property of the irq domain, really and no caller has
> > > any business with that. 
> > 
> > Because irq_domain_alloc_descs() doesn't actually *take* the domain as
> > an argument. It's more of an internal function, which is only non-
> > static because it's used from kernel/irq/ipi.c too for some reason. If
> > we convert the IPI code to just call __irq_alloc_descs() directly,
> > perhaps that we can actually make irq_domain_alloc_decs() static.
> 
> What is preventing you to change the function signature? But handing
> down irqdomain here is not cutting it. The right thing to do is to
> replace 'struct irq_affinity_desc *affinity' with something more
> flexible.

Yeah, although I do think I want to ditch this part completely, and
treat the "possible" mask for the domain very much more like we do
cpu_online_mask. In that we can allow affinities to be *requested*
which are outside it, and they can just never be *effective* while
those CPUs aren't present and reachable.

That way a lot of the nastiness about preparing an "acceptable" mask in
advance can go away.

It's *also* driven, as I noted, by the realisation that on x86, the
x86_non_ir_cpumask will only ever contain those CPUs which have been
brought online so far and meet the criteria... but won't that be true
on *any* system where CPU numbers are virtual and not 1:1 mapped with
whatever determines reachability by the IRQ domain? It isn't *such* an
x86ism, surely?

> Fact is, that if there are CPUs which cannot be targeted by device
> interrupts then the multiqueue affinity mechanism has to be fixed to
> handle this. Right now it's just broken.

I think part of the problem there is that I don't really understand how
this part is *supposed* to work. I was focusing on getting the simple
case working first, in the expectation that we'd come back to that part
ansd you'd keep me honest. Is there some decent documentation on this
that I'm missing?

> It's pretty obvious that the irq domains are the right place to store
> that information:
> 
> const struct cpumask *irqdomain_get_possible_affinity(struct irq_domain *d)
> {
> while (d) {
>   if (d->get_possible_affinity)
>   return d->get_possible_affinity(d);
> d = d->parent;
> }
> return cpu_possible_mask;
> }

Sure.

> So if you look at X86 then you have either:
> 
>[VECTOR] - [IO/APIC]
>   |-- [MSI]
>   |-- [WHATEVER]
> 
> or
> 
>[VECTOR] ---[REMAP]--- [IO/APIC]
>  ||-- [MSI]
>  |[WHATEVER]

Hierarchically, theoretically the IOAPIC and HPET really ought to be
children of the MSI domain. It's the Compatibility MSI which has the
restriction on destination ID, because of how many bits it interprets
from the MSI address. HPET and IOAPIC are just generating MSIs that hit
that upstream limit.

> So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
> then irqdomain_get_possible_affinity() will return the correct result
> independent whether remapping is enabled or not.

Sure. Although VECTOR doesn't have the restriction. As noted, it's the
Compatibility MSI that does. So the diagram looks something like this:

 [ VECTOR ]  [ REMAP ]  [ IR-MSI ]  [ IR-HPET ]
  | |---[ IR-PCI-MSI ]
  | |---[ IR-IOAPIC ]
  |
  |--[ COMPAT-MSI ]  [ HPET ]
 |-- [ PCI-MSI ]
 |-- [ IOAPIC ]


In this diagram, it's COMPAT-MSI that has the affinity restriction,
inherited by its children.

Now, I understand that you're not keen on IOAPIC actually being a child
of the MSI domain, and that's fine. In Linux right now, those generic
'IR-MSI&#

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse



On 7 October 2020 13:59:00 BST, Thomas Gleixner  wrote:
>On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
>> On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote:
>>> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
>> This is the case I called out in the cover letter:
>>
>> This patch series implements a per-domain "maximum affinity" set
>and
>> uses it for the non-remapped IOAPIC and MSI domains on x86. As
>well as
>> allowing more CPUs to be used without interrupt remapping, this
>also
>> fixes the case where some IOAPICs or PCI devices aren't actually
>in
>> scope of any active IOMMU and are operating without remapping.
>>
>> To fix *that* case, we really do need the whole series giving us per-
>> domain restricted affinity, and to use it for those MSIs/IOAPICs that
>> the IRQ remapping doesn't cover.
>
>Which do not exist today.

Sure. But at patch 10/13 into this particular patch series, it *does* exist.

(Ignoring, for the moment, that I'm about to rewrite half the preceding patches 
to take a different approach)


>>> >   ip->irqdomain->parent = parent;
>>> > + if (parent == x86_vector_domain)
>>> > + irq_domain_set_affinity(ip->irqdomain, &x86_non_ir_cpumask);
>>> 
>>> OMG
>>
>> This IOAPIC function may or may not be behind remapping.
>
>>> >   d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
>>> > + irq_domain_set_affinity(d, &x86_non_ir_cpumask);
>>> 
>>> So here it's unconditional
>>
>> Yes, this code is only for the non-remapped case and there's a
>separate
>> arch_create_remap_msi_irq_domain() function a few lines further down
>> which creates the IR-PCI-MSI IRQ domain. So no need for a condition
>> here.
>>
>>> > + if (parent == x86_vector_domain)
>>> > + irq_domain_set_affinity(d, &x86_non_ir_cpumask);
>>> 
>>> And here we need a condition again. Completely obvious and
>reviewable - NOT.
>>
>> And HPET may or may not be behind remapping so again needs the
>> condition. I had figured that part was fairly obvious but can note it
>> in the commit comment.
>
>And all of this is completely wrong to begin with.
>
>The information has to property of the relevant irq domains and the
>hierarchy allows you nicely to retrieve it from there instead of
>sprinkling this all over the place.

No. This is not a property of the parent domain per se. Especially if you're 
thinking that we could inherit the affinity mask from the parent, then twice no.

This is a property of the MSI domain itself, and how many bits of destination 
ID the hardware at *this* level can interpret and pass on to the parent domain.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 13:15 +0200, Paolo Bonzini wrote:
> On 07/10/20 10:59, David Woodhouse wrote:
> > Yeah, I was expecting the per-irqdomain affinity support to take a few
> > iterations. But this part, still sticking with the current behaviour of
> > only allowing CPUs to come online at all if they can be reached by all
> > interrupts, can probably go in first.
> > 
> > It's presumably (hopefully!) a blocker for the qemu patch which exposes
> > the same feature bit defined in this patch.
> 
> Yeah, though we could split it further and get the documentation part in
> first.  That would let the QEMU part go through.

Potentially. Although I've worked out that the first patch in my
series, adding x2apic_set_max_apicid(), is actually a bug fix because
it fixes the behaviour if you only *hotplug* CPUs with APIC IDs > 255
and there were none of them present at boot time.

So I'll post this set on its own to start with, and then focus on the
per-irqdomain affinity support after that.

David Woodhouse (5):
  x86/apic: Fix x2apic enablement without interrupt remapping
  x86/msi: Only use high bits of MSI address for DMAR unit
  x86/ioapic: Handle Extended Destination ID field in RTE
  x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 10:14 +0200, Paolo Bonzini wrote:
> Looks like the rest of the series needs some more work, but anyway:
> 
> Acked-by: Paolo Bonzini 

Thanks.

Yeah, I was expecting the per-irqdomain affinity support to take a few
iterations. But this part, still sticking with the current behaviour of
only allowing CPUs to come online at all if they can be reached by all
interrupts, can probably go in first.

It's presumably (hopefully!) a blocker for the qemu patch which exposes
the same feature bit defined in this patch.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote:
> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> 
> > From: David Woodhouse 
> > 
> > When interrupt remapping isn't enabled, only the first 255 CPUs can
> 
> No, only CPUs with an APICid < 255 

Ack.

> > receive external interrupts. Set the appropriate max affinity for
> > the IOAPIC and MSI IRQ domains accordingly.
> > 
> > This also fixes the case where interrupt remapping is enabled but some
> > devices are not within the scope of any active IOMMU.
> 
> What? If this fixes an pre-existing problem then
> 
>   1) Explain the problem proper
>   2) Have a patch at the beginning of the series which fixes it
>  independently of this pile
> 
> If it's fixing a problem in your pile, then you got the ordering wrong.

It's not that simple; there's not a single patch which fixes that and
which can go first. I can, and do, fix the "no IR" case in a simple
patch that goes first, simply by restricting the kernel to the APIC IDs
which can be targeted.

This is the case I called out in the cover letter:

This patch series implements a per-domain "maximum affinity" set and
uses it for the non-remapped IOAPIC and MSI domains on x86. As well as
allowing more CPUs to be used without interrupt remapping, this also
fixes the case where some IOAPICs or PCI devices aren't actually in
scope of any active IOMMU and are operating without remapping.

To fix *that* case, we really do need the whole series giving us per-
domain restricted affinity, and to use it for those MSIs/IOAPICs that
the IRQ remapping doesn't cover.

> You didn't start kernel programming as of yesterday, so you really know
> how that works.
> 
> > ip->irqdomain->parent = parent;
> > +   if (parent == x86_vector_domain)
> > +   irq_domain_set_affinity(ip->irqdomain, &x86_non_ir_cpumask);
> 
> OMG

This IOAPIC function may or may not be behind remapping.

> 
> > if (cfg->type == IOAPIC_DOMAIN_LEGACY ||
> > cfg->type == IOAPIC_DOMAIN_STRICT)
> > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> > index 4d891967bea4..af5ce5c4da02 100644
> > --- a/arch/x86/kernel/apic/msi.c
> > +++ b/arch/x86/kernel/apic/msi.c
> > @@ -259,6 +259,7 @@ struct irq_domain * __init 
> > native_create_pci_msi_domain(void)
> > pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
> > } else {
> > d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
> > +   irq_domain_set_affinity(d, &x86_non_ir_cpumask);
> 
> So here it's unconditional

Yes, this code is only for the non-remapped case and there's a separate
arch_create_remap_msi_irq_domain() function a few lines further down
which creates the IR-PCI-MSI IRQ domain. So no need for a condition
here.

> > }
> > return d;
> >  }
> > @@ -479,6 +480,8 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id)
> > irq_domain_free_fwnode(fn);
> > kfree(domain_info);
> > }
> > +   if (parent == x86_vector_domain)
> > +   irq_domain_set_affinity(d, &x86_non_ir_cpumask);
> 
> And here we need a condition again. Completely obvious and reviewable - NOT.

And HPET may or may not be behind remapping so again needs the
condition. I had figured that part was fairly obvious but can note it
in the commit comment.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:42 +0200, Thomas Gleixner wrote:
> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > This is the mask of CPUs to which IRQs can be delivered without
> > interrupt
> > remapping.
> >  
> > +/* Mask of CPUs which can be targeted by non-remapped interrupts.
> > */
> > +cpumask_t x86_non_ir_cpumask = { CPU_BITS_ALL };
> 
> What?

By default, if we didn't hit any limits, all CPUs can be targeted by
external interrupts. It's the default today.

Or at least we pretend it is, modulo the bugs :)

> >  #ifdef CONFIG_X86_32
> >  
> >  /*
> > @@ -1838,6 +1841,7 @@ static __init void x2apic_enable(void)
> >  static __init void try_to_enable_x2apic(int remap_mode)
> >  {
> > u32 apic_limit = 0;
> > +   int i;
> >  
> > if (x2apic_state == X2APIC_DISABLED)
> > return;
> > @@ -1880,6 +1884,14 @@ static __init void try_to_enable_x2apic(int 
> > remap_mode)
> > if (apic_limit)
> > x2apic_set_max_apicid(apic_limit);
> >  
> > +   /* Build the affinity mask for interrupts that can't be remapped. */
> > +   cpumask_clear(&x86_non_ir_cpumask);
> > +   i = min_t(unsigned int, num_possible_cpus() - 1, apic_limit);
> > +   for ( ; i >= 0; i--) {
> > +   if (cpu_physical_id(i) <= apic_limit)
> > +   cpumask_set_cpu(i, &x86_non_ir_cpumask);
> > +   }
> 
> Blink. If the APIC id is not linear with the cpu numbers then this
> results in a reduced addressable set of CPUs. WHY?

Hm, good question. That loop was cargo-culted from hyperv-iommu.c;
perhaps it makes more sense there because Hyper-V really does promise
that linearity, or perhaps it was already buggy. Will fix.

In fact, in apic.c I could probably just use the cpuid_to_apicid array
which is right there in the file.

> > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> > index aa9a3b54a96c..4d0ef46fedb9 100644
> > --- a/arch/x86/kernel/apic/io_apic.c
> > +++ b/arch/x86/kernel/apic/io_apic.c
> > @@ -2098,6 +2098,8 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
> > struct irq_alloc_info info;
> >  
> > ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 0, 0);
> > +   if (domain->parent == x86_vector_domain)
> > +   info.mask = &x86_non_ir_cpumask;
> 
> We are not going to sprinkle such domain checks all over the
> place. Again, the mask is a property of the interrupt domain.

Yeah, that's a hangover from the first attempts which I forgot to
delete.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 08/13] genirq: Add irq_domain_set_affinity()

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:32 +0200, Thomas Gleixner wrote:
> What the heck? Why does this need a setter function which is exported?
> So that random driver writers can fiddle with it?
> 
> The affinity mask restriction of an irq domain is already known when the
> domain is created.

It's exported because __irq_domain_add is exported. I didn't want to
just add a rarely-used extra argument to __irq_domain_add and the other
public APIs which call into it, and figured a separate setter function
was the simplest option.

I can rework to add the argument to __irq_domain_add if you prefer.

Or we can declare that restricted affinity *isn't* something that IRQ
domains provided by modules can have, and just not export the setter
function?



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote:
> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > This is the maximum possible set of CPUs which can be used. Use it
> > to calculate the default affinity requested from __irq_alloc_descs()
> > by first attempting to find the intersection with irq_default_affinity,
> > or falling back to using just the max_affinity if the intersection
> > would be empty.
> 
> And why do we need that as yet another argument?
> 
> This is an optional property of the irq domain, really and no caller has
> any business with that. 

Because irq_domain_alloc_descs() doesn't actually *take* the domain as
an argument. It's more of an internal function, which is only non-
static because it's used from kernel/irq/ipi.c too for some reason. If
we convert the IPI code to just call __irq_alloc_descs() directly,
perhaps that we can actually make irq_domain_alloc_decs() static.

> >  int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t 
> > hwirq,
> > -  int node, const struct irq_affinity_desc *affinity)
> > +  int node, const struct irq_affinity_desc *affinity,
> > +  const struct cpumask *max_affinity)
> >  {
> > +   cpumask_var_t default_affinity;
> > unsigned int hint;
> > +   int i;
> > +
> > +   /* Check requested per-IRQ affinities are in the possible range */
> > +   if (affinity && max_affinity) {
> > +   for (i = 0; i < cnt; i++)
> > +   if (!cpumask_subset(&affinity[i].mask, max_affinity))
> > +   return -EINVAL;
> 
> https://lore.kernel.org/r/alpine.DEB.2.20.1701171956290.3645@nanos
> 
> What is preventing the affinity spreading code from spreading the masks
> out to unusable CPUs? The changelog is silent about that part.

I'm coming to the conclusion that we should allow unusable CPUs to be
specified at this point, just as we do offline CPUs. That's largely
driven by the realisation that our x86_non_ir_cpumask is only going to
contain online CPUs anyway, and hotplugged CPUs only get added to it as
they are brought online.

> > +   /*
> > +* Generate default affinity. Either the possible subset of
> > +* irq_default_affinity if such a subset is non-empty, or fall
> > +* back to the provided max_affinity if there is no intersection.
> 
> ..
> > +* And just a copy of irq_default_affinity in the
> > +* !CONFIG_CPUMASK_OFFSTACK case.
> 
> We know that already...
> 
> > +*/
> > +   memset(&default_affinity, 0, sizeof(default_affinity));
> 
> Right, memset() before allocating is useful.

The memset is because there's no cpumask_var_t initialiser that I can
see. So cpumask_available() on the uninitialised 'default_affinity'
variable might be true even in the OFFSTACK case.

> > +   if ((max_affinity &&
> > +!cpumask_subset(irq_default_affinity, max_affinity))) {
> > +   if (!alloc_cpumask_var(&default_affinity, GFP_KERNEL))
> > +   return -ENOMEM;
> > +   cpumask_and(default_affinity, max_affinity,
> > +   irq_default_affinity);
> > +   if (cpumask_empty(default_affinity))
> > +   cpumask_copy(default_affinity, max_affinity);
> > +   } else if (cpumask_available(default_affinity))
> > +   cpumask_copy(default_affinity, irq_default_affinity);
> 
> That's garbage and unreadable.

That's why there was a comment explaining it... at which point you
claimed to already know :)

Clearly the comment didn't do the job it was supposed to do. I'll
rework.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs()

2020-10-06 Thread David Woodhouse



On 6 October 2020 22:01:18 BST, Thomas Gleixner  wrote:
>On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
>>  
>>  #else /* CONFIG_SMP */
>>  
>> +#define irq_default_affinity (NULL)
>
>...
>
>>  static int alloc_descs(unsigned int start, unsigned int cnt, int
>node,
>> const struct irq_affinity_desc *affinity,
>> +   const struct cpumask *default_affinity,
>> struct module *owner)
>>  {
>>  struct irq_desc *desc;
>>  int i;
>>  
>>  /* Validate affinity mask(s) */
>> +if (!default_affinity || cpumask_empty(default_affinity))
>> +return -EINVAL;
>
>How is that supposed to work on UP?

Hm, good point.

>Aside of that I really hate to have yet another argument just
>because.

Yeah, I was trying to avoid having to allocate a whole array of 
irq_affinity_desc just to fill them all in with the same default.

But perhaps I need to treat the "affinity_max" like we do cpu_online_mask, and 
allow affinity to be set even to offline/unreachable CPUs at this point. Then 
we can be more relaxed about the default affinities.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs()

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

Instead of plugging in irq_default_affinity at the lowest level in
desc_smp_init() if the caller didn't specify an affinity for this
specific IRQ in its array, allow the default affinity to be passed
down from __irq_alloc_descs() instead.

This is in preparation for irq domains having their own default (and
indeed maximum) affinities.

An alternative possibility would have been to make the caller allocate
an entire array of struct irq_affinity_desc for the allocation, but
that's a lot nastier than simply passing in an additional const cpumask.

This commit has no visible API change outside static functions in
irqdesc.c for now; the actual change will come later.

Signed-off-by: David Woodhouse 
---
 include/linux/interrupt.h |  2 ++
 kernel/irq/irqdesc.c  | 21 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index f9aee3538461..cd0ff293486a 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -364,6 +364,8 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, 
unsigned int maxvec,
 
 #else /* CONFIG_SMP */
 
+#define irq_default_affinity (NULL)
+
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
 {
return -EINVAL;
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 1a7723604399..4ac91b9fc618 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -81,8 +81,6 @@ static int alloc_masks(struct irq_desc *desc, int node)
 static void desc_smp_init(struct irq_desc *desc, int node,
  const struct cpumask *affinity)
 {
-   if (!affinity)
-   affinity = irq_default_affinity;
cpumask_copy(desc->irq_common_data.affinity, affinity);
 
 #ifdef CONFIG_GENERIC_PENDING_IRQ
@@ -465,12 +463,16 @@ static void free_desc(unsigned int irq)
 
 static int alloc_descs(unsigned int start, unsigned int cnt, int node,
   const struct irq_affinity_desc *affinity,
+  const struct cpumask *default_affinity,
   struct module *owner)
 {
struct irq_desc *desc;
int i;
 
/* Validate affinity mask(s) */
+   if (!default_affinity || cpumask_empty(default_affinity))
+   return -EINVAL;
+
if (affinity) {
for (i = 0; i < cnt; i++) {
if (cpumask_empty(&affinity[i].mask))
@@ -479,7 +481,7 @@ static int alloc_descs(unsigned int start, unsigned int 
cnt, int node,
}
 
for (i = 0; i < cnt; i++) {
-   const struct cpumask *mask = NULL;
+   const struct cpumask *mask = default_affinity;
unsigned int flags = 0;
 
if (affinity) {
@@ -488,10 +490,10 @@ static int alloc_descs(unsigned int start, unsigned int 
cnt, int node,
IRQD_MANAGED_SHUTDOWN;
}
mask = &affinity->mask;
-   node = cpu_to_node(cpumask_first(mask));
affinity++;
}
 
+   node = cpu_to_node(cpumask_first(mask));
desc = alloc_desc(start + i, node, flags, mask, owner);
if (!desc)
goto err;
@@ -538,7 +540,7 @@ int __init early_irq_init(void)
nr_irqs = initcnt;
 
for (i = 0; i < initcnt; i++) {
-   desc = alloc_desc(i, node, 0, NULL, NULL);
+   desc = alloc_desc(i, node, 0, irq_default_affinity, NULL);
set_bit(i, allocated_irqs);
irq_insert_desc(i, desc);
}
@@ -573,7 +575,8 @@ int __init early_irq_init(void)
raw_spin_lock_init(&desc[i].lock);
lockdep_set_class(&desc[i].lock, &irq_desc_lock_class);
mutex_init(&desc[i].request_mutex);
-   desc_set_defaults(i, &desc[i], node, NULL, NULL);
+   desc_set_defaults(i, &desc[i], node, irq_default_affinity,
+ NULL);
}
return arch_early_irq_init();
 }
@@ -590,12 +593,14 @@ static void free_desc(unsigned int irq)
unsigned long flags;
 
raw_spin_lock_irqsave(&desc->lock, flags);
-   desc_set_defaults(irq, desc, irq_desc_get_node(desc), NULL, NULL);
+   desc_set_defaults(irq, desc, irq_desc_get_node(desc),
+ irq_default_affinity, NULL);
raw_spin_unlock_irqrestore(&desc->lock, flags);
 }
 
 static inline int alloc_descs(unsigned int start, unsigned int cnt, int node,
  const struct irq_affinity_desc *affinity,
+ const struct cpumask *default_affinity,
  struct module *owner)
 {
u32 i;
@@ -803,7 +808,7 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned in

[PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

Some hypervisors can allow the guest to use the Extended Destination ID
field in the IOAPIC RTE and MSI address to address up to 32768 CPUs.

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/mpspec.h   |  1 +
 arch/x86/include/asm/x86_init.h |  2 ++
 arch/x86/kernel/apic/apic.c | 15 ++-
 arch/x86/kernel/apic/msi.c  |  9 -
 arch/x86/kernel/x86_init.c  |  1 +
 5 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index e90ac7e9ae2c..25ee8ca0a1f2 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -42,6 +42,7 @@ extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
 extern unsigned int boot_cpu_physical_apicid;
 extern u8 boot_cpu_apic_version;
 extern unsigned long mp_lapic_addr;
+extern int msi_ext_dest_id;
 
 #ifdef CONFIG_X86_LOCAL_APIC
 extern int smp_found_config;
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 397196fae24d..5af3fe9e38f3 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -114,6 +114,7 @@ struct x86_init_pci {
  * @init_platform: platform setup
  * @guest_late_init:   guest late init
  * @x2apic_available:  X2APIC detection
+ * @msi_ext_dest_id:   MSI and IOAPIC support 15-bit APIC IDs
  * @init_mem_mapping:  setup early mappings during init_mem_mapping()
  * @init_after_bootmem:guest init after boot allocator is 
finished
  */
@@ -121,6 +122,7 @@ struct x86_hyper_init {
void (*init_platform)(void);
void (*guest_late_init)(void);
bool (*x2apic_available)(void);
+   bool (*msi_ext_dest_id)(void);
void (*init_mem_mapping)(void);
void (*init_after_bootmem)(void);
 };
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a75767052a92..459c78558f36 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1837,6 +1837,8 @@ static __init void x2apic_enable(void)
 
 static __init void try_to_enable_x2apic(int remap_mode)
 {
+   u32 apic_limit = 0;
+
if (x2apic_state == X2APIC_DISABLED)
return;
 
@@ -1858,7 +1860,15 @@ static __init void try_to_enable_x2apic(int remap_mode)
return;
}
 
-   x2apic_set_max_apicid(255);
+   /*
+* If the hypervisor supports extended destination ID
+* in IOAPIC and MSI, we can support that many CPUs.
+*/
+   if (x86_init.hyper.msi_ext_dest_id()) {
+   msi_ext_dest_id = 1;
+   apic_limit = 32767;
+   } else
+   apic_limit = 255;
}
 
/*
@@ -1867,6 +1877,9 @@ static __init void try_to_enable_x2apic(int remap_mode)
 */
x2apic_phys = 1;
}
+   if (apic_limit)
+   x2apic_set_max_apicid(apic_limit);
+
x2apic_enable();
 }
 
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 356f8acf4927..4d891967bea4 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,6 +23,8 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
+int msi_ext_dest_id __ro_after_init;
+
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, 
int dmar)
 {
msg->address_hi = MSI_ADDR_BASE_HI;
@@ -45,10 +47,15 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, 
struct msi_msg *msg, int
 * Only the IOMMU itself can use the trick of putting destination
 * APIC ID into the high bits of the address. Anything else would
 * just be writing to memory if it tried that, and needs IR to
-* address APICs above 255.
+* address APICs which can't be addressed in the normal 32-bit
+* address range at 0xFFEx. That is typically just 8 bits, but
+* some hypervisors allow the extended destination ID field in bits
+* 11-5 to be used, giving support for 15 bits of APIC IDs in total.
 */
if (dmar)
msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+   else if (msi_ext_dest_id && cfg->dest_apicid < 0x8000)
+   msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3;
else
WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a3038d8deb6a..8b395821cb8d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = {
.init_platform  = x86_init_noop,

[PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

It took me a while to realise that this "IRQ remapping" driver exists
not to actually remap interrupts, but to return -EINVAL if anyone ever
tries to set the affinity to a set of CPUs which can't be reached
*without* remapping. Having fixed the core IRQ domain code to handle
such limited, all this hackery can now die.

I haven't deleted it entirely because its existence still causes the
kernel to use X2APIC in cluster mode not physical. I'm not sure we
*want* that, but it wants further thought and testing before ripping
it out too.

Signed-off-by: David Woodhouse 
---
 drivers/iommu/hyperv-iommu.c | 149 +--
 1 file changed, 1 insertion(+), 148 deletions(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..46a794d34f57 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -24,156 +24,12 @@
 #include "irq_remapping.h"
 
 #ifdef CONFIG_IRQ_REMAP
-
-/*
- * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
- * Redirection Table. Hyper-V exposes one single IO-APIC and so define
- * 24 IO APIC remmapping entries.
- */
-#define IOAPIC_REMAPPING_ENTRY 24
-
-static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
-static struct irq_domain *ioapic_ir_domain;
-
-static int hyperv_ir_set_affinity(struct irq_data *data,
-   const struct cpumask *mask, bool force)
-{
-   struct irq_data *parent = data->parent_data;
-   struct irq_cfg *cfg = irqd_cfg(data);
-   struct IO_APIC_route_entry *entry;
-   int ret;
-
-   /* Return error If new irq affinity is out of ioapic_max_cpumask. */
-   if (!cpumask_subset(mask, &ioapic_max_cpumask))
-   return -EINVAL;
-
-   ret = parent->chip->irq_set_affinity(parent, mask, force);
-   if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
-   return ret;
-
-   entry = data->chip_data;
-   entry->dest = cfg->dest_apicid;
-   entry->vector = cfg->vector;
-   send_cleanup_vector(cfg);
-
-   return 0;
-}
-
-static struct irq_chip hyperv_ir_chip = {
-   .name   = "HYPERV-IR",
-   .irq_ack= apic_ack_irq,
-   .irq_set_affinity   = hyperv_ir_set_affinity,
-};
-
-static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
-unsigned int virq, unsigned int nr_irqs,
-void *arg)
-{
-   struct irq_alloc_info *info = arg;
-   struct irq_data *irq_data;
-   struct irq_desc *desc;
-   int ret = 0;
-
-   if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
-   return -EINVAL;
-
-   ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
-   if (ret < 0)
-   return ret;
-
-   irq_data = irq_domain_get_irq_data(domain, virq);
-   if (!irq_data) {
-   irq_domain_free_irqs_common(domain, virq, nr_irqs);
-   return -EINVAL;
-   }
-
-   irq_data->chip = &hyperv_ir_chip;
-
-   /*
-* If there is interrupt remapping function of IOMMU, setting irq
-* affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't
-* support interrupt remapping function, setting irq affinity of IO-APIC
-* interrupts still needs to change IO-APIC registers. But ioapic_
-* configure_entry() will ignore value of cfg->vector and cfg->
-* dest_apicid when IO-APIC's parent irq domain is not the vector
-* domain.(See ioapic_configure_entry()) In order to setting vector
-* and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved
-* in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
-* affinity() set vector and dest_apicid directly into IO-APIC entry.
-*/
-   irq_data->chip_data = info->ioapic.entry;
-
-   /*
-* Hypver-V IO APIC irq affinity should be in the scope of
-* ioapic_max_cpumask because no irq remapping support.
-*/
-   desc = irq_data_to_desc(irq_data);
-   cpumask_copy(desc->irq_common_data.affinity, &ioapic_max_cpumask);
-
-   return 0;
-}
-
-static void hyperv_irq_remapping_free(struct irq_domain *domain,
-unsigned int virq, unsigned int nr_irqs)
-{
-   irq_domain_free_irqs_common(domain, virq, nr_irqs);
-}
-
-static int hyperv_irq_remapping_activate(struct irq_domain *domain,
- struct irq_data *irq_data, bool reserve)
-{
-   struct irq_cfg *cfg = irqd_cfg(irq_data);
-   struct IO_APIC_route_entry *entry = irq_data->chip_data;
-
-   entry->dest = cfg->dest_apicid;
-   entry->vector = cfg->vector;
-
-   return 0;
-}
-
-static const struct irq_domain_ops hyperv_ir_domain_ops = {
-   .alloc = hyp

[PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

This allows the host to indicate that IOAPIC and MSI emulation supports
15-bit destination IDs, allowing up to 32Ki CPUs without remapping.

Signed-off-by: David Woodhouse 
---
 Documentation/virt/kvm/cpuid.rst | 4 
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c| 6 ++
 3 files changed, 11 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..1726b5925d2b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT  14  guest checks 
this feature bit
   async pf acknowledgment msr
   0x4b564d07.
 
+KVM_FEATURE_MSI_EXT_DEST_ID   15  guest checks this feature bit
+  before using extended destination
+  ID bits in MSI address bits 11-5.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24  host will warn if no guest-side
   per-cpu warps are expeced in
   kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..950afebfba88 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL   12
 #define KVM_FEATURE_PV_SCHED_YIELD 13
 #define KVM_FEATURE_ASYNC_PF_INT   14
+#define KVM_FEATURE_MSI_EXT_DEST_ID15
 
 #define KVM_HINTS_REALTIME  0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b51b727b140..4986b4399aef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -743,12 +743,18 @@ static void __init kvm_init_platform(void)
x86_platform.apic_post_init = kvm_apic_init;
 }
 
+static bool __init kvm_msi_ext_dest_id(void)
+{
+   return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
.name   = "KVM",
.detect = kvm_detect,
.type   = X86_HYPER_KVM,
.init.guest_late_init   = kvm_guest_init,
.init.x2apic_available  = kvm_para_available,
+   .init.msi_ext_dest_id   = kvm_msi_ext_dest_id,
.init.init_platform = kvm_init_platform,
 };
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs.

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

If the BIOS has enabled x2apic mode, then leave it enabled and don't
fall back to xapic just because some CPUs can't be targeted.

Previously, if there are CPUs with APIC IDs > 255, the kernel will
disable x2apic and fall back to xapic. And then not use the additional
CPUs anyway, since xapic can't target them either.

In fact, xapic mode can target even *fewer* CPUs, since it can't use
the one with APIC ID 255 either. Using x2apic instead gives us at least
one more CPU without needing interrupt remapping in the guest.

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/apic.h|  1 +
 arch/x86/kernel/apic/apic.c| 18 ++
 arch/x86/kernel/apic/x2apic_phys.c |  9 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c129abb7f09..b0fd204e0023 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void)
 
 extern int x2apic_mode;
 extern int x2apic_phys;
+extern void __init x2apic_set_max_apicid(u32 apicid);
 extern void __init check_x2apic(void);
 extern void x2apic_setup(void);
 static inline int x2apic_enabled(void)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b3eef1d5c903..a75767052a92 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1841,14 +1841,24 @@ static __init void try_to_enable_x2apic(int remap_mode)
return;
 
if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
-   /* IR is required if there is APIC ID > 255 even when running
-* under KVM
+   /*
+* If there are APIC IDs above 255, they cannot be used
+* without interrupt remapping. In a virtual machine, but
+* not on bare metal, X2APIC can be used anyway. In the
+* case where BIOS has enabled X2APIC, don't disable it
+* just because there are APIC IDs that can't be used.
+* They couldn't be used if the kernel falls back to XAPIC
+* anyway.
 */
if (max_physical_apicid > 255 ||
!x86_init.hyper.x2apic_available()) {
pr_info("x2apic: IRQ remapping doesn't support X2APIC 
mode\n");
-   x2apic_disable();
-   return;
+   if (!x2apic_mode) {
+   x2apic_disable();
+   return;
+   }
+
+   x2apic_set_max_apicid(255);
}
 
/*
diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..b4b4e89c1118 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -8,6 +8,12 @@
 int x2apic_phys;
 
 static struct apic apic_x2apic_phys;
+static u32 x2apic_max_apicid;
+
+void __init x2apic_set_max_apicid(u32 apicid)
+{
+   x2apic_max_apicid = apicid;
+}
 
 static int __init set_x2apic_phys_mode(char *arg)
 {
@@ -98,6 +104,9 @@ static int x2apic_phys_probe(void)
 /* Common x2apic functions, also used by x2apic_cluster */
 int x2apic_apic_id_valid(u32 apicid)
 {
+   if (x2apic_max_apicid && apicid > x2apic_max_apicid)
+   return 0;
+
return 1;
 }
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

Now that external interrupt affinity can be limited to the range of
CPUs that can be reached through legacy IOAPIC RTEs and MSI, it is
possible to use additional CPUs.

Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/apic/apic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 069f5e9f1d28..750a92464bec 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1881,8 +1881,6 @@ static __init void try_to_enable_x2apic(int remap_mode)
 */
x2apic_phys = 1;
}
-   if (apic_limit)
-   x2apic_set_max_apicid(apic_limit);
 
/* Build the affinity mask for interrupts that can't be remapped. */
cpumask_clear(&x86_non_ir_cpumask);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

This is the maximum possible set of CPUs which can be used. Use it
to calculate the default affinity requested from __irq_alloc_descs()
by first attempting to find the intersection with irq_default_affinity,
or falling back to using just the max_affinity if the intersection
would be empty.

Signed-off-by: David Woodhouse 
---
 include/linux/irqdomain.h |  3 ++-
 kernel/irq/ipi.c  |  2 +-
 kernel/irq/irqdomain.c| 45 +--
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 5d9de881..6b5576da77f0 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -278,7 +278,8 @@ extern void irq_set_default_host(struct irq_domain *host);
 extern struct irq_domain *irq_get_default_host(void);
 extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
  irq_hw_number_t hwirq, int node,
- const struct irq_affinity_desc *affinity);
+ const struct irq_affinity_desc *affinity,
+ const struct cpumask *max_affinity);
 
 static inline struct fwnode_handle *of_node_to_fwnode(struct device_node *node)
 {
diff --git a/kernel/irq/ipi.c b/kernel/irq/ipi.c
index 43e3d1be622c..13f56210eca9 100644
--- a/kernel/irq/ipi.c
+++ b/kernel/irq/ipi.c
@@ -75,7 +75,7 @@ int irq_reserve_ipi(struct irq_domain *domain,
}
}
 
-   virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE, NULL);
+   virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE, NULL, dest);
if (virq <= 0) {
pr_warn("Can't reserve IPI, failed to alloc descs\n");
return -ENOMEM;
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index c93e00ca11d8..ffd41f21afca 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -660,7 +660,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
}
 
/* Allocate a virtual interrupt number */
-   virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), 
NULL);
+   virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+ NULL, NULL);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -1011,25 +1012,57 @@ int irq_domain_translate_twocell(struct irq_domain *d,
 EXPORT_SYMBOL_GPL(irq_domain_translate_twocell);
 
 int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t hwirq,
-  int node, const struct irq_affinity_desc *affinity)
+  int node, const struct irq_affinity_desc *affinity,
+  const struct cpumask *max_affinity)
 {
+   cpumask_var_t default_affinity;
unsigned int hint;
+   int i;
+
+   /* Check requested per-IRQ affinities are in the possible range */
+   if (affinity && max_affinity) {
+   for (i = 0; i < cnt; i++)
+   if (!cpumask_subset(&affinity[i].mask, max_affinity))
+   return -EINVAL;
+   }
+
+   /*
+* Generate default affinity. Either the possible subset of
+* irq_default_affinity if such a subset is non-empty, or fall
+* back to the provided max_affinity if there is no intersection.
+* And just a copy of irq_default_affinity in the
+* !CONFIG_CPUMASK_OFFSTACK case.
+*/
+   memset(&default_affinity, 0, sizeof(default_affinity));
+   if ((max_affinity &&
+!cpumask_subset(irq_default_affinity, max_affinity))) {
+   if (!alloc_cpumask_var(&default_affinity, GFP_KERNEL))
+   return -ENOMEM;
+   cpumask_and(default_affinity, max_affinity,
+   irq_default_affinity);
+   if (cpumask_empty(default_affinity))
+   cpumask_copy(default_affinity, max_affinity);
+   } else if (cpumask_available(default_affinity))
+   cpumask_copy(default_affinity, irq_default_affinity);
 
if (virq >= 0) {
virq = __irq_alloc_descs(virq, virq, cnt, node, THIS_MODULE,
-affinity, NULL);
+affinity, default_affinity);
} else {
hint = hwirq % nr_irqs;
if (hint == 0)
hint++;
virq = __irq_alloc_descs(-1, hint, cnt, node, THIS_MODULE,
-affinity, NULL);
+affinity, default_affinity);
if (virq <= 0 && hint > 1) {
virq = __irq_alloc_descs(-1, 1, cnt, 

[PATCH 08/13] genirq: Add irq_domain_set_affinity()

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

This allows a maximal affinity to be set, for IRQ domains which cannot
target all CPUs in the system.

Signed-off-by: David Woodhouse 
---
 include/linux/irqdomain.h |  4 
 kernel/irq/irqdomain.c| 28 ++--
 kernel/irq/manage.c   | 19 +++
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 6b5576da77f0..cf686f18c1fa 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -151,6 +151,7 @@ struct irq_domain_chip_generic;
  *  drivers using the generic chip library which uses this pointer.
  * @parent: Pointer to parent irq_domain to support hierarchy irq_domains
  * @debugfs_file: dentry for the domain debugfs file
+ * @max_affinity: mask of CPUs targetable by this IRQ domain
  *
  * Revmap data, used internally by irq_domain
  * @revmap_direct_max_irq: The largest hwirq that can be set for controllers 
that
@@ -177,6 +178,7 @@ struct irq_domain {
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
struct dentry   *debugfs_file;
 #endif
+   const struct cpumask *max_affinity;
 
/* reverse map data. The linear map gets appended to the irq_domain */
irq_hw_number_t hwirq_max;
@@ -453,6 +455,8 @@ extern void irq_domain_set_info(struct irq_domain *domain, 
unsigned int virq,
void *chip_data, irq_flow_handler_t handler,
void *handler_data, const char *handler_name);
 extern void irq_domain_reset_irq_data(struct irq_data *irq_data);
+extern int irq_domain_set_affinity(struct irq_domain *domain,
+  const struct cpumask *affinity);
 #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
 extern struct irq_domain *irq_domain_create_hierarchy(struct irq_domain 
*parent,
unsigned int flags, unsigned int size,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index ffd41f21afca..0b298355be1b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -661,7 +661,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 
/* Allocate a virtual interrupt number */
virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
- NULL, NULL);
+ NULL, domain->max_affinity);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -1078,6 +1078,27 @@ void irq_domain_reset_irq_data(struct irq_data *irq_data)
 }
 EXPORT_SYMBOL_GPL(irq_domain_reset_irq_data);
 
+/**
+ * irq_domain_set_affinity - Set maximum CPU affinity for domain
+ * @parent:Domain to set affinity for
+ * @affinity:  Pointer to cpumask, consumed by domain
+ *
+ * Sets the maximal set of CPUs to which interrupts in this domain may
+ * be delivered. Must only be called after creation, before any interrupts
+ * have been in the domain.
+ *
+ * This function retains a pointer to the cpumask which is passed in.
+ */
+int irq_domain_set_affinity(struct irq_domain *domain,
+   const struct cpumask *affinity)
+{
+   if (cpumask_empty(affinity))
+   return -EINVAL;
+   domain->max_affinity = affinity;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(irq_domain_set_affinity);
+
 #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
 /**
  * irq_domain_create_hierarchy - Add a irqdomain into the hierarchy
@@ -1110,6 +1131,9 @@ struct irq_domain *irq_domain_create_hierarchy(struct 
irq_domain *parent,
if (domain) {
domain->parent = parent;
domain->flags |= flags;
+   if (parent && parent->max_affinity)
+   irq_domain_set_affinity(domain,
+   parent->max_affinity);
}
 
return domain;
@@ -1375,7 +1399,7 @@ int __irq_domain_alloc_irqs(struct irq_domain *domain, 
int irq_base,
virq = irq_base;
} else {
virq = irq_domain_alloc_descs(irq_base, nr_irqs, 0, node,
- affinity, NULL);
+ affinity, domain->max_affinity);
if (virq < 0) {
pr_debug("cannot allocate IRQ(base %d, count %d)\n",
 irq_base, nr_irqs);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 5df903fccb60..e8c5f8ecd306 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -345,6 +345,10 @@ int irq_set_affinity_locked(struct irq_data *data, const 
struct cpumask *mask,
struct irq_desc *desc = irq_data_to_desc(data);
int ret = 0;
 
+   if (data->domain && data->domain->max_affinity &&
+   !cpumask_subset(mask, data->domain->max_affinity))
+   return -EINVAL;
+
if (!chi

[PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

The Intel IOMMU has an MSI-like configuration for its interrupt, but
it isn't really MSI. So it gets to abuse the high 32 bits of the address,
and puts the high 24 bits of the extended APIC ID there.

This isn't something that can be used in the general case for real MSIs,
since external devices using the high bits of the address would be
performing writes to actual memory space above 4GiB, not targeted at the
APIC.

Factor the hack out and allow it only to be used when appropriate,
adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC
ID. That should never happen since the legacy MSI messages are not supposed
to be used with Interrupt Remapping enabled.

The x2apic_enabled() check isn't needed because we won't bring up CPUs
with higher APIC IDs unless x2apic is enabled anyway.

Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/apic/msi.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..356f8acf4927 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -23,13 +23,10 @@
 
 struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
-static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, 
int dmar)
 {
msg->address_hi = MSI_ADDR_BASE_HI;
 
-   if (x2apic_enabled())
-   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
msg->address_lo =
MSI_ADDR_BASE_LO |
((apic->irq_dest_mode == 0) ?
@@ -43,18 +40,42 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, 
struct msi_msg *msg)
MSI_DATA_LEVEL_ASSERT |
MSI_DATA_DELIVERY_FIXED |
MSI_DATA_VECTOR(cfg->vector);
+
+   /*
+* Only the IOMMU itself can use the trick of putting destination
+* APIC ID into the high bits of the address. Anything else would
+* just be writing to memory if it tried that, and needs IR to
+* address APICs above 255.
+*/
+   if (dmar)
+   msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+   else
+   WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid));
 }
 
 void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-   __irq_msi_compose_msg(irqd_cfg(data), msg);
+   __irq_msi_compose_msg(irqd_cfg(data), msg, 0);
 }
 
+/*
+ * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the
+ * high bits of the destination APIC ID. This can't be done in the general
+ * case for MSIs as it would be targeting real memory above 4GiB not the
+ * APIC.
+ */
+static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+   __irq_msi_compose_msg(irqd_cfg(data), msg, 1);
+
+
+
+}
 static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
 {
struct msi_msg msg[2] = { [1] = { }, };
 
-   __irq_msi_compose_msg(cfg, msg);
+   __irq_msi_compose_msg(cfg, msg, 0);
irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
 }
 
@@ -288,6 +309,7 @@ static struct irq_chip dmar_msi_controller = {
.irq_ack= irq_chip_ack_parent,
.irq_set_affinity   = msi_domain_set_affinity,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
+   .irq_compose_msi_msg= dmar_msi_compose_msg,
.irq_write_msi_msg  = dmar_msi_write_msg,
.flags  = IRQCHIP_SKIP_SET_WAKE,
 };
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/13] x86/irq: Add x86_non_ir_cpumask

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

This is the mask of CPUs to which IRQs can be delivered without interrupt
remapping.

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/mpspec.h  |  1 +
 arch/x86/kernel/apic/apic.c| 12 
 arch/x86/kernel/apic/io_apic.c |  2 ++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 25ee8ca0a1f2..b2090be5b444 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -141,5 +141,6 @@ static inline void physid_set_mask_of_physid(int physid, 
physid_mask_t *map)
 #define PHYSID_MASK_NONE   { {[0 ... PHYSID_ARRAY_SIZE-1] = 0UL} }
 
 extern physid_mask_t phys_cpu_present_map;
+extern cpumask_t x86_non_ir_cpumask;
 
 #endif /* _ASM_X86_MPSPEC_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 459c78558f36..069f5e9f1d28 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -103,6 +103,9 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_acpiid);
 
+/* Mask of CPUs which can be targeted by non-remapped interrupts. */
+cpumask_t x86_non_ir_cpumask = { CPU_BITS_ALL };
+
 #ifdef CONFIG_X86_32
 
 /*
@@ -1838,6 +1841,7 @@ static __init void x2apic_enable(void)
 static __init void try_to_enable_x2apic(int remap_mode)
 {
u32 apic_limit = 0;
+   int i;
 
if (x2apic_state == X2APIC_DISABLED)
return;
@@ -1880,6 +1884,14 @@ static __init void try_to_enable_x2apic(int remap_mode)
if (apic_limit)
x2apic_set_max_apicid(apic_limit);
 
+   /* Build the affinity mask for interrupts that can't be remapped. */
+   cpumask_clear(&x86_non_ir_cpumask);
+   i = min_t(unsigned int, num_possible_cpus() - 1, apic_limit);
+   for ( ; i >= 0; i--) {
+   if (cpu_physical_id(i) <= apic_limit)
+   cpumask_set_cpu(i, &x86_non_ir_cpumask);
+   }
+
x2apic_enable();
 }
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index aa9a3b54a96c..4d0ef46fedb9 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2098,6 +2098,8 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
struct irq_alloc_info info;
 
ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 0, 0);
+   if (domain->parent == x86_vector_domain)
+   info.mask = &x86_non_ir_cpumask;
info.devid = mpc_ioapic_id(ioapic);
info.ioapic.pin = pin;
mutex_lock(&ioapic_mutex);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/13] genirq: Add default_affinity argument to __irq_alloc_descs()

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

It already takes an array of affinities for each individual irq being
allocated but that's awkward to allocate and populate in the case
where they're all the same and inherited from the irqdomain, so pass
the default in separately as a simple cpumask.

Signed-off-by: David Woodhouse 
---
 include/linux/irq.h| 10 ++
 kernel/irq/devres.c|  8 ++--
 kernel/irq/irqdesc.c   | 10 --
 kernel/irq/irqdomain.c |  6 +++---
 4 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 1b7f4dfee35b..6e119594d35d 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -897,15 +897,17 @@ unsigned int arch_dynirq_lower_bound(unsigned int from);
 
 int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
  struct module *owner,
- const struct irq_affinity_desc *affinity);
+ const struct irq_affinity_desc *affinity,
+ const struct cpumask *default_affinity);
 
 int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from,
   unsigned int cnt, int node, struct module *owner,
-  const struct irq_affinity_desc *affinity);
+  const struct irq_affinity_desc *affinity,
+  const struct cpumask *default_affinity);
 
 /* use macros to avoid needing export.h for THIS_MODULE */
 #define irq_alloc_descs(irq, from, cnt, node)  \
-   __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL)
+   __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL, NULL)
 
 #define irq_alloc_desc(node)   \
irq_alloc_descs(-1, 0, 1, node)
@@ -920,7 +922,7 @@ int __devm_irq_alloc_descs(struct device *dev, int irq, 
unsigned int from,
irq_alloc_descs(-1, from, cnt, node)
 
 #define devm_irq_alloc_descs(dev, irq, from, cnt, node)\
-   __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL)
+   __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL, 
NULL)
 
 #define devm_irq_alloc_desc(dev, node) \
devm_irq_alloc_descs(dev, -1, 0, 1, node)
diff --git a/kernel/irq/devres.c b/kernel/irq/devres.c
index f6e5515ee077..079339decc23 100644
--- a/kernel/irq/devres.c
+++ b/kernel/irq/devres.c
@@ -170,6 +170,8 @@ static void devm_irq_desc_release(struct device *dev, void 
*res)
  * @affinity:  Optional pointer to an irq_affinity_desc array of size @cnt
  * which hints where the irq descriptors should be allocated
  * and which default affinities to use
+ * @default_affinity: Optional pointer to a cpumask indicating the default
+ *  affinity to use where not specified by the @affinity array
  *
  * Returns the first irq number or error code.
  *
@@ -177,7 +179,8 @@ static void devm_irq_desc_release(struct device *dev, void 
*res)
  */
 int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from,
   unsigned int cnt, int node, struct module *owner,
-  const struct irq_affinity_desc *affinity)
+  const struct irq_affinity_desc *affinity,
+  const struct cpumask *default_affinity)
 {
struct irq_desc_devres *dr;
int base;
@@ -186,7 +189,8 @@ int __devm_irq_alloc_descs(struct device *dev, int irq, 
unsigned int from,
if (!dr)
return -ENOMEM;
 
-   base = __irq_alloc_descs(irq, from, cnt, node, owner, affinity);
+   base = __irq_alloc_descs(irq, from, cnt, node, owner, affinity,
+default_affinity);
if (base < 0) {
devres_free(dr);
return base;
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 4ac91b9fc618..fcc3b8a1fe01 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -770,15 +770,21 @@ EXPORT_SYMBOL_GPL(irq_free_descs);
  * @affinity:  Optional pointer to an affinity mask array of size @cnt which
  * hints where the irq descriptors should be allocated and which
  * default affinities to use
+ * @default_affinity: Optional pointer to a cpumask indicating the default
+ *  affinity where not specified in the @affinity array
  *
  * Returns the first irq number or error code
  */
 int __ref
 __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
- struct module *owner, const struct irq_affinity_desc 
*affinity)
+ struct module *owner, const struct irq_affinity_desc 
*affinity,
+ const struct cpumask *default_affinity)
 {
int start, ret;
 
+   if (!default_affinity)
+   default_affinity = irq_default_affinity;
+
if (!cnt)
return -EINVAL;
 
@@ -808,7 +814,7 @@ __irq_alloc_descs(int irq, unsi

[PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

The IOAPIC Redirection Table Entries contain an 8-bit Extended
Destination ID field which maps to bits 11-4 of the MSI address.

The lowest bit is used to indicate remappable format, when interrupt
remapping is in use. A hypervisor can use the other 7 bits to permit
guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs
before having to expose a vIOMMU and interrupt remapping to the guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR IOAPIC domain.

Signed-off-by: David Woodhouse 
---
 arch/x86/include/asm/io_apic.h |  3 ++-
 arch/x86/kernel/apic/io_apic.c | 19 +--
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a1a26f6d3aa4..e65a0b7379d0 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -78,7 +78,8 @@ struct IO_APIC_route_entry {
mask:  1,   /* 0: enabled, 1: disabled */
__reserved_2: 15;
 
-   __u32   __reserved_3: 24,
+   __u32   __reserved_3: 17,
+   ext_dest:  7,
dest:  8;
 } __attribute__ ((packed));
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a33380059db6..aa9a3b54a96c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, 
unsigned int nr_entries)
   buf, (ir_entry->index2 << 15) | ir_entry->index,
   ir_entry->zero);
else
-   printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
+   printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n",
   buf,
   entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
-  "logical " : "physical",
+  "logical " : "physical", entry.ext_dest,
   entry.dest, entry.delivery_mode);
}
 }
@@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void)
 */
if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;
+   u32 apic_id = read_apic_id();
 
memset(&entry, 0, sizeof(entry));
entry.mask  = IOAPIC_UNMASKED;
@@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void)
entry.polarity  = IOAPIC_POL_HIGH;
entry.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
entry.delivery_mode = dest_ExtINT;
-   entry.dest  = read_apic_id();
+   entry.dest  = apic_id & 0xff;
+   entry.ext_dest  = apic_id >> 8;
 
/*
 * Add it to the IO-APIC irq-routing table:
@@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd)
 * ioapic chip to verify that.
 */
if (irqd->chip == &ioapic_chip) {
-   mpd->entry.dest = cfg->dest_apicid;
+   mpd->entry.dest = cfg->dest_apicid & 0xff;
+   mpd->entry.ext_dest = cfg->dest_apicid >> 8;
mpd->entry.vector = cfg->vector;
}
for_each_irq_pin(entry, mpd->irq_2_pin)
@@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void)
int apic, pin, i;
struct IO_APIC_route_entry entry0, entry1;
unsigned char save_control, save_freq_select;
+   u32 apic_id;
 
pin  = find_isa_irq_pin(8, mp_INT);
if (pin == -1) {
@@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void)
entry0 = ioapic_read_entry(apic, pin);
clear_IO_APIC_pin(apic, pin);
 
+   apic_id = hard_smp_processor_id();
memset(&entry1, 0, sizeof(entry1));
 
entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
entry1.mask = IOAPIC_UNMASKED;
-   entry1.dest = hard_smp_processor_id();
+   entry1.dest = apic_id & 0xff;
+   entry1.ext_dest = apic_id >> 8;
entry1.delivery_mode = dest_ExtINT;
entry1.polarity = entry0.polarity;
entry1.trigger = IOAPIC_EDGE;
@@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct 
mp_chip_data *data,
memset(entry, 0, sizeof(*entry));
entry->delivery_mode = apic->irq_delivery_mode;
entry->dest_mode = apic->irq_dest_mode;
-   entry->dest  = cfg->dest_apicid;
+   entry->dest  = cfg->dest_apicid & 0xff;
+   entry->ext_dest  = cfg->dest_apicid >> 8;
entry->vector= cfg->vect

[PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping

2020-10-05 Thread David Woodhouse
Linux currently refuses to use >255 CPUs on x86 unless it has interrupt
remapping. This is a bit gratuitous because it could use those extra
CPUs just fine; it just can't target external interrupts at them.

The only problem is that our generic IRQ domain code cann't cope with
the concept of domains which can only target a subset of CPUs.

The hyperv-iommu IRQ remapping driver works around this — not by
actually doing any remapping, but just returning -EINVAL if the
affinity is ever set to an unreachable CPU. This almost works, but ends
up being a bit late because irq_set_affinity_locked() doesn't call into
the irqchip driver immediately; the error only happens later.

This patch series implements a per-domain "maximum affinity" set and
uses it for the non-remapped IOAPIC and MSI domains on x86. As well as
allowing more CPUs to be used without interrupt remapping, this also
fixes the case where some IOAPICs or PCI devices aren't actually in
scope of any active IOMMU and are operating without remapping.

While we're at it, recognise that the 8-bit limit is a bit gratuitous
and a hypervisor could offer at least 15 bits of APIC ID in the IOAPIC
RTE and MSI address bits 11-5 without even needing to use remapping.

David Woodhouse (13):
  x86/apic: Use x2apic in guest kernels even with unusable CPUs.
  x86/msi: Only use high bits of MSI address for DMAR unit
  x86/ioapic: Handle Extended Destination ID field in RTE
  x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  genirq: Prepare for default affinity to be passed to __irq_alloc_descs()
  genirq: Add default_affinity argument to __irq_alloc_descs()
  irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
  genirq: Add irq_domain_set_affinity()
  x86/irq: Add x86_non_ir_cpumask
  x86/irq: Limit IOAPIC and MSI domains' affinity without IR
  x86/smp: Allow more than 255 CPUs even without interrupt remapping
  iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant
  x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

 Documentation/virt/kvm/cpuid.rst |   4 +
 arch/x86/include/asm/apic.h  |   1 +
 arch/x86/include/asm/io_apic.h   |   3 +-
 arch/x86/include/asm/mpspec.h|   2 +
 arch/x86/include/asm/x86_init.h  |   2 +
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kernel/apic/apic.c  |  41 +-
 arch/x86/kernel/apic/io_apic.c   |  23 --
 arch/x86/kernel/apic/msi.c   |  44 +--
 arch/x86/kernel/apic/x2apic_phys.c   |   9 +++
 arch/x86/kernel/kvm.c|   6 ++
 arch/x86/kernel/x86_init.c   |   1 +
 drivers/iommu/hyperv-iommu.c | 149 +--
 include/linux/interrupt.h|   2 +
 include/linux/irq.h  |  10 ++-
 include/linux/irqdomain.h|   7 +-
 kernel/irq/devres.c  |   8 +-
 kernel/irq/ipi.c |   2 +-
 kernel/irq/irqdesc.c |  29 ---
 kernel/irq/irqdomain.c   |  69 ++--
 kernel/irq/manage.c  |  19 -
 21 files changed, 240 insertions(+), 192 deletions(-)


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-05 Thread David Woodhouse
From: David Woodhouse 

When interrupt remapping isn't enabled, only the first 255 CPUs can
receive external interrupts. Set the appropriate max affinity for
the IOAPIC and MSI IRQ domains accordingly.

This also fixes the case where interrupt remapping is enabled but some
devices are not within the scope of any active IOMMU.

Signed-off-by: David Woodhouse 
---
 arch/x86/kernel/apic/io_apic.c | 2 ++
 arch/x86/kernel/apic/msi.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 4d0ef46fedb9..1c8ce7bc098f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2332,6 +2332,8 @@ static int mp_irqdomain_create(int ioapic)
}
 
ip->irqdomain->parent = parent;
+   if (parent == x86_vector_domain)
+   irq_domain_set_affinity(ip->irqdomain, &x86_non_ir_cpumask);
 
if (cfg->type == IOAPIC_DOMAIN_LEGACY ||
cfg->type == IOAPIC_DOMAIN_STRICT)
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 4d891967bea4..af5ce5c4da02 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -259,6 +259,7 @@ struct irq_domain * __init 
native_create_pci_msi_domain(void)
pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
} else {
d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
+   irq_domain_set_affinity(d, &x86_non_ir_cpumask);
}
return d;
 }
@@ -479,6 +480,8 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id)
irq_domain_free_fwnode(fn);
kfree(domain_info);
}
+   if (parent == x86_vector_domain)
+   irq_domain_set_affinity(d, &x86_non_ir_cpumask);
return d;
 }
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2020-09-24 Thread David Woodhouse
From: David Woodhouse 

Instead of bailing out completely, such a unit can still be used for
interrupt remapping.

Signed-off-by: David Woodhouse 
---
 drivers/iommu/intel/dmar.c | 46 +-
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 93e6345f3414..4420a759f095 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1024,8 +1024,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 {
struct intel_iommu *iommu;
u32 ver, sts;
-   int agaw = 0;
-   int msagaw = 0;
+   int agaw = -1;
+   int msagaw = -1;
int err;
 
if (!drhd->reg_base_addr) {
@@ -1050,17 +1050,28 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
}
 
err = -EINVAL;
-   agaw = iommu_calculate_agaw(iommu);
-   if (agaw < 0) {
-   pr_err("Cannot get a valid agaw for iommu (seq_id = %d)\n",
-   iommu->seq_id);
-   goto err_unmap;
-   }
-   msagaw = iommu_calculate_max_sagaw(iommu);
-   if (msagaw < 0) {
-   pr_err("Cannot get a valid max agaw for iommu (seq_id = %d)\n",
-   iommu->seq_id);
-   goto err_unmap;
+   if (cap_sagaw(iommu->cap) == 0) {
+   pr_info("%s: No supported address widths. Not attempting DMA 
translation.\n",
+   iommu->name);
+   drhd->ignored = 1;
+   }
+
+   if (!drhd->ignored) {
+   agaw = iommu_calculate_agaw(iommu);
+   if (agaw < 0) {
+   pr_err("Cannot get a valid agaw for iommu (seq_id = 
%d)\n",
+  iommu->seq_id);
+   drhd->ignored = 1;
+   }
+   }
+   if (!drhd->ignored) {
+   msagaw = iommu_calculate_max_sagaw(iommu);
+   if (msagaw < 0) {
+   pr_err("Cannot get a valid max agaw for iommu (seq_id = 
%d)\n",
+  iommu->seq_id);
+   drhd->ignored = 1;
+   agaw = -1;
+   }
}
iommu->agaw = agaw;
iommu->msagaw = msagaw;
@@ -1087,7 +1098,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 
raw_spin_lock_init(&iommu->register_lock);
 
-   if (intel_iommu_enabled) {
+   /*
+* This is only for hotplug; at boot time intel_iommu_enabled won't
+* be set yet. When intel_iommu_init() runs, it registers the units
+* present at boot time, then sets intel_iommu_enabled.
+*/
+   if (intel_iommu_enabled && !drhd->ignored) {
err = iommu_device_sysfs_add(&iommu->iommu, NULL,
 intel_iommu_groups,
 "%s", iommu->name);
@@ -1117,7 +1133,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 
 static void free_iommu(struct intel_iommu *iommu)
 {
-   if (intel_iommu_enabled) {
+   if (intel_iommu_enabled && iommu->iommu.ops) {
iommu_device_unregister(&iommu->iommu);
iommu_device_sysfs_remove(&iommu->iommu);
}
-- 
2.17.1



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/4] PCI, iommu: Factor 'untrusted' check for ATS enablement

2020-05-18 Thread David Woodhouse
On Fri, 2020-05-15 at 10:19 -0700, Raj, Ashok wrote:
> Hi Christoph
> 
> On Fri, May 15, 2020 at 08:43:51AM -0700, Christoph Hellwig wrote:
> > Can you please lift the untrusted flag into struct device?  It really
> > isn't a PCI specific concept, and we should not have code poking into
> > pci_dev all over the iommu code.
> 
> Just for clarification: All IOMMU's today mostly pertain to only PCI devices
> and for devices that aren't PCI like HPET for instance we give a PCI
> identifier. Facilities like ATS for instance require the platform to have 
> an IOMMU.
> 
> what additional problems does moving this to struct device solve?

Even the Intel IOMMU supports ACPI devices for which there is no struct
pci_dev; only a B/D/F in the ANDD record indicating which entry in the
context table to use.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address

2020-03-26 Thread David Woodhouse
On Thu, 2020-03-26 at 18:11 +0100, Alexander Graf wrote:
> I'm with you on that sentiment, but in the environment I'm currently 
> looking at, we have neither DT nor ACPI: The kernel gets purely 
> configured via kernel command line. For other unenumerable artifacts on 
> the system, such as virtio-mmio platform devices, that works well enough 
> and also basically "hacks a kernel parameter" to specify the system layout.

Well... maybe it should also feed in a DT for those too?



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/5] iommu/vt-d: Consolidate various cache flush ops

2019-12-03 Thread David Woodhouse
On Fri, 2019-11-22 at 11:04 +0800, Lu Baolu wrote:
> Intel VT-d 3.0 introduces more caches and interfaces for software to
> flush when it runs in the scalable mode. Currently various cache flush
> helpers are scattered around. This consolidates them by putting them in
> the existing iommu_flush structure.
> 
> /* struct iommu_flush - Intel IOMMU cache invalidation ops
>  *
>  * @cc_inv: invalidate context cache
>  * @iotlb_inv: Invalidate IOTLB and paging structure caches when software
>  * has changed second-level tables.
>  * @p_iotlb_inv: Invalidate IOTLB and paging structure caches when software
>  *   has changed first-level tables.
>  * @pc_inv: invalidate pasid cache
>  * @dev_tlb_inv: invalidate cached mappings used by requests-without-PASID
>  *   from the Device-TLB on a endpoint device.
>  * @p_dev_tlb_inv: invalidate cached mappings used by requests-with-PASID
>  * from the Device-TLB on an endpoint device
>  */
> struct iommu_flush {
> void (*cc_inv)(struct intel_iommu *iommu, u16 did,
>u16 sid, u8 fm, u64 type);
> void (*iotlb_inv)(struct intel_iommu *iommu, u16 did, u64 addr,
>   unsigned int size_order, u64 type);
> void (*p_iotlb_inv)(struct intel_iommu *iommu, u16 did, u32 pasid,
> u64 addr, unsigned long npages, bool ih);
> void (*pc_inv)(struct intel_iommu *iommu, u16 did, u32 pasid,
>u64 granu);
> void (*dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> u16 qdep, u64 addr, unsigned int mask);
> void (*p_dev_tlb_inv)(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>   u32 pasid, u16 qdep, u64 addr,
>   unsigned long npages);
> };
> 
> The name of each cache flush ops is defined according to the spec section 6.5
> so that people are easy to look up them in the spec.

Hm, indirect function calls are quite expensive these days.

I would have preferred to go in the opposite direction, since surely
aren't going to have *many* of these implementations. Currently there's
only one for register-based and one for queued invalidation, right?
Even if VT-d 3.0 throws an extra version in, I think I'd prefer to take
out the indirection completely and have an if/then helper.

Would love to see a microbenchmark of unmap operations before and after
this patch series with retpoline enabled, to see the effect.





smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] intel-iommu: Turn off translations at shutdown

2019-11-08 Thread David Woodhouse
On Fri, 2019-11-08 at 08:47 +, Zeng, Jason wrote:
> > -Original Message-
> > From: David Woodhouse 
> > Sent: Friday, November 8, 2019 3:54 PM
> > To: Deepa Dinamani ; j...@8bytes.org; linux-
> > ker...@vger.kernel.org
> > Cc: iommu@lists.linux-foundation.org; Zeng, Jason ;
> > Tian, Kevin 
> > Subject: Re: [PATCH] intel-iommu: Turn off translations at shutdown
> > 
> > On Thu, 2019-11-07 at 12:59 -0800, Deepa Dinamani wrote:
> > > The intel-iommu driver assumes that the iommu state is
> > > cleaned up at the start of the new kernel.
> > > But, when we try to kexec boot something other than the
> > > Linux kernel, the cleanup cannot be relied upon.
> > > Hence, cleanup before we go down for reboot.
> > > 
> > > Keeping the cleanup at initialization also, in case BIOS
> > > leaves the IOMMU enabled.
> > > 
> > > I considered turning off iommu only during kexec reboot,
> > > but a clean shutdown seems always a good idea. But if
> > > someone wants to make it conditional, we can do that.
> > 
> > This is going to break things for the VMM live update scheme that Jason
> > presented at KVM Forum, isn't it?
> > 
> > In that case we rely on the IOMMU still operating during the
> > transition.
> 
> For VMM live update case, we should be able to detect and bypass
> the shutdown that Deepa introduced here, so keep IOMMU still operating?

Is that a 'yes' to Deepa's "if someone wants to make it conditional, we
can do that" ? 



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  1   2   3   4   >