On Wed, Jan 06, 2021 at 10:27:49AM +0800, Lu Baolu wrote: > The pci_subdevice_msi_create_irq_domain() should fail if the underlying > platform is not able to support IMS (Interrupt Message Storage). Otherwise, > the isolation of interrupt is not guaranteed. > > For x86, IMS is only supported on bare metal for now. We could enable it > in the virtualization environments in the future if interrupt HYPERCALL > domain is supported or the hardware has the capability of interrupt > isolation for subdevices. > > Suggested-by: Thomas Gleixner <t...@linutronix.de> > Link: > https://lore.kernel.org/linux-pci/87pn4nk7nn....@nanos.tec.linutronix.de/ > Link: > https://lore.kernel.org/linux-pci/877dqrnzr3....@nanos.tec.linutronix.de/ > Link: > https://lore.kernel.org/linux-pci/877dqqmc2h....@nanos.tec.linutronix.de/ > Signed-off-by: Lu Baolu <baolu...@linux.intel.com> > --- > arch/x86/pci/common.c | 47 +++++++++++++++++++++++++++++++++++++ > drivers/base/platform-msi.c | 8 +++++++ > include/linux/msi.h | 1 + > 3 files changed, 56 insertions(+) > > > Background: > Learnt from the discussions in this thread: > > https://lore.kernel.org/linux-pci/160408357912.912050.17005584526266191420.st...@djiang5-desk3.ch.intel.com/ > > The device IMS (Interrupt Message Storage) should not be enabled in any > virtualization environments unless there is a HYPERCALL domain which > makes the changes in the message store managed by the hypervisor. > > As the initial step, we allow the IMS to be enabled only if we are > running on the bare metal. It's easy to enable IMS in the virtualization > environments if above preconditions are met in the future. > > We ever thought about moving on_bare_metal() to a generic file so that > it could be well maintained and used. But we need some suggestions about > where to put it. Your comments are very appreciated. > > This patch is only for comments purpose. Please don't merge it. We will > include it in the Intel IMS implementation later once we reach a > consensus. > > Change log: > v1->v2: > - v1: > > https://lore.kernel.org/linux-pci/20201210004624.345282-1-baolu...@linux.intel.com/ > - Rename probably_on_bare_metal() with on_bare_metal(); > - Some vendors might use the same name for both bare metal and virtual > environment. Before we add vendor specific code to distinguish > between them, let's return false in on_bare_metal(). This won't > introduce any regression. The only impact is that the coming new > platform msi feature won't be supported until the vendor specific code > is provided. > > Best regards, > baolu > > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c > index 3507f456fcd0..963e0401f2b2 100644 > --- a/arch/x86/pci/common.c > +++ b/arch/x86/pci/common.c > @@ -724,3 +724,50 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev) > return dev; > } > #endif > + > +/* > + * We want to figure out which context we are running in. But the hardware > + * does not introduce a reliable way (instruction, CPUID leaf, MSR, whatever) > + * which can be manipulated by the VMM to let the OS figure out where it > runs. > + * So we go with the below probably on_bare_metal() function as a replacement > + * for definitely on_bare_metal() to go forward only for the very simple > reason > + * that this is the only option we have. > + * > + * People might use the same vendor name for both bare metal and virtual > + * environment. We can remove those names once we have vendor specific code > to > + * distinguish between them. > + */ > +static const char * const vmm_vendor_name[] = { > + "QEMU", "Bochs", "KVM", "Xen", "VMware", "VMW", "VMware Inc.", > + "innotek GmbH", "Oracle Corporation", "Parallels", "BHYVE", > + "Microsoft Corporation", "Amazon EC2" > +};
Maybe it is not concern at all, but this approach will make forward/backward compatibility without kernel upgrade impossible. Once QEMU (example) will have needed support, someone will need to remove the QEMU from this array, rewrite on_bare_metal() because it is not bare vs. virtual anymore and require kernel upgrade/downgrade every time QEMU version is switched. Plus need to update stable@ and distros. I'm already feeling pain from the fields while they debug such code. Am I missing it completely? Thanks