[tip: x86/pasid] Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)
The following commit has been merged into the x86/pasid branch of tip: Commit-ID: 4e7b11567d946ebe14a3d10b697b078971a9da89 Gitweb: https://git.kernel.org/tip/4e7b11567d946ebe14a3d10b697b078971a9da89 Author:Ashok Raj AuthorDate:Tue, 15 Sep 2020 09:30:07 -07:00 Committer: Borislav Petkov CommitterDate: Thu, 17 Sep 2020 19:29:42 +02:00 Documentation/x86: Add documentation for SVA (Shared Virtual Addressing) ENQCMD and Data Streaming Accelerator (DSA) and all of their associated features are a complicated stack with lots of interconnected pieces. This documentation provides a big picture overview for all of the features. Signed-off-by: Ashok Raj Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Link: https://lkml.kernel.org/r/1600187413-163670-4-git-send-email-fenghua...@intel.com --- Documentation/x86/index.rst | 1 +- Documentation/x86/sva.rst | 257 +++- 2 files changed, 258 insertions(+) create mode 100644 Documentation/x86/sva.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 265d9e9..e5d5ff0 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -30,3 +30,4 @@ x86-specific Documentation usb-legacy-support i386/index x86_64/index + sva diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst new file mode 100644 index 000..076efd5 --- /dev/null +++ b/Documentation/x86/sva.rst @@ -0,0 +1,257 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=== +Shared Virtual Addressing (SVA) with ENQCMD +=== + +Background +== + +Shared Virtual Addressing (SVA) allows the processor and device to use the +same virtual addresses avoiding the need for software to translate virtual +addresses to physical addresses. SVA is what PCIe calls Shared Virtual +Memory (SVM). + +In addition to the convenience of using application virtual addresses +by the device, it also doesn't require pinning pages for DMA. +PCIe Address Translation Services (ATS) along with Page Request Interface +(PRI) allow devices to function much the same way as the CPU handling +application page-faults. For more information please refer to the PCIe +specification Chapter 10: ATS Specification. + +Use of SVA requires IOMMU support in the platform. IOMMU is also +required to support the PCIe features ATS and PRI. ATS allows devices +to cache translations for virtual addresses. The IOMMU driver uses the +mmu_notifier() support to keep the device TLB cache and the CPU cache in +sync. When an ATS lookup fails for a virtual address, the device should +use the PRI in order to request the virtual address to be paged into the +CPU page tables. The device must use ATS again in order the fetch the +translation before use. + +Shared Hardware Workqueues +== + +Unlike Single Root I/O Virtualization (SR-IOV), Scalable IOV (SIOV) permits +the use of Shared Work Queues (SWQ) by both applications and Virtual +Machines (VM's). This allows better hardware utilization vs. hard +partitioning resources that could result in under utilization. In order to +allow the hardware to distinguish the context for which work is being +executed in the hardware by SWQ interface, SIOV uses Process Address Space +ID (PASID), which is a 20-bit number defined by the PCIe SIG. + +PASID value is encoded in all transactions from the device. This allows the +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe +Resource Identifier (RID) which is the Bus/Device/Function. + + +ENQCMD +== + +ENQCMD is a new instruction on Intel platforms that atomically submits a +work descriptor to a device. The descriptor includes the operation to be +performed, virtual addresses of all parameters, virtual address of a completion +record, and the PASID (process address space ID) of the current process. + +ENQCMD works with non-posted semantics and carries a status back if the +command was accepted by hardware. This allows the submitter to know if the +submission needs to be retried or other device specific mechanisms to +implement fairness or ensure forward progress should be provided. + +ENQCMD is the glue that ensures applications can directly submit commands +to the hardware and also permits hardware to be aware of application context +to perform I/O operations via use of PASID. + +Process Address Space Tagging += + +A new thread-scoped MSR (IA32_PASID) provides the connection between +user processes and the rest of the hardware. When an application first +accesses an SVA-capable device, this MSR is initialized with a newly +allocated PASID. The driver for the device calls an IOMMU-specific API +that sets up the routing for DMA and page-requests. + +For example, the Intel Data Streaming Accelerator (DSA) uses +iommu_sva_bind_device(), which will
[tip: x86/urgent] x86/hotplug: Silence APIC only after all interrupts are migrated
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 52d6b926aabc47643cd910c85edb262b7f44c168 Gitweb: https://git.kernel.org/tip/52d6b926aabc47643cd910c85edb262b7f44c168 Author:Ashok Raj AuthorDate:Wed, 26 Aug 2020 21:12:10 -07:00 Committer: Thomas Gleixner CommitterDate: Thu, 27 Aug 2020 09:29:23 +02:00 x86/hotplug: Silence APIC only after all interrupts are migrated There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: Evan Green Signed-off-by: Ashok Raj Signed-off-by: Thomas Gleixner Tested-by: Mathias Nyman Tested-by: Evan Green Reviewed-by: Evan Green Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/lkml/875zdarr4h@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok@intel.com --- arch/x86/kernel/smpboot.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 27aa04a..f5ef689 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1594,14 +1594,28 @@ int native_cpu_disable(void) if (ret) return ret; - /* -* Disable the local APIC. Otherwise IPI broadcasts will reach -* it. It still responds normally to INIT, NMI, SMI, and SIPI -* messages. -*/ - apic_soft_disable(); cpu_disable_common(); +/* + * Disable the local APIC. Otherwise IPI broadcasts will reach + * it. It still responds normally to INIT, NMI, SMI, and SIPI + * messages. + * + * Disabling the APIC must happen after cpu_disable_common() + * which invokes fixup_irqs(). + * + * Disabling the APIC preserves already set bits in IRR, but + * an interrupt arriving after disabling the local APIC does not + * set the corresponding IRR bit. + * + * fixup_irqs() scans IRR for set bits so it can raise a not + * yet handled interrupt on the new destination CPU via an IPI + * but obviously it can't do so for IRR bits which are not set. + * IOW, interrupts arriving after disabling the local APIC will + * be lost. + */ + apic_soft_disable(); + return 0; }
[tip: x86/microcode] x86/microcode: Update late microcode in parallel
The following commit has been merged into the x86/microcode branch of tip: Commit-ID: 93946a33b5693a6bbcf917a170198ff4afaa7a31 Gitweb: https://git.kernel.org/tip/93946a33b5693a6bbcf917a170198ff4afaa7a31 Author:Ashok Raj AuthorDate:Thu, 22 Aug 2019 23:43:47 +03:00 Committer: Borislav Petkov CommitterDate: Tue, 01 Oct 2019 15:58:54 +02:00 x86/microcode: Update late microcode in parallel Microcode update was changed to be serialized due to restrictions after Spectre days. Updating serially on a large multi-socket system can be painful since it is being done on one CPU at a time. Cloud customers have expressed discontent as services disappear for a prolonged time. The restriction is that only one core (or only one thread of a core in the case of an SMT system) goes through the update while other cores (or respectively, SMT threads) are quiesced. Do the microcode update only on the first thread of each core while other siblings simply wait for this to complete. [ bp: Simplify, massage, cleanup comments. ] Signed-off-by: Ashok Raj Signed-off-by: Mihai Carabas Signed-off-by: Borislav Petkov Cc: Boris Ostrovsky Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jon Grimm Cc: kanth.ghatr...@oracle.com Cc: konrad.w...@oracle.com Cc: patrick.c...@oracle.com Cc: Thomas Gleixner Cc: Tom Lendacky Cc: x86-ml Link: https://lkml.kernel.org/r/1566506627-16536-2-git-send-email-mihai.cara...@oracle.com --- arch/x86/kernel/cpu/microcode/core.c | 36 +++ 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index cb0fdca..7019d4b 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -63,11 +63,6 @@ LIST_HEAD(microcode_cache); */ static DEFINE_MUTEX(microcode_mutex); -/* - * Serialize late loading so that CPUs get updated one-by-one. - */ -static DEFINE_RAW_SPINLOCK(update_lock); - struct ucode_cpu_info ucode_cpu_info[NR_CPUS]; struct cpu_info_ctx { @@ -566,11 +561,18 @@ static int __reload_late(void *info) if (__wait_for_cpus(_cpus_in, NSEC_PER_SEC)) return -1; - raw_spin_lock(_lock); - apply_microcode_local(); - raw_spin_unlock(_lock); + /* +* On an SMT system, it suffices to load the microcode on one sibling of +* the core because the microcode engine is shared between the threads. +* Synchronization still needs to take place so that no concurrent +* loading attempts happen on multiple threads of an SMT core. See +* below. +*/ + if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu) + apply_microcode_local(); + else + goto wait_for_siblings; - /* siblings return UCODE_OK because their engine got updated already */ if (err > UCODE_NFOUND) { pr_warn("Error reloading microcode on CPU %d\n", cpu); ret = -1; @@ -578,14 +580,18 @@ static int __reload_late(void *info) ret = 1; } +wait_for_siblings: + if (__wait_for_cpus(_cpus_out, NSEC_PER_SEC)) + panic("Timeout during microcode update!\n"); + /* -* Increase the wait timeout to a safe value here since we're -* serializing the microcode update and that could take a while on a -* large number of CPUs. And that is fine as the *actual* timeout will -* be determined by the last CPU finished updating and thus cut short. +* At least one thread has completed update on each core. +* For others, simply call the update to make sure the +* per-cpu cpuinfo can be updated with right microcode +* revision. */ - if (__wait_for_cpus(_cpus_out, NSEC_PER_SEC * num_online_cpus())) - panic("Timeout during microcode update!\n"); + if (cpumask_first(topology_sibling_cpumask(cpu)) != cpu) + apply_microcode_local(); return ret; }