[Xen-devel] Patch "x86/smpboot: Make logical package management more robust" has been added to the 4.9-stable tree
This is a note to let you know that I've just added the patch titled x86/smpboot: Make logical package management more robust to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: x86-smpboot-make-logical-package-management-more-robust.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please letknow about it. From 9d85eb9119f4eeeb48e87adfcd71f752655700e9 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 12 Dec 2016 11:04:53 +0100 Subject: x86/smpboot: Make logical package management more robust From: Thomas Gleixner commit 9d85eb9119f4eeeb48e87adfcd71f752655700e9 upstream. The logical package management has several issues: - The APIC ids provided by ACPI are not required to be the same as the initial APIC id which can be retrieved by CPUID. The APIC ids provided by ACPI are those which are written by the BIOS into the APIC. The initial id is set by hardware and can not be changed. The hardware provided ids contain the real hardware package information. Especially AMD sets the effective APIC id different from the hardware id as they need to reserve space for the IOAPIC ids starting at id 0. As a consequence those machines trigger the currently active firmware bug printouts in dmesg, These are obviously wrong. - Virtual machines have their own interesting of enumerating APICs and packages which are not reliably covered by the current implementation. The sizing of the mapping array has been tweaked to be generously large to handle systems which provide a wrong core count when HT is disabled so the whole magic which checks for space in the physical hotplug case is not needed anymore. Simplify the whole machinery and do the mapping when the CPU starts and the CPUID derived physical package information is available. This solves the observed problems on AMD machines and works for the virtualization issues as well. Remove the extra call from XEN cpu bringup code as it is not longer required. Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)") Reported-and-tested-by: Borislav Petkov Tested-by: Boris Ostrovsky Signed-off-by: Thomas Gleixner Cc: Juergen Gross Cc: Peter Zijlstra Cc: M. Vefa Bicakci Cc: xen-devel Cc: Charles (Chas) Williams Cc: Borislav Petkov Cc: Alok Kataria Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/apic/apic.c | 15 arch/x86/kernel/cpu/common.c | 24 ++-- arch/x86/kernel/smpboot.c| 51 --- arch/x86/xen/smp.c |6 - 4 files changed, 27 insertions(+), 69 deletions(-) --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2159,21 +2159,6 @@ int __generic_processor_info(int apicid, } /* -* This can happen on physical hotplug. The sanity check at boot time -* is done from native_smp_prepare_cpus() after num_possible_cpus() is -* established. -*/ - if (topology_update_package_map(apicid, cpu) < 0) { - int thiscpu = max + disabled_cpus; - - pr_warning("APIC: Package limit reached. Processor %d/0x%x ignored.\n", - thiscpu, apicid); - - disabled_cpus++; - return -ENOSPC; - } - - /* * Validate version */ if (version == 0x0) { --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -979,29 +979,21 @@ static void x86_init_cache_qos(struct cp } /* - * The physical to logical package id mapping is initialized from the - * acpi/mptables information. Make sure that CPUID actually agrees with - * that. + * Validate that ACPI/mptables have the same information about the + * effective APIC id and update the package map. */ -static void sanitize_package_id(struct cpuinfo_x86 *c) +static void validate_apic_and_package_id(struct cpuinfo_x86 *c) { #ifdef CONFIG_SMP - unsigned int pkg, apicid, cpu = smp_processor_id(); + unsigned int apicid, cpu = smp_processor_id(); apicid = apic->cpu_present_to_apicid(cpu); - pkg = apicid >> boot_cpu_data.x86_coreid_bits; - if (apicid != c->initial_apicid) { - pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n", + if (apicid != c->apicid) { + pr_err(FW_BUG "CPU%u: APIC id mismatch.
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On 12/10/2016 02:13 PM, Thomas Gleixner wrote: On Sat, 10 Dec 2016, Thomas Gleixner wrote: On Fri, 9 Dec 2016, Boris Ostrovsky wrote: On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: On 12/09/2016 05:06 PM, Thomas Gleixner wrote: On Thu, 8 Dec 2016, Thomas Gleixner wrote: Boris, can you please verify if that makes the topology_update_package_map() call which you placed into the Xen cpu starting code obsolete ? Will do. I did test your patch but without removing topology_update_package_map() call. It complained about package IDs being wrong, but that's expected until I fix Xen part. Ignore my statement about earlier testing --- it was all on single-node machines. Something is broken with multi-node on Intel, but failure modes are different. Prior to this patch build_sched_domain() reports an error and pretty soon we crash in scheduler (don't remember off the top of my head). With patch applied I crash mush later, when one of the drivers does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, for example). Hmm. But the cpu_to_node() association is unrelated to the logical package management. Just came to my mind after hitting send. We had the whole persistent cpuid to nodeid association work merged in 4.9. So that might be related. Yes, that's exactly the reason. It uses _PXM to set nodeID and _PXM is exposed to dom0 (which is a privileged PV guest). Re: you previous message: after I "fix" the problem above, I see pr_info("Max logical packages: %u\n", __max_logical_packages); but no pr_warn(CPU %u Converting physical %u to logical package %u\n", ...) with or without topology_update_package_map() in arch/x86/xen/smp.c:cpu_bringup() -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On Sat, 10 Dec 2016, Thomas Gleixner wrote: > On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: > > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: > > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote: > > > > > > > > Boris, can you please verify if that makes the > > > > topology_update_package_map() call which you placed into the Xen cpu > > > > starting code obsolete ? > > > > > > Will do. I did test your patch but without removing > > > topology_update_package_map() call. It complained about package IDs > > > being wrong, but that's expected until I fix Xen part. > > > > Ignore my statement about earlier testing --- it was all on single-node > > machines. > > > > Something is broken with multi-node on Intel, but failure modes are > > different. > > Prior to this patch build_sched_domain() reports an error and pretty soon we > > crash in scheduler (don't remember off the top of my head). With patch > > applied > > I crash mush later, when one of the drivers does kmalloc_node(.., > > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen > > ("x86: Booted up 1 node, 32 CPUs" is reported, for example). > > Hmm. But the cpu_to_node() association is unrelated to the logical package > management. Just came to my mind after hitting send. We had the whole persistent cpuid to nodeid association work merged in 4.9. So that might be related. Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote: > > > > > > Boris, can you please verify if that makes the > > > topology_update_package_map() call which you placed into the Xen cpu > > > starting code obsolete ? > > > > Will do. I did test your patch but without removing > > topology_update_package_map() call. It complained about package IDs > > being wrong, but that's expected until I fix Xen part. > > Ignore my statement about earlier testing --- it was all on single-node > machines. > > Something is broken with multi-node on Intel, but failure modes are different. > Prior to this patch build_sched_domain() reports an error and pretty soon we > crash in scheduler (don't remember off the top of my head). With patch applied > I crash mush later, when one of the drivers does kmalloc_node(.., > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen > ("x86: Booted up 1 node, 32 CPUs" is reported, for example). Hmm. But the cpu_to_node() association is unrelated to the logical package management. > 2-node AMD box doesn't have these problems. > > I haven't upgraded the Intel machine for about a month but this all must have > happened in 4.9 timeframe. > > So I can't answer your question since we clearly have other problems on Xen. I > will be looking into this. Fair enough. What you could do though with this patch applied and the extra XEN call to topology_update_package_map() removed is to watchout for the following messages: pr_info("Max logical packages: %u\n", __max_logical_packages); and pr_warn(CPU %u Converting physical %u to logical package %u\n", ...) Ideally the latter wont show. Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > On 12/09/2016 06:00 PM, Thomas Gleixner wrote: > > On Fri, 9 Dec 2016, Boris Ostrovsky wrote: > > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: > > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote: > > > > > > > > Boris, can you please verify if that makes the > > > > topology_update_package_map() call which you placed into the Xen cpu > > > > starting code obsolete ? > > > > > > Will do. I did test your patch but without removing > > > topology_update_package_map() call. It complained about package IDs > > > being wrong, but that's expected until I fix Xen part. > > > > That should not longer be the case as I changed the approach to that > > management thing. > > > I didn't notice this email before I sent the earlier message. > > Is these anything else besides this patch that I should use? I applied it to > Linus tree and it didn't apply cleanly (there was some fuzz and such) so I > wonder whether I am missing something. No. I did it against tip, but there is nothing which it depends on. Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On 12/09/2016 06:00 PM, Thomas Gleixner wrote: On Fri, 9 Dec 2016, Boris Ostrovsky wrote: On 12/09/2016 05:06 PM, Thomas Gleixner wrote: On Thu, 8 Dec 2016, Thomas Gleixner wrote: Boris, can you please verify if that makes the topology_update_package_map() call which you placed into the Xen cpu starting code obsolete ? Will do. I did test your patch but without removing topology_update_package_map() call. It complained about package IDs being wrong, but that's expected until I fix Xen part. That should not longer be the case as I changed the approach to that management thing. I didn't notice this email before I sent the earlier message. Is these anything else besides this patch that I should use? I applied it to Linus tree and it didn't apply cleanly (there was some fuzz and such) so I wonder whether I am missing something. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust
On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: On 12/09/2016 05:06 PM, Thomas Gleixner wrote: On Thu, 8 Dec 2016, Thomas Gleixner wrote: Boris, can you please verify if that makes the topology_update_package_map() call which you placed into the Xen cpu starting code obsolete ? Will do. I did test your patch but without removing topology_update_package_map() call. It complained about package IDs being wrong, but that's expected until I fix Xen part. Ignore my statement about earlier testing --- it was all on single-node machines. Something is broken with multi-node on Intel, but failure modes are different. Prior to this patch build_sched_domain() reports an error and pretty soon we crash in scheduler (don't remember off the top of my head). With patch applied I crash mush later, when one of the drivers does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, for example). 2-node AMD box doesn't have these problems. I haven't upgraded the Intel machine for about a month but this all must have happened in 4.9 timeframe. So I can't answer your question since we clearly have other problems on Xen. I will be looking into this. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel