After the hotplug rework Charles Williams reported that his vmware virtualized system no longer boots and crashes in rapl_cpu_online(). As it turns out topology_max_packages() reports four while topology_logical_package_id() for CPU two and three returns 65535. That means cpu_to_rapl_pmu() for those CPUs is accessing not allocated memory of rapl_pmus->pmus[]. "M. Vefa Bicakci" reported the same problem on XEN. This patch ensures we error out in such an invalid situation.
Reported-by: "Charles (Chas) Williams" <ciwil...@brocade.com> Tested-by: "M. Vefa Bicakci" <m....@runbox.com> Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> --- I am not sure if this a race with the new hotplug code or something that was always there. Both (M. Vefa Bicakc and Charles) say that the box boots sometimes fine (without the patch). smp_store_boot_cpu_info() should have run before the notofoert and thus should have set the info properly. However I got the following bootlog from Charles with this patch: [ 0.017110] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.017111] smpboot: APIC(1) Converting physical 1 to logical package 1 [ 0.017113] smpboot: Max logical packages: 2 … [ 1.995494] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535 [ 1.995647] rapl pmu error: max package: 2 but CPU1 belongs to 65535 So it seems that the information got overwritten. I am not sure how to proceed here. That memory corruption should be found and fixed and a boot crash might motivate one to do so… I can't reproduce this on barematal. Thread starts at d40f8e3c-b332-c331-38b9-11eb4f4aa...@brocade.com arch/x86/events/intel/rapl.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 0a535cea8ff3..f5d85f2853d7 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void) { int maxpkg = topology_max_packages(); size_t size; + unsigned int cpu; + + for_each_possible_cpu(cpu) { + if (topology_logical_package_id(cpu) >= maxpkg) { + pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n", + maxpkg, cpu, topology_logical_package_id(cpu)); + return -EINVAL; + } + } size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *); rapl_pmus = kzalloc(size, GFP_KERNEL); -- 2.10.2