** Description changed:

  [Impact]
  Applications, particularly those in the HPC domain (e.g. openmpi), can be 
optimized for the processor and cache topology. However, the ARM CPU topology 
isn't correctly exposed to userspace.
  
  [Fix]
  The ACPI 6.2 specification introduced a Processor Properties Topology Table 
(PPTT). This is what ARM server vendors are using to expose their topology. The 
linux kernel needs support for parsing this table, and exposing the parsed 
topology to userspace.
  
  [Test Case]
  A HiSilicon D06 without the fix. Note that it thinks I'm on a 24-socket 
system with 4 cores each. I'm not. I'm on a 2 socket system w/ 48 cores each. 
An HPC app that optimized for this (bogus) topology would therefore suffer a 
performance penalty.
  
  ubuntu@d06-1:~$ lscpu
  Architecture:        aarch64
  Byte Order:          Little Endian
  CPU(s):              96
  On-line CPU(s) list: 0-95
  Thread(s) per core:  1
  Core(s) per socket:  4
  Socket(s):           24
  NUMA node(s):        4
  Vendor ID:           0x48
  Model:               0
  Stepping:            0x0
  BogoMIPS:            200.00
  NUMA node0 CPU(s):   0-23
  NUMA node1 CPU(s):   24-47
  NUMA node2 CPU(s):   48-71
  NUMA node3 CPU(s):   72-95
  Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm dcpop
  
  With the fix (and this is correct):
  ubuntu@d06-1:~$ lscpu
  Architecture:        aarch64
  Byte Order:          Little Endian
  CPU(s):              96
  On-line CPU(s) list: 0-95
  Thread(s) per core:  1
  Core(s) per socket:  48
  Socket(s):           2
  NUMA node(s):        4
  Vendor ID:           0x48
  Model:               0
  Stepping:            0x0
  BogoMIPS:            200.00
  L1d cache:           64K
  L1i cache:           64K
  L2 cache:            512K
  L3 cache:            32768K
  NUMA node0 CPU(s):   0-23
  NUMA node1 CPU(s):   24-47
  NUMA node2 CPU(s):   48-71
  NUMA node3 CPU(s):   72-95
  Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm dcpop
  
  [Regression Risk]
  Here's a patch-by patch risk analysis of the changeset:
  
  0001-ACPICA-ACPI-6.2-Additional-PPTT-flags.patch
  Just adds new #defines. No functional change.
  
  0002-drivers-base-cacheinfo-move-cache_setup_of_node.patch
  Moves code down in a file. No functional change. (I double-checked that the 
moved code is the same).
  
  0003-drivers-base-cacheinfo-setup-DT-cache-properties-ear.patch
- The code this touches is all #ifdef CONFIG_OF, which only applies to the ARM 
& Power ports of Ubuntu. I've tested on ARM.
+ The code this touches is all #ifdef CONFIG_OF, which only applies to the ARM 
& Power ports of Ubuntu. I've tested on arm64 and regression tested on ppc64el 
(POWER9). POWER booted fine, and there was no change to lscpu output.
  
  0004-cacheinfo-rename-of_node-to-fw_token.patch
  Renames a variable, and changes it's type from "struct device_node *" to 
"void *". No functional change.
  
  0005-arm64-acpi-Create-arch-specific-cpu-to-acpi-id-helpe.patch
  Adds a new function (but doesn't use it yet).
  
  0006-ACPI-PPTT-Add-Processor-Properties-Topology-Table-pa.patch
  Adds new code that isn't called yet.
  
  0007-UBUNTU-Config-CONFIG_ACPI_PPTT-y.patch
  Configures on the new code.
  
  0008-ACPI-Enable-PPTT-support-on-ARM64.patch
  Kconfig/Makefile bits for the new code.
  
  0009-drivers-base-cacheinfo-Add-support-for-ACPI-based-fi.patch
  Finally, we call the new code during initialization, so let's go back and 
look at that code. The first thing it does is to check for the presence of a 
PPTT table before proceeding, so the regression risk to systems *without* a 
PPTT table is negligible. The PPTT table was introduced in the ACPI 6.2 
specification, which was released in May 2017. Regression risk should therefore 
be restricted to systems manufactured (or firmware updated) after that time 
that happened to include a PPTT table. I don't know of anyone other than ARM 
licensees doing this - but it's possible there are others. And, it's possible 
that there's a table out there that tickles a bug in the parsing code. Should 
that be the case, a hotfix would be to use an initrd to override/strip[*] the 
PPTT from the XSDT table until a suitable fix is put in place.
  
  0010-arm64-Add-support-for-ACPI-based-firmware-tables.patch
  0011-arm64-topology-rename-cluster_id.patch
  0012-arm64-topology-enable-ACPI-PPTT-based-CPU-topology.patch
  These patches only touch arm64 code. I explicitly tested on an arm64 server.
  
  0013-ACPI-Add-PPTT-to-injectable-table-list.patch
  Allows users to override the PPTT table. No change to default behavior.
  
+ 0014-arm64-topology-divorce-MC-scheduling-domain-from-cor.patch
+ Only touches arm64 code. I explicitly tested on an arm64 server.
+ 
  [*]
  https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt

** Description changed:

  [Impact]
- Applications, particularly those in the HPC domain (e.g. openmpi), can be 
optimized for the processor and cache topology. However, the ARM CPU topology 
isn't correctly exposed to userspace.
+ Applications, particularly those in the HPC domain (e.g. openmpi), can be 
optimized for the processor and cache topology. However, the ARM CPU topology 
isn't correctly exposed to userspace. The kernel's scheduler also uses this 
information to optimize task placement.
  
  [Fix]
  The ACPI 6.2 specification introduced a Processor Properties Topology Table 
(PPTT). This is what ARM server vendors are using to expose their topology. The 
linux kernel needs support for parsing this table, and exposing the parsed 
topology to userspace.
  
  [Test Case]
  A HiSilicon D06 without the fix. Note that it thinks I'm on a 24-socket 
system with 4 cores each. I'm not. I'm on a 2 socket system w/ 48 cores each. 
An HPC app that optimized for this (bogus) topology would therefore suffer a 
performance penalty.
  
  ubuntu@d06-1:~$ lscpu
  Architecture:        aarch64
  Byte Order:          Little Endian
  CPU(s):              96
  On-line CPU(s) list: 0-95
  Thread(s) per core:  1
  Core(s) per socket:  4
  Socket(s):           24
  NUMA node(s):        4
  Vendor ID:           0x48
  Model:               0
  Stepping:            0x0
  BogoMIPS:            200.00
  NUMA node0 CPU(s):   0-23
  NUMA node1 CPU(s):   24-47
  NUMA node2 CPU(s):   48-71
  NUMA node3 CPU(s):   72-95
  Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm dcpop
  
  With the fix (and this is correct):
  ubuntu@d06-1:~$ lscpu
  Architecture:        aarch64
  Byte Order:          Little Endian
  CPU(s):              96
  On-line CPU(s) list: 0-95
  Thread(s) per core:  1
  Core(s) per socket:  48
  Socket(s):           2
  NUMA node(s):        4
  Vendor ID:           0x48
  Model:               0
  Stepping:            0x0
  BogoMIPS:            200.00
  L1d cache:           64K
  L1i cache:           64K
  L2 cache:            512K
  L3 cache:            32768K
  NUMA node0 CPU(s):   0-23
  NUMA node1 CPU(s):   24-47
  NUMA node2 CPU(s):   48-71
  NUMA node3 CPU(s):   72-95
  Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm dcpop
  
  [Regression Risk]
  Here's a patch-by patch risk analysis of the changeset:
  
  0001-ACPICA-ACPI-6.2-Additional-PPTT-flags.patch
  Just adds new #defines. No functional change.
  
  0002-drivers-base-cacheinfo-move-cache_setup_of_node.patch
  Moves code down in a file. No functional change. (I double-checked that the 
moved code is the same).
  
  0003-drivers-base-cacheinfo-setup-DT-cache-properties-ear.patch
  The code this touches is all #ifdef CONFIG_OF, which only applies to the ARM 
& Power ports of Ubuntu. I've tested on arm64 and regression tested on ppc64el 
(POWER9). POWER booted fine, and there was no change to lscpu output.
  
  0004-cacheinfo-rename-of_node-to-fw_token.patch
  Renames a variable, and changes it's type from "struct device_node *" to 
"void *". No functional change.
  
  0005-arm64-acpi-Create-arch-specific-cpu-to-acpi-id-helpe.patch
  Adds a new function (but doesn't use it yet).
  
  0006-ACPI-PPTT-Add-Processor-Properties-Topology-Table-pa.patch
  Adds new code that isn't called yet.
  
  0007-UBUNTU-Config-CONFIG_ACPI_PPTT-y.patch
  Configures on the new code.
  
  0008-ACPI-Enable-PPTT-support-on-ARM64.patch
  Kconfig/Makefile bits for the new code.
  
  0009-drivers-base-cacheinfo-Add-support-for-ACPI-based-fi.patch
  Finally, we call the new code during initialization, so let's go back and 
look at that code. The first thing it does is to check for the presence of a 
PPTT table before proceeding, so the regression risk to systems *without* a 
PPTT table is negligible. The PPTT table was introduced in the ACPI 6.2 
specification, which was released in May 2017. Regression risk should therefore 
be restricted to systems manufactured (or firmware updated) after that time 
that happened to include a PPTT table. I don't know of anyone other than ARM 
licensees doing this - but it's possible there are others. And, it's possible 
that there's a table out there that tickles a bug in the parsing code. Should 
that be the case, a hotfix would be to use an initrd to override/strip[*] the 
PPTT from the XSDT table until a suitable fix is put in place.
  
  0010-arm64-Add-support-for-ACPI-based-firmware-tables.patch
  0011-arm64-topology-rename-cluster_id.patch
  0012-arm64-topology-enable-ACPI-PPTT-based-CPU-topology.patch
  These patches only touch arm64 code. I explicitly tested on an arm64 server.
  
  0013-ACPI-Add-PPTT-to-injectable-table-list.patch
  Allows users to override the PPTT table. No change to default behavior.
  
  0014-arm64-topology-divorce-MC-scheduling-domain-from-cor.patch
  Only touches arm64 code. I explicitly tested on an arm64 server.
  
  [*]
  https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1770231

Title:
  Expose arm64 CPU topology to userspace

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]
  Applications, particularly those in the HPC domain (e.g. openmpi), can be 
optimized for the processor and cache topology. However, the ARM CPU topology 
isn't correctly exposed to userspace. The kernel's scheduler also uses this 
information to optimize task placement.

  [Fix]
  The ACPI 6.2 specification introduced a Processor Properties Topology Table 
(PPTT). This is what ARM server vendors are using to expose their topology. The 
linux kernel needs support for parsing this table, and exposing the parsed 
topology to userspace.

  [Test Case]
  A HiSilicon D06 without the fix. Note that it thinks I'm on a 24-socket 
system with 4 cores each. I'm not. I'm on a 2 socket system w/ 48 cores each. 
An HPC app that optimized for this (bogus) topology would therefore suffer a 
performance penalty.

  ubuntu@d06-1:~$ lscpu
  Architecture:        aarch64
  Byte Order:          Little Endian
  CPU(s):              96
  On-line CPU(s) list: 0-95
  Thread(s) per core:  1
  Core(s) per socket:  4
  Socket(s):           24
  NUMA node(s):        4
  Vendor ID:           0x48
  Model:               0
  Stepping:            0x0
  BogoMIPS:            200.00
  NUMA node0 CPU(s):   0-23
  NUMA node1 CPU(s):   24-47
  NUMA node2 CPU(s):   48-71
  NUMA node3 CPU(s):   72-95
  Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm dcpop

  With the fix (and this is correct):
  ubuntu@d06-1:~$ lscpu
  Architecture:        aarch64
  Byte Order:          Little Endian
  CPU(s):              96
  On-line CPU(s) list: 0-95
  Thread(s) per core:  1
  Core(s) per socket:  48
  Socket(s):           2
  NUMA node(s):        4
  Vendor ID:           0x48
  Model:               0
  Stepping:            0x0
  BogoMIPS:            200.00
  L1d cache:           64K
  L1i cache:           64K
  L2 cache:            512K
  L3 cache:            32768K
  NUMA node0 CPU(s):   0-23
  NUMA node1 CPU(s):   24-47
  NUMA node2 CPU(s):   48-71
  NUMA node3 CPU(s):   72-95
  Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm dcpop

  [Regression Risk]
  Here's a patch-by patch risk analysis of the changeset:

  0001-ACPICA-ACPI-6.2-Additional-PPTT-flags.patch
  Just adds new #defines. No functional change.

  0002-drivers-base-cacheinfo-move-cache_setup_of_node.patch
  Moves code down in a file. No functional change. (I double-checked that the 
moved code is the same).

  0003-drivers-base-cacheinfo-setup-DT-cache-properties-ear.patch
  The code this touches is all #ifdef CONFIG_OF, which only applies to the ARM 
& Power ports of Ubuntu. I've tested on arm64 and regression tested on ppc64el 
(POWER9). POWER booted fine, and there was no change to lscpu output.

  0004-cacheinfo-rename-of_node-to-fw_token.patch
  Renames a variable, and changes it's type from "struct device_node *" to 
"void *". No functional change.

  0005-arm64-acpi-Create-arch-specific-cpu-to-acpi-id-helpe.patch
  Adds a new function (but doesn't use it yet).

  0006-ACPI-PPTT-Add-Processor-Properties-Topology-Table-pa.patch
  Adds new code that isn't called yet.

  0007-UBUNTU-Config-CONFIG_ACPI_PPTT-y.patch
  Configures on the new code.

  0008-ACPI-Enable-PPTT-support-on-ARM64.patch
  Kconfig/Makefile bits for the new code.

  0009-drivers-base-cacheinfo-Add-support-for-ACPI-based-fi.patch
  Finally, we call the new code during initialization, so let's go back and 
look at that code. The first thing it does is to check for the presence of a 
PPTT table before proceeding, so the regression risk to systems *without* a 
PPTT table is negligible. The PPTT table was introduced in the ACPI 6.2 
specification, which was released in May 2017. Regression risk should therefore 
be restricted to systems manufactured (or firmware updated) after that time 
that happened to include a PPTT table. I don't know of anyone other than ARM 
licensees doing this - but it's possible there are others. And, it's possible 
that there's a table out there that tickles a bug in the parsing code. Should 
that be the case, a hotfix would be to use an initrd to override/strip[*] the 
PPTT from the XSDT table until a suitable fix is put in place.

  0010-arm64-Add-support-for-ACPI-based-firmware-tables.patch
  0011-arm64-topology-rename-cluster_id.patch
  0012-arm64-topology-enable-ACPI-PPTT-based-CPU-topology.patch
  These patches only touch arm64 code. I explicitly tested on an arm64 server.

  0013-ACPI-Add-PPTT-to-injectable-table-list.patch
  Allows users to override the PPTT table. No change to default behavior.

  0014-arm64-topology-divorce-MC-scheduling-domain-from-cor.patch
  Only touches arm64 code. I explicitly tested on an arm64 server.

  [*]
  https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1770231/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to