Re: [PATCH v4 00/10] Introduce SMT level and add PowerPC support

2023-08-10 Thread Laurent Dufour

Le 10/08/2023 à 08:23, Michael Ellerman a écrit :

Thomas Gleixner  writes:

Laurent, Michael!

On Wed, Jul 05 2023 at 16:51, Laurent Dufour wrote:

I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.


Thanks for getting this into shape.

I've merged it into:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp/core

and tagged it at patch 7 for consumption into the powerpc tree, so the
powerpc specific changes can be applied there on top:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
smp-core-for-ppc-23-07-28


Thanks. I've merged this and applied the powerpc patches on top.

I've left it sitting in my topic/cpu-smt branch for the build bots to
chew on:

   
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/cpu-smt

I'll plan to merge it into my next in the next day or two.


Thanks Michael!


Re: [PATCH v4 00/10] Introduce SMT level and add PowerPC support

2023-07-31 Thread Laurent Dufour




Le 28/07/2023 à 09:58, Thomas Gleixner a écrit :

Laurent, Michael!

On Wed, Jul 05 2023 at 16:51, Laurent Dufour wrote:

I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.


Thanks for getting this into shape.

I've merged it into:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp/core

and tagged it at patch 7 for consumption into the powerpc tree, so the
powerpc specific changes can be applied there on top:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
smp-core-for-ppc-23-07-28


Thanks Thomas!


[PATCH] powerpc/kexec: fix minor typo

2023-07-25 Thread Laurent Dufour
Function name in the descriptor was not correct.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202307251721.bugcsceq-...@intel.com/
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kexec/file_load_64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 110d28bede2a..73e492d18804 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -933,9 +933,9 @@ int setup_purgatory_ppc64(struct kimage *image, const void 
*slave_code,
 }
 
 /**
- * get_cpu_node_size - Compute the size of a CPU node in the FDT.
- * This should be done only once and the value is stored in
- * a static variable.
+ * cpu_node_size - Compute the size of a CPU node in the FDT.
+ * This should be done only once and the value is stored in
+ * a static variable.
  * Returns the max size of a CPU node in the FDT.
  */
 static unsigned int cpu_node_size(void)
-- 
2.41.0



Re: [PATCH v4 00/10] Introduce SMT level and add PowerPC support

2023-07-10 Thread Laurent Dufour




Le 09/07/2023 à 17:25, Zhang, Rui a écrit :

Hi, Laurent,

I ran into a boot hang regression with latest upstream code, and it
took me a while to bisect the offending commit and workaround it.

Now I have tested this patch series on an Intel RaptorLake Hybrid
platform (4 Pcores with HT and 4 Ecores without HT), and it works as
expected.

So, for patch 1~7 in this series,

Tested-by: Zhang Rui 


Thanks Rui!


thanks,
rui

On Wed, 2023-07-05 at 16:51 +0200, Laurent Dufour wrote:

I'm taking over the series Michael sent previously [1] which is
smartly
reviewing the initial series I sent [2].  This series is addressing
the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads.
This
leads to weird, but functional, result when adding CPU on a SMT 4
system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4
threads
active (system has been booted with the 'smt-enabled=4' kernel
option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4 5 6 7
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither
in user
space, as far as I know. Such a level is helpful when adding new CPU
or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT
level
(cpu_smt_num_threads) and few callbacks allowing the architecture
code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.

v4:
   Rebase on top of 6.5's updates
   Remove a dependancy against the X86's symbol
cpu_primary_thread_mask
v3:
   Fix a build error in the patch 6/9
v2:
   As Thomas suggested,
     Reword some commit's description
     Remove topology_smt_supported()
     Remove topology_smt_threads_supported()
     Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC
     Remove switch() in __store_smt_control()
   Update kernel-parameters.txt

[1]
https://lore.kernel.org/linuxppc-dev/20230524155630.794584-1-...@ellerman.id.au/
[2]
https://lore.kernel.org/linuxppc-dev/20230331153905.31698-1-lduf...@linux.ibm.com/


Laurent Dufour (2):
   cpu/hotplug: remove dependancy against cpu_primary_thread_mask
   cpu/SMT: Remove topology_smt_supported()

Michael Ellerman (8):
   cpu/SMT: Move SMT prototypes into cpu_smt.h
   cpu/SMT: Move smt/control simple exit cases earlier
   cpu/SMT: Store the current/max number of threads
   cpu/SMT: Create topology_smt_thread_allowed()
   cpu/SMT: Allow enabling partial SMT states via sysfs
   powerpc/pseries: Initialise CPU hotplug callbacks earlier
   powerpc: Add HOTPLUG_SMT support
   powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

  .../ABI/testing/sysfs-devices-system-cpu  |   1 +
  .../admin-guide/kernel-parameters.txt |   4 +-
  arch/Kconfig  |   3 +
  arch/powerpc/Kconfig  |   2 +
  arch/powerpc/include/asm/topology.h   |  15 ++
  arch/powerpc/kernel/smp.c |   8 +-
  arch/powerpc/platforms/pseries/hotplug-cpu.c  |  30 ++--
  arch/powerpc/platforms/pseries/pseries.h  |   2 +
  arch/powerpc/platforms/pseries/setup.c    |   2 +
  arch/x86/include/asm/topology.h   |   4 +-
  arch/x86/kernel/cpu/common.c  |   2 +-
  arch/x86/kernel/smpboot.c |   8 -
  include/linux/cpu.h   |  25 +--
  include/linux/cpu_smt.h   |  33 
  kernel/cpu.c  | 142 +---
--
  15 files changed, 196 insertions(+), 85 deletions(-)
  create mode 100644 include/linux/cpu_smt.h





[PATCH v4 02/10] cpu/SMT: Move SMT prototypes into cpu_smt.h

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

In order to export the cpuhp_smt_control enum as part of the interface
between generic and architecture code, the architecture code needs to
include asm/topology.h.

But that leads to circular header dependencies. So split the enum and
related declarations into a separate header.

Signed-off-by: Michael Ellerman 
[ldufour: rewording the commit's description]
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h |  2 ++
 include/linux/cpu.h | 25 +
 include/linux/cpu_smt.h | 29 +
 kernel/cpu.c|  1 +
 4 files changed, 33 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index caf41c4869a0..ae49ed4417d0 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,6 +136,8 @@ static inline int topology_max_smt_threads(void)
return __max_smt_threads;
 }
 
+#include 
+
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 6e6e57ec69e8..6b326a9e8191 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct device_node;
@@ -204,30 +205,6 @@ void cpuhp_report_idle_dead(void);
 static inline void cpuhp_report_idle_dead(void) { }
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
-enum cpuhp_smt_control {
-   CPU_SMT_ENABLED,
-   CPU_SMT_DISABLED,
-   CPU_SMT_FORCE_DISABLED,
-   CPU_SMT_NOT_SUPPORTED,
-   CPU_SMT_NOT_IMPLEMENTED,
-};
-
-#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
-extern enum cpuhp_smt_control cpu_smt_control;
-extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
-extern bool cpu_smt_possible(void);
-extern int cpuhp_smt_enable(void);
-extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
-#else
-# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
-static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
-static inline bool cpu_smt_possible(void) { return false; }
-static inline int cpuhp_smt_enable(void) { return 0; }
-static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
-#endif
-
 extern bool cpu_mitigations_off(void);
 extern bool cpu_mitigations_auto_nosmt(void);
 
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
new file mode 100644
index ..722c2e306fef
--- /dev/null
+++ b/include/linux/cpu_smt.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CPU_SMT_H_
+#define _LINUX_CPU_SMT_H_
+
+enum cpuhp_smt_control {
+   CPU_SMT_ENABLED,
+   CPU_SMT_DISABLED,
+   CPU_SMT_FORCE_DISABLED,
+   CPU_SMT_NOT_SUPPORTED,
+   CPU_SMT_NOT_IMPLEMENTED,
+};
+
+#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
+extern enum cpuhp_smt_control cpu_smt_control;
+extern void cpu_smt_disable(bool force);
+extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
+extern int cpuhp_smt_enable(void);
+extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
+#else
+# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+static inline void cpu_smt_disable(bool force) { }
+static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
+static inline int cpuhp_smt_enable(void) { return 0; }
+static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
+#endif
+
+#endif /* _LINUX_CPU_SMT_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 03309f2f35a4..e02204c4675a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -592,6 +592,7 @@ static void lockdep_release_cpus_lock(void)
 void __weak arch_smt_update(void) { }
 
 #ifdef CONFIG_HOTPLUG_SMT
+
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
 void __init cpu_smt_disable(bool force)
-- 
2.41.0



[PATCH v4 10/10] powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

Integrate with the generic SMT support, so that when a CPU is DLPAR
onlined it is brought up with the correct SMT mode.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 61fb7cb00880..e62835a12d73 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -398,6 +398,14 @@ static int dlpar_online_cpu(struct device_node *dn)
for_each_present_cpu(cpu) {
if (get_hard_smp_processor_id(cpu) != thread)
continue;
+
+   if (!topology_is_primary_thread(cpu)) {
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   break;
+   if (!topology_smt_thread_allowed(cpu))
+   break;
+   }
+
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
-- 
2.41.0



[PATCH v4 09/10] powerpc: Add HOTPLUG_SMT support

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.

Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.

Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.

Signed-off-by: Michael Ellerman 
[ldufour: pass current SMT level to cpu_smt_set_num_threads]
[ldufour: remove topology_smt_supported]
[ldufour: remove topology_smt_threads_supported]
[ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC]
[ldufour: update kernel-parameters.txt]
Signed-off-by: Laurent Dufour 
---
 Documentation/admin-guide/kernel-parameters.txt |  4 ++--
 arch/powerpc/Kconfig|  2 ++
 arch/powerpc/include/asm/topology.h | 15 +++
 arch/powerpc/kernel/smp.c   |  8 +++-
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 44bcaf791ce6..979f9bad59da 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3853,10 +3853,10 @@
nosmp   [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC.  legacy for "maxcpus=0".
 
-   nosmt   [KNL,MIPS,S390] Disable symmetric multithreading (SMT).
+   nosmt   [KNL,MIPS,S390, PPC] Disable symmetric multithreading 
(SMT).
Equivalent to smt=1.
 
-   [KNL,X86] Disable symmetric multithreading (SMT).
+   [KNL,X86,PPC] Disable symmetric multithreading (SMT).
nosmt=force: Force disable SMT, cannot be undone
 via the sysfs control file.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 0b1172cbeccb..aef38d2ca542 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -272,6 +272,8 @@ config PPC
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_VIRT_CPU_ACCOUNTING_GEN
+   select HOTPLUG_SMT  if HOTPLUG_CPU
+   select SMT_NUM_THREADS_DYNAMIC
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8a4d4f4d9749..f4e6f2dd04b7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -143,5 +143,20 @@ static inline int cpu_to_coregroup_id(int cpu)
 #endif
 #endif
 
+#ifdef CONFIG_HOTPLUG_SMT
+#include 
+#include 
+
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+   return cpu == cpu_first_thread_sibling(cpu);
+}
+
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+   return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index fbbb695bae3d..b9f0f8f11c37 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1087,7 +1087,7 @@ static int __init init_big_cores(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
-   unsigned int cpu;
+   unsigned int cpu, num_threads;
 
DBG("smp_prepare_cpus\n");
 
@@ -1154,6 +1154,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
if (smp_ops && smp_ops->probe)
smp_ops->probe();
+
+   // Initalise the generic SMT topology support
+   num_threads = 1;
+   if (smt_enabled_at_boot)
+   num_threads = smt_enabled_at_boot;
+   cpu_smt_set_num_threads(num_threads, threads_per_core);
 }
 
 void smp_prepare_boot_cpu(void)
-- 
2.41.0



[PATCH v4 07/10] cpu/SMT: Allow enabling partial SMT states via sysfs

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

Add support to the /sys/devices/system/cpu/smt/control interface for
enabling a specified number of SMT threads per core, including partial
SMT states where not all threads are brought online.

The current interface accepts "on" and "off", to enable either 1 or all
SMT threads per core.

This commit allows writing an integer, between 1 and the number of SMT
threads supported by the machine. Writing 1 is a synonym for "off", 2 or
more enables SMT with the specified number of threads.

When reading the file, if all threads are online "on" is returned, to
avoid changing behaviour for existing users. If some other number of
threads is online then the integer value is returned.

Architectures like x86 only supporting 1 thread or all threads, should not
define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT
states, like PowerPC, should define it.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit's description]
[ldufour: remove switch() in __store_smt_control()]
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202306282340.ihqm0fla-...@intel.com/
[ldufour: fix build issue in control_show()]
Signed-off-by: Laurent Dufour 
---
 .../ABI/testing/sysfs-devices-system-cpu  |  1 +
 kernel/cpu.c  | 60 ++-
 2 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index ecd585ca2d50..6dba65fb1956 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -555,6 +555,7 @@ Description:Control Symmetric Multi Threading (SMT)
  
=
 "on" SMT is enabled
 "off"SMT is disabled
+""SMT is enabled with N threads per 
core.
 "forceoff"   SMT is force disabled. Cannot be 
changed.
 "notsupported"   SMT is not supported by the CPU
 "notimplemented" SMT runtime toggling is not
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9a8d0685e055..7e8f1b044772 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2876,11 +2876,19 @@ static const struct attribute_group 
cpuhp_cpu_root_attr_group = {
 
 #ifdef CONFIG_HOTPLUG_SMT
 
+static bool cpu_smt_num_threads_valid(unsigned int threads)
+{
+   if (IS_ENABLED(CONFIG_SMT_NUM_THREADS_DYNAMIC))
+   return threads >= 1 && threads <= cpu_smt_max_threads;
+   return threads == 1 || threads == cpu_smt_max_threads;
+}
+
 static ssize_t
 __store_smt_control(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
 {
-   int ctrlval, ret;
+   int ctrlval, ret, num_threads, orig_threads;
+   bool force_off;
 
if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
return -EPERM;
@@ -2888,30 +2896,39 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
return -ENODEV;
 
-   if (sysfs_streq(buf, "on"))
+   if (sysfs_streq(buf, "on")) {
ctrlval = CPU_SMT_ENABLED;
-   else if (sysfs_streq(buf, "off"))
+   num_threads = cpu_smt_max_threads;
+   } else if (sysfs_streq(buf, "off")) {
ctrlval = CPU_SMT_DISABLED;
-   else if (sysfs_streq(buf, "forceoff"))
+   num_threads = 1;
+   } else if (sysfs_streq(buf, "forceoff")) {
ctrlval = CPU_SMT_FORCE_DISABLED;
-   else
+   num_threads = 1;
+   } else if (kstrtoint(buf, 10, _threads) == 0) {
+   if (num_threads == 1)
+   ctrlval = CPU_SMT_DISABLED;
+   else if (cpu_smt_num_threads_valid(num_threads))
+   ctrlval = CPU_SMT_ENABLED;
+   else
+   return -EINVAL;
+   } else {
return -EINVAL;
+   }
 
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
 
-   if (ctrlval != cpu_smt_control) {
-   switch (ctrlval) {
-   case CPU_SMT_ENABLED:
-   ret = cpuhp_smt_enable();
-   break;
-   case CPU_SMT_DISABLED:
-   case CPU_SMT_FORCE_DISABLED:
-   ret = cpuhp_smt_disable(ctrlval);
-   break;
-   }
-   }
+   orig_threads = cpu_smt_num_threads;
+   cpu_smt_num_threads = num_threads;
+
+   force_off = ctrlval != cpu_smt_

[PATCH v4 06/10] cpu/SMT: Create topology_smt_thread_allowed()

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures allows partial SMT states, ie. when not all SMT
threads are brought online.

To support that, add an architecture helper which checks whether a given
CPU is allowed to be brought online depending on how many SMT threads are
currently enabled. Since this is only applicable to architecture supporting
partial SMT, only these architectures should select the new configuration
variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not
supporting the partial SMT states, there is no need to define
topology_cpu_smt_allowed(), the generic code assumed that all the threads
are allowed or only the primary ones.

Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.

Signed-off-by: Michael Ellerman 
Suggested-by: Thomas Gleixner 
[ldufour: slightly reword the commit's description]
[ldufour: introduce CONFIG_SMT_NUM_THREADS_DYNAMIC]
Signed-off-by: Laurent Dufour 
---
 arch/Kconfig |  3 +++
 kernel/cpu.c | 24 +++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index aff2746c8af2..63c5d6a2022b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -34,6 +34,9 @@ config ARCH_HAS_SUBPAGE_FAULTS
 config HOTPLUG_SMT
bool
 
+config SMT_NUM_THREADS_DYNAMIC
+   bool
+
 # Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
 config HOTPLUG_CORE_SYNC
bool
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 70add058e77b..9a8d0685e055 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -645,9 +645,23 @@ static int __init smt_cmdline_disable(char *str)
 }
 early_param("nosmt", smt_cmdline_disable);
 
+/*
+ * For Archicture supporting partial SMT states check if the thread is allowed.
+ * Otherwise this has already been checked through cpu_smt_max_threads when
+ * setting the SMT level.
+ */
+static inline bool cpu_smt_thread_allowed(unsigned int cpu)
+{
+#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
+   return topology_smt_thread_allowed(cpu);
+#else
+   return true;
+#endif
+}
+
 static inline bool cpu_smt_allowed(unsigned int cpu)
 {
-   if (cpu_smt_control == CPU_SMT_ENABLED)
+   if (cpu_smt_control == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
return true;
 
if (topology_is_primary_thread(cpu))
@@ -2642,6 +2656,12 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
for_each_online_cpu(cpu) {
if (topology_is_primary_thread(cpu))
continue;
+   /*
+* Disable can be called with CPU_SMT_ENABLED when changing
+* from a higher to lower number of SMT threads per core.
+*/
+   if (ctrlval == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
+   continue;
ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
if (ret)
break;
@@ -2676,6 +2696,8 @@ int cpuhp_smt_enable(void)
/* Skip online CPUs and CPUs on offline nodes */
if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
continue;
+   if (!cpu_smt_thread_allowed(cpu))
+   continue;
ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
if (ret)
break;
-- 
2.41.0



[PATCH v4 08/10] powerpc/pseries: Initialise CPU hotplug callbacks earlier

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.

The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").

For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.

So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 22 
 arch/powerpc/platforms/pseries/pseries.h |  2 ++
 arch/powerpc/platforms/pseries/setup.c   |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1a3cb313976a..61fb7cb00880 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -845,15 +845,9 @@ static struct notifier_block pseries_smp_nb = {
.notifier_call = pseries_smp_notifier,
 };
 
-static int __init pseries_cpu_hotplug_init(void)
+void __init pseries_cpu_hotplug_init(void)
 {
int qcss_tok;
-   unsigned int node;
-
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
-   ppc_md.cpu_probe = dlpar_cpu_probe;
-   ppc_md.cpu_release = dlpar_cpu_release;
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
rtas_stop_self_token = rtas_function_token(RTAS_FN_STOP_SELF);
qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
@@ -862,12 +856,22 @@ static int __init pseries_cpu_hotplug_init(void)
qcss_tok == RTAS_UNKNOWN_SERVICE) {
printk(KERN_INFO "CPU Hotplug not supported by firmware "
"- disabling.\n");
-   return 0;
+   return;
}
 
smp_ops->cpu_offline_self = pseries_cpu_offline_self;
smp_ops->cpu_disable = pseries_cpu_disable;
smp_ops->cpu_die = pseries_cpu_die;
+}
+
+static int __init pseries_dlpar_init(void)
+{
+   unsigned int node;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+   ppc_md.cpu_probe = dlpar_cpu_probe;
+   ppc_md.cpu_release = dlpar_cpu_release;
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
/* Processors can be added/removed only on LPAR */
if (firmware_has_feature(FW_FEATURE_LPAR)) {
@@ -886,4 +890,4 @@ static int __init pseries_cpu_hotplug_init(void)
 
return 0;
 }
-machine_arch_initcall(pseries, pseries_cpu_hotplug_init);
+machine_arch_initcall(pseries, pseries_dlpar_init);
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index f8bce40ebd0c..f8893ba46e83 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -75,11 +75,13 @@ static inline int dlpar_hp_pmem(struct pseries_hp_errorlog 
*hp_elog)
 
 #ifdef CONFIG_HOTPLUG_CPU
 int dlpar_cpu(struct pseries_hp_errorlog *hp_elog);
+void pseries_cpu_hotplug_init(void);
 #else
 static inline int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 {
return -EOPNOTSUPP;
 }
+static inline void pseries_cpu_hotplug_init(void) { }
 #endif
 
 /* PCI root bridge prepare function override for pseries */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e2a57cfa6c83..41451b76c6e5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -816,6 +816,8 @@ static void __init pSeries_setup_arch(void)
/* Discover PIC type and setup ppc_md accordingly */
smp_init_pseries();
 
+   // Setup CPU hotplug callbacks
+   pseries_cpu_hotplug_init();
 
if (radix_enabled() && !mmu_has_feature(MMU_FTR_GTSE))
if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
-- 
2.41.0



[PATCH v4 05/10] cpu/SMT: Remove topology_smt_supported()

2023-07-05 Thread Laurent Dufour
Since the maximum number of threads is now passed to
cpu_smt_set_num_threads(), checking that value is enough to know if SMT is
supported.

Cc: Michael Ellerman 
Suggested-by: Thomas Gleixner 
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h | 2 --
 arch/x86/kernel/smpboot.c   | 8 
 kernel/cpu.c| 4 ++--
 3 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index ae49ed4417d0..3235ba1e5b06 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -141,7 +141,6 @@ static inline int topology_max_smt_threads(void)
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
-bool topology_smt_supported(void);
 
 extern struct cpumask __cpu_primary_thread_mask;
 #define cpu_primary_thread_mask ((const struct cpumask 
*)&__cpu_primary_thread_mask)
@@ -164,7 +163,6 @@ static inline int topology_phys_to_logical_pkg(unsigned int 
pkg) { return 0; }
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
-static inline bool topology_smt_supported(void) { return false; }
 #endif /* !CONFIG_SMP */
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ed2d51960a7d..f8e709fd2cd5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -326,14 +326,6 @@ static void notrace start_secondary(void *unused)
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
 
-/**
- * topology_smt_supported - Check whether SMT is supported by the CPUs
- */
-bool topology_smt_supported(void)
-{
-   return smp_num_siblings > 1;
-}
-
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  * @phys_pkg:  The physical package id to map
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d7dd535cb5b5..70add058e77b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -621,7 +621,7 @@ void __init cpu_smt_set_num_threads(unsigned int 
num_threads,
 {
WARN_ON(!num_threads || (num_threads > max_threads));
 
-   if (!topology_smt_supported())
+   if (max_threads == 1)
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
 
cpu_smt_max_threads = max_threads;
@@ -1801,7 +1801,7 @@ early_param("cpuhp.parallel", 
parallel_bringup_parse_param);
 
 static inline bool cpuhp_smt_aware(void)
 {
-   return topology_smt_supported();
+   return cpu_smt_max_threads > 1;
 }
 
 static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
-- 
2.41.0



[PATCH v4 03/10] cpu/SMT: Move smt/control simple exit cases earlier

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

Move the simple exit cases, ie. which don't depend on the value written,
earlier in the function. That makes it clearer that regardless of the
input those states can not be transitioned out of.

That does have a user-visible effect, in that the error returned will
now always be EPERM/ENODEV for those states, regardless of the value
written. Previously writing an invalid value would return EINVAL even
when in those states.

Signed-off-by: Michael Ellerman 
---
 kernel/cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index e02204c4675a..b6fe170c93e9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2841,6 +2841,12 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
 {
int ctrlval, ret;
 
+   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
+   return -EPERM;
+
+   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+   return -ENODEV;
+
if (sysfs_streq(buf, "on"))
ctrlval = CPU_SMT_ENABLED;
else if (sysfs_streq(buf, "off"))
@@ -2850,12 +2856,6 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
else
return -EINVAL;
 
-   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
-   return -EPERM;
-
-   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
-   return -ENODEV;
-
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
-- 
2.41.0



[PATCH v4 00/10] Introduce SMT level and add PowerPC support

2023-07-05 Thread Laurent Dufour
I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither in user
space, as far as I know. Such a level is helpful when adding new CPU or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT level
(cpu_smt_num_threads) and few callbacks allowing the architecture code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.

v4:
  Rebase on top of 6.5's updates
  Remove a dependancy against the X86's symbol cpu_primary_thread_mask
v3:
  Fix a build error in the patch 6/9
v2:
  As Thomas suggested,
Reword some commit's description
Remove topology_smt_supported()
Remove topology_smt_threads_supported()
Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC
Remove switch() in __store_smt_control()
  Update kernel-parameters.txt

[1] 
https://lore.kernel.org/linuxppc-dev/20230524155630.794584-1-...@ellerman.id.au/
[2] 
https://lore.kernel.org/linuxppc-dev/20230331153905.31698-1-lduf...@linux.ibm.com/


Laurent Dufour (2):
  cpu/hotplug: remove dependancy against cpu_primary_thread_mask
  cpu/SMT: Remove topology_smt_supported()

Michael Ellerman (8):
  cpu/SMT: Move SMT prototypes into cpu_smt.h
  cpu/SMT: Move smt/control simple exit cases earlier
  cpu/SMT: Store the current/max number of threads
  cpu/SMT: Create topology_smt_thread_allowed()
  cpu/SMT: Allow enabling partial SMT states via sysfs
  powerpc/pseries: Initialise CPU hotplug callbacks earlier
  powerpc: Add HOTPLUG_SMT support
  powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

 .../ABI/testing/sysfs-devices-system-cpu  |   1 +
 .../admin-guide/kernel-parameters.txt |   4 +-
 arch/Kconfig  |   3 +
 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/include/asm/topology.h   |  15 ++
 arch/powerpc/kernel/smp.c |   8 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |  30 ++--
 arch/powerpc/platforms/pseries/pseries.h  |   2 +
 arch/powerpc/platforms/pseries/setup.c|   2 +
 arch/x86/include/asm/topology.h   |   4 +-
 arch/x86/kernel/cpu/common.c  |   2 +-
 arch/x86/kernel/smpboot.c |   8 -
 include/linux/cpu.h   |  25 +--
 include/linux/cpu_smt.h   |  33 
 kernel/cpu.c  | 142 +-
 15 files changed, 196 insertions(+), 85 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

-- 
2.41.0



[PATCH v4 04/10] cpu/SMT: Store the current/max number of threads

2023-07-05 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures allow partial SMT states at boot time, ie. when
not all SMT threads are brought online.

To support that the SMT code needs to know the maximum number of SMT
threads, and also the currently configured number.

The architecture code knows the max number of threads, so have the
architecture code pass that value to cpu_smt_set_num_threads(). Note that
although topology_max_smt_threads() exists, it is not configured early
enough to be used here. As architecture, like PowerPC, allows the threads
number to be set through the kernel command line, also pass that value.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit message]
[ldufour: rename cpu_smt_check_topology and add a num_threads argument]
Signed-off-by: Laurent Dufour 
---
 arch/x86/kernel/cpu/common.c |  2 +-
 include/linux/cpu_smt.h  |  8 ++--
 kernel/cpu.c | 21 -
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 52683fddafaf..12a48a85da3d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2317,7 +2317,7 @@ void __init arch_cpu_finalize_init(void)
 * identify_boot_cpu() initialized SMT support information, let the
 * core code know.
 */
-   cpu_smt_check_topology();
+   cpu_smt_set_num_threads(smp_num_siblings, smp_num_siblings);
 
if (!IS_ENABLED(CONFIG_SMP)) {
pr_info("CPU: ");
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
index 722c2e306fef..0c1664294b57 100644
--- a/include/linux/cpu_smt.h
+++ b/include/linux/cpu_smt.h
@@ -12,15 +12,19 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+extern unsigned int cpu_smt_num_threads;
 extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
+extern void cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads);
 extern bool cpu_smt_possible(void);
 extern int cpuhp_smt_enable(void);
 extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
 #else
 # define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+# define cpu_smt_num_threads 1
 static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
+static inline void cpu_smt_set_num_threads(unsigned int num_threads,
+  unsigned int max_threads) { }
 static inline bool cpu_smt_possible(void) { return false; }
 static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
diff --git a/kernel/cpu.c b/kernel/cpu.c
index b6fe170c93e9..d7dd535cb5b5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -594,6 +594,8 @@ void __weak arch_smt_update(void) { }
 #ifdef CONFIG_HOTPLUG_SMT
 
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+static unsigned int cpu_smt_max_threads __ro_after_init;
+unsigned int cpu_smt_num_threads __read_mostly = UINT_MAX;
 
 void __init cpu_smt_disable(bool force)
 {
@@ -607,16 +609,33 @@ void __init cpu_smt_disable(bool force)
pr_info("SMT: disabled\n");
cpu_smt_control = CPU_SMT_DISABLED;
}
+   cpu_smt_num_threads = 1;
 }
 
 /*
  * The decision whether SMT is supported can only be done after the full
  * CPU identification. Called from architecture code.
  */
-void __init cpu_smt_check_topology(void)
+void __init cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads)
 {
+   WARN_ON(!num_threads || (num_threads > max_threads));
+
if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+
+   cpu_smt_max_threads = max_threads;
+
+   /*
+* If SMT has been disabled via the kernel command line or SMT is
+* not supported, set cpu_smt_num_threads to 1 for consistency.
+* If enabled, take the architecture requested number of threads
+* to bring up into account.
+*/
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   cpu_smt_num_threads = 1;
+   else if (num_threads < cpu_smt_num_threads)
+   cpu_smt_num_threads = num_threads;
 }
 
 static int __init smt_cmdline_disable(char *str)
-- 
2.41.0



[PATCH v4 01/10] cpu/hotplug: remove dependancy against cpu_primary_thread_mask

2023-07-05 Thread Laurent Dufour
The commit 18415f33e2ac ("cpu/hotplug: Allow "parallel" bringup up to
CPUHP_BP_KICK_AP_STATE") introduce a dependancy against a global variable
cpu_primary_thread_mask exported by the X86 code. This variable is only
used when CONFIG_HOTPLUG_PARALLEL is set.

Since cpuhp_get_primary_thread_mask() and cpuhp_smt_aware() are only used
when CONFIG_HOTPLUG_PARALLEL is set, don't define them when it is not set.

There is no functional change introduce by that patch.

Cc: Thomas Gleixner 
Signed-off-by: Laurent Dufour 
---
 kernel/cpu.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 88a7ede322bd..03309f2f35a4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -650,22 +650,8 @@ bool cpu_smt_possible(void)
 }
 EXPORT_SYMBOL_GPL(cpu_smt_possible);
 
-static inline bool cpuhp_smt_aware(void)
-{
-   return topology_smt_supported();
-}
-
-static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
-{
-   return cpu_primary_thread_mask;
-}
 #else
 static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
-static inline bool cpuhp_smt_aware(void) { return false; }
-static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
-{
-   return cpu_present_mask;
-}
 #endif
 
 static inline enum cpuhp_state
@@ -1793,6 +1779,16 @@ static int __init parallel_bringup_parse_param(char *arg)
 }
 early_param("cpuhp.parallel", parallel_bringup_parse_param);
 
+static inline bool cpuhp_smt_aware(void)
+{
+   return topology_smt_supported();
+}
+
+static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
+{
+   return cpu_primary_thread_mask;
+}
+
 /*
  * On architectures which have enabled parallel bringup this invokes all BP
  * prepare states for each of the to be onlined APs first. The last state
-- 
2.41.0



Re: [PATCH v3 3/9] cpu/SMT: Store the current/max number of threads

2023-07-05 Thread Laurent Dufour




Le 05/07/2023 à 05:05, Zhang, Rui a écrit :

On Thu, 2023-06-29 at 16:31 +0200, Laurent Dufour wrote:

From: Michael Ellerman 

Some architectures allows partial SMT states at boot time,


s/allows/allow.


Thanks Rui !


Re: [PATCH v3 6/9] cpu/SMT: Allow enabling partial SMT states via sysfs

2023-07-05 Thread Laurent Dufour




Le 05/07/2023 à 05:14, Zhang, Rui a écrit :

On Thu, 2023-06-29 at 16:31 +0200, Laurent Dufour wrote:

@@ -2580,6 +2597,17 @@ static ssize_t control_show(struct device
*dev,
  {
 const char *state = smt_states[cpu_smt_control];
  
+#ifdef CONFIG_HOTPLUG_SMT

+   /*
+    * If SMT is enabled but not all threads are enabled then
show the
+    * number of threads. If all threads are enabled show "on".
Otherwise
+    * show the state name.
+    */
+   if (cpu_smt_control == CPU_SMT_ENABLED &&
+   cpu_smt_num_threads != cpu_smt_max_threads)
+   return sysfs_emit(buf, "%d\n", cpu_smt_num_threads);
+#endif
+


My understanding is that cpu_smt_control is always set to
CPU_SMT_NOT_IMPLEMENTED when CONFIG_HOTPLUG_SMT is not set, so this
ifdef is not necessary, right?


Hi Rui,

Indeed, cpu_smt_control, cpu_smt_num_threads and cpu_smt_max_threads are 
only defined when CONFIG_HOTPLUG_SMT is set. This is the reason for this 
#ifdef block.


This has been reported by the kernel test robot testing v2:
https://lore.kernel.org/oe-kbuild-all/202306282340.ihqm0fla-...@intel.com

Cheers,
Laurent.


Re: [PATCH v3 0/9] Introduce SMT level and add PowerPC support

2023-07-05 Thread Laurent Dufour

Le 05/07/2023 à 05:04, Zhang, Rui a écrit :

Hi, Laurent,

I want to test this patch set and found that it does not apply on top
of latest usptream git, because of some changes in this merge window,
so better rebase.


Hi Rui,

Thanks for your interest for this series.
The latest Thomas's changes came into the PowerPC next branch.
I'm working on a rebase.

Cheers,
Laurent.


thanks,
rui

On Thu, 2023-06-29 at 16:31 +0200, Laurent Dufour wrote:

I'm taking over the series Michael sent previously [1] which is
smartly
reviewing the initial series I sent [2].  This series is addressing
the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads.
This
leads to weird, but functional, result when adding CPU on a SMT 4
system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4
threads
active (system has been booted with the 'smt-enabled=4' kernel
option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:    0*    1*    2*    3*    4 5 6 7
Core   1:    8*    9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither
in user
space, as far as I know. Such a level is helpful when adding new CPU
or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT
level
(cpu_smt_num_threads) and few callbacks allowing the architecture
code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.

v3:
   Fix a build error in the patch 6/9
v2:
   As Thomas suggested,
     Reword some commit's description
     Remove topology_smt_supported()
     Remove topology_smt_threads_supported()
     Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC
     Remove switch() in __store_smt_control()
   Update kernel-parameters.txt

[1]
https://lore.kernel.org/linuxppc-dev/20230524155630.794584-1-...@ellerman.id.au/
[2]
https://lore.kernel.org/linuxppc-dev/20230331153905.31698-1-lduf...@linux.ibm.com/

Laurent Dufour (1):
   cpu/SMT: Remove topology_smt_supported()

Michael Ellerman (8):
   cpu/SMT: Move SMT prototypes into cpu_smt.h
   cpu/SMT: Move smt/control simple exit cases earlier
   cpu/SMT: Store the current/max number of threads
   cpu/SMT: Create topology_smt_thread_allowed()
   cpu/SMT: Allow enabling partial SMT states via sysfs
   powerpc/pseries: Initialise CPU hotplug callbacks earlier
   powerpc: Add HOTPLUG_SMT support
   powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

  .../ABI/testing/sysfs-devices-system-cpu  |   1 +
  .../admin-guide/kernel-parameters.txt |   4 +-
  arch/Kconfig  |   3 +
  arch/powerpc/Kconfig  |   2 +
  arch/powerpc/include/asm/topology.h   |  15 +++
  arch/powerpc/kernel/smp.c |   8 +-
  arch/powerpc/platforms/pseries/hotplug-cpu.c  |  30 +++--
  arch/powerpc/platforms/pseries/pseries.h  |   2 +
  arch/powerpc/platforms/pseries/setup.c    |   2 +
  arch/x86/include/asm/topology.h   |   4 +-
  arch/x86/kernel/cpu/bugs.c    |   3 +-
  arch/x86/kernel/smpboot.c |   8 --
  include/linux/cpu.h   |  25 +---
  include/linux/cpu_smt.h   |  33 +
  kernel/cpu.c  | 118 ++--
--
  15 files changed, 187 insertions(+), 71 deletions(-)
  create mode 100644 include/linux/cpu_smt.h





Re: [PATCH v3 8/9] powerpc: Add HOTPLUG_SMT support

2023-06-30 Thread Laurent Dufour

Hi Michael,

Le 29/06/2023 à 16:31, Laurent Dufour a écrit :

From: Michael Ellerman 

Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.

Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.

Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.

Signed-off-by: Michael Ellerman 
[ldufour: pass current SMT level to cpu_smt_set_num_threads]
[ldufour: remove topology_smt_supported]
[ldufour: remove topology_smt_threads_supported]
[ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC]
[ldufour: update kernel-parameters.txt]
Signed-off-by: Laurent Dufour 
---
  Documentation/admin-guide/kernel-parameters.txt |  4 ++--
  arch/powerpc/Kconfig|  2 ++
  arch/powerpc/include/asm/topology.h | 15 +++
  arch/powerpc/kernel/smp.c   |  8 +++-
  4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..5efb6c73a928 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3838,10 +3838,10 @@
nosmp   [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC.  legacy for "maxcpus=0".
  
-	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).

+   nosmt   [KNL,S390,PPC] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
  
-			[KNL,X86] Disable symmetric multithreading (SMT).

+   [KNL,X86,PPC] Disable symmetric multithreading (SMT).
nosmt=force: Force disable SMT, cannot be undone
 via the sysfs control file.
  
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig

index 8b955bc7b59f..bacabc3d7f0c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -273,6 +273,8 @@ config PPC
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_VIRT_CPU_ACCOUNTING_GEN
+   select HOTPLUG_SMT  if HOTPLUG_CPU
+   select SMT_NUM_THREADS_DYNAMIC


I missed that this list should be kept sorted alphabetically.
Could you fix that when applying the series, or should I send a new 
version ?


Thanks,
Laurent.


select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8a4d4f4d9749..f4e6f2dd04b7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -143,5 +143,20 @@ static inline int cpu_to_coregroup_id(int cpu)
  #endif
  #endif
  
+#ifdef CONFIG_HOTPLUG_SMT

+#include 
+#include 
+
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+   return cpu == cpu_first_thread_sibling(cpu);
+}
+
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+   return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
+}
+#endif
+
  #endif /* __KERNEL__ */
  #endif/* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 406e6d0ffae3..eb539325dff8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1087,7 +1087,7 @@ static int __init init_big_cores(void)
  
  void __init smp_prepare_cpus(unsigned int max_cpus)

  {
-   unsigned int cpu;
+   unsigned int cpu, num_threads;
  
  	DBG("smp_prepare_cpus\n");
  
@@ -1154,6 +1154,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
  
  	if (smp_ops && smp_ops->probe)

smp_ops->probe();
+
+   // Initalise the generic SMT topology support
+   num_threads = 1;
+   if (smt_enabled_at_boot)
+   num_threads = smt_enabled_at_boot;
+   cpu_smt_set_num_threads(num_threads, threads_per_core);
  }
  
  void smp_prepare_boot_cpu(void)


Re: [PATCH v3 0/9] Introduce SMT level and add PowerPC support

2023-06-30 Thread Laurent Dufour




Le 30/06/2023 à 15:32, Sachin Sant a écrit :




On 29-Jun-2023, at 8:01 PM, Laurent Dufour  wrote:

I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither in user
space, as far as I know. Such a level is helpful when adding new CPU or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT level
(cpu_smt_num_threads) and few callbacks allowing the architecture code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.

v3:
  Fix a build error in the patch 6/9


Successfully tested the V3 version on a Power10 LPAR. Add/remove of
processor core worked correctly, preserving the SMT level (on a kernel
booted with smt-enabled= parameter)

Laurent (Thanks!) also provided a patch to update the ppc64_cpu &
lparstat utility. With patched ppc64_cpu utility verified that SMT level
changed at runtime was preserved across processor core add (on
a kernel booted without smt-enabled= parameter)

Based on these test results

Tested-by: Sachin Sant 


Thanks a lot, Sachin!

Once this series is accepted, I'll send the series to update ppc64_cpu.



[PATCH v3 2/9] cpu/SMT: Move smt/control simple exit cases earlier

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

Move the simple exit cases, ie. which don't depend on the value written,
earlier in the function. That makes it clearer that regardless of the
input those states can not be transitioned out of.

That does have a user-visible effect, in that the error returned will
now always be EPERM/ENODEV for those states, regardless of the value
written. Previously writing an invalid value would return EINVAL even
when in those states.

Signed-off-by: Michael Ellerman 
---
 kernel/cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 237394e0574a..c67049bb3fc8 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2482,6 +2482,12 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
 {
int ctrlval, ret;
 
+   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
+   return -EPERM;
+
+   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+   return -ENODEV;
+
if (sysfs_streq(buf, "on"))
ctrlval = CPU_SMT_ENABLED;
else if (sysfs_streq(buf, "off"))
@@ -2491,12 +2497,6 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
else
return -EINVAL;
 
-   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
-   return -EPERM;
-
-   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
-   return -ENODEV;
-
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
-- 
2.41.0



[PATCH v3 5/9] cpu/SMT: Create topology_smt_thread_allowed()

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures allows partial SMT states, ie. when not all SMT
threads are brought online.

To support that, add an architecture helper which checks whether a given
CPU is allowed to be brought online depending on how many SMT threads are
currently enabled. Since this is only applicable to architecture supporting
partial SMT, only these architectures should select the new configuration
variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not
supporting the partial SMT states, there is no need to define
topology_cpu_smt_allowed(), the generic code assumed that all the threads
are allowed or only the primary ones.

Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.

Signed-off-by: Michael Ellerman 
Suggested-by: Thomas Gleixner 
[ldufour: slightly reword the commit's description]
[ldufour: introduce CONFIG_SMT_NUM_THREADS_DYNAMIC]
Signed-off-by: Laurent Dufour 
---
 arch/Kconfig |  3 +++
 kernel/cpu.c | 24 +++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 205fd23e0cad..c69e9c662a87 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -34,6 +34,9 @@ config ARCH_HAS_SUBPAGE_FAULTS
 config HOTPLUG_SMT
bool
 
+config SMT_NUM_THREADS_DYNAMIC
+   bool
+
 config GENERIC_ENTRY
bool
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e354af92b2b8..29bf310651c6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -466,9 +466,23 @@ static int __init smt_cmdline_disable(char *str)
 }
 early_param("nosmt", smt_cmdline_disable);
 
+/*
+ * For Archicture supporting partial SMT states check if the thread is allowed.
+ * Otherwise this has already been checked through cpu_smt_max_threads when
+ * setting the SMT level.
+ */
+static inline bool cpu_smt_thread_allowed(unsigned int cpu)
+{
+#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
+   return topology_smt_thread_allowed(cpu);
+#else
+   return true;
+#endif
+}
+
 static inline bool cpu_smt_allowed(unsigned int cpu)
 {
-   if (cpu_smt_control == CPU_SMT_ENABLED)
+   if (cpu_smt_control == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
return true;
 
if (topology_is_primary_thread(cpu))
@@ -2283,6 +2297,12 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
for_each_online_cpu(cpu) {
if (topology_is_primary_thread(cpu))
continue;
+   /*
+* Disable can be called with CPU_SMT_ENABLED when changing
+* from a higher to lower number of SMT threads per core.
+*/
+   if (ctrlval == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
+   continue;
ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
if (ret)
break;
@@ -2317,6 +2337,8 @@ int cpuhp_smt_enable(void)
/* Skip online CPUs and CPUs on offline nodes */
if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
continue;
+   if (!cpu_smt_thread_allowed(cpu))
+   continue;
ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
if (ret)
break;
-- 
2.41.0



[PATCH v3 6/9] cpu/SMT: Allow enabling partial SMT states via sysfs

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

Add support to the /sys/devices/system/cpu/smt/control interface for
enabling a specified number of SMT threads per core, including partial
SMT states where not all threads are brought online.

The current interface accepts "on" and "off", to enable either 1 or all
SMT threads per core.

This commit allows writing an integer, between 1 and the number of SMT
threads supported by the machine. Writing 1 is a synonym for "off", 2 or
more enables SMT with the specified number of threads.

When reading the file, if all threads are online "on" is returned, to
avoid changing behaviour for existing users. If some other number of
threads is online then the integer value is returned.

Architectures like x86 only supporting 1 thread or all threads, should not
define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT
states, like PowerPC, should define it.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit's description]
[ldufour: remove switch() in __store_smt_control()]
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202306282340.ihqm0fla-...@intel.com/
[ldufour: fix build issue in control_show()]
Signed-off-by: Laurent Dufour 
---
 .../ABI/testing/sysfs-devices-system-cpu  |  1 +
 kernel/cpu.c  | 60 ++-
 2 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index f54867cadb0f..3c4cfb59d495 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -555,6 +555,7 @@ Description:Control Symmetric Multi Threading (SMT)
  
=
 "on" SMT is enabled
 "off"SMT is disabled
+""SMT is enabled with N threads per 
core.
 "forceoff"   SMT is force disabled. Cannot be 
changed.
 "notsupported"   SMT is not supported by the CPU
 "notimplemented" SMT runtime toggling is not
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 29bf310651c6..d63f633e34cd 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2517,11 +2517,19 @@ static const struct attribute_group 
cpuhp_cpu_root_attr_group = {
 
 #ifdef CONFIG_HOTPLUG_SMT
 
+static bool cpu_smt_num_threads_valid(unsigned int threads)
+{
+   if (IS_ENABLED(CONFIG_SMT_NUM_THREADS_DYNAMIC))
+   return threads >= 1 && threads <= cpu_smt_max_threads;
+   return threads == 1 || threads == cpu_smt_max_threads;
+}
+
 static ssize_t
 __store_smt_control(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
 {
-   int ctrlval, ret;
+   int ctrlval, ret, num_threads, orig_threads;
+   bool force_off;
 
if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
return -EPERM;
@@ -2529,30 +2537,39 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
return -ENODEV;
 
-   if (sysfs_streq(buf, "on"))
+   if (sysfs_streq(buf, "on")) {
ctrlval = CPU_SMT_ENABLED;
-   else if (sysfs_streq(buf, "off"))
+   num_threads = cpu_smt_max_threads;
+   } else if (sysfs_streq(buf, "off")) {
ctrlval = CPU_SMT_DISABLED;
-   else if (sysfs_streq(buf, "forceoff"))
+   num_threads = 1;
+   } else if (sysfs_streq(buf, "forceoff")) {
ctrlval = CPU_SMT_FORCE_DISABLED;
-   else
+   num_threads = 1;
+   } else if (kstrtoint(buf, 10, _threads) == 0) {
+   if (num_threads == 1)
+   ctrlval = CPU_SMT_DISABLED;
+   else if (cpu_smt_num_threads_valid(num_threads))
+   ctrlval = CPU_SMT_ENABLED;
+   else
+   return -EINVAL;
+   } else {
return -EINVAL;
+   }
 
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
 
-   if (ctrlval != cpu_smt_control) {
-   switch (ctrlval) {
-   case CPU_SMT_ENABLED:
-   ret = cpuhp_smt_enable();
-   break;
-   case CPU_SMT_DISABLED:
-   case CPU_SMT_FORCE_DISABLED:
-   ret = cpuhp_smt_disable(ctrlval);
-   break;
-   }
-   }
+   orig_threads = cpu_smt_num_threads;
+   cpu_smt_num_threads = num_threads;
+
+   force_off = ctrlval != cpu_smt_

[PATCH v3 9/9] powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

Integrate with the generic SMT support, so that when a CPU is DLPAR
onlined it is brought up with the correct SMT mode.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 61fb7cb00880..e62835a12d73 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -398,6 +398,14 @@ static int dlpar_online_cpu(struct device_node *dn)
for_each_present_cpu(cpu) {
if (get_hard_smp_processor_id(cpu) != thread)
continue;
+
+   if (!topology_is_primary_thread(cpu)) {
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   break;
+   if (!topology_smt_thread_allowed(cpu))
+   break;
+   }
+
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
-- 
2.41.0



[PATCH v3 4/9] cpu/SMT: Remove topology_smt_supported()

2023-06-29 Thread Laurent Dufour
Since the maximum number of threads is now passed to
cpu_smt_set_num_threads(), checking that value is enough to know if SMT is
supported.

Cc: Michael Ellerman 
Suggested-by: Thomas Gleixner 
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h | 2 --
 arch/x86/kernel/smpboot.c   | 8 
 kernel/cpu.c| 2 +-
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 66927a59e822..87358a8fe843 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,7 +143,6 @@ int topology_update_die_map(unsigned int dieid, unsigned 
int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
 int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
-bool topology_smt_supported(void);
 #else
 #define topology_max_packages()(1)
 static inline int
@@ -156,7 +155,6 @@ static inline int topology_phys_to_logical_die(unsigned int 
die,
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
-static inline bool topology_smt_supported(void) { return false; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..3052c171668d 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -278,14 +278,6 @@ bool topology_is_primary_thread(unsigned int cpu)
return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
 }
 
-/**
- * topology_smt_supported - Check whether SMT is supported by the CPUs
- */
-bool topology_smt_supported(void)
-{
-   return smp_num_siblings > 1;
-}
-
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
diff --git a/kernel/cpu.c b/kernel/cpu.c
index edca8b7bd400..e354af92b2b8 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -442,7 +442,7 @@ void __init cpu_smt_set_num_threads(unsigned int 
num_threads,
 {
WARN_ON(!num_threads || (num_threads > max_threads));
 
-   if (!topology_smt_supported())
+   if (max_threads == 1)
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
 
cpu_smt_max_threads = max_threads;
-- 
2.41.0



[PATCH v3 8/9] powerpc: Add HOTPLUG_SMT support

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.

Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.

Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.

Signed-off-by: Michael Ellerman 
[ldufour: pass current SMT level to cpu_smt_set_num_threads]
[ldufour: remove topology_smt_supported]
[ldufour: remove topology_smt_threads_supported]
[ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC]
[ldufour: update kernel-parameters.txt]
Signed-off-by: Laurent Dufour 
---
 Documentation/admin-guide/kernel-parameters.txt |  4 ++--
 arch/powerpc/Kconfig|  2 ++
 arch/powerpc/include/asm/topology.h | 15 +++
 arch/powerpc/kernel/smp.c   |  8 +++-
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..5efb6c73a928 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3838,10 +3838,10 @@
nosmp   [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC.  legacy for "maxcpus=0".
 
-   nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
+   nosmt   [KNL,S390,PPC] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
 
-   [KNL,X86] Disable symmetric multithreading (SMT).
+   [KNL,X86,PPC] Disable symmetric multithreading (SMT).
nosmt=force: Force disable SMT, cannot be undone
 via the sysfs control file.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8b955bc7b59f..bacabc3d7f0c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -273,6 +273,8 @@ config PPC
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_VIRT_CPU_ACCOUNTING_GEN
+   select HOTPLUG_SMT  if HOTPLUG_CPU
+   select SMT_NUM_THREADS_DYNAMIC
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8a4d4f4d9749..f4e6f2dd04b7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -143,5 +143,20 @@ static inline int cpu_to_coregroup_id(int cpu)
 #endif
 #endif
 
+#ifdef CONFIG_HOTPLUG_SMT
+#include 
+#include 
+
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+   return cpu == cpu_first_thread_sibling(cpu);
+}
+
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+   return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 406e6d0ffae3..eb539325dff8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1087,7 +1087,7 @@ static int __init init_big_cores(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
-   unsigned int cpu;
+   unsigned int cpu, num_threads;
 
DBG("smp_prepare_cpus\n");
 
@@ -1154,6 +1154,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
if (smp_ops && smp_ops->probe)
smp_ops->probe();
+
+   // Initalise the generic SMT topology support
+   num_threads = 1;
+   if (smt_enabled_at_boot)
+   num_threads = smt_enabled_at_boot;
+   cpu_smt_set_num_threads(num_threads, threads_per_core);
 }
 
 void smp_prepare_boot_cpu(void)
-- 
2.41.0



[PATCH v3 3/9] cpu/SMT: Store the current/max number of threads

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures allows partial SMT states at boot time, ie. when
not all SMT threads are brought online.

To support that the SMT code needs to know the maximum number of SMT
threads, and also the currently configured number.

The architecture code knows the max number of threads, so have the
architecture code pass that value to cpu_smt_set_num_threads(). Note that
although topology_max_smt_threads() exists, it is not configured early
enough to be used here. As architecture, like PowerPC, allows the threads
number to be set through the kernel command line, also pass that value.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit message]
[ldufour: rename cpu_smt_check_topology and add a num_threads argument]
Signed-off-by: Laurent Dufour 
---
 arch/x86/kernel/cpu/bugs.c |  3 ++-
 include/linux/cpu_smt.h|  8 ++--
 kernel/cpu.c   | 21 -
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 182af64387d0..ed71ad385ea7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cpu.h"
 
@@ -133,7 +134,7 @@ void __init check_bugs(void)
 * identify_boot_cpu() initialized SMT support information, let the
 * core code know.
 */
-   cpu_smt_check_topology();
+   cpu_smt_set_num_threads(smp_num_siblings, smp_num_siblings);
 
if (!IS_ENABLED(CONFIG_SMP)) {
pr_info("CPU: ");
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
index 722c2e306fef..0c1664294b57 100644
--- a/include/linux/cpu_smt.h
+++ b/include/linux/cpu_smt.h
@@ -12,15 +12,19 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+extern unsigned int cpu_smt_num_threads;
 extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
+extern void cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads);
 extern bool cpu_smt_possible(void);
 extern int cpuhp_smt_enable(void);
 extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
 #else
 # define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+# define cpu_smt_num_threads 1
 static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
+static inline void cpu_smt_set_num_threads(unsigned int num_threads,
+  unsigned int max_threads) { }
 static inline bool cpu_smt_possible(void) { return false; }
 static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c67049bb3fc8..edca8b7bd400 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -415,6 +415,8 @@ void __weak arch_smt_update(void) { }
 #ifdef CONFIG_HOTPLUG_SMT
 
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+static unsigned int cpu_smt_max_threads __ro_after_init;
+unsigned int cpu_smt_num_threads __read_mostly = UINT_MAX;
 
 void __init cpu_smt_disable(bool force)
 {
@@ -428,16 +430,33 @@ void __init cpu_smt_disable(bool force)
pr_info("SMT: disabled\n");
cpu_smt_control = CPU_SMT_DISABLED;
}
+   cpu_smt_num_threads = 1;
 }
 
 /*
  * The decision whether SMT is supported can only be done after the full
  * CPU identification. Called from architecture code.
  */
-void __init cpu_smt_check_topology(void)
+void __init cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads)
 {
+   WARN_ON(!num_threads || (num_threads > max_threads));
+
if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+
+   cpu_smt_max_threads = max_threads;
+
+   /*
+* If SMT has been disabled via the kernel command line or SMT is
+* not supported, set cpu_smt_num_threads to 1 for consistency.
+* If enabled, take the architecture requested number of threads
+* to bring up into account.
+*/
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   cpu_smt_num_threads = 1;
+   else if (num_threads < cpu_smt_num_threads)
+   cpu_smt_num_threads = num_threads;
 }
 
 static int __init smt_cmdline_disable(char *str)
-- 
2.41.0



[PATCH v3 7/9] powerpc/pseries: Initialise CPU hotplug callbacks earlier

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.

The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").

For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.

So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 22 
 arch/powerpc/platforms/pseries/pseries.h |  2 ++
 arch/powerpc/platforms/pseries/setup.c   |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1a3cb313976a..61fb7cb00880 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -845,15 +845,9 @@ static struct notifier_block pseries_smp_nb = {
.notifier_call = pseries_smp_notifier,
 };
 
-static int __init pseries_cpu_hotplug_init(void)
+void __init pseries_cpu_hotplug_init(void)
 {
int qcss_tok;
-   unsigned int node;
-
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
-   ppc_md.cpu_probe = dlpar_cpu_probe;
-   ppc_md.cpu_release = dlpar_cpu_release;
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
rtas_stop_self_token = rtas_function_token(RTAS_FN_STOP_SELF);
qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
@@ -862,12 +856,22 @@ static int __init pseries_cpu_hotplug_init(void)
qcss_tok == RTAS_UNKNOWN_SERVICE) {
printk(KERN_INFO "CPU Hotplug not supported by firmware "
"- disabling.\n");
-   return 0;
+   return;
}
 
smp_ops->cpu_offline_self = pseries_cpu_offline_self;
smp_ops->cpu_disable = pseries_cpu_disable;
smp_ops->cpu_die = pseries_cpu_die;
+}
+
+static int __init pseries_dlpar_init(void)
+{
+   unsigned int node;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+   ppc_md.cpu_probe = dlpar_cpu_probe;
+   ppc_md.cpu_release = dlpar_cpu_release;
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
/* Processors can be added/removed only on LPAR */
if (firmware_has_feature(FW_FEATURE_LPAR)) {
@@ -886,4 +890,4 @@ static int __init pseries_cpu_hotplug_init(void)
 
return 0;
 }
-machine_arch_initcall(pseries, pseries_cpu_hotplug_init);
+machine_arch_initcall(pseries, pseries_dlpar_init);
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index f8bce40ebd0c..f8893ba46e83 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -75,11 +75,13 @@ static inline int dlpar_hp_pmem(struct pseries_hp_errorlog 
*hp_elog)
 
 #ifdef CONFIG_HOTPLUG_CPU
 int dlpar_cpu(struct pseries_hp_errorlog *hp_elog);
+void pseries_cpu_hotplug_init(void);
 #else
 static inline int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 {
return -EOPNOTSUPP;
 }
+static inline void pseries_cpu_hotplug_init(void) { }
 #endif
 
 /* PCI root bridge prepare function override for pseries */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e2a57cfa6c83..41451b76c6e5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -816,6 +816,8 @@ static void __init pSeries_setup_arch(void)
/* Discover PIC type and setup ppc_md accordingly */
smp_init_pseries();
 
+   // Setup CPU hotplug callbacks
+   pseries_cpu_hotplug_init();
 
if (radix_enabled() && !mmu_has_feature(MMU_FTR_GTSE))
if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
-- 
2.41.0



[PATCH v3 0/9] Introduce SMT level and add PowerPC support

2023-06-29 Thread Laurent Dufour
I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither in user
space, as far as I know. Such a level is helpful when adding new CPU or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT level
(cpu_smt_num_threads) and few callbacks allowing the architecture code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.

v3:
  Fix a build error in the patch 6/9
v2:
  As Thomas suggested,
Reword some commit's description
Remove topology_smt_supported()
Remove topology_smt_threads_supported()
Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC
Remove switch() in __store_smt_control()
  Update kernel-parameters.txt

[1] 
https://lore.kernel.org/linuxppc-dev/20230524155630.794584-1-...@ellerman.id.au/
[2] 
https://lore.kernel.org/linuxppc-dev/20230331153905.31698-1-lduf...@linux.ibm.com/

Laurent Dufour (1):
  cpu/SMT: Remove topology_smt_supported()

Michael Ellerman (8):
  cpu/SMT: Move SMT prototypes into cpu_smt.h
  cpu/SMT: Move smt/control simple exit cases earlier
  cpu/SMT: Store the current/max number of threads
  cpu/SMT: Create topology_smt_thread_allowed()
  cpu/SMT: Allow enabling partial SMT states via sysfs
  powerpc/pseries: Initialise CPU hotplug callbacks earlier
  powerpc: Add HOTPLUG_SMT support
  powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

 .../ABI/testing/sysfs-devices-system-cpu  |   1 +
 .../admin-guide/kernel-parameters.txt |   4 +-
 arch/Kconfig  |   3 +
 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/include/asm/topology.h   |  15 +++
 arch/powerpc/kernel/smp.c |   8 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |  30 +++--
 arch/powerpc/platforms/pseries/pseries.h  |   2 +
 arch/powerpc/platforms/pseries/setup.c|   2 +
 arch/x86/include/asm/topology.h   |   4 +-
 arch/x86/kernel/cpu/bugs.c|   3 +-
 arch/x86/kernel/smpboot.c |   8 --
 include/linux/cpu.h   |  25 +---
 include/linux/cpu_smt.h   |  33 +
 kernel/cpu.c  | 118 ++
 15 files changed, 187 insertions(+), 71 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

-- 
2.41.0



[PATCH v3 1/9] cpu/SMT: Move SMT prototypes into cpu_smt.h

2023-06-29 Thread Laurent Dufour
From: Michael Ellerman 

In order to export the cpuhp_smt_control enum as part of the interface
between generic and architecture code, the architecture code needs to
include asm/topology.h.

But that leads to circular header dependencies. So split the enum and
related declarations into a separate header.

Signed-off-by: Michael Ellerman 
[ldufour: rewording the commit's description]
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h |  2 ++
 include/linux/cpu.h | 25 +
 include/linux/cpu_smt.h | 29 +
 kernel/cpu.c|  1 +
 4 files changed, 33 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..66927a59e822 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,6 +136,8 @@ static inline int topology_max_smt_threads(void)
return __max_smt_threads;
 }
 
+#include 
+
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 8582a7142623..40548f3c201c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct device_node;
@@ -202,30 +203,6 @@ void cpuhp_report_idle_dead(void);
 static inline void cpuhp_report_idle_dead(void) { }
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
-enum cpuhp_smt_control {
-   CPU_SMT_ENABLED,
-   CPU_SMT_DISABLED,
-   CPU_SMT_FORCE_DISABLED,
-   CPU_SMT_NOT_SUPPORTED,
-   CPU_SMT_NOT_IMPLEMENTED,
-};
-
-#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
-extern enum cpuhp_smt_control cpu_smt_control;
-extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
-extern bool cpu_smt_possible(void);
-extern int cpuhp_smt_enable(void);
-extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
-#else
-# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
-static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
-static inline bool cpu_smt_possible(void) { return false; }
-static inline int cpuhp_smt_enable(void) { return 0; }
-static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
-#endif
-
 extern bool cpu_mitigations_off(void);
 extern bool cpu_mitigations_auto_nosmt(void);
 
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
new file mode 100644
index ..722c2e306fef
--- /dev/null
+++ b/include/linux/cpu_smt.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CPU_SMT_H_
+#define _LINUX_CPU_SMT_H_
+
+enum cpuhp_smt_control {
+   CPU_SMT_ENABLED,
+   CPU_SMT_DISABLED,
+   CPU_SMT_FORCE_DISABLED,
+   CPU_SMT_NOT_SUPPORTED,
+   CPU_SMT_NOT_IMPLEMENTED,
+};
+
+#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
+extern enum cpuhp_smt_control cpu_smt_control;
+extern void cpu_smt_disable(bool force);
+extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
+extern int cpuhp_smt_enable(void);
+extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
+#else
+# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+static inline void cpu_smt_disable(bool force) { }
+static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
+static inline int cpuhp_smt_enable(void) { return 0; }
+static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
+#endif
+
+#endif /* _LINUX_CPU_SMT_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f4a2c5845bcb..237394e0574a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -413,6 +413,7 @@ static void lockdep_release_cpus_lock(void)
 void __weak arch_smt_update(void) { }
 
 #ifdef CONFIG_HOTPLUG_SMT
+
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
 void __init cpu_smt_disable(bool force)
-- 
2.41.0



Re: [PATCH v2 0/9] Introduce SMT level and add PowerPC support

2023-06-29 Thread Laurent Dufour




Le 29/06/2023 à 13:10, Michael Ellerman a écrit :

Sachin Sant  writes:

On 28-Jun-2023, at 3:35 PM, Laurent Dufour  wrote:

I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.



Thanks for the patches Laurent.

Is the SMT level retained even when dynamically changing SMT values?
I am observing difference in behaviour with and without smt-enabled
kernel command line option.

When smt-enabled= option is specified SMT level is retained across
cpu core remove and add.

Without this option but changing SMT level during runtime using
ppc64_cpu —smt=, the SMT level is not retained after
cpu core add.


That's because ppc64_cpu is not using the sysfs SMT control file, it's
just onlining/offlining threads manually.

If you run:
  $ ppc64_cpu --smt=4

And then also do:

  $ echo 4 > /sys/devices/system/cpu/smt/control

It should work as expected?

ppc64_cpu will need to be updated to do that automatically.


Hi Sachin and Michael,

Yes, ppc64_cpu will need an update, and I have a patch ready to be sent 
once this series will be accepted.


By the way, I've a fix for the build issue reported against the patch 
6/9. I'll send a v3 soon.


Cheers,

Laurent.


[PATCH v2 0/9] Introduce SMT level and add PowerPC support

2023-06-28 Thread Laurent Dufour
I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither in user
space, as far as I know. Such a level is helpful when adding new CPU or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT level
(cpu_smt_num_threads) and few callbacks allowing the architecture code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.


v2:
  As Thomas suggested,
Reword some commit's description
Remove topology_smt_supported()
Remove topology_smt_threads_supported()
Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC
Remove switch() in __store_smt_control()
  Update kernel-parameters.txt

[1] 
https://lore.kernel.org/linuxppc-dev/20230524155630.794584-1-...@ellerman.id.au/
[2] 
https://lore.kernel.org/linuxppc-dev/20230331153905.31698-1-lduf...@linux.ibm.com/

Laurent Dufour (1):
  cpu/SMT: Remove topology_smt_supported()

Michael Ellerman (8):
  cpu/SMT: Move SMT prototypes into cpu_smt.h
  cpu/SMT: Move smt/control simple exit cases earlier
  cpu/SMT: Store the current/max number of threads
  cpu/SMT: Create topology_smt_thread_allowed()
  cpu/SMT: Allow enabling partial SMT states via sysfs
  powerpc/pseries: Initialise CPU hotplug callbacks earlier
  powerpc: Add HOTPLUG_SMT support
  powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

 .../ABI/testing/sysfs-devices-system-cpu  |   1 +
 .../admin-guide/kernel-parameters.txt |   4 +-
 arch/Kconfig  |   3 +
 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/include/asm/topology.h   |  15 +++
 arch/powerpc/kernel/smp.c |   8 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |  30 +++--
 arch/powerpc/platforms/pseries/pseries.h  |   2 +
 arch/powerpc/platforms/pseries/setup.c|   2 +
 arch/x86/include/asm/topology.h   |   4 +-
 arch/x86/kernel/cpu/bugs.c|   3 +-
 arch/x86/kernel/smpboot.c |   8 --
 include/linux/cpu.h   |  25 +---
 include/linux/cpu_smt.h   |  33 +
 kernel/cpu.c  | 116 ++
 15 files changed, 185 insertions(+), 71 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

-- 
2.41.0



[PATCH v2 7/9] powerpc/pseries: Initialise CPU hotplug callbacks earlier

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.

The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").

For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.

So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 22 
 arch/powerpc/platforms/pseries/pseries.h |  2 ++
 arch/powerpc/platforms/pseries/setup.c   |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1a3cb313976a..61fb7cb00880 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -845,15 +845,9 @@ static struct notifier_block pseries_smp_nb = {
.notifier_call = pseries_smp_notifier,
 };
 
-static int __init pseries_cpu_hotplug_init(void)
+void __init pseries_cpu_hotplug_init(void)
 {
int qcss_tok;
-   unsigned int node;
-
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
-   ppc_md.cpu_probe = dlpar_cpu_probe;
-   ppc_md.cpu_release = dlpar_cpu_release;
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
rtas_stop_self_token = rtas_function_token(RTAS_FN_STOP_SELF);
qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
@@ -862,12 +856,22 @@ static int __init pseries_cpu_hotplug_init(void)
qcss_tok == RTAS_UNKNOWN_SERVICE) {
printk(KERN_INFO "CPU Hotplug not supported by firmware "
"- disabling.\n");
-   return 0;
+   return;
}
 
smp_ops->cpu_offline_self = pseries_cpu_offline_self;
smp_ops->cpu_disable = pseries_cpu_disable;
smp_ops->cpu_die = pseries_cpu_die;
+}
+
+static int __init pseries_dlpar_init(void)
+{
+   unsigned int node;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+   ppc_md.cpu_probe = dlpar_cpu_probe;
+   ppc_md.cpu_release = dlpar_cpu_release;
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
/* Processors can be added/removed only on LPAR */
if (firmware_has_feature(FW_FEATURE_LPAR)) {
@@ -886,4 +890,4 @@ static int __init pseries_cpu_hotplug_init(void)
 
return 0;
 }
-machine_arch_initcall(pseries, pseries_cpu_hotplug_init);
+machine_arch_initcall(pseries, pseries_dlpar_init);
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index f8bce40ebd0c..f8893ba46e83 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -75,11 +75,13 @@ static inline int dlpar_hp_pmem(struct pseries_hp_errorlog 
*hp_elog)
 
 #ifdef CONFIG_HOTPLUG_CPU
 int dlpar_cpu(struct pseries_hp_errorlog *hp_elog);
+void pseries_cpu_hotplug_init(void);
 #else
 static inline int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 {
return -EOPNOTSUPP;
 }
+static inline void pseries_cpu_hotplug_init(void) { }
 #endif
 
 /* PCI root bridge prepare function override for pseries */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e2a57cfa6c83..41451b76c6e5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -816,6 +816,8 @@ static void __init pSeries_setup_arch(void)
/* Discover PIC type and setup ppc_md accordingly */
smp_init_pseries();
 
+   // Setup CPU hotplug callbacks
+   pseries_cpu_hotplug_init();
 
if (radix_enabled() && !mmu_has_feature(MMU_FTR_GTSE))
if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
-- 
2.41.0



[PATCH v2 4/9] cpu/SMT: Remove topology_smt_supported()

2023-06-28 Thread Laurent Dufour
Since the maximum number of threads is now passed to
cpu_smt_set_num_threads(), checking that value is enough to know if SMT is
supported.

Cc: Michael Ellerman 
Suggested-by: Thomas Gleixner 
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h | 2 --
 arch/x86/kernel/smpboot.c   | 8 
 kernel/cpu.c| 2 +-
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 66927a59e822..87358a8fe843 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,7 +143,6 @@ int topology_update_die_map(unsigned int dieid, unsigned 
int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
 int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
-bool topology_smt_supported(void);
 #else
 #define topology_max_packages()(1)
 static inline int
@@ -156,7 +155,6 @@ static inline int topology_phys_to_logical_die(unsigned int 
die,
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
-static inline bool topology_smt_supported(void) { return false; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..3052c171668d 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -278,14 +278,6 @@ bool topology_is_primary_thread(unsigned int cpu)
return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
 }
 
-/**
- * topology_smt_supported - Check whether SMT is supported by the CPUs
- */
-bool topology_smt_supported(void)
-{
-   return smp_num_siblings > 1;
-}
-
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
diff --git a/kernel/cpu.c b/kernel/cpu.c
index edca8b7bd400..e354af92b2b8 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -442,7 +442,7 @@ void __init cpu_smt_set_num_threads(unsigned int 
num_threads,
 {
WARN_ON(!num_threads || (num_threads > max_threads));
 
-   if (!topology_smt_supported())
+   if (max_threads == 1)
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
 
cpu_smt_max_threads = max_threads;
-- 
2.41.0



[PATCH v2 2/9] cpu/SMT: Move smt/control simple exit cases earlier

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

Move the simple exit cases, ie. which don't depend on the value written,
earlier in the function. That makes it clearer that regardless of the
input those states can not be transitioned out of.

That does have a user-visible effect, in that the error returned will
now always be EPERM/ENODEV for those states, regardless of the value
written. Previously writing an invalid value would return EINVAL even
when in those states.

Signed-off-by: Michael Ellerman 
---
 kernel/cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 237394e0574a..c67049bb3fc8 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2482,6 +2482,12 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
 {
int ctrlval, ret;
 
+   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
+   return -EPERM;
+
+   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+   return -ENODEV;
+
if (sysfs_streq(buf, "on"))
ctrlval = CPU_SMT_ENABLED;
else if (sysfs_streq(buf, "off"))
@@ -2491,12 +2497,6 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
else
return -EINVAL;
 
-   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
-   return -EPERM;
-
-   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
-   return -ENODEV;
-
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
-- 
2.41.0



[PATCH v2 9/9] powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

Integrate with the generic SMT support, so that when a CPU is DLPAR
onlined it is brought up with the correct SMT mode.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 61fb7cb00880..e62835a12d73 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -398,6 +398,14 @@ static int dlpar_online_cpu(struct device_node *dn)
for_each_present_cpu(cpu) {
if (get_hard_smp_processor_id(cpu) != thread)
continue;
+
+   if (!topology_is_primary_thread(cpu)) {
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   break;
+   if (!topology_smt_thread_allowed(cpu))
+   break;
+   }
+
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
-- 
2.41.0



[PATCH v2 3/9] cpu/SMT: Store the current/max number of threads

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures allows partial SMT states at boot time, ie. when
not all SMT threads are brought online.

To support that the SMT code needs to know the maximum number of SMT
threads, and also the currently configured number.

The architecture code knows the max number of threads, so have the
architecture code pass that value to cpu_smt_set_num_threads(). Note that
although topology_max_smt_threads() exists, it is not configured early
enough to be used here. As architecture, like PowerPC, allows the threads
number to be set through the kernel command line, also pass that value.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit message]
[ldufour: rename cpu_smt_check_topology and add a num_threads argument]
Signed-off-by: Laurent Dufour 
---
 arch/x86/kernel/cpu/bugs.c |  3 ++-
 include/linux/cpu_smt.h|  8 ++--
 kernel/cpu.c   | 21 -
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 182af64387d0..ed71ad385ea7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cpu.h"
 
@@ -133,7 +134,7 @@ void __init check_bugs(void)
 * identify_boot_cpu() initialized SMT support information, let the
 * core code know.
 */
-   cpu_smt_check_topology();
+   cpu_smt_set_num_threads(smp_num_siblings, smp_num_siblings);
 
if (!IS_ENABLED(CONFIG_SMP)) {
pr_info("CPU: ");
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
index 722c2e306fef..0c1664294b57 100644
--- a/include/linux/cpu_smt.h
+++ b/include/linux/cpu_smt.h
@@ -12,15 +12,19 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+extern unsigned int cpu_smt_num_threads;
 extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
+extern void cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads);
 extern bool cpu_smt_possible(void);
 extern int cpuhp_smt_enable(void);
 extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
 #else
 # define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+# define cpu_smt_num_threads 1
 static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
+static inline void cpu_smt_set_num_threads(unsigned int num_threads,
+  unsigned int max_threads) { }
 static inline bool cpu_smt_possible(void) { return false; }
 static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c67049bb3fc8..edca8b7bd400 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -415,6 +415,8 @@ void __weak arch_smt_update(void) { }
 #ifdef CONFIG_HOTPLUG_SMT
 
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+static unsigned int cpu_smt_max_threads __ro_after_init;
+unsigned int cpu_smt_num_threads __read_mostly = UINT_MAX;
 
 void __init cpu_smt_disable(bool force)
 {
@@ -428,16 +430,33 @@ void __init cpu_smt_disable(bool force)
pr_info("SMT: disabled\n");
cpu_smt_control = CPU_SMT_DISABLED;
}
+   cpu_smt_num_threads = 1;
 }
 
 /*
  * The decision whether SMT is supported can only be done after the full
  * CPU identification. Called from architecture code.
  */
-void __init cpu_smt_check_topology(void)
+void __init cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads)
 {
+   WARN_ON(!num_threads || (num_threads > max_threads));
+
if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+
+   cpu_smt_max_threads = max_threads;
+
+   /*
+* If SMT has been disabled via the kernel command line or SMT is
+* not supported, set cpu_smt_num_threads to 1 for consistency.
+* If enabled, take the architecture requested number of threads
+* to bring up into account.
+*/
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   cpu_smt_num_threads = 1;
+   else if (num_threads < cpu_smt_num_threads)
+   cpu_smt_num_threads = num_threads;
 }
 
 static int __init smt_cmdline_disable(char *str)
-- 
2.41.0



[PATCH v2 8/9] powerpc: Add HOTPLUG_SMT support

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.

Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.

Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.

Signed-off-by: Michael Ellerman 
[ldufour: pass current SMT level to cpu_smt_set_num_threads]
[ldufour: remove topology_smt_supported]
[ldufour: remove topology_smt_threads_supported]
[ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC]
[ldufour: update kernel-parameters.txt]
Signed-off-by: Laurent Dufour 
---
 Documentation/admin-guide/kernel-parameters.txt |  4 ++--
 arch/powerpc/Kconfig|  2 ++
 arch/powerpc/include/asm/topology.h | 15 +++
 arch/powerpc/kernel/smp.c   |  8 +++-
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..5efb6c73a928 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3838,10 +3838,10 @@
nosmp   [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC.  legacy for "maxcpus=0".
 
-   nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
+   nosmt   [KNL,S390,PPC] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
 
-   [KNL,X86] Disable symmetric multithreading (SMT).
+   [KNL,X86,PPC] Disable symmetric multithreading (SMT).
nosmt=force: Force disable SMT, cannot be undone
 via the sysfs control file.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8b955bc7b59f..bacabc3d7f0c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -273,6 +273,8 @@ config PPC
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_VIRT_CPU_ACCOUNTING_GEN
+   select HOTPLUG_SMT  if HOTPLUG_CPU
+   select SMT_NUM_THREADS_DYNAMIC
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8a4d4f4d9749..f4e6f2dd04b7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -143,5 +143,20 @@ static inline int cpu_to_coregroup_id(int cpu)
 #endif
 #endif
 
+#ifdef CONFIG_HOTPLUG_SMT
+#include 
+#include 
+
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+   return cpu == cpu_first_thread_sibling(cpu);
+}
+
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+   return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 406e6d0ffae3..eb539325dff8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1087,7 +1087,7 @@ static int __init init_big_cores(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
-   unsigned int cpu;
+   unsigned int cpu, num_threads;
 
DBG("smp_prepare_cpus\n");
 
@@ -1154,6 +1154,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
if (smp_ops && smp_ops->probe)
smp_ops->probe();
+
+   // Initalise the generic SMT topology support
+   num_threads = 1;
+   if (smt_enabled_at_boot)
+   num_threads = smt_enabled_at_boot;
+   cpu_smt_set_num_threads(num_threads, threads_per_core);
 }
 
 void smp_prepare_boot_cpu(void)
-- 
2.41.0



[PATCH v2 5/9] cpu/SMT: Create topology_smt_thread_allowed()

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures allows partial SMT states, ie. when not all SMT
threads are brought online.

To support that, add an architecture helper which checks whether a given
CPU is allowed to be brought online depending on how many SMT threads are
currently enabled. Since this is only applicable to architecture supporting
partial SMT, only these architectures should select the new configuration
variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not
supporting the partial SMT states, there is no need to define
topology_cpu_smt_allowed(), the generic code assumed that all the threads
are allowed or only the primary ones.

Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.

Signed-off-by: Michael Ellerman 
Suggested-by: Thomas Gleixner 
[ldufour: slightly reword the commit's description]
[ldufour: introduce CONFIG_SMT_NUM_THREADS_DYNAMIC]
Signed-off-by: Laurent Dufour 
---
 arch/Kconfig |  3 +++
 kernel/cpu.c | 24 +++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 205fd23e0cad..c69e9c662a87 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -34,6 +34,9 @@ config ARCH_HAS_SUBPAGE_FAULTS
 config HOTPLUG_SMT
bool
 
+config SMT_NUM_THREADS_DYNAMIC
+   bool
+
 config GENERIC_ENTRY
bool
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e354af92b2b8..29bf310651c6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -466,9 +466,23 @@ static int __init smt_cmdline_disable(char *str)
 }
 early_param("nosmt", smt_cmdline_disable);
 
+/*
+ * For Archicture supporting partial SMT states check if the thread is allowed.
+ * Otherwise this has already been checked through cpu_smt_max_threads when
+ * setting the SMT level.
+ */
+static inline bool cpu_smt_thread_allowed(unsigned int cpu)
+{
+#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
+   return topology_smt_thread_allowed(cpu);
+#else
+   return true;
+#endif
+}
+
 static inline bool cpu_smt_allowed(unsigned int cpu)
 {
-   if (cpu_smt_control == CPU_SMT_ENABLED)
+   if (cpu_smt_control == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
return true;
 
if (topology_is_primary_thread(cpu))
@@ -2283,6 +2297,12 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
for_each_online_cpu(cpu) {
if (topology_is_primary_thread(cpu))
continue;
+   /*
+* Disable can be called with CPU_SMT_ENABLED when changing
+* from a higher to lower number of SMT threads per core.
+*/
+   if (ctrlval == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
+   continue;
ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
if (ret)
break;
@@ -2317,6 +2337,8 @@ int cpuhp_smt_enable(void)
/* Skip online CPUs and CPUs on offline nodes */
if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
continue;
+   if (!cpu_smt_thread_allowed(cpu))
+   continue;
ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
if (ret)
break;
-- 
2.41.0



[PATCH v2 1/9] cpu/SMT: Move SMT prototypes into cpu_smt.h

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

In order to export the cpuhp_smt_control enum as part of the interface
between generic and architecture code, the architecture code needs to
include asm/topology.h.

But that leads to circular header dependencies. So split the enum and
related declarations into a separate header.

Signed-off-by: Michael Ellerman 
[ldufour: rewording the commit's description]
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h |  2 ++
 include/linux/cpu.h | 25 +
 include/linux/cpu_smt.h | 29 +
 kernel/cpu.c|  1 +
 4 files changed, 33 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..66927a59e822 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,6 +136,8 @@ static inline int topology_max_smt_threads(void)
return __max_smt_threads;
 }
 
+#include 
+
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 8582a7142623..40548f3c201c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct device_node;
@@ -202,30 +203,6 @@ void cpuhp_report_idle_dead(void);
 static inline void cpuhp_report_idle_dead(void) { }
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
-enum cpuhp_smt_control {
-   CPU_SMT_ENABLED,
-   CPU_SMT_DISABLED,
-   CPU_SMT_FORCE_DISABLED,
-   CPU_SMT_NOT_SUPPORTED,
-   CPU_SMT_NOT_IMPLEMENTED,
-};
-
-#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
-extern enum cpuhp_smt_control cpu_smt_control;
-extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
-extern bool cpu_smt_possible(void);
-extern int cpuhp_smt_enable(void);
-extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
-#else
-# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
-static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
-static inline bool cpu_smt_possible(void) { return false; }
-static inline int cpuhp_smt_enable(void) { return 0; }
-static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
-#endif
-
 extern bool cpu_mitigations_off(void);
 extern bool cpu_mitigations_auto_nosmt(void);
 
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
new file mode 100644
index ..722c2e306fef
--- /dev/null
+++ b/include/linux/cpu_smt.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CPU_SMT_H_
+#define _LINUX_CPU_SMT_H_
+
+enum cpuhp_smt_control {
+   CPU_SMT_ENABLED,
+   CPU_SMT_DISABLED,
+   CPU_SMT_FORCE_DISABLED,
+   CPU_SMT_NOT_SUPPORTED,
+   CPU_SMT_NOT_IMPLEMENTED,
+};
+
+#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
+extern enum cpuhp_smt_control cpu_smt_control;
+extern void cpu_smt_disable(bool force);
+extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
+extern int cpuhp_smt_enable(void);
+extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
+#else
+# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+static inline void cpu_smt_disable(bool force) { }
+static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
+static inline int cpuhp_smt_enable(void) { return 0; }
+static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
+#endif
+
+#endif /* _LINUX_CPU_SMT_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f4a2c5845bcb..237394e0574a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -413,6 +413,7 @@ static void lockdep_release_cpus_lock(void)
 void __weak arch_smt_update(void) { }
 
 #ifdef CONFIG_HOTPLUG_SMT
+
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
 void __init cpu_smt_disable(bool force)
-- 
2.41.0



[PATCH v2 6/9] cpu/SMT: Allow enabling partial SMT states via sysfs

2023-06-28 Thread Laurent Dufour
From: Michael Ellerman 

Add support to the /sys/devices/system/cpu/smt/control interface for
enabling a specified number of SMT threads per core, including partial
SMT states where not all threads are brought online.

The current interface accepts "on" and "off", to enable either 1 or all
SMT threads per core.

This commit allows writing an integer, between 1 and the number of SMT
threads supported by the machine. Writing 1 is a synonym for "off", 2 or
more enables SMT with the specified number of threads.

When reading the file, if all threads are online "on" is returned, to
avoid changing behaviour for existing users. If some other number of
threads is online then the integer value is returned.

Architectures like x86 only supporting 1 thread or all threads, should not
define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT
states, like PowerPC, should define it.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit's description]
[ldufour: remove switch() in __store_smt_control()]
Signed-off-by: Laurent Dufour 
---
 .../ABI/testing/sysfs-devices-system-cpu  |  1 +
 kernel/cpu.c  | 58 ++-
 2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index f54867cadb0f..3c4cfb59d495 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -555,6 +555,7 @@ Description:Control Symmetric Multi Threading (SMT)
  
=
 "on" SMT is enabled
 "off"SMT is disabled
+""SMT is enabled with N threads per 
core.
 "forceoff"   SMT is force disabled. Cannot be 
changed.
 "notsupported"   SMT is not supported by the CPU
 "notimplemented" SMT runtime toggling is not
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 29bf310651c6..e0111c1a2f98 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2517,11 +2517,19 @@ static const struct attribute_group 
cpuhp_cpu_root_attr_group = {
 
 #ifdef CONFIG_HOTPLUG_SMT
 
+static bool cpu_smt_num_threads_valid(unsigned int threads)
+{
+   if (IS_ENABLED(CONFIG_SMT_NUM_THREADS_DYNAMIC))
+   return threads >= 1 && threads <= cpu_smt_max_threads;
+   return threads == 1 || threads == cpu_smt_max_threads;
+}
+
 static ssize_t
 __store_smt_control(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
 {
-   int ctrlval, ret;
+   int ctrlval, ret, num_threads, orig_threads;
+   bool force_off;
 
if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
return -EPERM;
@@ -2529,30 +2537,39 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
return -ENODEV;
 
-   if (sysfs_streq(buf, "on"))
+   if (sysfs_streq(buf, "on")) {
ctrlval = CPU_SMT_ENABLED;
-   else if (sysfs_streq(buf, "off"))
+   num_threads = cpu_smt_max_threads;
+   } else if (sysfs_streq(buf, "off")) {
ctrlval = CPU_SMT_DISABLED;
-   else if (sysfs_streq(buf, "forceoff"))
+   num_threads = 1;
+   } else if (sysfs_streq(buf, "forceoff")) {
ctrlval = CPU_SMT_FORCE_DISABLED;
-   else
+   num_threads = 1;
+   } else if (kstrtoint(buf, 10, _threads) == 0) {
+   if (num_threads == 1)
+   ctrlval = CPU_SMT_DISABLED;
+   else if (cpu_smt_num_threads_valid(num_threads))
+   ctrlval = CPU_SMT_ENABLED;
+   else
+   return -EINVAL;
+   } else {
return -EINVAL;
+   }
 
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
 
-   if (ctrlval != cpu_smt_control) {
-   switch (ctrlval) {
-   case CPU_SMT_ENABLED:
-   ret = cpuhp_smt_enable();
-   break;
-   case CPU_SMT_DISABLED:
-   case CPU_SMT_FORCE_DISABLED:
-   ret = cpuhp_smt_disable(ctrlval);
-   break;
-   }
-   }
+   orig_threads = cpu_smt_num_threads;
+   cpu_smt_num_threads = num_threads;
+
+   force_off = ctrlval != cpu_smt_control && ctrlval == 
CPU_SMT_FORCE_DISABLED;
+
+   if (num_threads > orig_threads)
+   ret = cpuhp_smt_enable();
+   else if (num_

[PATCH 04/10] cpu/SMT: Remove topology_smt_supported()

2023-06-15 Thread Laurent Dufour
Since the maximum number of threads is now passed to
cpu_smt_set_num_threads(), checking that value is enough to know if SMT is
supported.

Cc: Michael Ellerman 
Suggested-by: Thomas Gleixner 
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/topology.h | 2 --
 arch/x86/kernel/smpboot.c   | 8 
 kernel/cpu.c| 2 +-
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 66927a59e822..87358a8fe843 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,7 +143,6 @@ int topology_update_die_map(unsigned int dieid, unsigned 
int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
 int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
-bool topology_smt_supported(void);
 #else
 #define topology_max_packages()(1)
 static inline int
@@ -156,7 +155,6 @@ static inline int topology_phys_to_logical_die(unsigned int 
die,
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
-static inline bool topology_smt_supported(void) { return false; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..3052c171668d 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -278,14 +278,6 @@ bool topology_is_primary_thread(unsigned int cpu)
return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
 }
 
-/**
- * topology_smt_supported - Check whether SMT is supported by the CPUs
- */
-bool topology_smt_supported(void)
-{
-   return smp_num_siblings > 1;
-}
-
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
diff --git a/kernel/cpu.c b/kernel/cpu.c
index edca8b7bd400..e354af92b2b8 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -442,7 +442,7 @@ void __init cpu_smt_set_num_threads(unsigned int 
num_threads,
 {
WARN_ON(!num_threads || (num_threads > max_threads));
 
-   if (!topology_smt_supported())
+   if (max_threads == 1)
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
 
cpu_smt_max_threads = max_threads;
-- 
2.41.0



[PATCH 08/10] powerpc/pseries: Initialise CPU hotplug callbacks earlier

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

As part of the generic HOTPLUG_SMT code, there is support for disabling
secondary SMT threads at boot time, by passing "nosmt" on the kernel
command line.

The way that is implemented is the secondary threads are brought partly
online, and then taken back offline again. That is done to support x86
CPUs needing certain initialisation done on all threads. However powerpc
has similar needs, see commit d70a54e2d085 ("powerpc/powernv: Ignore
smt-enabled on Power8 and later").

For that to work the powerpc CPU hotplug callbacks need to be registered
before secondary CPUs are brought online, otherwise __cpu_disable()
fails due to smp_ops->cpu_disable being NULL.

So split the basic initialisation into pseries_cpu_hotplug_init() which
can be called early from setup_arch(). The DLPAR related initialisation
can still be done later, because it needs to do allocations.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 22 
 arch/powerpc/platforms/pseries/pseries.h |  2 ++
 arch/powerpc/platforms/pseries/setup.c   |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1a3cb313976a..61fb7cb00880 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -845,15 +845,9 @@ static struct notifier_block pseries_smp_nb = {
.notifier_call = pseries_smp_notifier,
 };
 
-static int __init pseries_cpu_hotplug_init(void)
+void __init pseries_cpu_hotplug_init(void)
 {
int qcss_tok;
-   unsigned int node;
-
-#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
-   ppc_md.cpu_probe = dlpar_cpu_probe;
-   ppc_md.cpu_release = dlpar_cpu_release;
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
rtas_stop_self_token = rtas_function_token(RTAS_FN_STOP_SELF);
qcss_tok = rtas_function_token(RTAS_FN_QUERY_CPU_STOPPED_STATE);
@@ -862,12 +856,22 @@ static int __init pseries_cpu_hotplug_init(void)
qcss_tok == RTAS_UNKNOWN_SERVICE) {
printk(KERN_INFO "CPU Hotplug not supported by firmware "
"- disabling.\n");
-   return 0;
+   return;
}
 
smp_ops->cpu_offline_self = pseries_cpu_offline_self;
smp_ops->cpu_disable = pseries_cpu_disable;
smp_ops->cpu_die = pseries_cpu_die;
+}
+
+static int __init pseries_dlpar_init(void)
+{
+   unsigned int node;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+   ppc_md.cpu_probe = dlpar_cpu_probe;
+   ppc_md.cpu_release = dlpar_cpu_release;
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
/* Processors can be added/removed only on LPAR */
if (firmware_has_feature(FW_FEATURE_LPAR)) {
@@ -886,4 +890,4 @@ static int __init pseries_cpu_hotplug_init(void)
 
return 0;
 }
-machine_arch_initcall(pseries, pseries_cpu_hotplug_init);
+machine_arch_initcall(pseries, pseries_dlpar_init);
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index f8bce40ebd0c..f8893ba46e83 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -75,11 +75,13 @@ static inline int dlpar_hp_pmem(struct pseries_hp_errorlog 
*hp_elog)
 
 #ifdef CONFIG_HOTPLUG_CPU
 int dlpar_cpu(struct pseries_hp_errorlog *hp_elog);
+void pseries_cpu_hotplug_init(void);
 #else
 static inline int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 {
return -EOPNOTSUPP;
 }
+static inline void pseries_cpu_hotplug_init(void) { }
 #endif
 
 /* PCI root bridge prepare function override for pseries */
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e2a57cfa6c83..41451b76c6e5 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -816,6 +816,8 @@ static void __init pSeries_setup_arch(void)
/* Discover PIC type and setup ppc_md accordingly */
smp_init_pseries();
 
+   // Setup CPU hotplug callbacks
+   pseries_cpu_hotplug_init();
 
if (radix_enabled() && !mmu_has_feature(MMU_FTR_GTSE))
if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
-- 
2.41.0



[PATCH 01/10] cpu/SMT: Move SMT prototypes into cpu_smt.h

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

A subsequent patch would like to use the cpuhp_smt_control enum as part
of the interface between generic and arch code.

Currently that leads to circular header dependencies. So split the enum
and related declarations into a separate header.

Signed-off-by: Michael Ellerman 
---
 arch/x86/include/asm/topology.h |  2 ++
 include/linux/cpu.h | 25 +
 include/linux/cpu_smt.h | 29 +
 kernel/cpu.c|  1 +
 4 files changed, 33 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..66927a59e822 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,6 +136,8 @@ static inline int topology_max_smt_threads(void)
return __max_smt_threads;
 }
 
+#include 
+
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 8582a7142623..40548f3c201c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct device_node;
@@ -202,30 +203,6 @@ void cpuhp_report_idle_dead(void);
 static inline void cpuhp_report_idle_dead(void) { }
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
-enum cpuhp_smt_control {
-   CPU_SMT_ENABLED,
-   CPU_SMT_DISABLED,
-   CPU_SMT_FORCE_DISABLED,
-   CPU_SMT_NOT_SUPPORTED,
-   CPU_SMT_NOT_IMPLEMENTED,
-};
-
-#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
-extern enum cpuhp_smt_control cpu_smt_control;
-extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
-extern bool cpu_smt_possible(void);
-extern int cpuhp_smt_enable(void);
-extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
-#else
-# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
-static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
-static inline bool cpu_smt_possible(void) { return false; }
-static inline int cpuhp_smt_enable(void) { return 0; }
-static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
-#endif
-
 extern bool cpu_mitigations_off(void);
 extern bool cpu_mitigations_auto_nosmt(void);
 
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
new file mode 100644
index ..722c2e306fef
--- /dev/null
+++ b/include/linux/cpu_smt.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CPU_SMT_H_
+#define _LINUX_CPU_SMT_H_
+
+enum cpuhp_smt_control {
+   CPU_SMT_ENABLED,
+   CPU_SMT_DISABLED,
+   CPU_SMT_FORCE_DISABLED,
+   CPU_SMT_NOT_SUPPORTED,
+   CPU_SMT_NOT_IMPLEMENTED,
+};
+
+#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
+extern enum cpuhp_smt_control cpu_smt_control;
+extern void cpu_smt_disable(bool force);
+extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
+extern int cpuhp_smt_enable(void);
+extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
+#else
+# define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+static inline void cpu_smt_disable(bool force) { }
+static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
+static inline int cpuhp_smt_enable(void) { return 0; }
+static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
+#endif
+
+#endif /* _LINUX_CPU_SMT_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f4a2c5845bcb..237394e0574a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -413,6 +413,7 @@ static void lockdep_release_cpus_lock(void)
 void __weak arch_smt_update(void) { }
 
 #ifdef CONFIG_HOTPLUG_SMT
+
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
 void __init cpu_smt_disable(bool force)
-- 
2.41.0



[PATCH 03/10] cpu/SMT: Store the current/max number of threads

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

Some architectures (ppc64) allows partial SMT states at boot time, ie. when
not all SMT threads are brought online.

To support that the SMT code needs to know the maximum number of SMT
threads, and also the currently configured number.

The architecture code knows the max number of threads, so have the
architecture code pass that value to cpu_smt_set_num_threads(). Note that
although topology_max_smt_threads() exists, it is not configured early
enough to be used here. As architecture, like PowerPC, allows the threads
number to be set through the kernel command line, also pass that value.

Signed-off-by: Michael Ellerman 
[ldufour: slightly reword the commit message]
[ldufour: rename cpu_smt_check_topology and add a num_threads argument]
Signed-off-by: Laurent Dufour 
---
 arch/x86/kernel/cpu/bugs.c |  3 ++-
 include/linux/cpu_smt.h|  8 ++--
 kernel/cpu.c   | 21 -
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 182af64387d0..ed71ad385ea7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cpu.h"
 
@@ -133,7 +134,7 @@ void __init check_bugs(void)
 * identify_boot_cpu() initialized SMT support information, let the
 * core code know.
 */
-   cpu_smt_check_topology();
+   cpu_smt_set_num_threads(smp_num_siblings, smp_num_siblings);
 
if (!IS_ENABLED(CONFIG_SMP)) {
pr_info("CPU: ");
diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h
index 722c2e306fef..0c1664294b57 100644
--- a/include/linux/cpu_smt.h
+++ b/include/linux/cpu_smt.h
@@ -12,15 +12,19 @@ enum cpuhp_smt_control {
 
 #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
 extern enum cpuhp_smt_control cpu_smt_control;
+extern unsigned int cpu_smt_num_threads;
 extern void cpu_smt_disable(bool force);
-extern void cpu_smt_check_topology(void);
+extern void cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads);
 extern bool cpu_smt_possible(void);
 extern int cpuhp_smt_enable(void);
 extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
 #else
 # define cpu_smt_control   (CPU_SMT_NOT_IMPLEMENTED)
+# define cpu_smt_num_threads 1
 static inline void cpu_smt_disable(bool force) { }
-static inline void cpu_smt_check_topology(void) { }
+static inline void cpu_smt_set_num_threads(unsigned int num_threads,
+  unsigned int max_threads) { }
 static inline bool cpu_smt_possible(void) { return false; }
 static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 
0; }
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c67049bb3fc8..edca8b7bd400 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -415,6 +415,8 @@ void __weak arch_smt_update(void) { }
 #ifdef CONFIG_HOTPLUG_SMT
 
 enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
+static unsigned int cpu_smt_max_threads __ro_after_init;
+unsigned int cpu_smt_num_threads __read_mostly = UINT_MAX;
 
 void __init cpu_smt_disable(bool force)
 {
@@ -428,16 +430,33 @@ void __init cpu_smt_disable(bool force)
pr_info("SMT: disabled\n");
cpu_smt_control = CPU_SMT_DISABLED;
}
+   cpu_smt_num_threads = 1;
 }
 
 /*
  * The decision whether SMT is supported can only be done after the full
  * CPU identification. Called from architecture code.
  */
-void __init cpu_smt_check_topology(void)
+void __init cpu_smt_set_num_threads(unsigned int num_threads,
+   unsigned int max_threads)
 {
+   WARN_ON(!num_threads || (num_threads > max_threads));
+
if (!topology_smt_supported())
cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
+
+   cpu_smt_max_threads = max_threads;
+
+   /*
+* If SMT has been disabled via the kernel command line or SMT is
+* not supported, set cpu_smt_num_threads to 1 for consistency.
+* If enabled, take the architecture requested number of threads
+* to bring up into account.
+*/
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   cpu_smt_num_threads = 1;
+   else if (num_threads < cpu_smt_num_threads)
+   cpu_smt_num_threads = num_threads;
 }
 
 static int __init smt_cmdline_disable(char *str)
-- 
2.41.0



[PATCH 09/10] powerpc: Add HOTPLUG_SMT support

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
parameter.

Implement the recently added hooks to allow partial SMT states, allow
any number of threads per core.

Tie the config symbol to HOTPLUG_CPU, which enables it on the major
platforms that support SMT. If there are other platforms that want the
SMT support that can be tweaked in future.

Signed-off-by: Michael Ellerman 
[ldufour: pass current SMT level to cpu_smt_set_num_threads]
[ldufour: remove topology_smt_supported]
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/Kconfig|  1 +
 arch/powerpc/include/asm/topology.h | 20 
 arch/powerpc/kernel/smp.c   |  8 +++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 539d1f03ff42..5cf87ca10a9c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -273,6 +273,7 @@ config PPC
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_VIRT_CPU_ACCOUNTING_GEN
+   select HOTPLUG_SMT  if HOTPLUG_CPU
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8a4d4f4d9749..7602f17d688a 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -143,5 +143,25 @@ static inline int cpu_to_coregroup_id(int cpu)
 #endif
 #endif
 
+#ifdef CONFIG_HOTPLUG_SMT
+#include 
+#include 
+
+static inline bool topology_smt_threads_supported(unsigned int num_threads)
+{
+   return num_threads <= threads_per_core;
+}
+
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+   return cpu == cpu_first_thread_sibling(cpu);
+}
+
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+   return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 265801a3e94c..cdb77d36cdd0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1087,7 +1087,7 @@ static int __init init_big_cores(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
-   unsigned int cpu;
+   unsigned int cpu, num_threads;
 
DBG("smp_prepare_cpus\n");
 
@@ -1154,6 +1154,12 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
if (smp_ops && smp_ops->probe)
smp_ops->probe();
+
+   // Initalise the generic SMT topology support
+   num_threads = 1;
+   if (smt_enabled_at_boot)
+   num_threads = smt_enabled_at_boot;
+   cpu_smt_set_num_threads(num_threads, threads_per_core);
 }
 
 void smp_prepare_boot_cpu(void)
-- 
2.41.0



[PATCH 07/10] cpu/SMT: Allow enabling partial SMT states via sysfs

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

Add support to the /sys/devices/system/cpu/smt/control interface for
enabling a specified number of SMT threads per core, including partial
SMT states where not all threads are brought online.

The current interface accepts "on" and "off", to enable either 1 or all
SMT threads per core.

This commit allows writing an integer, between 1 and the number of SMT
threads supported by the machine. Writing 1 is a synonym for "off", 2 or
more enables SMT with the specified number of threads.

When reading the file, if all threads are online "on" is returned, to
avoid changing behaviour for existing users. If some other number of
threads is online then the integer value is returned.

There is a hook which allows arch code to control how many threads per
core are supported. To retain the existing behaviour, the x86 hook only
supports 1 thread or all threads.

Signed-off-by: Michael Ellerman 
---
 .../ABI/testing/sysfs-devices-system-cpu  |  1 +
 kernel/cpu.c  | 39 ---
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index f54867cadb0f..3c4cfb59d495 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -555,6 +555,7 @@ Description:Control Symmetric Multi Threading (SMT)
  
=
 "on" SMT is enabled
 "off"SMT is disabled
+""SMT is enabled with N threads per 
core.
 "forceoff"   SMT is force disabled. Cannot be 
changed.
 "notsupported"   SMT is not supported by the CPU
 "notimplemented" SMT runtime toggling is not
diff --git a/kernel/cpu.c b/kernel/cpu.c
index ae2fa26a5b63..248f0734098a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2507,7 +2507,7 @@ static ssize_t
 __store_smt_control(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
 {
-   int ctrlval, ret;
+   int ctrlval, ret, num_threads, orig_threads;
 
if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
return -EPERM;
@@ -2515,20 +2515,38 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
return -ENODEV;
 
-   if (sysfs_streq(buf, "on"))
+   if (sysfs_streq(buf, "on")) {
ctrlval = CPU_SMT_ENABLED;
-   else if (sysfs_streq(buf, "off"))
+   num_threads = cpu_smt_max_threads;
+   } else if (sysfs_streq(buf, "off")) {
ctrlval = CPU_SMT_DISABLED;
-   else if (sysfs_streq(buf, "forceoff"))
+   num_threads = 1;
+   } else if (sysfs_streq(buf, "forceoff")) {
ctrlval = CPU_SMT_FORCE_DISABLED;
-   else
+   num_threads = 1;
+   } else if (kstrtoint(buf, 10, _threads) == 0) {
+   if (num_threads == 1)
+   ctrlval = CPU_SMT_DISABLED;
+   else if (num_threads > 1 && 
topology_smt_threads_supported(num_threads))
+   ctrlval = CPU_SMT_ENABLED;
+   else
+   return -EINVAL;
+   } else {
return -EINVAL;
+   }
 
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
 
-   if (ctrlval != cpu_smt_control) {
+   orig_threads = cpu_smt_num_threads;
+   cpu_smt_num_threads = num_threads;
+
+   if (num_threads > orig_threads) {
+   ret = cpuhp_smt_enable();
+   } else if (num_threads < orig_threads) {
+   ret = cpuhp_smt_disable(ctrlval);
+   } else if (ctrlval != cpu_smt_control) {
switch (ctrlval) {
case CPU_SMT_ENABLED:
ret = cpuhp_smt_enable();
@@ -2566,6 +2584,15 @@ static ssize_t control_show(struct device *dev,
 {
const char *state = smt_states[cpu_smt_control];
 
+   /*
+* If SMT is enabled but not all threads are enabled then show the
+* number of threads. If all threads are enabled show "on". Otherwise
+* show the state name.
+*/
+   if (cpu_smt_control == CPU_SMT_ENABLED &&
+   cpu_smt_num_threads != cpu_smt_max_threads)
+   return sysfs_emit(buf, "%d\n", cpu_smt_num_threads);
+
return snprintf(buf, PAGE_SIZE - 2, "%s\n", state);
 }
 
-- 
2.41.0



[PATCH 02/10] cpu/SMT: Move smt/control simple exit cases earlier

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

Move the simple exit cases, ie. which don't depend on the value written,
earlier in the function. That makes it clearer that regardless of the
input those states can not be transitioned out of.

That does have a user-visible effect, in that the error returned will
now always be EPERM/ENODEV for those states, regardless of the value
written. Previously writing an invalid value would return EINVAL even
when in those states.

Signed-off-by: Michael Ellerman 
---
 kernel/cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 237394e0574a..c67049bb3fc8 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2482,6 +2482,12 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
 {
int ctrlval, ret;
 
+   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
+   return -EPERM;
+
+   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+   return -ENODEV;
+
if (sysfs_streq(buf, "on"))
ctrlval = CPU_SMT_ENABLED;
else if (sysfs_streq(buf, "off"))
@@ -2491,12 +2497,6 @@ __store_smt_control(struct device *dev, struct 
device_attribute *attr,
else
return -EINVAL;
 
-   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
-   return -EPERM;
-
-   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
-   return -ENODEV;
-
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
-- 
2.41.0



[PATCH 00/10] Introduce SMT level and add PowerPC support

2023-06-15 Thread Laurent Dufour
I'm taking over the series Michael sent previously [1] which is smartly
reviewing the initial series I sent [2].  This series is addressing the
comments sent by Thomas and me on the Michael's one.

Here is a short introduction to the issue this series is addressing:

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level may confused end users and/or some applications.

There is no SMT level recorded in the kernel (common code), neither in user
space, as far as I know. Such a level is helpful when adding new CPU or
when optimizing the energy efficiency (when reactivating CPUs).

When SMP and HOTPLUG_SMT are defined, this series is adding a new SMT level
(cpu_smt_num_threads) and few callbacks allowing the architecture code to
fine control this value, setting a max and a "at boot" level, and
controling whether a thread should be onlined or not.

[1] 
https://lore.kernel.org/linuxppc-dev/20230524155630.794584-1-...@ellerman.id.au/
[2] 
https://lore.kernel.org/linuxppc-dev/20230331153905.31698-1-lduf...@linux.ibm.com/

Laurent Dufour (1):
  cpu/SMT: Remove topology_smt_supported()

Michael Ellerman (9):
  cpu/SMT: Move SMT prototypes into cpu_smt.h
  cpu/SMT: Move smt/control simple exit cases earlier
  cpu/SMT: Store the current/max number of threads
  cpu/SMT: Create topology_smt_threads_supported()
  cpu/SMT: Create topology_smt_thread_allowed()
  cpu/SMT: Allow enabling partial SMT states via sysfs
  powerpc/pseries: Initialise CPU hotplug callbacks earlier
  powerpc: Add HOTPLUG_SMT support
  powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

 .../ABI/testing/sysfs-devices-system-cpu  |  1 +
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/topology.h   | 20 +
 arch/powerpc/kernel/smp.c |  8 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  | 30 +--
 arch/powerpc/platforms/pseries/pseries.h  |  2 +
 arch/powerpc/platforms/pseries/setup.c|  2 +
 arch/x86/include/asm/topology.h   |  8 +-
 arch/x86/kernel/cpu/bugs.c|  3 +-
 arch/x86/kernel/smpboot.c | 25 +-
 include/linux/cpu.h   | 25 +-
 include/linux/cpu_smt.h   | 33 
 kernel/cpu.c  | 83 +++
 13 files changed, 187 insertions(+), 54 deletions(-)
 create mode 100644 include/linux/cpu_smt.h

-- 
2.41.0



[PATCH 06/10] cpu/SMT: Create topology_smt_thread_allowed()

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

A subsequent patch will enable partial SMT states, ie. when not all SMT
threads are brought online.

To support that, add an arch helper which checks whether a given CPU is
allowed to be brought online depending on how many SMT threads are
currently enabled.

Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.

Signed-off-by: Michael Ellerman 
---
 arch/x86/include/asm/topology.h |  2 ++
 arch/x86/kernel/smpboot.c   | 15 +++
 kernel/cpu.c| 10 +-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 232df5ffab34..4696d4566cb5 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -144,6 +144,7 @@ int topology_phys_to_logical_pkg(unsigned int pkg);
 int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
 bool topology_smt_threads_supported(unsigned int threads);
+bool topology_smt_thread_allowed(unsigned int cpu);
 #else
 #define topology_max_packages()(1)
 static inline int
@@ -157,6 +158,7 @@ static inline int topology_max_die_per_package(void) { 
return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
 static inline bool topology_smt_threads_supported(unsigned int threads) { 
return false; }
+static inline bool topology_smt_thread_allowed(unsigned int cpu) { return 
false; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d163ef55577b..cfae55c2d1b0 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -290,6 +290,21 @@ bool topology_smt_threads_supported(unsigned int threads)
return threads == 1 || threads == smp_num_siblings;
 }
 
+/**
+ * topology_smt_thread_allowed - When enabling SMT check whether this 
particular
+ *  CPU thread is allowed to be brought online.
+ * @cpu:   CPU to check
+ */
+bool topology_smt_thread_allowed(unsigned int cpu)
+{
+   /*
+* No extra logic s required here to support different thread values
+* because threads will always == 1 or smp_num_siblings because of
+* topology_smt_threads_supported().
+*/
+   return true;
+}
+
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e354af92b2b8..ae2fa26a5b63 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -468,7 +468,7 @@ early_param("nosmt", smt_cmdline_disable);
 
 static inline bool cpu_smt_allowed(unsigned int cpu)
 {
-   if (cpu_smt_control == CPU_SMT_ENABLED)
+   if (cpu_smt_control == CPU_SMT_ENABLED && 
topology_smt_thread_allowed(cpu))
return true;
 
if (topology_is_primary_thread(cpu))
@@ -2283,6 +2283,12 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
for_each_online_cpu(cpu) {
if (topology_is_primary_thread(cpu))
continue;
+   /*
+* Disable can be called with CPU_SMT_ENABLED when changing
+* from a higher to lower number of SMT threads per core.
+*/
+   if (ctrlval == CPU_SMT_ENABLED && 
topology_smt_thread_allowed(cpu))
+   continue;
ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
if (ret)
break;
@@ -2317,6 +2323,8 @@ int cpuhp_smt_enable(void)
/* Skip online CPUs and CPUs on offline nodes */
if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
continue;
+   if (!topology_smt_thread_allowed(cpu))
+   continue;
ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
if (ret)
break;
-- 
2.41.0



[PATCH 10/10] powerpc/pseries: Honour current SMT state when DLPAR onlining CPUs

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

Integrate with the generic SMT support, so that when a CPU is DLPAR
onlined it is brought up with the correct SMT mode.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 61fb7cb00880..e62835a12d73 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -398,6 +398,14 @@ static int dlpar_online_cpu(struct device_node *dn)
for_each_present_cpu(cpu) {
if (get_hard_smp_processor_id(cpu) != thread)
continue;
+
+   if (!topology_is_primary_thread(cpu)) {
+   if (cpu_smt_control != CPU_SMT_ENABLED)
+   break;
+   if (!topology_smt_thread_allowed(cpu))
+   break;
+   }
+
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
rc = device_online(get_cpu_device(cpu));
-- 
2.41.0



[PATCH 05/10] cpu/SMT: Create topology_smt_threads_supported()

2023-06-15 Thread Laurent Dufour
From: Michael Ellerman 

A subsequent patch will enable partial SMT states, ie. when not all SMT
threads are brought online.

To support that, add an arch helper to check how many SMT threads are
supported.

To retain existing behaviour, the x86 implementation only allows a
single thread or all threads to be online.

Signed-off-by: Michael Ellerman 
---
 arch/x86/include/asm/topology.h |  2 ++
 arch/x86/kernel/smpboot.c   | 12 
 2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 87358a8fe843..232df5ffab34 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,6 +143,7 @@ int topology_update_die_map(unsigned int dieid, unsigned 
int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
 int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
+bool topology_smt_threads_supported(unsigned int threads);
 #else
 #define topology_max_packages()(1)
 static inline int
@@ -155,6 +156,7 @@ static inline int topology_phys_to_logical_die(unsigned int 
die,
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
+static inline bool topology_smt_threads_supported(unsigned int threads) { 
return false; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3052c171668d..d163ef55577b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -278,6 +278,18 @@ bool topology_is_primary_thread(unsigned int cpu)
return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
 }
 
+/**
+ * topology_smt_threads_supported - Check if the given number of SMT threads
+ * is supported.
+ *
+ * @threads:   The number of SMT threads.
+ */
+bool topology_smt_threads_supported(unsigned int threads)
+{
+   // Only support a single thread or all threads.
+   return threads == 1 || threads == smp_num_siblings;
+}
+
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
-- 
2.41.0



Re: [PATCH 3/9] cpu/SMT: Store the current/max number of threads

2023-06-14 Thread Laurent Dufour
On 13/06/2023 20:53:56, Thomas Gleixner wrote:
> On Tue, Jun 13 2023 at 19:16, Laurent Dufour wrote:
>> On 10/06/2023 23:26:18, Thomas Gleixner wrote:
>>> On Thu, May 25 2023 at 01:56, Michael Ellerman wrote:
>>>>  #ifdef CONFIG_HOTPLUG_SMT
>>>>  enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
>>>> +static unsigned int cpu_smt_max_threads __ro_after_init;
>>>> +unsigned int cpu_smt_num_threads;
>>>
>>> Why needs this to be global? cpu_smt_control is pointlessly global already.
>>
>> I agree that cpu_smt_*_threads should be static.

I spoke too quickly, cpu_smt_num_threads is used in the powerpc code.

When a new CPU is added it used to decide whether a thread has to be
onlined or not, and there is no way to pass it as argument at this time.
In details, it is used in topology_smt_thread_allowed() called by
dlpar_online_cpu() (see patch "powerpc/pseries: Honour current SMT state
when DLPAR onlining CPUs" at the end of this series).

I think the best option is to keep it global.

>>
>> Howwever, regarding cpu_smt_control, it is used in 2 places in the x86 code:
>>  - arch/x86/power/hibernate.c in arch_resume_nosmt()
>>  - arch/x86/kernel/cpu/bugs.c in spectre_v2_user_select_mitigation()
> 
> Bah. I must have fatfingered the grep then.
> 
>> An accessor function may be introduced to read that value in these 2
>> functions, but I'm wondering if that's really the best option.
>>
>> Unless there is a real need to change this through this series, I think
>> cpu_smt_control can remain global.
> 
> That's fine.
> 
> Thanks,
> 
> tglx



Re: [PATCH 3/9] cpu/SMT: Store the current/max number of threads

2023-06-13 Thread Laurent Dufour
On 10/06/2023 23:26:18, Thomas Gleixner wrote:
> On Thu, May 25 2023 at 01:56, Michael Ellerman wrote:
>>  #ifdef CONFIG_HOTPLUG_SMT
>>  enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
>> +static unsigned int cpu_smt_max_threads __ro_after_init;
>> +unsigned int cpu_smt_num_threads;
> 
> Why needs this to be global? cpu_smt_control is pointlessly global already.

I agree that cpu_smt_*_threads should be static.

Howwever, regarding cpu_smt_control, it is used in 2 places in the x86 code:
 - arch/x86/power/hibernate.c in arch_resume_nosmt()
 - arch/x86/kernel/cpu/bugs.c in spectre_v2_user_select_mitigation()

An accessor function may be introduced to read that value in these 2
functions, but I'm wondering if that's really the best option.

Unless there is a real need to change this through this series, I think
cpu_smt_control can remain global.

Thomas, are you ok with that?

> 
>>  void __init cpu_smt_disable(bool force)
>>  {
>> @@ -433,10 +435,18 @@ void __init cpu_smt_disable(bool force)
>>   * The decision whether SMT is supported can only be done after the full
>>   * CPU identification. Called from architecture code.
>>   */
>> -void __init cpu_smt_check_topology(void)
>> +void __init cpu_smt_check_topology(unsigned int num_threads)
>>  {
>>  if (!topology_smt_supported())
>>  cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
>> +
>> +cpu_smt_max_threads = num_threads;
>> +
>> +// May already be disabled by nosmt command line parameter
>> +if (cpu_smt_control != CPU_SMT_ENABLED)
>> +cpu_smt_num_threads = 1;
>> +else
>> +cpu_smt_num_threads = num_threads;
> 
> Taking Laurents findings into account this should be something like
> the incomplete below.
> 
> x86 would simply invoke cpu_smt_set_num_threads() with both arguments as
> smp_num_siblings while PPC can funnel its command line parameter through
> the num_threads argument.

I do prefer cpu_smt_set_num_threads() also.

Thanks,
Laurent

> 
> Thanks,
> 
> tglx
> ---
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -414,6 +414,8 @@ void __weak arch_smt_update(void) { }
>  
>  #ifdef CONFIG_HOTPLUG_SMT
>  enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
> +static unsigned int cpu_smt_max_threads __ro_after_init;
> +static unsigned int cpu_smt_num_threads = UINT_MAX;
>  
>  void __init cpu_smt_disable(bool force)
>  {
> @@ -427,24 +429,31 @@ void __init cpu_smt_disable(bool force)
>   pr_info("SMT: disabled\n");
>   cpu_smt_control = CPU_SMT_DISABLED;
>   }
> + cpu_smt_num_threads = 1;
>  }
>  
>  /*
>   * The decision whether SMT is supported can only be done after the full
>   * CPU identification. Called from architecture code.
>   */
> -void __init cpu_smt_check_topology(void)
> +void __init cpu_smt_set_num_threads(unsigned int max_threads, unsigned int 
> num_threads)
>  {
> - if (!topology_smt_supported())
> + if (max_threads == 1)
>   cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
> -}
>  
> -static int __init smt_cmdline_disable(char *str)
> -{
> - cpu_smt_disable(str && !strcmp(str, "force"));
> - return 0;
> + cpu_smt_max_threads = max_threads;
> +
> + /*
> +  * If SMT has been disabled via the kernel command line or SMT is
> +  * not supported, set cpu_smt_num_threads to 1 for consistency.
> +  * If enabled, take the architecture requested number of threads
> +  * to bring up into account.
> +  */
> + if (cpu_smt_control != CPU_SMT_ENABLED)
> + cpu_smt_num_threads = 1;
> + else if (num_threads < cpu_smt_num_threads)
> + cpu_smt_num_threads = num_threads;
>  }
> -early_param("nosmt", smt_cmdline_disable);
>  
>  static inline bool cpu_smt_allowed(unsigned int cpu)
>  {
> @@ -463,6 +472,13 @@ static inline bool cpu_smt_allowed(unsig
>   return !cpumask_test_cpu(cpu, _booted_once_mask);
>  }
>  
> +static int __init smt_cmdline_disable(char *str)
> +{
> + cpu_smt_disable(str && !strcmp(str, "force"));
> + return 0;
> +}
> +early_param("nosmt", smt_cmdline_disable);
> +
>  /* Returns true if SMT is not supported of forcefully (irreversibly) 
> disabled */
>  bool cpu_smt_possible(void)
>  {



Re: [PATCH 8/9] powerpc: Add HOTPLUG_SMT support

2023-06-12 Thread Laurent Dufour
On 10/06/2023 23:10:02, Thomas Gleixner wrote:
> On Thu, Jun 01 2023 at 18:19, Laurent Dufour wrote:
>> @@ -435,12 +435,17 @@ void __init cpu_smt_disable(bool force)
>>   * The decision whether SMT is supported can only be done after the full
>>   * CPU identification. Called from architecture code.
>>   */
>> -void __init cpu_smt_check_topology(unsigned int num_threads)
>> +void __init cpu_smt_check_topology(unsigned int num_threads,
>> +   unsigned int max_threads)
>>  {
>>  if (!topology_smt_supported())
>>  cpu_smt_control = CPU_SMT_NOT_SUPPORTED;
>>  
>> -cpu_smt_max_threads = num_threads;
>> +cpu_smt_max_threads = max_threads;
>> +
>> +WARN_ON(num_threads > max_threads);
>> +if (num_threads > max_threads)
>> +num_threads = max_threads;
> 
> This does not work. The call site does:
> 
>> +cpu_smt_check_topology(smt_enabled_at_boot, threads_per_core);
> 
> smt_enabled_at_boot is 0 when 'smt-enabled=off', which is not what the
> hotplug core expects. If SMT is disabled it brings up the primary
> thread, which means cpu_smt_num_threads = 1.

Thanks, Thomas,
Definitively, a test against smt_enabled_at_boot==0 is required here.

> This needs more thoughts to avoid a completely inconsistent duct tape
> mess.

Despite the test against smt_enabled_at_boot, mentioned above, I can't see
anything else to rework. Am I missing something?

> 
> Btw, the command line parser and the variable smt_enabled_at_boot being
> type int allow negative number of threads too... Maybe not what you want.

I do agree, it should an unsigned type.

Thanks,
Laurent.

> Thanks,
> 
> tglx
> 
> 
> 
> 



Re: [PATCH 8/9] powerpc: Add HOTPLUG_SMT support

2023-06-01 Thread Laurent Dufour
On 01/06/2023 15:27:30, Laurent Dufour wrote:
> On 24/05/2023 17:56:29, Michael Ellerman wrote:
>> Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
>> files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
>> parameter.
> 
> Hi Michael,
> 
> It seems that there is now a conflict between with the PPC 'smt-enabled'
> boot option.
> 
> Booting the patched kernel with 'smt-enabled=4', later, change to the SMT
> level (for instance to 6) done through /sys/devices/system/cpu/smt/control
> are not applied. Nothing happens.
> Based on my early debug, I think the reasons is that cpu_smt_num_threads=8
> when entering __store_smt_control(). But I need to dig further.

I dug deeper.

FWIW, I think smt_enabled_at_boot should be passed to
cpu_smt_check_topology() in smp_prepare_cpus(), instead of
threads_per_core. But that's not enough to fix the issue because this value
is also used to set cpu_smt_max_threads.

To achieve that, cpu_smt_check_topology() should receive 2 parameters, the
current SMT level define at boot time, and the maximum SMT level.

The attached patch is fixing the issue on my ppc64 test LPAR.
This patch is not addressing the x86 architecture (I didn't get the time to
do it, but it should be doable) and should be spread among the patches 3
and 8 of your series.

Hope this helps.

Cheers,
Laurent.

> 
> BTW, should the 'smt-enabled' PPC specific option remain?
> 
> Cheers,
> Laurent.
> 
>> Implement the recently added hooks to allow partial SMT states, allow
>> any number of threads per core.
>>
>> Tie the config symbol to HOTPLUG_CPU, which enables it on the major
>> platforms that support SMT. If there are other platforms that want the
>> SMT support that can be tweaked in future.
>>
>> Signed-off-by: Michael Ellerman 
>> ---
>>  arch/powerpc/Kconfig|  1 +
>>  arch/powerpc/include/asm/topology.h | 25 +
>>  arch/powerpc/kernel/smp.c   |  3 +++
>>  3 files changed, 29 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 539d1f03ff42..5cf87ca10a9c 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -273,6 +273,7 @@ config PPC
>>  select HAVE_SYSCALL_TRACEPOINTS
>>  select HAVE_VIRT_CPU_ACCOUNTING
>>  select HAVE_VIRT_CPU_ACCOUNTING_GEN
>> +select HOTPLUG_SMT  if HOTPLUG_CPU
>>  select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
>>  select IOMMU_HELPER if PPC64
>>  select IRQ_DOMAIN
>> diff --git a/arch/powerpc/include/asm/topology.h 
>> b/arch/powerpc/include/asm/topology.h
>> index 8a4d4f4d9749..1e9117a22d14 100644
>> --- a/arch/powerpc/include/asm/topology.h
>> +++ b/arch/powerpc/include/asm/topology.h
>> @@ -143,5 +143,30 @@ static inline int cpu_to_coregroup_id(int cpu)
>>  #endif
>>  #endif
>>  
>> +#ifdef CONFIG_HOTPLUG_SMT
>> +#include 
>> +#include 
>> +
>> +static inline bool topology_smt_supported(void)
>> +{
>> +return threads_per_core > 1;
>> +}
>> +
>> +static inline bool topology_smt_threads_supported(unsigned int num_threads)
>> +{
>> +return num_threads <= threads_per_core;
>> +}
>> +
>> +static inline bool topology_is_primary_thread(unsigned int cpu)
>> +{
>> +return cpu == cpu_first_thread_sibling(cpu);
>> +}
>> +
>> +static inline bool topology_smt_thread_allowed(unsigned int cpu)
>> +{
>> +return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
>> +}
>> +#endif
>> +
>>  #endif /* __KERNEL__ */
>>  #endif  /* _ASM_POWERPC_TOPOLOGY_H */
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 265801a3e94c..eed20b9253b7 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1154,6 +1154,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>>  
>>  if (smp_ops && smp_ops->probe)
>>  smp_ops->probe();
>> +
>> +// Initalise the generic SMT topology support
>> +cpu_smt_check_topology(threads_per_core);
>>  }
>>  
>>  void smp_prepare_boot_cpu(void)
> 
From 682e7d78fb98d6298926e88e5093e2172488ea6f Mon Sep 17 00:00:00 2001
From: Laurent Dufour 
Date: Thu, 1 Jun 2023 18:02:55 +0200
Subject: [PATCH] Consider the SMT level specify at boot time

This allows PPC kernel to boot with a SMT level different from 1 or threads
per core value.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kernel/smp.c | 2 +-
 include/lin

Re: [PATCH 8/9] powerpc: Add HOTPLUG_SMT support

2023-06-01 Thread Laurent Dufour
On 24/05/2023 17:56:29, Michael Ellerman wrote:
> Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support
> files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot
> parameter.

Hi Michael,

It seems that there is now a conflict between with the PPC 'smt-enabled'
boot option.

Booting the patched kernel with 'smt-enabled=4', later, change to the SMT
level (for instance to 6) done through /sys/devices/system/cpu/smt/control
are not applied. Nothing happens.
Based on my early debug, I think the reasons is that cpu_smt_num_threads=8
when entering __store_smt_control(). But I need to dig further.

BTW, should the 'smt-enabled' PPC specific option remain?

Cheers,
Laurent.

> Implement the recently added hooks to allow partial SMT states, allow
> any number of threads per core.
> 
> Tie the config symbol to HOTPLUG_CPU, which enables it on the major
> platforms that support SMT. If there are other platforms that want the
> SMT support that can be tweaked in future.
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/Kconfig|  1 +
>  arch/powerpc/include/asm/topology.h | 25 +
>  arch/powerpc/kernel/smp.c   |  3 +++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 539d1f03ff42..5cf87ca10a9c 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -273,6 +273,7 @@ config PPC
>   select HAVE_SYSCALL_TRACEPOINTS
>   select HAVE_VIRT_CPU_ACCOUNTING
>   select HAVE_VIRT_CPU_ACCOUNTING_GEN
> + select HOTPLUG_SMT  if HOTPLUG_CPU
>   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
>   select IOMMU_HELPER if PPC64
>   select IRQ_DOMAIN
> diff --git a/arch/powerpc/include/asm/topology.h 
> b/arch/powerpc/include/asm/topology.h
> index 8a4d4f4d9749..1e9117a22d14 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -143,5 +143,30 @@ static inline int cpu_to_coregroup_id(int cpu)
>  #endif
>  #endif
>  
> +#ifdef CONFIG_HOTPLUG_SMT
> +#include 
> +#include 
> +
> +static inline bool topology_smt_supported(void)
> +{
> + return threads_per_core > 1;
> +}
> +
> +static inline bool topology_smt_threads_supported(unsigned int num_threads)
> +{
> + return num_threads <= threads_per_core;
> +}
> +
> +static inline bool topology_is_primary_thread(unsigned int cpu)
> +{
> + return cpu == cpu_first_thread_sibling(cpu);
> +}
> +
> +static inline bool topology_smt_thread_allowed(unsigned int cpu)
> +{
> + return cpu_thread_in_core(cpu) < cpu_smt_num_threads;
> +}
> +#endif
> +
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_POWERPC_TOPOLOGY_H */
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 265801a3e94c..eed20b9253b7 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1154,6 +1154,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  
>   if (smp_ops && smp_ops->probe)
>   smp_ops->probe();
> +
> + // Initalise the generic SMT topology support
> + cpu_smt_check_topology(threads_per_core);
>  }
>  
>  void smp_prepare_boot_cpu(void)



Re: [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler

2023-04-24 Thread Laurent Dufour
On 23/04/2023 12:52:12, Sourabh Jain wrote:
> On PowePC memblock regions are used to prepare elfcorehdr which
> describes the memory regions of the running kernel to the kdump kernel.
> Since the notifier used for the memory hotplug crash handler gets
> initiated before the update of the memblock region happens (as depicted
> below) the newly prepared elfcorehdr still holds the old memory regions.
> If the elfcorehdr is prepared with stale memblock regions then the newly
> prepared elfcorehdr will still be holding stale memory regions. And dump
> collection with stale elfcorehdr will lead to dump collection failure or
> incomplete dump collection.
> 
> The sequence of actions done on PowerPC when an LMB memory hot removed:
> 
>  Initiate memory hot remove
>   |
>   v
>  offline pages
>   |
>   v
>  initiate memory notify call
>  chain for MEM_OFFLINE event  <---> Prepare new elfcorehdr and replace
>   it with old one
>   |
>   v
>  update memblock regions
> 
> Such challenges only exist for memory remove case. For the memory add
> case the memory regions are updated first and then memory notify calls
> the arch crash hotplug handler to update the elfcorehdr.
> 
> This patch passes additional information about the hot removed LMB to
> the arch crash hotplug handler in the form of memory_notify object.
> 
> How passing memory_notify to arch crash hotplug handler will help?
> 
> memory_notify holds the start PFN and page count of the hot removed
> memory. With that base address and the size of the hot removed memory
> can be calculated and same can be used to avoid adding hot removed
> memory region to get added in the elfcorehdr.
> 
> Signed-off-by: Sourabh Jain 
> Reviewed-by: Laurent Dufour 

I don't remember sending a review-by on this patch earlier, do you?

> ---
>  arch/powerpc/include/asm/kexec.h |  2 +-
>  arch/powerpc/kexec/core_64.c |  3 ++-
>  arch/x86/include/asm/kexec.h |  2 +-
>  arch/x86/kernel/crash.c  |  3 ++-
>  include/linux/kexec.h|  2 +-
>  kernel/crash_core.c  | 14 +++---
>  6 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index f01ba767af56e..7e811bad5ec92 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -104,7 +104,7 @@ struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
>  #if defined(CONFIG_CRASH_HOTPLUG)
> -void arch_crash_handle_hotplug_event(struct kimage *image);
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>  #endif
>  #endif
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 611b89bcea2be..147ea6288a526 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -551,10 +551,11 @@ int update_cpus_node(void *fdt)
>   * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to 
> update the
>   *necessary kexec segments based on the 
> hotplug event.
>   * @image: the active struct kimage
> + * @arg: struct memory_notify handler for memory add/remove case and NULL 
> for CPU case.
>   *
>   * Update FDT segment to include newly added CPU. No action for CPU remove 
> case.
>   */
> -void arch_crash_handle_hotplug_event(struct kimage *image)
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>  {
>   void *fdt, *ptr;
>   unsigned long mem;
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 1bc852ce347d4..70c3b23b468b6 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -213,7 +213,7 @@ extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>  extern void kdump_nmi_shootdown_cpus(void);
>  
>  #ifdef CONFIG_CRASH_HOTPLUG
> -void arch_crash_handle_hotplug_event(struct kimage *image);
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index ead602636f3e0..b45d13193b579 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -445,11 +445,12 @@ int crash_load_segments(struct kimage *image)
>  /**
>   * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>   * @image: the active struct kimage
> + * @arg: s

Re: [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support

2023-04-24 Thread Laurent Dufour
On 23/04/2023 12:52:13, Sourabh Jain wrote:
> Extend PowerPC arch crash hotplug handler to support memory hotplug
> events. Since elfcorehdr is used to exchange the memory info between the
> kernels hence it needs to be recreated to reflect the changes due to
> memory hotplug events.
> 
> The way memory hotplug events are handled on PowerPC and the notifier
> call chain used in generic code to trigger the arch crash handler, the
> process to recreate the elfcorehdr is different for memory add and
> remove case.
> 
> For memory remove case the memory change notifier call chain is
> triggered first and then memblock regions is updated. Whereas for the
> memory hot add case, memblock regions are updated before invoking the
> memory change notifier call chain.
> 
> On PowerPC, memblock regions list is used to prepare the elfcorehdr. In
> case of memory hot remove the memblock regions are updated after the
> arch crash hotplug handler is triggered, hence an additional step is
> taken to ensure that memory ranges used to prepare elfcorehdr do not
> include hot removed memory.
> 
> When memory is hot removed it possible that memory regions count may
> increase. So to accommodate a growing number of memory regions, the
> elfcorehdr kexec segment is built with additional buffer space.
> 
> The changes done here will also work for the kexec_load system call given
> that the kexec tool builds the elfcoredhr with additional space to
> accommodate future memory regions as it is done for kexec_file_load
> system call in the kernel.
> 
> Signed-off-by: Sourabh Jain 
> Reviewed-by: Laurent Dufour 

I don't remember sending a review-by on this patch earlier, do you?

> ---
>  arch/powerpc/include/asm/kexec_ranges.h |  1 +
>  arch/powerpc/kexec/core_64.c| 77 +-
>  arch/powerpc/kexec/file_load_64.c   | 36 ++-
>  arch/powerpc/kexec/ranges.c | 85 +
>  4 files changed, 195 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
> b/arch/powerpc/include/asm/kexec_ranges.h
> index f83866a19e870..802abf580cf0f 100644
> --- a/arch/powerpc/include/asm/kexec_ranges.h
> +++ b/arch/powerpc/include/asm/kexec_ranges.h
> @@ -7,6 +7,7 @@
>  void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
>  struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
>  int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
> +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
>  int add_tce_mem_ranges(struct crash_mem **mem_ranges);
>  int add_initrd_mem_range(struct crash_mem **mem_ranges);
>  #ifdef CONFIG_PPC_64S_HASH_MMU
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 147ea6288a526..01a764b1c9b07 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -547,6 +548,76 @@ int update_cpus_node(void *fdt)
>  #undef pr_fmt
>  #define pr_fmt(fmt) "crash hp: " fmt
>  
> +/**
> + * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with 
> old
> + *  elfcorehdr in the kexec segment array.
> + * @image: the active struct kimage
> + * @arg: struct memory_notify data handler
> + */
> +static void update_crash_elfcorehdr(struct kimage *image, struct 
> memory_notify *mn)
> +{
> + int ret;
> + struct crash_mem *cmem = NULL;
> + struct kexec_segment *ksegment;
> + void *ptr, *mem, *elfbuf = NULL;
> + unsigned long elfsz, memsz, base_addr, size;
> +
> + ksegment = >segment[image->elfcorehdr_index];
> + mem = (void *) ksegment->mem;
> + memsz = ksegment->memsz;
> +
> + ret = get_crash_memory_ranges();
> + if (ret) {
> + pr_err("Failed to get crash mem range\n");
> + return;
> + }
> +
> + /*
> +  * The hot unplugged memory is not yet removed from crash memory
> +  * ranges, remove it here.
> +  */
> + if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
> + base_addr = PFN_PHYS(mn->start_pfn);
> + size = mn->nr_pages * PAGE_SIZE;
> + ret = remove_mem_range(, base_addr, size);
> + if (ret) {
> + pr_err("Failed to remove hot-unplugged from crash 
> memory ranges.\n");
> + return;
> + }
> + }
> +
> + ret = crash_prepare_elf64_headers(cmem, false, , );
> + if (ret) {
> + pr_err("Fail

Re: [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support

2023-04-24 Thread Laurent Dufour
On 23/04/2023 12:52:11, Sourabh Jain wrote:
> Introduce powerpc crash hotplug handler to update the necessary kexec
> segments in the kernel on CPU/Memory hotplug events. Currently, these
> updates are done by monitoring CPU/Memory hotplug events in userspace.
> 
> A common crash hotplug handler is triggered from generic infrastructure
> for both CPU/Memory hotplug events. But in this patch, crash updates are
> handled only for CPU hotplug events. Support for the crash update on
> memory hotplug events is added in upcoming patches.
> 
> The elfcorehdr segment is used to exchange the CPU and other
> dump-related information between the kernels. Ideally, the elfcorehdr
> segment needs to be recreated on CPU hotplug events to reflect the
> changes. But on powerpc, the elfcorehdr is built with possible CPUs
> hence there is no need to update/recreate the elfcorehdr on CPU hotplug
> events.
> 
> In addition to elfcorehdr, there is another kexec segment that holds CPU
> data on powerpc is FDT (Flattened Device Tree). During the kdump kernel
> boot, it is expected that the crashing CPU must be present in FDT, else
> kdump kernel boot fails.
> 
> Now the only action needed on powerpc to handle the crash CPU hotplug
> event is to add hot added CPUs in the kdump FDT segment to avoid kdump
> kernel boot failure. So for the CPU hot add event, the FDT segment is
> updated with hot added CPU and Since there is no need to remove the hot
> unplugged CPUs from the FDT segment hence no action was taken for CPU
> hot remove event.
> 
> To accommodate a growing number of CPUs, FDT is built with additional
> buffer space to ensure that it can hold possible CPU nodes.
> 
> The changes done here will also work for the kexec_load system call
> given that the kexec tool builds the FDT segment with additional space
> to accommodate possible CPU nodes.
> 
> Since memory crash hotplug support is not there yet the crash hotplug
> the handler simply warns the user and returns.
> 
> Signed-off-by: Sourabh Jain 
> Reviewed-by: Laurent Dufour  ---
>  arch/powerpc/include/asm/kexec.h  |  4 ++
>  arch/powerpc/kexec/core_64.c  | 61 +++
>  arch/powerpc/kexec/elf_64.c   | 12 +-
>  arch/powerpc/kexec/file_load_64.c | 14 +++
>  4 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 8090ad7d97d9d..f01ba767af56e 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -103,6 +103,10 @@ void kexec_copy_flush(struct kimage *image);
>  struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +void arch_crash_handle_hotplug_event(struct kimage *image);
> +#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
> +#endif
>  #endif
>  
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 0b292f93a74cc..611b89bcea2be 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -543,6 +543,67 @@ int update_cpus_node(void *fdt)
>   return ret;
>  }
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +#undef pr_fmt
> +#define pr_fmt(fmt) "crash hp: " fmt
> +
> +/**
> + * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to 
> update the
> + *necessary kexec segments based on the 
> hotplug event.
> + * @image: the active struct kimage
> + *
> + * Update FDT segment to include newly added CPU. No action for CPU remove 
> case.
> + */
> +void arch_crash_handle_hotplug_event(struct kimage *image)
> +{
> + void *fdt, *ptr;
> + unsigned long mem;
> + int i, fdt_index = -1;
> + unsigned int hp_action = image->hp_action;
> +
> + /*
> +  * Since the hot-unplugged CPU is already part of crash FDT,
> +  * no action is needed for CPU remove case.
> +  */
> + if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
> + return;
> +
> + /* crash update on memory hotplug events is not supported yet */
> + if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == 
> KEXEC_CRASH_HP_ADD_MEMORY) {
> + pr_info_once("Crash update is not supported for memory 
> hotplug\n");
> + return;
> + }
> +
> + /* Find the FDT segment index in kexec segment array. */
> + for (i = 0; i < image->nr_segments; i++) {
> + mem = image->segment[i].mem;
> + ptr = __va(mem);
> 

Re: [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG

2023-04-24 Thread Laurent Dufour
On 23/04/2023 12:52:10, Sourabh Jain wrote:
> Due to CPU/Memory hot plug/unplug or online/offline events the system
> resources changes. A similar change should reflect in the loaded kdump
> kernel kexec segments that describes the state of the CPU and memory of
> the running kernel.
> 
> If the kdump kernel kexec segments are not updated after the CPU/Memory
> hot plug/unplug or online/offline events and kdump kernel tries to
> collect the dump with the stale system resource data then this might
> lead to dump collection failure or an inaccurate dump collection.
> 
> The current method to keep the kdump kernel kexec segments up to date is
> by reloading the complete kdump kernel whenever a CPU/Memory hot
> plug/unplug or online/offline event is observed in userspace. Reloading
> the kdump kernel for every CPU/Memory hot plug/unplug or online/offline
> event is inefficient and creates a large window where the kdump service
> is not available. It can be improved by doing in-kernel updates to only
> necessary kdump kernel kexec segments which describe CPU and Memory
> resources of the running kernel to the kdump kernel.
> 
> The kernel changes related to in-kernel updates to the kdump kernel
> kexec segments are kept under the CRASH_HOTPLUG config option.
> 
> Later in the series, a powerpc crash hotplug handler is introduced to
> update the kdump kernel kexec segments on CPU/Memory hotplug events.
> This arch-specific handler is triggered from a generic crash handler
> that registers with the CPU/Memory add/remove notifiers.
> 
> The CRASH_HOTPLUG config option is enabled by default.
> 
> Signed-off-by: Sourabh Jain 
> Reviewed-by: Laurent Dufour 

I can't remember having sent a review-by on that patch earlier.

Anyway, I can't find any issue with that one, so replace with:
Reviewed-by: Laurent Dufour 

> ---
>  arch/powerpc/Kconfig | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index a6c4407d3ec83..ac0dc0ffe89b4 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -681,6 +681,18 @@ config CRASH_DUMP
> The same kernel binary can be used as production kernel and dump
> capture kernel.
>  
> +config CRASH_HOTPLUG
> + bool "In-kernel update to kdump kernel on system configuration changes"
> + default y
> + depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> + help
> +   Quick and efficient mechanism to update the kdump kernel in the
> +   event of CPU/Memory hot plug/unplug or online/offline events. This
> +   approach does the in-kernel update to only necessary kexec segment
> +   instead of unload-reload entire kdump kernel from userspace.
> +
> +   If unsure, say Y.
> +
>  config FA_DUMP
>   bool "Firmware-assisted dump"
>   depends on PPC64 && (PPC_RTAS || PPC_POWERNV)



Re: [PATCH 1/2] pseries/smp: export the smt level in the SYS FS.

2023-04-13 Thread Laurent Dufour
On 13/04/2023 15:37:59, Michael Ellerman wrote:
> Hi Laurent,
> 
> Laurent Dufour  writes:
>> There is no SMT level recorded in the kernel neither in user space.
>> Indeed there is no real constraint about that and mixed SMT levels are
>> allowed and system is working fine this way.
>>
>> However when new CPU are added, the kernel is onlining all the threads
>> which is leading to mixed SMT levels and confuse end user a bit.
>>
>> To prevent this exports a SMT level from the kernel so user space
>> application like the energy daemon, could read it to adjust their settings.
>> There is no action unless recording the value when a SMT value is written
>> into the new sysfs entry. User space applications like ppc64_cpu should
>> update the sysfs when changing the SMT level to keep the system consistent.
>>
>> Suggested-by: Srikar Dronamraju 
>> Signed-off-by: Laurent Dufour 
>> ---
>>  arch/powerpc/platforms/pseries/pseries.h |  3 ++
>>  arch/powerpc/platforms/pseries/smp.c | 39 
>>  2 files changed, 42 insertions(+)
> 
> There is a generic sysfs interface for smt in /sys/devices/system/cpu/smt
> 
> I think we should be enabling that on powerpc and then adapting it to
> our needs, rather than adding a pseries specific file.

Thanks Michael, I was not aware of this sysfs interface.

> Currently the generic code is only aware of SMT on/off, so it would need
> to be taught about SMT4 and 8 at least.

Do you think we should limit our support to SMT4 and SMT8 only?

> There are already hooks in the generic code to check the SMT level when
> bringing CPUs up, see cpu_smt_allowed(), they may work for the pseries
> hotplug case too, though maybe we need some additional logic.
> 
> Wiring up the basic support is pretty straight forward, something like
> the diff below.

I'll look into how to wire this up.
Thanks a lot!

> cheers
> 
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 0f123f1f62a1..a48576f1c579 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -260,6 +260,7 @@ config PPC
>   select HAVE_SYSCALL_TRACEPOINTS
>   select HAVE_VIRT_CPU_ACCOUNTING
>   select HAVE_VIRT_CPU_ACCOUNTING_GEN
> + select HOTPLUG_SMT  if HOTPLUG_CPU
>   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
>   select IOMMU_HELPER if PPC64
>   select IRQ_DOMAIN
> diff --git a/arch/powerpc/include/asm/topology.h 
> b/arch/powerpc/include/asm/topology.h
> index 8a4d4f4d9749..bd23ba716d23 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -143,5 +143,8 @@ static inline int cpu_to_coregroup_id(int cpu)
>  #endif
>  #endif
> 
> +bool topology_is_primary_thread(unsigned int cpu);
> +bool topology_smt_supported(void);
> +
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_POWERPC_TOPOLOGY_H */
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 265801a3e94c..8619609809d5 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1769,4 +1769,20 @@ void __noreturn arch_cpu_idle_dead(void)
>   start_secondary_resume();
>  }
> 
> +/**
> + * topology_is_primary_thread - Check whether CPU is the primary SMT thread
> + * @cpu: CPU to check
> + */
> +bool topology_is_primary_thread(unsigned int cpu)
> +{
> + return cpu == cpu_first_thread_sibling(cpu);
> +}
> +
> +/**
> + * topology_smt_supported - Check whether SMT is supported by the CPUs
> + */
> +bool topology_smt_supported(void)
> +{
> + return threads_per_core > 1;
> +}
>  #endif



Re: [PATCH 1/2] pseries/smp: export the smt level in the SYS FS.

2023-04-03 Thread Laurent Dufour
On 31/03/2023 18:05:27, Michal Suchánek wrote:
> Hello,
> 
> On Fri, Mar 31, 2023 at 05:39:04PM +0200, Laurent Dufour wrote:
>> There is no SMT level recorded in the kernel neither in user space.
>> Indeed there is no real constraint about that and mixed SMT levels are
>> allowed and system is working fine this way.
>>
>> However when new CPU are added, the kernel is onlining all the threads
>> which is leading to mixed SMT levels and confuse end user a bit.
>>
>> To prevent this exports a SMT level from the kernel so user space
>> application like the energy daemon, could read it to adjust their settings.
>> There is no action unless recording the value when a SMT value is written
>> into the new sysfs entry. User space applications like ppc64_cpu should
>> update the sysfs when changing the SMT level to keep the system consistent.
>>
>> Suggested-by: Srikar Dronamraju 
>> Signed-off-by: Laurent Dufour 
>> ---
>>  arch/powerpc/platforms/pseries/pseries.h |  3 ++
>>  arch/powerpc/platforms/pseries/smp.c | 39 
>>  2 files changed, 42 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h 
>> b/arch/powerpc/platforms/pseries/pseries.h
>> index f8bce40ebd0c..af0a145af98f 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -23,7 +23,9 @@ extern int pSeries_machine_check_exception(struct pt_regs 
>> *regs);
>>  extern long pseries_machine_check_realmode(struct pt_regs *regs);
>>  void pSeries_machine_check_log_err(void);
>>  
>> +
>>  #ifdef CONFIG_SMP
>> +extern int pseries_smt;
>>  extern void smp_init_pseries(void);
>>  
>>  /* Get state of physical CPU from query_cpu_stopped */
>> @@ -34,6 +36,7 @@ int smp_query_cpu_stopped(unsigned int pcpu);
>>  #define QCSS_HARDWARE_ERROR -1
>>  #define QCSS_HARDWARE_BUSY -2
>>  #else
>> +#define pseries_smt 1
> 
> Is this really needed for anything?
> 
> The code using pseries_smt would not compile with a define, and would be
> only compiled with SMP enabled anyway so we should not need this.
> 

Hi Michal,

I do agree, the pseries code is implying SMP.

When writing that code, I found that SMP conditional block and just add
this define to be sure the code will compile in the case SMP is not
defined, but that's probably useless.

Instead of resending a new series, Michael, could you please remove that
line when applying the patch to your tree?

Thanks,
Laurent.

> Thanks
> 
> Michal
> 
>>  static inline void smp_init_pseries(void) { }
>>  #endif
>>  
>> diff --git a/arch/powerpc/platforms/pseries/smp.c 
>> b/arch/powerpc/platforms/pseries/smp.c
>> index c597711ef20a..6c382922f8f3 100644
>> --- a/arch/powerpc/platforms/pseries/smp.c
>> +++ b/arch/powerpc/platforms/pseries/smp.c
>> @@ -21,6 +21,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  #include 
>> @@ -45,6 +46,8 @@
>>  
>>  #include "pseries.h"
>>  
>> +int pseries_smt;
>> +
>>  /*
>>   * The Primary thread of each non-boot processor was started from the OF 
>> client
>>   * interface by prom_hold_cpus and is spinning on secondary_hold_spinloop.
>> @@ -280,3 +283,39 @@ void __init smp_init_pseries(void)
>>  
>>  pr_debug(" <- smp_init_pSeries()\n");
>>  }
>> +
>> +static ssize_t pseries_smt_store(struct class *class,
>> + struct class_attribute *attr,
>> + const char *buf, size_t count)
>> +{
>> +int smt;
>> +
>> +if (kstrtou32(buf, 0, ) || !smt || smt > (u32) threads_per_core) {
>> +pr_err("Invalid pseries_smt specified.\n");
>> +return -EINVAL;
>> +}
>> +
>> +pseries_smt = smt;
>> +
>> +return count;
>> +}
>> +
>> +static ssize_t pseries_smt_show(struct class *class, struct class_attribute 
>> *attr,
>> +  char *buf)
>> +{
>> +return sysfs_emit(buf, "%d\n", pseries_smt);
>> +}
>> +
>> +static CLASS_ATTR_RW(pseries_smt);
>> +
>> +static int __init pseries_smt_init(void)
>> +{
>> +int rc;
>> +
>> +pseries_smt = smt_enabled_at_boot;
>> +rc = sysfs_create_file(kernel_kobj, _attr_pseries_smt.attr);
>> +if (rc)
>> +pr_err("Can't create pseries_smt sysfs/kernel entry.\n");
>> +return rc;
>> +}
>> +machine_device_initcall(pseries, pseries_smt_init);
>> -- 
>> 2.40.0
>>



[PATCH 0/2] Online new threads according to the current SMT level

2023-03-31 Thread Laurent Dufour
When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

This mixed SMT level is confusing end users and some application like
lparstat are reporting wrong values.

There is no SMT level recorded in the kernel, neither in user space. Such a
level could be helpful when adding new CPU or when optimizing the energy
efficiency. This series introduce a new SYS FS entry named 'pseries_smt' to
store the current SMT level.

The SMT level is provided in best effort, writing a new value into that
entry is only recording it into the kernel. This way, it can be used when
new CPU are onlined for instance. There is no real change to the CPU setup
when a value is written, no CPU are onlined or offlined.

At boot time `pseries_smt` is loaded with smt_enabled_at_boot which is
containing the SMT level set at boot time, even if no kernel option is
specified.

The change is specific to pseries since CPU hot-plug is only provided for
this platform.

The second patch of this series is implementing the change to online only
the right number of threads when a new CPU is added.

Laurent Dufour (2):
  pseries/smp: export the smt level in the SYS FS.
  powerpc/pseries/cpuhp: respect current SMT when adding new CPU

 arch/powerpc/platforms/pseries/hotplug-cpu.c | 18 ++---
 arch/powerpc/platforms/pseries/pseries.h |  3 ++
 arch/powerpc/platforms/pseries/smp.c | 39 
 3 files changed, 55 insertions(+), 5 deletions(-)

-- 
2.40.0



[PATCH 2/2] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-03-31 Thread Laurent Dufour
When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

We rely on the newly pseries_smt value which should be updated when
changing the SMT level by ppc64_cpu --smt=x and at boot time using the
smt-enabled kernel option.

This way on a LPAR running in SMT=4, newly added CPU will be running 4
threads, which is what a end user would expect.

Cc: Srikar Dronamraju 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 18 +-
 arch/powerpc/platforms/pseries/smp.c |  2 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1a3cb313976a..e623ed8649b3 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn)
 {
int rc = 0;
unsigned int cpu;
-   int len, nthreads, i;
+   int len, nthreads, i, smt;
const __be32 *intserv;
u32 thread;
 
@@ -392,6 +392,11 @@ static int dlpar_online_cpu(struct device_node *dn)
 
nthreads = len / sizeof(u32);
 
+   smt = READ_ONCE(pseries_smt);
+   /* We should online at least one thread */
+   if (smt < 1)
+   smt = 1;
+
cpu_maps_update_begin();
for (i = 0; i < nthreads; i++) {
thread = be32_to_cpu(intserv[i]);
@@ -400,10 +405,13 @@ static int dlpar_online_cpu(struct device_node *dn)
continue;
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
-   rc = device_online(get_cpu_device(cpu));
-   if (rc) {
-   dlpar_offline_cpu(dn);
-   goto out;
+   /* Don't active CPU over the current SMT setting */
+   if (smt-- > 0) {
+   rc = device_online(get_cpu_device(cpu));
+   if (rc) {
+   dlpar_offline_cpu(dn);
+   goto out;
+   }
}
cpu_maps_update_begin();
 
diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index 6c382922f8f3..ef8070651846 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -295,7 +295,7 @@ static ssize_t pseries_smt_store(struct class *class,
return -EINVAL;
}
 
-   pseries_smt = smt;
+   WRITE_ONCE(pseries_smt, smt);
 
return count;
 }
-- 
2.40.0



[PATCH 1/2] pseries/smp: export the smt level in the SYS FS.

2023-03-31 Thread Laurent Dufour
There is no SMT level recorded in the kernel neither in user space.
Indeed there is no real constraint about that and mixed SMT levels are
allowed and system is working fine this way.

However when new CPU are added, the kernel is onlining all the threads
which is leading to mixed SMT levels and confuse end user a bit.

To prevent this exports a SMT level from the kernel so user space
application like the energy daemon, could read it to adjust their settings.
There is no action unless recording the value when a SMT value is written
into the new sysfs entry. User space applications like ppc64_cpu should
update the sysfs when changing the SMT level to keep the system consistent.

Suggested-by: Srikar Dronamraju 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/pseries.h |  3 ++
 arch/powerpc/platforms/pseries/smp.c | 39 
 2 files changed, 42 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index f8bce40ebd0c..af0a145af98f 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -23,7 +23,9 @@ extern int pSeries_machine_check_exception(struct pt_regs 
*regs);
 extern long pseries_machine_check_realmode(struct pt_regs *regs);
 void pSeries_machine_check_log_err(void);
 
+
 #ifdef CONFIG_SMP
+extern int pseries_smt;
 extern void smp_init_pseries(void);
 
 /* Get state of physical CPU from query_cpu_stopped */
@@ -34,6 +36,7 @@ int smp_query_cpu_stopped(unsigned int pcpu);
 #define QCSS_HARDWARE_ERROR -1
 #define QCSS_HARDWARE_BUSY -2
 #else
+#define pseries_smt 1
 static inline void smp_init_pseries(void) { }
 #endif
 
diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index c597711ef20a..6c382922f8f3 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -45,6 +46,8 @@
 
 #include "pseries.h"
 
+int pseries_smt;
+
 /*
  * The Primary thread of each non-boot processor was started from the OF client
  * interface by prom_hold_cpus and is spinning on secondary_hold_spinloop.
@@ -280,3 +283,39 @@ void __init smp_init_pseries(void)
 
pr_debug(" <- smp_init_pSeries()\n");
 }
+
+static ssize_t pseries_smt_store(struct class *class,
+struct class_attribute *attr,
+const char *buf, size_t count)
+{
+   int smt;
+
+   if (kstrtou32(buf, 0, ) || !smt || smt > (u32) threads_per_core) {
+   pr_err("Invalid pseries_smt specified.\n");
+   return -EINVAL;
+   }
+
+   pseries_smt = smt;
+
+   return count;
+}
+
+static ssize_t pseries_smt_show(struct class *class, struct class_attribute 
*attr,
+ char *buf)
+{
+   return sysfs_emit(buf, "%d\n", pseries_smt);
+}
+
+static CLASS_ATTR_RW(pseries_smt);
+
+static int __init pseries_smt_init(void)
+{
+   int rc;
+
+   pseries_smt = smt_enabled_at_boot;
+   rc = sysfs_create_file(kernel_kobj, _attr_pseries_smt.attr);
+   if (rc)
+   pr_err("Can't create pseries_smt sysfs/kernel entry.\n");
+   return rc;
+}
+machine_device_initcall(pseries, pseries_smt_init);
-- 
2.40.0



Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-03-31 Thread Laurent Dufour
On 30/03/2023 18:19:38, Michal Suchánek wrote:
> On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote:
>> On 13/02/2023 16:40:50, Nathan Lynch wrote:
>>> Michal Suchánek  writes:
>>>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>>>> Laurent Dufour  writes:
>>>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>>>> for instance.
>>>>>>
>>>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 
>>>>>> threads
>>>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>>>
>>>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>>>> Core   0:0*1*2*3*4 5 6 7
>>>>>> Core   1:8*9*   10*   11*   12*   13*   14*   15*
>>>>>>
>>>>>> There is no SMT value in the kernel. It is possible to run unbalanced 
>>>>>> LPAR
>>>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
> 
>> Indeed, that's not so easy. There are multiple ways for the SMT level to be
>> impacted:
>>  - smt-enabled kernel option
>>  - smtstate systemctl service (if activated), saving SMT level at shutdown
>> time to restore it a boot time
>>  - pseries-energyd daemon (if activated) could turn off threads
>>  - ppc64_cpu --smt=x user command
>>  - sysfs direct writing to turn off/on specific threads.
>>
>> There is no SMT level saved, on "disk" or in the kernel, and any of these
>> options can interact in parallel. So from the user space point of view, the
>> best we could do is looking for the SMT current values, there could be
>> multiple values in the case of a mixed SMT state, peek one value and apply 
>> it.
>>
>> Extending the drmgr's hook is still valid, and I sent a patch series on the
>> powerpc-utils mailing list to achieve that. However, changing the SMT level
>> in that hook means that newly added CPU will be first turn on and there is
>> a window where this threads could be seen active. Not a big deal but not
>> turning on these extra threads looks better to me.
> 
> Which means
> 
> 1) add an option to not onlince hotplugged CPUs by default

After discussing this with Srikar, it happens that exposing the smt-enabled
value set a boot time (or not) in SYS FS and to update it when SMT level is
changed using ppc64_cpu will be better. This will aslo allow the energy
daemon to take this value in account.

> 2) when a tool that wants to manage CPU onlining is active it can set
> the option so that no threads are onlined automatically, and online the
> desired threads

When new CPU are added, the kernel will automatically online the right
number of threads based on that recorded SMT level.

> 
> 3) when no such tool is active the default should be to online all
> threeads to preserve compatibility with existing behavior

I don't think we should keep the existing behavior, customers are confused
and some user space tools like lparstart have difficulties to handled mixed
SMT levels.

I'll submit a new series exposing the SMT level and propose a patch for
ppc64_cpu too.

> 
>> That's being said, I can't see any benefit of a user space implementation
>> compared to the option I'm proposing in that patch.
> 
> The userspace implementation can implement arbitrily complex policy,
> that's not something that belongs into the kernel.
> 
> Thanks
> 
> Michal



Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-03-30 Thread Laurent Dufour
On 13/02/2023 16:40:50, Nathan Lynch wrote:
> Michal Suchánek  writes:
>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>> Laurent Dufour  writes:
>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>> for instance.
>>>>
>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>
>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>> Core   0:0*1*2*3*4 5 6 7
>>>> Core   1:8*9*   10*   11*   12*   13*   14*   15*
>>>>
>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>>>>
>>>> To work around this possibility, and assuming that the LPAR run with the
>>>> same number of threads for each CPU, which is the common case,
>>>
>>> I am skeptical at best of baking that assumption into this code. Mixed
>>> SMT modes within a partition doesn't strike me as an unreasonable
>>> possibility for some use cases. And if that's wrong, then we should just
>>> add a global smt value instead of using heuristics.
>>>
>>>> the number
>>>> of active threads of the CPU doing the hot-plug operation is computed. Only
>>>> that number of threads will be activated for the newly added CPU.
>>>>
>>>> This way on a LPAR running in SMT=4, newly added CPU will be running 4
>>>> threads, which is what a end user would expect.
>>>
>>> I could see why most users would prefer this new behavior. But surely
>>> some users have come to expect the existing behavior, which has been in
>>> place for years, and developed workarounds that might be broken by this
>>> change?
>>>
>>> I would suggest that to handle this well, we need to give user space
>>> more ability to tell the kernel what actions to take on added cores, on
>>> an opt-in basis.
>>>
>>> This could take the form of extending the DLPAR sysfs command set:
>>>
>>> Option 1 - Add a flag that tells the kernel not to online any threads at
>>> all; user space will online the desired threads later.
>>>
>>> Option 2 - Add an option that tells the kernel which SMT mode to apply.
>>
>> powerpc-utils grew some drmgr hooks recently so maybe the policy can be
>> moved to userspace?
> 
> I'm not sure whether the hook mechanism would come into play, but yes, I
> am suggesting that user space be given the option of overriding the
> kernel's current behavior.

Indeed, that's not so easy. There are multiple ways for the SMT level to be
impacted:
 - smt-enabled kernel option
 - smtstate systemctl service (if activated), saving SMT level at shutdown
time to restore it a boot time
 - pseries-energyd daemon (if activated) could turn off threads
 - ppc64_cpu --smt=x user command
 - sysfs direct writing to turn off/on specific threads.

There is no SMT level saved, on "disk" or in the kernel, and any of these
options can interact in parallel. So from the user space point of view, the
best we could do is looking for the SMT current values, there could be
multiple values in the case of a mixed SMT state, peek one value and apply it.

Extending the drmgr's hook is still valid, and I sent a patch series on the
powerpc-utils mailing list to achieve that. However, changing the SMT level
in that hook means that newly added CPU will be first turn on and there is
a window where this threads could be seen active. Not a big deal but not
turning on these extra threads looks better to me.

That's being said, I can't see any benefit of a user space implementation
compared to the option I'm proposing in that patch.

Does anyone have a better idea?

Cheers,
Laurent.


Re: [PATCH v9 3/6] powerpc/crash: add a new member to the kimage_arch struct

2023-03-13 Thread Laurent Dufour
On 12/03/2023 19:11:51, Sourabh Jain wrote:
> A new member "fdt_index" is added to the kimage_arch struct to hold
> the index of the FDT (Flattened Device Tree) segment from the kexec
> the segment array.
> 
> fdt_index will provide direct access to the FDT segment in the kexec
> segment array after the kdump kernel is loaded.
> 
> The new attribute will be used in the arch crash hotplug handler
> (added in upcoming patches) on every CPU and memory hotplug event.
> 
> The fdt_index is populated for both kexec_load and kexec_file_load
> case.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  arch/powerpc/include/asm/kexec.h |  5 +
>  arch/powerpc/kexec/core_64.c | 31 +++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 8090ad7d97d9d..348eb96e8ca67 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -103,6 +103,8 @@ void kexec_copy_flush(struct kimage *image);
>  struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +int machine_kexec_post_load(struct kimage *image);
> +#define machine_kexec_post_load machine_kexec_post_load

Minor comment, when CONFIG_CRASH_HOTPLUG is not set the function is simply
returning 0, why not defining it has an inline in that case?

>  #endif
>  
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> @@ -118,6 +120,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
>  struct kimage_arch {
>   struct crash_mem *exclude_ranges;
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + int fdt_index;
> +#endif
>   unsigned long backup_start;
>   void *backup_buf;
>   void *fdt;
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 0b292f93a74cc..531486c973988 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -77,6 +77,37 @@ int machine_kexec_prepare(struct kimage *image)
>   return 0;
>  }
>  
> +int machine_kexec_post_load(struct kimage *kimage)
> +{
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + int i;
> + void *ptr;
> + unsigned long mem;
> +
> + /* Mark fdt_index invalid */
> + kimage->arch.fdt_index = -1;

Why is that not done in the series introducing the generic
crash hotplug update, in do_kimage_alloc_init() ?
This way there is a guarantee that the field will not be used while set by
default to 0.

> +
> + /* fdt_index remains invalid if it is not a crash kernel load */
> + if (kimage->type != KEXEC_TYPE_CRASH)
> + return 0;
> + /*
> +  * Find the FDT segment index in kexec segment array and
> +  * assign it to kimage's member fdt_index to enable direct
> +  * access to FDT segment later on.
> +  */
> + for (i = 0; i < kimage->nr_segments; i++) {
> + mem = kimage->segment[i].mem;
> + ptr = __va(mem);
> +
> + if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
> + kimage->arch.fdt_index = i;
> + break;
> + }
> + }
> +#endif
> + return 0;
> +}
> +
>  /* Called during kexec sequence with MMU off */
>  static notrace void copy_segments(unsigned long ind)
>  {



Re: [PATCH v9 1/6] powerpc/kexec: turn some static helper functions public

2023-03-13 Thread Laurent Dufour
On 12/03/2023 19:11:49, Sourabh Jain wrote:
> Move update_cpus_node and get_crash_memory_ranges functions from
> kexec/file_load.c to kexec/core_64.c to make these functions usable
file_load_64.c
> by other kexec compoenets.
 components
> 
> Later in the series, these functions are utilized to do in-kernel update to
> kexec segments on CPU/Memory hotplug events for both kexec_load and
> kexec_file_load syscalls.
> 
> No functional change intended.
>

FWIW, despite the 2 minor typos above,

Reviewed-by: Laurent Dufour 

> Signed-off-by: Sourabh Jain 
> ---
>  arch/powerpc/include/asm/kexec.h  |   6 ++
>  arch/powerpc/kexec/core_64.c  | 166 ++
>  arch/powerpc/kexec/file_load_64.c | 162 -
>  3 files changed, 172 insertions(+), 162 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index a1ddba01e7d13..8090ad7d97d9d 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -99,6 +99,12 @@ void relocate_new_kernel(unsigned long indirection_page, 
> unsigned long reboot_co
>  
>  void kexec_copy_flush(struct kimage *image);
>  
> +#ifdef CONFIG_PPC64
> +struct crash_mem;
> +int update_cpus_node(void *fdt);
> +int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +#endif
> +
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
>  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
>  #define crash_free_reserved_phys_range crash_free_reserved_phys_range
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index a79e28c91e2be..0b292f93a74cc 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -30,6 +32,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  int machine_kexec_prepare(struct kimage *image)
>  {
> @@ -377,6 +381,168 @@ void default_machine_kexec(struct kimage *image)
>   /* NOTREACHED */
>  }
>  
> +/**
> + * get_crash_memory_ranges - Get crash memory ranges. This list includes
> + *   first/crashing kernel's memory regions that
> + *   would be exported via an elfcore.
> + * @mem_ranges:  Range list to add the memory ranges to.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +int get_crash_memory_ranges(struct crash_mem **mem_ranges)
> +{
> + phys_addr_t base, end;
> + struct crash_mem *tmem;
> + u64 i;
> + int ret;
> +
> + for_each_mem_range(i, , ) {
> + u64 size = end - base;
> +
> + /* Skip backup memory region, which needs a separate entry */
> + if (base == BACKUP_SRC_START) {
> + if (size > BACKUP_SRC_SIZE) {
> + base = BACKUP_SRC_END + 1;
> + size -= BACKUP_SRC_SIZE;
> + } else
> + continue;
> + }
> +
> + ret = add_mem_range(mem_ranges, base, size);
> + if (ret)
> + goto out;
> +
> + /* Try merging adjacent ranges before reallocation attempt */
> + if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
> + sort_memory_ranges(*mem_ranges, true);
> + }
> +
> + /* Reallocate memory ranges if there is no space to split ranges */
> + tmem = *mem_ranges;
> + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
> + tmem = realloc_mem_ranges(mem_ranges);
> + if (!tmem)
> + goto out;
> + }
> +
> + /* Exclude crashkernel region */
> + ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
> + if (ret)
> + goto out;
> +
> + /*
> +  * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
> +  *regions are exported to save their context at the time of
> +  *crash, they should actually be backed up just like the
> +  *first 64K bytes of memory.
> +  */
> + ret = add_rtas_mem_range(mem_ranges);
> + if (ret)
> + goto out;
> +
> + ret = add_opal_mem_range(mem_ranges);
> + if (ret)
> + goto out;
> +
> + /* create a separate program header for the backup region */
> + ret = add_mem_range(mem_ranges, BACKUP_SRC_START, B

[PATCH] powerpc/mm: fix mmap_lock bad unlock

2023-03-06 Thread Laurent Dufour
When page fault is tried holding the per VMA lock, bad_access_pkey() and
bad_access() should not be called because it is assuming the mmap_lock is
held.
In the case a bad access is detected, fall back to the default path,
grabbing the mmap_lock to handle the fault and report the error.

Fixes: 169db3bb4609 ("powerc/mm: try VMA lock-based page fault handling first")
Reported-by: Sachin Sant 
Link: 
https://lore.kernel.org/linux-mm/842502fb-f99c-417c-9648-a37d0ecdc...@linux.ibm.com
Cc: Suren Baghdasaryan 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/mm/fault.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c7ae86b04b8a..e191b3ebd8d6 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -479,17 +479,13 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 
if (unlikely(access_pkey_error(is_write, is_exec,
   (error_code & DSISR_KEYFAULT), vma))) {
-   int rc = bad_access_pkey(regs, address, vma);
-
vma_end_read(vma);
-   return rc;
+   goto lock_mmap;
}
 
if (unlikely(access_error(is_write, is_exec, vma))) {
-   int rc = bad_access(regs, address);
-
vma_end_read(vma);
-   return rc;
+   goto lock_mmap;
}
 
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-- 
2.39.2



Re: [PATCH] powerpc/mm: fix mmap_lock bad unlock

2023-03-06 Thread Laurent Dufour
On 06/03/2023 15:07:26, David Hildenbrand wrote:
> On 06.03.23 14:55, Laurent Dufour wrote:
>> When page fault is tried holding the per VMA lock, bad_access_pkey() and
>> bad_access() should not be called because it is assuming the mmap_lock is
>> held.
>> In the case a bad access is detected, fall back to the default path,
>> grabbing the mmap_lock to handle the fault and report the error.
>>
>> Fixes: 169db3bb4609 ("powerc/mm: try VMA lock-based page fault handling
>> first")
>> Reported-by: Sachin Sant 
>> Link:
>> https://lore.kernel.org/linux-mm/842502fb-f99c-417c-9648-a37d0ecdc...@linux.ibm.com
>> Cc: Suren Baghdasaryan 
>> Signed-off-by: Laurent Dufour 
>> ---
>>   arch/powerpc/mm/fault.c | 8 ++--
>>   1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index c7ae86b04b8a..e191b3ebd8d6 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -479,17 +479,13 @@ static int ___do_page_fault(struct pt_regs *regs,
>> unsigned long address,
>>     if (unlikely(access_pkey_error(is_write, is_exec,
>>  (error_code & DSISR_KEYFAULT), vma))) {
>> -    int rc = bad_access_pkey(regs, address, vma);
>> -
>>   vma_end_read(vma);
>> -    return rc;
>> +    goto lock_mmap;
>>   }
>>     if (unlikely(access_error(is_write, is_exec, vma))) {
>> -    int rc = bad_access(regs, address);
>> -
>>   vma_end_read(vma);
>> -    return rc;
>> +    goto lock_mmap;
>>   }
>>     fault = handle_mm_fault(vma, address, flags |
>> FAULT_FLAG_VMA_LOCK, regs);
> 
> IIUC, that commit is neither upstream not in mm-stable -- it's unstable.
> Maybe raise that as a review comment in reply to the original patch, so we
> can easily connect the dots and squash it into the original, problematic
> patch that is still under review.
> 
Oh yes, I missed that. I'll reply to the Suren's thread.

Thanks,
Laurent.


[PATCH] powerpc/mm: fix mmap_lock bad unlock

2023-03-06 Thread Laurent Dufour
When page fault is tried holding the per VMA lock, bad_access_pkey() and
bad_access() should not be called because it is assuming the mmap_lock is
held.
In the case a bad access is detected, fall back to the default path,
grabbing the mmap_lock to handle the fault and report the error.

Fixes: 169db3bb4609 ("powerc/mm: try VMA lock-based page fault handling first")
Reported-by: Sachin Sant 
Link: 
https://lore.kernel.org/linux-mm/842502fb-f99c-417c-9648-a37d0ecdc...@linux.ibm.com
Cc: Suren Baghdasaryan 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/mm/fault.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c7ae86b04b8a..e191b3ebd8d6 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -479,17 +479,13 @@ static int ___do_page_fault(struct pt_regs *regs, 
unsigned long address,
 
if (unlikely(access_pkey_error(is_write, is_exec,
   (error_code & DSISR_KEYFAULT), vma))) {
-   int rc = bad_access_pkey(regs, address, vma);
-
vma_end_read(vma);
-   return rc;
+   goto lock_mmap;
}
 
if (unlikely(access_error(is_write, is_exec, vma))) {
-   int rc = bad_access(regs, address);
-
vma_end_read(vma);
-   return rc;
+   goto lock_mmap;
}
 
fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
-- 
2.39.2



Re: Selftests powerpc/primitives test hangs (linux-next)

2023-03-06 Thread Laurent Dufour
On 03/03/2023 10:19:29, Sachin Sant wrote:
> While running powerpc/primitives selftests, the test (load_unaligned_zeropad)
> hangs indefinitely. This behaviour is seen with linux-next 6.2.0-next-20230303
> on a Power10 logical partition.
> 
> Git bisect points to following commit
> 
> commit 169db3bb460903443e25ac9c0737da45d6bb5402
>powerc/mm: try VMA lock-based page fault handling first
> 
> - Sachin
> 

My mistake, bad_access_pkey() and bad_access() are releasing the mmap_lock.
Writing a fix...


Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-02-14 Thread Laurent Dufour
On 13/02/2023 16:40:50, Nathan Lynch wrote:
> Michal Suchánek  writes:
>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>> Laurent Dufour  writes:
>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>> for instance.
>>>>
>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>
>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>> Core   0:0*1*2*3*4 5 6 7
>>>> Core   1:8*9*   10*   11*   12*   13*   14*   15*
>>>>
>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>>>>
>>>> To work around this possibility, and assuming that the LPAR run with the
>>>> same number of threads for each CPU, which is the common case,
>>>
>>> I am skeptical at best of baking that assumption into this code. Mixed
>>> SMT modes within a partition doesn't strike me as an unreasonable
>>> possibility for some use cases. And if that's wrong, then we should just
>>> add a global smt value instead of using heuristics.
>>>
>>>> the number
>>>> of active threads of the CPU doing the hot-plug operation is computed. Only
>>>> that number of threads will be activated for the newly added CPU.
>>>>
>>>> This way on a LPAR running in SMT=4, newly added CPU will be running 4
>>>> threads, which is what a end user would expect.
>>>
>>> I could see why most users would prefer this new behavior. But surely
>>> some users have come to expect the existing behavior, which has been in
>>> place for years, and developed workarounds that might be broken by this
>>> change?
>>>
>>> I would suggest that to handle this well, we need to give user space
>>> more ability to tell the kernel what actions to take on added cores, on
>>> an opt-in basis.
>>>
>>> This could take the form of extending the DLPAR sysfs command set:
>>>
>>> Option 1 - Add a flag that tells the kernel not to online any threads at
>>> all; user space will online the desired threads later.
>>>
>>> Option 2 - Add an option that tells the kernel which SMT mode to apply.
>>
>> powerpc-utils grew some drmgr hooks recently so maybe the policy can be
>> moved to userspace?
> 
> I'm not sure whether the hook mechanism would come into play, but yes, I
> am suggesting that user space be given the option of overriding the
> kernel's current behavior.

I agree, sounds doable using the new drmgr hook mechanism.


[PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-02-13 Thread Laurent Dufour
When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core   0:0*1*2*3*4 5 6 7
Core   1:8*9*   10*   11*   12*   13*   14*   15*

There is no SMT value in the kernel. It is possible to run unbalanced LPAR
with 2 threads for a CPU, 4 for another one, and 5 on the latest.

To work around this possibility, and assuming that the LPAR run with the
same number of threads for each CPU, which is the common case, the number
of active threads of the CPU doing the hot-plug operation is computed. Only
that number of threads will be activated for the newly added CPU.

This way on a LPAR running in SMT=4, newly added CPU will be running 4
threads, which is what a end user would expect.

Cc: Srikar Dronamraju 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 24 
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 090ae5a1e0f5..58a7c97fc475 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn)
 {
int rc = 0;
unsigned int cpu;
-   int len, nthreads, i;
+   int len, nthreads, i, smt;
const __be32 *intserv;
u32 thread;
 
@@ -392,6 +392,17 @@ static int dlpar_online_cpu(struct device_node *dn)
 
nthreads = len / sizeof(u32);
 
+   /*
+* Compute the number of active threads for the current CPU, assuming
+* the system is homogeus, we don't want to active more threads than the
+* current SMT setting.
+*/
+   for (cpu = cpu_first_thread_sibling(raw_smp_processor_id()), smt = 0;
+cpu <= cpu_last_thread_sibling(raw_smp_processor_id()); cpu++) {
+   if (cpu_online(cpu))
+   smt++;
+   }
+
cpu_maps_update_begin();
for (i = 0; i < nthreads; i++) {
thread = be32_to_cpu(intserv[i]);
@@ -400,10 +411,13 @@ static int dlpar_online_cpu(struct device_node *dn)
continue;
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
-   rc = device_online(get_cpu_device(cpu));
-   if (rc) {
-   dlpar_offline_cpu(dn);
-   goto out;
+   /* Don't active CPU over the current SMT setting */
+   if (smt-- > 0) {
+   rc = device_online(get_cpu_device(cpu));
+   if (rc) {
+   dlpar_offline_cpu(dn);
+   goto out;
+   }
}
cpu_maps_update_begin();
 
-- 
2.39.1



Re: [PATCH v2 3/4] powerpc/rtas: remove lock and args fields from global rtas struct

2023-01-24 Thread Laurent Dufour
On 24/01/2023 15:04:47, Nathan Lynch wrote:
> Only code internal to the RTAS subsystem needs access to the central
> lock and parameter block. Remove these from the globally visible
> 'rtas' struct and make them file-static in rtas.c.
> 
> Some changed lines in rtas_call() lack appropriate spacing around
> operators and cause checkpatch errors; fix these as well.

Reviewed-by: Laurent Dufour 

> 
> Suggested-by: Laurent Dufour 
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/include/asm/rtas-types.h |  2 --
>  arch/powerpc/kernel/rtas.c| 50 ---
>  2 files changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas-types.h 
> b/arch/powerpc/include/asm/rtas-types.h
> index 8df6235d64d1..f2ad4a96cbc5 100644
> --- a/arch/powerpc/include/asm/rtas-types.h
> +++ b/arch/powerpc/include/asm/rtas-types.h
> @@ -18,8 +18,6 @@ struct rtas_t {
>   unsigned long entry;/* physical address pointer */
>   unsigned long base; /* physical address pointer */
>   unsigned long size;
> - arch_spinlock_t lock;
> - struct rtas_args args;
>   struct device_node *dev;/* virtual address pointer */
>  };
>  
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index e60e2f5af7b9..0059bb2a8f04 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -60,9 +60,17 @@ static inline void do_enter_rtas(unsigned long args)
>   srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
>  }
>  
> -struct rtas_t rtas = {
> - .lock = __ARCH_SPIN_LOCK_UNLOCKED
> -};
> +struct rtas_t rtas;
> +
> +/*
> + * Nearly all RTAS calls need to be serialized. All uses of the
> + * default rtas_args block must hold rtas_lock.
> + *
> + * Exceptions to the RTAS serialization requirement (e.g. stop-self)
> + * must use a separate rtas_args structure.
> + */
> +static arch_spinlock_t rtas_lock = __ARCH_SPIN_LOCK_UNLOCKED;
> +static struct rtas_args rtas_args;
>  
>  DEFINE_SPINLOCK(rtas_data_buf_lock);
>  EXPORT_SYMBOL_GPL(rtas_data_buf_lock);
> @@ -90,13 +98,13 @@ static unsigned long lock_rtas(void)
>  
>   local_irq_save(flags);
>   preempt_disable();
> - arch_spin_lock();
> + arch_spin_lock(_lock);
>   return flags;
>  }
>  
>  static void unlock_rtas(unsigned long flags)
>  {
> - arch_spin_unlock();
> + arch_spin_unlock(_lock);
>   local_irq_restore(flags);
>   preempt_enable();
>  }
> @@ -114,7 +122,7 @@ static void call_rtas_display_status(unsigned char c)
>   return;
>  
>   s = lock_rtas();
> - rtas_call_unlocked(, 10, 1, 1, NULL, c);
> + rtas_call_unlocked(_args, 10, 1, 1, NULL, c);
>   unlock_rtas(s);
>  }
>  
> @@ -386,7 +394,7 @@ static int rtas_last_error_token;
>   *  most recent failed call to rtas.  Because the error text
>   *  might go stale if there are any other intervening rtas calls,
>   *  this routine must be called atomically with whatever produced
> - *  the error (i.e. with rtas.lock still held from the previous call).
> + *  the error (i.e. with rtas_lock still held from the previous call).
>   */
>  static char *__fetch_rtas_last_error(char *altbuf)
>  {
> @@ -406,13 +414,13 @@ static char *__fetch_rtas_last_error(char *altbuf)
>   err_args.args[1] = cpu_to_be32(bufsz);
>   err_args.args[2] = 0;
>  
> - save_args = rtas.args;
> - rtas.args = err_args;
> + save_args = rtas_args;
> + rtas_args = err_args;
>  
> - do_enter_rtas(__pa());
> + do_enter_rtas(__pa(_args));
>  
> - err_args = rtas.args;
> - rtas.args = save_args;
> + err_args = rtas_args;
> + rtas_args = save_args;
>  
>   /* Log the error in the unlikely case that there was one. */
>   if (unlikely(err_args.args[2] == 0)) {
> @@ -534,7 +542,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>   va_list list;
>   int i;
>   unsigned long s;
> - struct rtas_args *rtas_args;
> + struct rtas_args *args;
>   char *buff_copy = NULL;
>   int ret;
>  
> @@ -559,21 +567,21 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>   s = lock_rtas();
>  
>   /* We use the global rtas args buffer */
> - rtas_args = 
> + args = _args;
>  
>   va_start(list, outputs);
> - va_rtas_call_unlocked(rtas_args, token, nargs, nret, list);
> + va_rtas_call_unlocked(args, token, nargs, nret, list);
>   va_end(list);
>  
>   /* A -1 return code indicates that the last command couldn't
>  be completed due to a

Re: [PATCH v2 2/4] powerpc/rtas: make all exports GPL

2023-01-24 Thread Laurent Dufour
On 24/01/2023 15:04:46, Nathan Lynch wrote:
> The first symbol exports of RTAS functions and data came with the (now
> removed) scanlog driver in 2003:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb
> 
> At the time this was applied, EXPORT_SYMBOL_GPL() was very new, and
> the exports of rtas_call() etc have remained non-GPL. As new APIs have
> been added to the RTAS subsystem, their symbol exports have followed
> the convention set by existing code.
> 
> However, the historical evidence is that RTAS function exports have
> been added over time only to satisfy the needs of in-kernel users, and
> these clients must have fairly intimate knowledge of how the APIs work
> to use them safely. No out of tree users are known, and future ones
> seem unlikely.
> 
> Arguably the default for RTAS symbols should have become
> EXPORT_SYMBOL_GPL once it was available. Let's make it so now, and
> exceptions can be evaluated as needed.

I also think this is unlikely to happen. But in the case a non GPL driver
needs one of this symbol, I guess it will be hard to move backward once it
is upstream. Crossing fingers!

Reviewed-by: Laurent Dufour 

> 
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 6c5716b19d69..e60e2f5af7b9 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -65,10 +65,10 @@ struct rtas_t rtas = {
>  };
>  
>  DEFINE_SPINLOCK(rtas_data_buf_lock);
> -EXPORT_SYMBOL(rtas_data_buf_lock);
> +EXPORT_SYMBOL_GPL(rtas_data_buf_lock);
>  
>  char rtas_data_buf[RTAS_DATA_BUF_SIZE] __cacheline_aligned;
> -EXPORT_SYMBOL(rtas_data_buf);
> +EXPORT_SYMBOL_GPL(rtas_data_buf);
>  
>  unsigned long rtas_rmo_buf;
>  
> @@ -77,7 +77,7 @@ unsigned long rtas_rmo_buf;
>   * This is done like this so rtas_flash can be a module.
>   */
>  void (*rtas_flash_term_hook)(int);
> -EXPORT_SYMBOL(rtas_flash_term_hook);
> +EXPORT_SYMBOL_GPL(rtas_flash_term_hook);
>  
>  /* RTAS use home made raw locking instead of spin_lock_irqsave
>   * because those can be called from within really nasty contexts
> @@ -325,7 +325,7 @@ void rtas_progress(char *s, unsigned short hex)
>   
>   spin_unlock(_lock);
>  }
> -EXPORT_SYMBOL(rtas_progress);/* needed by rtas_flash module 
> */
> +EXPORT_SYMBOL_GPL(rtas_progress);/* needed by rtas_flash module 
> */
>  
>  int rtas_token(const char *service)
>  {
> @@ -335,13 +335,13 @@ int rtas_token(const char *service)
>   tokp = of_get_property(rtas.dev, service, NULL);
>   return tokp ? be32_to_cpu(*tokp) : RTAS_UNKNOWN_SERVICE;
>  }
> -EXPORT_SYMBOL(rtas_token);
> +EXPORT_SYMBOL_GPL(rtas_token);
>  
>  int rtas_service_present(const char *service)
>  {
>   return rtas_token(service) != RTAS_UNKNOWN_SERVICE;
>  }
> -EXPORT_SYMBOL(rtas_service_present);
> +EXPORT_SYMBOL_GPL(rtas_service_present);
>  
>  #ifdef CONFIG_RTAS_ERROR_LOGGING
>  
> @@ -356,7 +356,7 @@ int rtas_get_error_log_max(void)
>  {
>   return rtas_error_log_max;
>  }
> -EXPORT_SYMBOL(rtas_get_error_log_max);
> +EXPORT_SYMBOL_GPL(rtas_get_error_log_max);
>  
>  static void __init init_error_log_max(void)
>  {
> @@ -584,7 +584,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>   }
>   return ret;
>  }
> -EXPORT_SYMBOL(rtas_call);
> +EXPORT_SYMBOL_GPL(rtas_call);
>  
>  /**
>   * rtas_busy_delay_time() - From an RTAS status value, calculate the
> @@ -622,7 +622,7 @@ unsigned int rtas_busy_delay_time(int status)
>  
>   return ms;
>  }
> -EXPORT_SYMBOL(rtas_busy_delay_time);
> +EXPORT_SYMBOL_GPL(rtas_busy_delay_time);
>  
>  /**
>   * rtas_busy_delay() - helper for RTAS busy and extended delay statuses
> @@ -696,7 +696,7 @@ bool rtas_busy_delay(int status)
>  
>   return ret;
>  }
> -EXPORT_SYMBOL(rtas_busy_delay);
> +EXPORT_SYMBOL_GPL(rtas_busy_delay);
>  
>  static int rtas_error_rc(int rtas_rc)
>  {
> @@ -741,7 +741,7 @@ int rtas_get_power_level(int powerdomain, int *level)
>   return rtas_error_rc(rc);
>   return rc;
>  }
> -EXPORT_SYMBOL(rtas_get_power_level);
> +EXPORT_SYMBOL_GPL(rtas_get_power_level);
>  
>  int rtas_set_power_level(int powerdomain, int level, int *setlevel)
>  {
> @@ -759,7 +759,7 @@ int rtas_set_power_level(int powerdomain, int level, int 
> *setlevel)
>   return rtas_error_rc(rc);
>   return rc;
>  }
> -EXPORT_SYMBOL(rtas_set_power_leve

Re: [PATCH v7 4/8] crash: add phdr for possible CPUs in elfcorehdr

2023-01-20 Thread Laurent Dufour
On 19/01/2023 19:29:52, Laurent Dufour wrote:
> On 15/01/2023 16:02:02, Sourabh Jain wrote:
>> On architectures like PowerPC the crash notes are available for all
>> possible CPUs. So let's populate the elfcorehdr for all possible
>> CPUs having crash notes to avoid updating elfcorehdr during in-kernel
>> crash update on CPU hotplug events.
>>
>> The similar technique is used in kexec-tool for kexec_load case.
>>
>> Signed-off-by: Sourabh Jain 
>> ---
>>  kernel/crash_core.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> This patch is not applying on ppc/next (53ab112a9508).
> 
> As far as I could see, crash_prepare_elf64_headers() is defined in the file
> kernel/kexec_file.c and that's not recent, see babac4a84a88 (kexec_file,
> x86: move re-factored code to generic side, 2018-04-13)
> 
> Am I missing something?

My mistake, sounds that your series is based on top of the Eric's one (not yet 
upstream):

https://lore.kernel.org/lkml/20230118213544.2128-1-eric.devol...@oracle.com/

> 
>>
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 910d377ea317e..19f987b3851e8 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct kimage *image, 
>> struct crash_mem *mem,
>>  ehdr->e_ehsize = sizeof(Elf64_Ehdr);
>>  ehdr->e_phentsize = sizeof(Elf64_Phdr);
>>  
>> -/* Prepare one phdr of type PT_NOTE for each present CPU */
>> -for_each_present_cpu(cpu) {
>> +/* Prepare one phdr of type PT_NOTE for possible CPU with crash note. */
>> +for_each_possible_cpu(cpu) {
>>  #ifdef CONFIG_CRASH_HOTPLUG
>>  if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>>  /* Skip the soon-to-be offlined cpu */
>> @@ -373,8 +373,11 @@ int crash_prepare_elf64_headers(struct kimage *image, 
>> struct crash_mem *mem,
>>  continue;
>>  }
>>  #endif
>> -phdr->p_type = PT_NOTE;
>>  notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
>> +if (!notes_addr)
>> +continue;
>> +
>> +phdr->p_type = PT_NOTE;
>>  phdr->p_offset = phdr->p_paddr = notes_addr;
>>  phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
>>  (ehdr->e_phnum)++;
> 



Re: [PATCH v7 3/8] powerpc/crash: update kimage_arch struct

2023-01-19 Thread Laurent Dufour
On 15/01/2023 16:02:01, Sourabh Jain wrote:
> Add a new member "fdt_index" to kimage_arch struct to hold the index of
> the FDT (Flattened Device Tree) segment in the kexec segment array.
> 
> Having direct access to FDT segment will help arch crash hotplug handler
> to avoid looping kexec segment array to identify the FDT segment index
> for every FDT update on hotplug events.
> 
> The fdt_index is initialized during the kexec load for both kexec_load and
> kexec_file_load system call.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  arch/powerpc/include/asm/kexec.h  |  7 +++
>  arch/powerpc/kexec/core_64.c  | 27 +++
>  arch/powerpc/kexec/elf_64.c   |  6 ++
>  arch/powerpc/kexec/file_load_64.c |  5 +
>  4 files changed, 45 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 8090ad7d97d9d..5a322c1737661 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -103,6 +103,10 @@ void kexec_copy_flush(struct kimage *image);
>  struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +int machine_kexec_post_load(struct kimage *image);
> +#define machine_kexec_post_load machine_kexec_post_load
> +#endif
>  #endif
>  
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> @@ -118,6 +122,9 @@ extern const struct kexec_file_ops kexec_elf64_ops;
>  struct kimage_arch {
>   struct crash_mem *exclude_ranges;
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + int fdt_index;
> +#endif
>   unsigned long backup_start;
>   void *backup_buf;
>   void *fdt;
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 0b292f93a74cc..3d4fe1aa6f761 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -77,6 +77,33 @@ int machine_kexec_prepare(struct kimage *image)
>   return 0;
>  }
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)

I think you should add a small function header describing that this
function is recording the index of the FDT segment for later use.

> +int machine_kexec_post_load(struct kimage *kimage)
> +{
> + int i;
> + void *ptr;
> + unsigned long mem;
> +
> + /* Mark fdt_index invalid */
> + kimage->arch.fdt_index = -1;

Is that really needed?
This is already done in arch_kexec_kernel_image_probe() called before this
function, isn't it?

> +
> + if (kimage->type != KEXEC_TYPE_CRASH)
> + return 0;
> +
> + for (i = 0; i < kimage->nr_segments; i++) {
> + mem = kimage->segment[i].mem;
> + ptr = __va(mem);
> +
> + if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
> + kimage->arch.fdt_index = i;
> + break;
> + }
> + }
> +
> + return 0;
> +}
> +#endif
> +
>  /* Called during kexec sequence with MMU off */
>  static notrace void copy_segments(unsigned long ind)
>  {
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index eeb258002d1e0..2a17f171661f1 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -123,6 +123,12 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = true;
>   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> +
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + image->arch.fdt_index = image->nr_segments;

I'm sorry, I'm not familliar with that code, could you explain why
fdt_index has to be assigned here, and to that value?

> +#endif
> + kbuf.memsz = fdt_totalsize(fdt);
> +
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out_free_fdt;
> diff --git a/arch/powerpc/kexec/file_load_64.c 
> b/arch/powerpc/kexec/file_load_64.c
> index 9bc70b4d8eafc..725f74d1b928c 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -1153,6 +1153,11 @@ int arch_kexec_kernel_image_probe(struct kimage 
> *image, void *buf,
>   return ret;
>   }
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> + /* Mark fdt_index invalid */
> + image->arch.fdt_index = -1;
> +#endif
> +
>   return kexec_image_probe_default(image, buf, buf_len);
>  }
>  



Re: [PATCH v7 4/8] crash: add phdr for possible CPUs in elfcorehdr

2023-01-19 Thread Laurent Dufour
On 15/01/2023 16:02:02, Sourabh Jain wrote:
> On architectures like PowerPC the crash notes are available for all
> possible CPUs. So let's populate the elfcorehdr for all possible
> CPUs having crash notes to avoid updating elfcorehdr during in-kernel
> crash update on CPU hotplug events.
> 
> The similar technique is used in kexec-tool for kexec_load case.
> 
> Signed-off-by: Sourabh Jain 
> ---
>  kernel/crash_core.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

This patch is not applying on ppc/next (53ab112a9508).

As far as I could see, crash_prepare_elf64_headers() is defined in the file
kernel/kexec_file.c and that's not recent, see babac4a84a88 (kexec_file,
x86: move re-factored code to generic side, 2018-04-13)

Am I missing something?

> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 910d377ea317e..19f987b3851e8 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct kimage *image, 
> struct crash_mem *mem,
>   ehdr->e_ehsize = sizeof(Elf64_Ehdr);
>   ehdr->e_phentsize = sizeof(Elf64_Phdr);
>  
> - /* Prepare one phdr of type PT_NOTE for each present CPU */
> - for_each_present_cpu(cpu) {
> + /* Prepare one phdr of type PT_NOTE for possible CPU with crash note. */
> + for_each_possible_cpu(cpu) {
>  #ifdef CONFIG_CRASH_HOTPLUG
>   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>   /* Skip the soon-to-be offlined cpu */
> @@ -373,8 +373,11 @@ int crash_prepare_elf64_headers(struct kimage *image, 
> struct crash_mem *mem,
>   continue;
>   }
>  #endif
> - phdr->p_type = PT_NOTE;
>   notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
> + if (!notes_addr)
> + continue;
> +
> + phdr->p_type = PT_NOTE;
>   phdr->p_offset = phdr->p_paddr = notes_addr;
>   phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
>   (ehdr->e_phnum)++;



Re: [PATCH] powerpc/rtas: upgrade internal arch spinlocks

2023-01-12 Thread Laurent Dufour
On 10/01/2023 05:42:55, Nathan Lynch wrote:
> At the time commit f97bb36f705d ("powerpc/rtas: Turn rtas lock into a
> raw spinlock") was written, the spinlock lockup detection code called
> __delay(), which will not make progress if the timebase is not
> advancing. Since the interprocessor timebase synchronization sequence
> for chrp, cell, and some now-unsupported Power models can temporarily
> freeze the timebase through an RTAS function (freeze-time-base), the
> lock that serializes most RTAS calls was converted to arch_spinlock_t
> to prevent kernel hangs in the lockup detection code.
> 
> However, commit bc88c10d7e69 ("locking/spinlock/debug: Remove spinlock
> lockup detection code") removed that inconvenient property from the
> lock debug code several years ago. So now it should be safe to
> reintroduce generic locks into the RTAS support code, primarily to
> increase lockdep coverage.
> 
> Making rtas.lock a spinlock_t would violate lock type nesting rules
> because it can be acquired while holding raw locks, e.g. pci_lock and
> irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent
> reason not to upgrade timebase_lock as well.
> 
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/include/asm/rtas-types.h |  2 +-
>  arch/powerpc/kernel/rtas.c| 52 ---
>  2 files changed, 15 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas-types.h 
> b/arch/powerpc/include/asm/rtas-types.h
> index 8df6235d64d1..a58f96eb2d19 100644
> --- a/arch/powerpc/include/asm/rtas-types.h
> +++ b/arch/powerpc/include/asm/rtas-types.h
> @@ -18,7 +18,7 @@ struct rtas_t {
>   unsigned long entry;/* physical address pointer */
>   unsigned long base; /* physical address pointer */
>   unsigned long size;
> - arch_spinlock_t lock;
> + raw_spinlock_t lock;
>   struct rtas_args args;
>   struct device_node *dev;/* virtual address pointer */
>  };
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index deded51a7978..a834726f18e3 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -61,7 +61,7 @@ static inline void do_enter_rtas(unsigned long args)
>  }
>  
>  struct rtas_t rtas = {
> - .lock = __ARCH_SPIN_LOCK_UNLOCKED
> + .lock = __RAW_SPIN_LOCK_UNLOCKED(rtas.lock),
>  };
>  EXPORT_SYMBOL(rtas);

This is not the scope of this patch, but the RTAS's lock is externalized
through the structure rtas_t, while it is only used in that file.

I think, this would be good, in case of future change about that lock, and
in order to not break KABI, to move it out of that structure, and to define
it statically in that file.

Otherwise, looks good to me.

Reviewed-by: Laurent Dufour 

>  
> @@ -80,28 +80,6 @@ unsigned long rtas_rmo_buf;
>  void (*rtas_flash_term_hook)(int);
>  EXPORT_SYMBOL(rtas_flash_term_hook);
>  
> -/* RTAS use home made raw locking instead of spin_lock_irqsave
> - * because those can be called from within really nasty contexts
> - * such as having the timebase stopped which would lockup with
> - * normal locks and spinlock debugging enabled
> - */
> -static unsigned long lock_rtas(void)
> -{
> - unsigned long flags;
> -
> - local_irq_save(flags);
> - preempt_disable();
> - arch_spin_lock();
> - return flags;
> -}
> -
> -static void unlock_rtas(unsigned long flags)
> -{
> - arch_spin_unlock();
> - local_irq_restore(flags);
> - preempt_enable();
> -}
> ->  /*
>   * call_rtas_display_status and call_rtas_display_status_delay
>   * are designed only for very early low-level debugging, which
> @@ -109,14 +87,14 @@ static void unlock_rtas(unsigned long flags)
>   */
>  static void call_rtas_display_status(unsigned char c)
>  {
> - unsigned long s;
> + unsigned long flags;
>  
>   if (!rtas.base)
>   return;
>  
> - s = lock_rtas();
> + raw_spin_lock_irqsave(, flags);
>   rtas_call_unlocked(, 10, 1, 1, NULL, c);
> - unlock_rtas(s);
> + raw_spin_unlock_irqrestore(, flags);
>  }
>  
>  static void call_rtas_display_status_delay(char c)
> @@ -534,7 +512,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>  {
>   va_list list;
>   int i;
> - unsigned long s;
> + unsigned long flags;
>   struct rtas_args *rtas_args;
>   char *buff_copy = NULL;
>   int ret;
> @@ -557,8 +535,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>   return -1;
>   }
>  
> - s = lock_rtas();
> -
> + raw_spin_lock_irqsave(,

[PATCH] pseries/mobility: reset the RCU watchdogs after a LPM

2022-11-25 Thread Laurent Dufour
The RCU watchdog timer should be reset when restarting the CPU after a Live
Partition Mobility operation.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/mobility.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 634fac5db3f9..9e10f38dd9ad 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -636,8 +636,10 @@ static int do_join(void *arg)
}
/*
 * Execution may have been suspended for several seconds, so
-* reset the watchdog.
+* reset the watchdogs.
 */
+   rcu_cpu_stall_reset();
+   /* touch_nmi_watchdog() also touch the soft lockup watchdog */
touch_nmi_watchdog();
return ret;
 }
-- 
2.38.1



[PATCH] powerpc/pseries: unregister VPA when hot unplugging a CPU

2022-11-14 Thread Laurent Dufour
The VPA should unregister when offlining a CPU. Otherwise there could be a
short window where 2 CPUs could share the same VPA.

This happens because the hypervisor is still keeping the VPA attached to
the vCPU even if it became offline.

Here is a potential situation:
 1. remove proc A,
 2. add proc B. If proc B gets proc A's place in cpu_present_map, then it
registers proc A's VPAs.
 3. If proc B is then re-added to the LP, its threads are sharing VPAs with
proc A briefly as they come online.

As the hypervisor may check for the VPA's yield_count field oddity, it may
detects an unexpected value and kill the LPAR.

Suggested-by: Nathan Lynch 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index e0a7ac5db15d..090ae5a1e0f5 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -70,6 +70,7 @@ static void pseries_cpu_offline_self(void)
xics_teardown_cpu();
 
unregister_slb_shadow(hwcpu);
+   unregister_vpa(hwcpu);
rtas_stop_self();
 
/* Should never get here... */
-- 
2.38.1



[PATCH v2 0/2] Consider the size of the added CPU nodes in the kexec FDT

2022-11-10 Thread Laurent Dufour
When adding CPUs to an already big system (test show it seems to start with
more than 256 CPUs), the kernel is showing error messages when building the
FDT for the kexec kernel (kdump or kexec).

It's worth to mention that the kdump kernel is reloaded after a CPU add
operation.

The messages look like (property's name may vary):
10175.025675] Unable to add 32-64-bridge property: FDT_ERR_NOSPACE

This happens because the size of the FDT is computed based on the size of
the FDT the kernel received at boot time. There is additional space added
in kexec_extra_fdt_size_ppc64() for the added memory but nothing is done
for the added CPUs.

This patch adds this feature so adding new CPUs will increase the size of
the FDT for the kexec kernel.

To compute the additional size required, the number of CPU nodes of the
initial FDT (the one the kernel receive at boot time) are recorded. When a
kexec FDT is created, the number of CPU nodes in the current FDT is used to
compute the additional size.

The first patch of this series is creating a variable provided by the boot
code when parsing the initial FDT at boot time.
The second patch is computing the required additional space.

This has been tested on a PowerVM LPAR running with than 256 CPUs in shared
mode, adding 320 CPUs to this LPAR.

Changes in v2:
 - Fix build issue, moving definition in prom.h

Laurent Dufour (2):
  powerpc: export the CPU node count
  powerpc: Take in account addition CPU node when building kexec FDT

 arch/powerpc/include/asm/prom.h   |  1 +
 arch/powerpc/kernel/prom.c|  3 ++
 arch/powerpc/kexec/file_load_64.c | 60 ++-
 3 files changed, 63 insertions(+), 1 deletion(-)

-- 
2.38.1



[PATCH v2 2/2] powerpc: Take in account addition CPU node when building kexec FDT

2022-11-10 Thread Laurent Dufour
On a system with a large number of CPUs, the creation of the FDT for a
kexec kernel may fail because the allocated FDT is not large enough.

When this happens, such a message is displayed on the console:

Unable to add ibm,processor-vadd-size property: FDT_ERR_NOSPACE

The property's name may change depending when the buffer overwrite is
detected.

Obviously the created FDT is missing information, and it is expected that
system dump or kexec kernel failed to run properly.

When the FDT is allocated, the size of the FDT the kernel received at boot
time is used and an extra size can be applied. Currently, only memory added
after boot time is taken in account, not the CPU nodes.

The extra size should take in account these additional CPU nodes and
compute the required extra space. To achieve that, the size of a CPU node,
including its subnode is computed once and multiplied by the number of
additional CPU nodes.

The assumption is that the size of the CPU node is _same_ for all the node,
the only variable part should be the name "PowerPC,POWERxx@##" where "##"
may vary a bit.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kexec/file_load_64.c | 60 ++-
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 349a781cea0b..6865cd7dc3ca 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct umem_info {
u64 *buf;   /* data buffer for usable-memory property */
@@ -928,6 +929,46 @@ int setup_purgatory_ppc64(struct kimage *image, const void 
*slave_code,
return ret;
 }
 
+/**
+ * get_cpu_node_size - Compute the size of a CPU node in the FDT.
+ * This should be done only once and the value is stored in
+ * a static variable.
+ * Returns the max size of a CPU node in the FDT.
+ */
+static unsigned int cpu_node_size(void)
+{
+   static unsigned int cpu_node_size;
+   struct device_node *dn;
+   struct property *pp;
+
+   /*
+* Don't compute it twice, we are assuming that the per CPU node size
+* doesn't change during the system's life.
+*/
+   if (cpu_node_size)
+   return cpu_node_size;
+
+   dn = of_find_node_by_type(NULL, "cpu");
+   if (!dn) {
+   /* Unlikely to happen */
+   WARN_ON_ONCE(1);
+   return 0;
+   }
+
+   /*
+* We compute the sub node size for a CPU node, assuming it
+* will be the same for all.
+*/
+   cpu_node_size += strlen(dn->name) + 5;
+   for_each_property_of_node(dn, pp) {
+   cpu_node_size += strlen(pp->name);
+   cpu_node_size += pp->length;
+   }
+
+   of_node_put(dn);
+   return cpu_node_size;
+}
+
 /**
  * kexec_extra_fdt_size_ppc64 - Return the estimated additional size needed to
  *  setup FDT for kexec/kdump kernel.
@@ -937,7 +978,10 @@ int setup_purgatory_ppc64(struct kimage *image, const void 
*slave_code,
  */
 unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 {
+   struct device_node *dn;
u64 usm_entries;
+   unsigned int cpu_nodes = 0;
+   unsigned int extra_size;
 
if (image->type != KEXEC_TYPE_CRASH)
return 0;
@@ -949,7 +993,21 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage 
*image)
 */
usm_entries = ((memblock_end_of_DRAM() / drmem_lmb_size()) +
   (2 * (resource_size(_res) / drmem_lmb_size(;
-   return (unsigned int)(usm_entries * sizeof(u64));
+
+   extra_size = (unsigned int)(usm_entries * sizeof(u64));
+
+   /*
+* Get the number of CPU nodes in the current DT. This allows to
+* reserve places for CPU nodes added since the boot time.
+*/
+   for_each_node_by_type(dn, "cpu") {
+   cpu_nodes++;
+   }
+
+   if (cpu_nodes > boot_cpu_node_count)
+   extra_size += (cpu_nodes - boot_cpu_node_count) * 
cpu_node_size();
+
+   return extra_size;
 }
 
 /**
-- 
2.38.1



[PATCH v2 1/2] powerpc: export the CPU node count

2022-11-10 Thread Laurent Dufour
At boot time, the FDT is parsed to compute the number of CPUs.
In addition count the number of CPU nodes and export it.

This is useful when building the FDT for a kexeced kernel since we need to
take in account the CPU node added since the boot time during CPU hotplug
operations.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/prom.h | 1 +
 arch/powerpc/kernel/prom.c  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 2e82820fbd64..c0107d8ddd8c 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -85,6 +85,7 @@ struct of_drc_info {
 extern int of_read_drc_info_cell(struct property **prop,
const __be32 **curval, struct of_drc_info *data);
 
+extern unsigned int boot_cpu_node_count;
 
 /*
  * There are two methods for telling firmware what our capabilities are.
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 1eed87d954ba..645f4450dfc3 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -72,6 +72,7 @@ int __initdata iommu_is_off;
 int __initdata iommu_force_on;
 unsigned long tce_alloc_start, tce_alloc_end;
 u64 ppc64_rma_size;
+unsigned int boot_cpu_node_count __ro_after_init;
 #endif
 static phys_addr_t first_memblock_size;
 static int __initdata boot_cpu_count;
@@ -335,6 +336,8 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
if (type == NULL || strcmp(type, "cpu") != 0)
return 0;
 
+   boot_cpu_node_count++;
+
/* Get physical cpuid */
intserv = of_get_flat_dt_prop(node, "ibm,ppc-interrupt-server#s", );
if (!intserv)
-- 
2.38.1



Re: [PATCH 1/2] powerpc: export the CPU node count

2022-11-07 Thread Laurent Dufour
On 07/11/2022 13:11:17, Nicholas Piggin wrote:
> On Sat Oct 29, 2022 at 2:00 AM AEST, Laurent Dufour wrote:
>> At boot time, the FDT is parsed to compute the number of CPUs.
>> In addition count the number of CPU nodes and export it.
>>
>> This is useful when building the FDT for a kexeced kernel since we need to
>> take in account the CPU node added since the boot time during CPU hotplug
>> operations.
> 
> It would be nice if it just realloced memory in this case, but that
> looks like a bigger change.

I agree, and I think the best option in long term would be the series 
Sourabh Jain sent in June, updating the crash kernel FDT without reloading
it 
(https://lore.kernel.org/linuxppc-dev/20220620070106.93141-1-sourabhj...@linux.ibm.com/)

In the meantime, this solves the issue.

> 
> But these patches look okay to me, if you can solve the compile bug.

Indeed, the compile bugs are raised because I added the definition of the new 
variable 
'boot_cpu_node_count' in kexec_ranges.h, and add the inclusion of that file in 
prom.c.

I was not confident putting this new variable definition in that header file, 
but I 
didn't find a better option.

Do you have a better idea of header file to use?

Could I just declare this variable "extern" in 
arch/powerpc/kexec/file_load_64.c? This looks
ugly to me.

Thanks,
Laurent.


> Thanks,
> Nick
> 
>>
>> Signed-off-by: Laurent Dufour 
>> ---
>>  arch/powerpc/include/asm/kexec_ranges.h | 2 ++
>>  arch/powerpc/kernel/prom.c  | 4 
>>  2 files changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
>> b/arch/powerpc/include/asm/kexec_ranges.h
>> index f83866a19e87..bf35d00ddd09 100644
>> --- a/arch/powerpc/include/asm/kexec_ranges.h
>> +++ b/arch/powerpc/include/asm/kexec_ranges.h
>> @@ -22,4 +22,6 @@ int add_rtas_mem_range(struct crash_mem **mem_ranges);
>>  int add_opal_mem_range(struct crash_mem **mem_ranges);
>>  int add_reserved_mem_ranges(struct crash_mem **mem_ranges);
>>  
>> +extern unsigned int boot_cpu_node_count;
>> +
>>  #endif /* _ASM_POWERPC_KEXEC_RANGES_H */
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index 1eed87d954ba..d326148fd5a4 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -56,6 +56,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  
>> @@ -72,6 +73,7 @@ int __initdata iommu_is_off;
>>  int __initdata iommu_force_on;
>>  unsigned long tce_alloc_start, tce_alloc_end;
>>  u64 ppc64_rma_size;
>> +unsigned int boot_cpu_node_count __ro_after_init;
>>  #endif
>>  static phys_addr_t first_memblock_size;
>>  static int __initdata boot_cpu_count;
>> @@ -335,6 +337,8 @@ static int __init early_init_dt_scan_cpus(unsigned long 
>> node,
>>  if (type == NULL || strcmp(type, "cpu") != 0)
>>  return 0;
>>  
>> +boot_cpu_node_count++;
>> +
>>  /* Get physical cpuid */
>>  intserv = of_get_flat_dt_prop(node, "ibm,ppc-interrupt-server#s", );
>>  if (!intserv)
>> -- 
>> 2.38.1
> 



[PATCH 2/2] powerpc: Take in account addition CPU node when building kexec FDT

2022-10-28 Thread Laurent Dufour
On a system with a large number of CPUs, the creation of the FDT for a
kexec kernel may fail because the allocated FDT is not large enough.

When this happens, such a message is displayed on the console:

Unable to add ibm,processor-vadd-size property: FDT_ERR_NOSPACE

The property's name may change depending when the buffer overwrite is
detected.

Obviously the created FDT is missing information, and it is expected that
system dump or kexec kernel failed to run properly.

When the FDT is allocated, the size of the FDT the kernel received at boot
time is used and an extra size can be applied. Currently, only memory added
after boot time is taken in account, not the CPU nodes.

The extra size should take in account these additional CPU nodes and
compute the required extra space. To achieve that, the size of a CPU node,
including its subnode is computed once and multiplied by the number of
additional CPU nodes.

The assumption is that the size of the CPU node is _same_ for all the node,
the only variable part should be the name "PowerPC,POWERxx@##" where "##"
may vary a bit.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kexec/file_load_64.c | 59 ++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 349a781cea0b..1476922cd7c5 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -928,6 +928,46 @@ int setup_purgatory_ppc64(struct kimage *image, const void 
*slave_code,
return ret;
 }
 
+/**
+ * get_cpu_node_size - Compute the size of a CPU node in the FDT.
+ * This should be done only once and the value is stored in
+ * a static variable.
+ * Returns the max size of a CPU node in the FDT.
+ */
+static unsigned int cpu_node_size(void)
+{
+   static unsigned int cpu_node_size;
+   struct device_node *dn;
+   struct property *pp;
+
+   /*
+* Don't compute it twice, we are assuming that the per CPU node size
+* doesn't change during the system's life.
+*/
+   if (cpu_node_size)
+   return cpu_node_size;
+
+   dn = of_find_node_by_type(NULL, "cpu");
+   if (!dn) {
+   /* Unlikely to happen */
+   WARN_ON_ONCE(1);
+   return 0;
+   }
+
+   /*
+* We compute the sub node size for a CPU node, assuming it
+* will be the same for all.
+*/
+   cpu_node_size += strlen(dn->name) + 5;
+   for_each_property_of_node(dn, pp) {
+   cpu_node_size += strlen(pp->name);
+   cpu_node_size += pp->length;
+   }
+
+   of_node_put(dn);
+   return cpu_node_size;
+}
+
 /**
  * kexec_extra_fdt_size_ppc64 - Return the estimated additional size needed to
  *  setup FDT for kexec/kdump kernel.
@@ -937,7 +977,10 @@ int setup_purgatory_ppc64(struct kimage *image, const void 
*slave_code,
  */
 unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 {
+   struct device_node *dn;
u64 usm_entries;
+   unsigned int cpu_nodes = 0;
+   unsigned int extra_size;
 
if (image->type != KEXEC_TYPE_CRASH)
return 0;
@@ -949,7 +992,21 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage 
*image)
 */
usm_entries = ((memblock_end_of_DRAM() / drmem_lmb_size()) +
   (2 * (resource_size(_res) / drmem_lmb_size(;
-   return (unsigned int)(usm_entries * sizeof(u64));
+
+   extra_size = (unsigned int)(usm_entries * sizeof(u64));
+
+   /*
+* Get the number of CPU nodes in the current DT. This allows to
+* reserve places for CPU nodes added since the boot time.
+*/
+   for_each_node_by_type(dn, "cpu") {
+   cpu_nodes++;
+   }
+
+   if (cpu_nodes > boot_cpu_node_count)
+   extra_size += (cpu_nodes - boot_cpu_node_count) * 
cpu_node_size();
+
+   return extra_size;
 }
 
 /**
-- 
2.38.1



[PATCH 1/2] powerpc: export the CPU node count

2022-10-28 Thread Laurent Dufour
At boot time, the FDT is parsed to compute the number of CPUs.
In addition count the number of CPU nodes and export it.

This is useful when building the FDT for a kexeced kernel since we need to
take in account the CPU node added since the boot time during CPU hotplug
operations.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/kexec_ranges.h | 2 ++
 arch/powerpc/kernel/prom.c  | 4 
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e87..bf35d00ddd09 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -22,4 +22,6 @@ int add_rtas_mem_range(struct crash_mem **mem_ranges);
 int add_opal_mem_range(struct crash_mem **mem_ranges);
 int add_reserved_mem_ranges(struct crash_mem **mem_ranges);
 
+extern unsigned int boot_cpu_node_count;
+
 #endif /* _ASM_POWERPC_KEXEC_RANGES_H */
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 1eed87d954ba..d326148fd5a4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -72,6 +73,7 @@ int __initdata iommu_is_off;
 int __initdata iommu_force_on;
 unsigned long tce_alloc_start, tce_alloc_end;
 u64 ppc64_rma_size;
+unsigned int boot_cpu_node_count __ro_after_init;
 #endif
 static phys_addr_t first_memblock_size;
 static int __initdata boot_cpu_count;
@@ -335,6 +337,8 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
if (type == NULL || strcmp(type, "cpu") != 0)
return 0;
 
+   boot_cpu_node_count++;
+
/* Get physical cpuid */
intserv = of_get_flat_dt_prop(node, "ibm,ppc-interrupt-server#s", );
if (!intserv)
-- 
2.38.1



[PATCH 0/2] Consider the size of the added CPU nodes in the kexec FDT

2022-10-28 Thread Laurent Dufour
When adding CPUs to an already big system (test show it seems to start with
more than 256 CPUs), the kernel is showing error messages when building the
FDT for the kexec kernel (kdump or kexec).

It's worth to mention that the kdump kernel is reloaded after a CPU add
operation.

The messages look like (property's name may vary):
10175.025675] Unable to add 32-64-bridge property: FDT_ERR_NOSPACE

This happens because the size of the FDT is computed based on the size of
the FDT the kernel received at boot time. There is additional space added
in kexec_extra_fdt_size_ppc64() for the added memory but nothing is done
for the added CPUs.

This patch adds this feature so adding new CPUs will increase the size of
the FDT for the kexec kernel.

To compute the additional size required, the number of CPU nodes of the
initial FDT (the one the kernel receive at boot time) are recorded. When a
kexec FDT is created, the number of CPU nodes in the current FDT is used to
compute the additional size.

The first patch of this series is creating a variable provided by the boot
code when parsing the initial FDT at boot time.
The second patch is computing the required additional space.

This has been tested on a PowerVM LPAR running with than 256 CPUs in shared
mode, adding 320 CPUs to this LPAR.

Laurent Dufour (2):
  powerpc: export the CPU node count
  powerpc: Take in account addition CPU node when building kexec FDT

 arch/powerpc/include/asm/kexec_ranges.h |  2 +
 arch/powerpc/kernel/prom.c  |  4 ++
 arch/powerpc/kexec/file_load_64.c   | 59 -
 3 files changed, 64 insertions(+), 1 deletion(-)

-- 
2.38.1



Re: [PATCH 11/17] powerpc/qspinlock: allow propagation of yield CPU down the queue

2022-10-06 Thread Laurent Dufour
On 28/07/2022 08:31:14, Nicholas Piggin wrote:
> Having all CPUs poll the lock word for the owner CPU that should be
> yielded to defeats most of the purpose of using MCS queueing for
> scalability. Yet it may be desirable for queued waiters to to yield
> to a preempted owner.
> 
> s390 addreses this problem by having queued waiters sample the lock
> word to find the owner much less frequently. In this approach, the
> waiters never sample it directly, but the queue head propagates the
> owner CPU back to the next waiter if it ever finds the owner has
> been preempted. Queued waiters then subsequently propagate the owner
> CPU back to the next waiter, and so on.
> 
> Disable this option by default for now, i.e., no logical change.
> ---
>  arch/powerpc/lib/qspinlock.c | 85 +++-
>  1 file changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 94f007f66942..28c85a2d5635 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -12,6 +12,7 @@
>  struct qnode {
>   struct qnode*next;
>   struct qspinlock *lock;
> + int yield_cpu;
>   u8  locked; /* 1 if lock acquired */
>  };
>  
> @@ -28,6 +29,7 @@ static int HEAD_SPINS __read_mostly = (1<<8);
>  static bool pv_yield_owner __read_mostly = true;
>  static bool pv_yield_allow_steal __read_mostly = false;
>  static bool pv_yield_prev __read_mostly = true;
> +static bool pv_yield_propagate_owner __read_mostly = true;
>  
>  static DEFINE_PER_CPU_ALIGNED(struct qnodes, qnodes);
>  
> @@ -257,13 +259,66 @@ static __always_inline void 
> yield_head_to_locked_owner(struct qspinlock *lock, u
>   __yield_to_locked_owner(lock, val, paravirt, clear_mustq);
>  }
>  
> +static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, 
> int *set_yield_cpu, bool paravirt)
> +{
> + struct qnode *next;
> + int owner;
> +
> + if (!paravirt)
> + return;
> + if (!pv_yield_propagate_owner)
> + return;
> +
> + owner = get_owner_cpu(val);
> + if (*set_yield_cpu == owner)
> + return;
> +
> + next = READ_ONCE(node->next);
> + if (!next)
> + return;
> +
> + if (vcpu_is_preempted(owner)) {
> + next->yield_cpu = owner;
> + *set_yield_cpu = owner;
> + } else if (*set_yield_cpu != -1) {
> + next->yield_cpu = owner;
> + *set_yield_cpu = owner;
> + }

This is bit confusing, the else branch is the same as the true one.
This might be written like this:

if (vcpu_is_preempted(owner) || *set_yield_cpu != -1) {
next->yield_cpu = owner;
*set_yield_cpu = owner;
}

> +}
> +
>  static __always_inline void yield_to_prev(struct qspinlock *lock, struct 
> qnode *node, int prev_cpu, bool paravirt)
>  {
>   u32 yield_count;
> + int yield_cpu;
>  
>   if (!paravirt)
>   goto relax;
>  
> + if (!pv_yield_propagate_owner)
> + goto yield_prev;
> +
> + yield_cpu = READ_ONCE(node->yield_cpu);
> + if (yield_cpu == -1) {
> + /* Propagate back the -1 CPU */
> + if (node->next && node->next->yield_cpu != -1)
> + node->next->yield_cpu = yield_cpu;
> + goto yield_prev;
> + }
> +
> + yield_count = yield_count_of(yield_cpu);
> + if ((yield_count & 1) == 0)
> + goto yield_prev; /* owner vcpu is running */
> +
> + smp_rmb();
> +
> + if (yield_cpu == node->yield_cpu) {
> + if (node->next && node->next->yield_cpu != yield_cpu)
> + node->next->yield_cpu = yield_cpu;
> + yield_to_preempted(yield_cpu, yield_count);
> + return;
> + }
> +

In the case that test is false, this means that the lock owner has probably
changed, why are we yeilding to the previous node instead of reading again
the node->yield_cpu, checking against -1 value etc..?
Yielding to the previous node is valid, but it might be better to yield to
the owner, isn't it?

> +yield_prev:
>   if (!pv_yield_prev)
>   goto relax;
>  
> @@ -337,6 +392,7 @@ static __always_inline void 
> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>   node = >nodes[idx];
>   node->next = NULL;
>   node->lock = lock;
> + node->yield_cpu = -1;
>   node->locked = 0;
>  
>   tail = encode_tail_cpu();
> @@ -358,13 +414,21 @@ static __always_inline void 
> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>   while (!node->locked)
>   yield_to_prev(lock, node, prev_cpu, paravirt);
>  
> + /* Clear out stale propagated yield_cpu */
> + if (paravirt && pv_yield_propagate_owner && node->yield_cpu != 
> -1)
> + node->yield_cpu = -1;

Why doing tests and not directly setting node->yield_cpu to -1?
Is the 

Re: [PATCH 16/17] powerpc/qspinlock: allow indefinite spinning on a preempted owner

2022-09-22 Thread Laurent Dufour
On 28/07/2022 08:31:19, Nicholas Piggin wrote:
> Provide an option that holds off queueing indefinitely while the lock
> owner is preempted. This could reduce queueing latencies for very
> overcommitted vcpu situations.
> 
> This is disabled by default.

Hi Nick,

I should have missed something here.

If this option is turned on, CPU trying to lock when there is a preempted
owner will spin checking the lock->val and yielding the lock owner CPU.
Am I right?

If yes, why not being queued and spin checking its own value, yielding
against the lock owner CPU? This will generate less cache bouncing, which
is what the queued spinlock is trying to address, isn't it?

Thanks,
Laurent.

> ---
>  arch/powerpc/lib/qspinlock.c | 91 +++-
>  1 file changed, 79 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 24f68bd71e2b..5cfd69931e31 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -35,6 +35,7 @@ static int HEAD_SPINS __read_mostly = (1<<8);
>  
>  static bool pv_yield_owner __read_mostly = true;
>  static bool pv_yield_allow_steal __read_mostly = false;
> +static bool pv_spin_on_preempted_owner __read_mostly = false;
>  static bool pv_yield_prev __read_mostly = true;
>  static bool pv_yield_propagate_owner __read_mostly = true;
>  static bool pv_prod_head __read_mostly = false;
> @@ -220,13 +221,15 @@ static struct qnode *get_tail_qnode(struct qspinlock 
> *lock, u32 val)
>   BUG();
>  }
>  
> -static __always_inline void __yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt, bool clear_mustq)
> +static __always_inline void __yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt, bool clear_mustq, bool *preempted)
>  {
>   int owner;
>   u32 yield_count;
>  
>   BUG_ON(!(val & _Q_LOCKED_VAL));
>  
> + *preempted = false;
> +
>   if (!paravirt)
>   goto relax;
>  
> @@ -241,6 +244,8 @@ static __always_inline void 
> __yield_to_locked_owner(struct qspinlock *lock, u32
>  
>   spin_end();
>  
> + *preempted = true;
> +
>   /*
>* Read the lock word after sampling the yield count. On the other side
>* there may a wmb because the yield count update is done by the
> @@ -265,14 +270,14 @@ static __always_inline void 
> __yield_to_locked_owner(struct qspinlock *lock, u32
>   spin_cpu_relax();
>  }
>  
> -static __always_inline void yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt)
> +static __always_inline void yield_to_locked_owner(struct qspinlock *lock, 
> u32 val, bool paravirt, bool *preempted)
>  {
> - __yield_to_locked_owner(lock, val, paravirt, false);
> + __yield_to_locked_owner(lock, val, paravirt, false, preempted);
>  }
>  
> -static __always_inline void yield_head_to_locked_owner(struct qspinlock 
> *lock, u32 val, bool paravirt, bool clear_mustq)
> +static __always_inline void yield_head_to_locked_owner(struct qspinlock 
> *lock, u32 val, bool paravirt, bool clear_mustq, bool *preempted)
>  {
> - __yield_to_locked_owner(lock, val, paravirt, clear_mustq);
> + __yield_to_locked_owner(lock, val, paravirt, clear_mustq, preempted);
>  }
>  
>  static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, 
> int *set_yield_cpu, bool paravirt)
> @@ -364,12 +369,33 @@ static __always_inline void yield_to_prev(struct 
> qspinlock *lock, struct qnode *
>  
>  static __always_inline bool try_to_steal_lock(struct qspinlock *lock, bool 
> paravirt)
>  {
> - int iters;
> + int iters = 0;
> +
> + if (!STEAL_SPINS) {
> + if (paravirt && pv_spin_on_preempted_owner) {
> + spin_begin();
> + for (;;) {
> + u32 val = READ_ONCE(lock->val);
> + bool preempted;
> +
> + if (val & _Q_MUST_Q_VAL)
> + break;
> + if (!(val & _Q_LOCKED_VAL))
> + break;
> + if (!vcpu_is_preempted(get_owner_cpu(val)))
> + break;
> + yield_to_locked_owner(lock, val, paravirt, 
> );
> + }
> + spin_end();
> + }
> + return false;
> + }
>  
>   /* Attempt to steal the lock */
>   spin_begin();
>   for (;;) {
>   u32 val = READ_ONCE(lock->val);
> + bool preempted;
>  
>   if (val & _Q_MUST_Q_VAL)
>   break;
> @@ -382,9 +408,22 @@ static __always_inline bool try_to_steal_lock(struct 
> qspinlock *lock, bool parav
>   continue;
>   }
>  
> - yield_to_locked_owner(lock, val, paravirt);
> -
> - iters++;
> + yield_to_locked_owner(lock, val, 

Re: [RFC PATCH RESEND 28/28] kernel/fork: throttle call_rcu() calls in vm_area_free

2022-09-09 Thread Laurent Dufour
Le 09/09/2022 à 18:02, Suren Baghdasaryan a écrit :
> On Fri, Sep 9, 2022 at 8:19 AM Laurent Dufour  wrote:
>>
>> Le 01/09/2022 à 19:35, Suren Baghdasaryan a écrit :
>>> call_rcu() can take a long time when callback offloading is enabled.
>>> Its use in the vm_area_free can cause regressions in the exit path when
>>> multiple VMAs are being freed. To minimize that impact, place VMAs into
>>> a list and free them in groups using one call_rcu() call per group.
>>>
>>> Signed-off-by: Suren Baghdasaryan 
>>> ---
>>>  include/linux/mm.h   |  1 +
>>>  include/linux/mm_types.h | 11 ++-
>>>  kernel/fork.c| 68 +++-
>>>  mm/init-mm.c |  3 ++
>>>  mm/mmap.c|  1 +
>>>  5 files changed, 75 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index a3cbaa7b9119..81dff694ac14 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -249,6 +249,7 @@ void setup_initial_init_mm(void *start_code, void 
>>> *end_code,
>>>  struct vm_area_struct *vm_area_alloc(struct mm_struct *);
>>>  struct vm_area_struct *vm_area_dup(struct vm_area_struct *);
>>>  void vm_area_free(struct vm_area_struct *);
>>> +void drain_free_vmas(struct mm_struct *mm);
>>>
>>>  #ifndef CONFIG_MMU
>>>  extern struct rb_root nommu_region_tree;
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 36562e702baf..6f3effc493b1 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -412,7 +412,11 @@ struct vm_area_struct {
>>>   struct vm_area_struct *vm_next, *vm_prev;
>>>   };
>>>  #ifdef CONFIG_PER_VMA_LOCK
>>> - struct rcu_head vm_rcu; /* Used for deferred freeing. */
>>> + struct {
>>> + struct list_head vm_free_list;
>>> + /* Used for deferred freeing. */
>>> + struct rcu_head vm_rcu;
>>> + };
>>>  #endif
>>>   };
>>>
>>> @@ -573,6 +577,11 @@ struct mm_struct {
>>> */
>>>  #ifdef CONFIG_PER_VMA_LOCK
>>>   int mm_lock_seq;
>>> + struct {
>>> + struct list_head head;
>>> + spinlock_t lock;
>>> + int size;
>>> + } vma_free_list;
>>>  #endif
>>>
>>>
>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>> index b443ba3a247a..7c88710aed72 100644
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -483,26 +483,75 @@ struct vm_area_struct *vm_area_dup(struct 
>>> vm_area_struct *orig)
>>>  }
>>>
>>>  #ifdef CONFIG_PER_VMA_LOCK
>>> -static void __vm_area_free(struct rcu_head *head)
>>> +static inline void __vm_area_free(struct vm_area_struct *vma)
>>>  {
>>> - struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
>>> -   vm_rcu);
>>>   /* The vma should either have no lock holders or be write-locked. */
>>>   vma_assert_no_reader(vma);
>>>   kmem_cache_free(vm_area_cachep, vma);
>>>  }
>>> -#endif
>>> +
>>> +static void vma_free_rcu_callback(struct rcu_head *head)
>>> +{
>>> + struct vm_area_struct *first_vma;
>>> + struct vm_area_struct *vma, *vma2;
>>> +
>>> + first_vma = container_of(head, struct vm_area_struct, vm_rcu);
>>> + list_for_each_entry_safe(vma, vma2, _vma->vm_free_list, 
>>> vm_free_list)
>>
>> Is that safe to walk the list against concurrent calls to
>> list_splice_init(), or list_add()?
> 
> I think it is. drain_free_vmas() moves the to-be-destroyed and already
> isolated VMAs from mm->vma_free_list into to_destroy list and then
> passes that list to vma_free_rcu_callback(). At this point the list of
> VMAs passed to vma_free_rcu_callback() is not accessible either from
> mm (VMAs were isolated before vm_area_free() was called) or from
> drain_free_vmas() since they were already removed from
> mm->vma_free_list. Does that make sense?

Got it!
Thanks for the explanation.

> 
>>
>>> + __vm_area_free(vma);
>>> + __vm_area_free(first_v

Re: [RFC PATCH RESEND 10/28] mm/mmap: mark VMAs as locked in vma_adjust

2022-09-09 Thread Laurent Dufour
Le 09/09/2022 à 02:51, Suren Baghdasaryan a écrit :
> On Tue, Sep 6, 2022 at 8:35 AM Laurent Dufour  wrote:
>>
>> Le 01/09/2022 à 19:34, Suren Baghdasaryan a écrit :
>>> vma_adjust modifies a VMA and possibly its neighbors. Mark them as locked
>>> before making the modifications.
>>>
>>> Signed-off-by: Suren Baghdasaryan 
>>> ---
>>>  mm/mmap.c | 11 ++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index f89c9b058105..ed58cf0689b2 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -710,6 +710,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
>>> long start,
>>>   long adjust_next = 0;
>>>   int remove_next = 0;
>>>
>>> + vma_mark_locked(vma);
>>> + if (next)
>>> + vma_mark_locked(next);
>>> +
>>
>> I was wondering if the VMAs insert and expand should be locked too.
>>
>> For expand, I can't see any valid reason, but for insert, I'm puzzled.
>> I would think that it is better to lock the VMA to be inserted but I can't
>> really justify that.
>>
>> It may be nice to detail why this is not need to lock insert and expand here.
> 
> 'expand' is always locked before it's passed to __vma_adjust() by
> vma_merge(). It has to be locked before we decide "Can it merge with
> the predecessor?" here
> https://elixir.bootlin.com/linux/latest/source/mm/mmap.c#L1201 because
> a change in VMA can affect that decision. I spent many hours tracking
> the issue caused by not locking the VMA before making this decision.
> It might be good to add a comment about this...
> 
> AFAIKT 'insert' is only used by __split_vma() and it's always a brand
> new VMA which is not yet linked into mm->mmap. Any reason
> __vma_adjust() should lock it?

No, I think that's good this way.

> 
>>
>>>   if (next && !insert) {
>>>   struct vm_area_struct *exporter = NULL, *importer = NULL;
>>>
>>> @@ -754,8 +758,11 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
>>> long start,
>>>* If next doesn't have anon_vma, import from vma 
>>> after
>>>* next, if the vma overlaps with it.
>>>*/
>>> - if (remove_next == 2 && !next->anon_vma)
>>> + if (remove_next == 2 && !next->anon_vma) {
>>>   exporter = next->vm_next;
>>> + if (exporter)
>>> + vma_mark_locked(exporter);
>>> + }
>>>
>>>   } else if (end > next->vm_start) {
>>>   /*
>>> @@ -931,6 +938,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
>>> long start,
>>>* "vma->vm_next" gap must be updated.
>>>*/
>>>   next = vma->vm_next;
>>> + if (next)
>>> + vma_mark_locked(next);
>>>   } else {
>>>   /*
>>>* For the scope of the comment "next" and
>>
>> --
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to kernel-team+unsubscr...@android.com.
>>



Re: [RFC PATCH RESEND 28/28] kernel/fork: throttle call_rcu() calls in vm_area_free

2022-09-09 Thread Laurent Dufour
Le 01/09/2022 à 19:35, Suren Baghdasaryan a écrit :
> call_rcu() can take a long time when callback offloading is enabled.
> Its use in the vm_area_free can cause regressions in the exit path when
> multiple VMAs are being freed. To minimize that impact, place VMAs into
> a list and free them in groups using one call_rcu() call per group.
> 
> Signed-off-by: Suren Baghdasaryan 
> ---
>  include/linux/mm.h   |  1 +
>  include/linux/mm_types.h | 11 ++-
>  kernel/fork.c| 68 +++-
>  mm/init-mm.c |  3 ++
>  mm/mmap.c|  1 +
>  5 files changed, 75 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a3cbaa7b9119..81dff694ac14 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -249,6 +249,7 @@ void setup_initial_init_mm(void *start_code, void 
> *end_code,
>  struct vm_area_struct *vm_area_alloc(struct mm_struct *);
>  struct vm_area_struct *vm_area_dup(struct vm_area_struct *);
>  void vm_area_free(struct vm_area_struct *);
> +void drain_free_vmas(struct mm_struct *mm);
>  
>  #ifndef CONFIG_MMU
>  extern struct rb_root nommu_region_tree;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 36562e702baf..6f3effc493b1 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -412,7 +412,11 @@ struct vm_area_struct {
>   struct vm_area_struct *vm_next, *vm_prev;
>   };
>  #ifdef CONFIG_PER_VMA_LOCK
> - struct rcu_head vm_rcu; /* Used for deferred freeing. */
> + struct {
> + struct list_head vm_free_list;
> + /* Used for deferred freeing. */
> + struct rcu_head vm_rcu;
> + };
>  #endif
>   };
>  
> @@ -573,6 +577,11 @@ struct mm_struct {
> */
>  #ifdef CONFIG_PER_VMA_LOCK
>   int mm_lock_seq;
> + struct {
> + struct list_head head;
> + spinlock_t lock;
> + int size;
> + } vma_free_list;
>  #endif
>  
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b443ba3a247a..7c88710aed72 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -483,26 +483,75 @@ struct vm_area_struct *vm_area_dup(struct 
> vm_area_struct *orig)
>  }
>  
>  #ifdef CONFIG_PER_VMA_LOCK
> -static void __vm_area_free(struct rcu_head *head)
> +static inline void __vm_area_free(struct vm_area_struct *vma)
>  {
> - struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
> -   vm_rcu);
>   /* The vma should either have no lock holders or be write-locked. */
>   vma_assert_no_reader(vma);
>   kmem_cache_free(vm_area_cachep, vma);
>  }
> -#endif
> +
> +static void vma_free_rcu_callback(struct rcu_head *head)
> +{
> + struct vm_area_struct *first_vma;
> + struct vm_area_struct *vma, *vma2;
> +
> + first_vma = container_of(head, struct vm_area_struct, vm_rcu);
> + list_for_each_entry_safe(vma, vma2, _vma->vm_free_list, 
> vm_free_list)

Is that safe to walk the list against concurrent calls to
list_splice_init(), or list_add()?

> + __vm_area_free(vma);
> + __vm_area_free(first_vma);
> +}
> +
> +void drain_free_vmas(struct mm_struct *mm)
> +{
> + struct vm_area_struct *first_vma;
> + LIST_HEAD(to_destroy);
> +
> + spin_lock(>vma_free_list.lock);
> + list_splice_init(>vma_free_list.head, _destroy);
> + mm->vma_free_list.size = 0;
> + spin_unlock(>vma_free_list.lock);
> +
> + if (list_empty(_destroy))
> + return;
> +
> + first_vma = list_first_entry(_destroy, struct vm_area_struct, 
> vm_free_list);
> + /* Remove the head which is allocated on the stack */
> + list_del(_destroy);
> +
> + call_rcu(_vma->vm_rcu, vma_free_rcu_callback);
> +}
> +
> +#define VM_AREA_FREE_LIST_MAX32
> +
> +void vm_area_free(struct vm_area_struct *vma)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + bool drain;
> +
> + free_anon_vma_name(vma);
> +
> + spin_lock(>vma_free_list.lock);
> + list_add(>vm_free_list, >vma_free_list.head);
> + mm->vma_free_list.size++;
> + drain = mm->vma_free_list.size > VM_AREA_FREE_LIST_MAX;
> + spin_unlock(>vma_free_list.lock);
> +
> + if (drain)
> + drain_free_vmas(mm);
> +}
> +
> +#else /* CONFIG_PER_VMA_LOCK */
> +
> +void drain_free_vmas(struct mm_struct *mm) {}
>  
>  void vm_area_free(struct vm_area_struct *vma)
>  {
>   free_anon_vma_name(vma);
> -#ifdef CONFIG_PER_VMA_LOCK
> - call_rcu(>vm_rcu, __vm_area_free);
> -#else
>   kmem_cache_free(vm_area_cachep, vma);
> -#endif
>  }
>  
> +#endif /* CONFIG_PER_VMA_LOCK */
> +
>  static void account_kernel_stack(struct task_struct *tsk, int account)
>  {
>   if (IS_ENABLED(CONFIG_VMAP_STACK)) {
> @@ -1137,6 +1186,9 @@ 

Re: [RFC PATCH RESEND 21/28] mm: introduce find_and_lock_anon_vma to be used from arch-specific code

2022-09-09 Thread Laurent Dufour
Le 01/09/2022 à 19:35, Suren Baghdasaryan a écrit :
> Introduce find_and_lock_anon_vma function to lookup and lock an anonymous
> VMA during page fault handling. When VMA is not found, can't be locked
> or changes after being locked, the function returns NULL. The lookup is
> performed under RCU protection to prevent the found VMA from being
> destroyed before the VMA lock is acquired. VMA lock statistics are
> updated according to the results.
> 
> Signed-off-by: Suren Baghdasaryan 
> ---
>  include/linux/mm.h |  3 +++
>  mm/memory.c| 45 +
>  2 files changed, 48 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7c3190eaabd7..a3cbaa7b9119 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -684,6 +684,9 @@ static inline void vma_assert_no_reader(struct 
> vm_area_struct *vma)
> vma);
>  }
>  
> +struct vm_area_struct *find_and_lock_anon_vma(struct mm_struct *mm,
> +   unsigned long address);
> +
>  #else /* CONFIG_PER_VMA_LOCK */
>  
>  static inline void vma_init_lock(struct vm_area_struct *vma) {}
> diff --git a/mm/memory.c b/mm/memory.c
> index 29d2f49f922a..bf557f7056de 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5183,6 +5183,51 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, 
> unsigned long address,
>  }
>  EXPORT_SYMBOL_GPL(handle_mm_fault);
>  
> +#ifdef CONFIG_PER_VMA_LOCK
> +static inline struct vm_area_struct *find_vma_under_rcu(struct mm_struct *mm,
> + unsigned long address)
> +{
> + struct vm_area_struct *vma = __find_vma(mm, address);
> +
> + if (!vma || vma->vm_start > address)
> + return NULL;
> +
> + if (!vma_is_anonymous(vma))
> + return NULL;
> +

It looks to me more natural to first check that the VMA is part of the RB
tree before try read locking it.

> + if (!vma_read_trylock(vma)) {
> + count_vm_vma_lock_event(VMA_LOCK_ABORT);
> + return NULL;
> + }
> +
> + /* Check if the VMA got isolated after we found it */
> + if (RB_EMPTY_NODE(>vm_rb)) {
> + vma_read_unlock(vma);
> + count_vm_vma_lock_event(VMA_LOCK_MISS);
> + return NULL;
> + }
> +
> + return vma;
> +}
> +
> +/*
> + * Lookup and lock and anonymous VMA. Returned VMA is guaranteed to be stable
> + * and not isolated. If the VMA is not found of is being modified the 
> function
> + * returns NULL.
> + */
> +struct vm_area_struct *find_and_lock_anon_vma(struct mm_struct *mm,
> +   unsigned long address)
> +{
> + struct vm_area_struct *vma;
> +
> + rcu_read_lock();
> + vma = find_vma_under_rcu(mm, address);
> + rcu_read_unlock();
> +
> + return vma;
> +}
> +#endif /* CONFIG_PER_VMA_LOCK */
> +
>  #ifndef __PAGETABLE_P4D_FOLDED
>  /*
>   * Allocate p4d page table.



Re: [RFC PATCH RESEND 20/28] mm: introduce per-VMA lock statistics

2022-09-09 Thread Laurent Dufour
Le 01/09/2022 à 19:35, Suren Baghdasaryan a écrit :
> Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra
> statistics about handling page fault under VMA lock.
> 

Why not making this a default when per VMA lock are enabled?

> Signed-off-by: Suren Baghdasaryan 
> ---
>  include/linux/vm_event_item.h | 6 ++
>  include/linux/vmstat.h| 6 ++
>  mm/Kconfig.debug  | 8 
>  mm/vmstat.c   | 6 ++
>  4 files changed, 26 insertions(+)
> 
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index f3fc36cd2276..a325783ed05d 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -150,6 +150,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  #ifdef CONFIG_X86
>   DIRECT_MAP_LEVEL2_SPLIT,
>   DIRECT_MAP_LEVEL3_SPLIT,
> +#endif
> +#ifdef CONFIG_PER_VMA_LOCK_STATS
> + VMA_LOCK_SUCCESS,
> + VMA_LOCK_ABORT,
> + VMA_LOCK_RETRY,
> + VMA_LOCK_MISS,
>  #endif
>   NR_VM_EVENT_ITEMS
>  };
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index bfe38869498d..0c2611899cfc 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -131,6 +131,12 @@ static inline void vm_events_fold_cpu(int cpu)
>  #define count_vm_vmacache_event(x) do {} while (0)
>  #endif
>  
> +#ifdef CONFIG_PER_VMA_LOCK_STATS
> +#define count_vm_vma_lock_event(x) count_vm_event(x)
> +#else
> +#define count_vm_vma_lock_event(x) do {} while (0)
> +#endif
> +
>  #define __count_zid_vm_events(item, zid, delta) \
>   __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta)
>  
> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index ce8dded36de9..075642763a03 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -207,3 +207,11 @@ config PTDUMP_DEBUGFS
> kernel.
>  
> If in doubt, say N.
> +
> +
> +config PER_VMA_LOCK_STATS
> + bool "Statistics for per-vma locks"
> + depends on PER_VMA_LOCK
> + help
> +   Statistics for per-vma locks.
> +   If in doubt, say N.
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 90af9a8572f5..3f3804c846a6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1411,6 +1411,12 @@ const char * const vmstat_text[] = {
>   "direct_map_level2_splits",
>   "direct_map_level3_splits",
>  #endif
> +#ifdef CONFIG_PER_VMA_LOCK_STATS
> + "vma_lock_success",
> + "vma_lock_abort",
> + "vma_lock_retry",
> + "vma_lock_miss",
> +#endif
>  #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
>  };
>  #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */



Re: [RFC PATCH RESEND 19/28] mm: disallow do_swap_page to handle page faults under VMA lock

2022-09-09 Thread Laurent Dufour
Le 01/09/2022 à 19:35, Suren Baghdasaryan a écrit :
> Due to the possibility of do_swap_page dropping mmap_lock, abort fault
> handling under VMA lock and retry holding mmap_lock. This can be handled
> more gracefully in the future.
> 
> Signed-off-by: Suren Baghdasaryan 

Reviewed-by: Laurent Dufour 

> ---
>  mm/memory.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 9ac9944e8c62..29d2f49f922a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3738,6 +3738,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>   vm_fault_t ret = 0;
>   void *shadow = NULL;
>  
> + if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> + ret = VM_FAULT_RETRY;
> + goto out;
> + }
> +
>   if (!pte_unmap_same(vmf))
>   goto out;
>  



Re: [RFC PATCH RESEND 18/28] mm: add FAULT_FLAG_VMA_LOCK flag

2022-09-09 Thread Laurent Dufour
Le 01/09/2022 à 19:35, Suren Baghdasaryan a écrit :
> Add a new flag to distinguish page faults handled under protection of
> per-vma lock.
> 
> Signed-off-by: Suren Baghdasaryan 

FWIW,

Reviewed-by: Laurent Dufour 

> ---
>  include/linux/mm.h   | 3 ++-
>  include/linux/mm_types.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0d9c1563c354..7c3190eaabd7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -466,7 +466,8 @@ static inline bool fault_flag_allow_retry_first(enum 
> fault_flag flags)
>   { FAULT_FLAG_USER,  "USER" }, \
>   { FAULT_FLAG_REMOTE,"REMOTE" }, \
>   { FAULT_FLAG_INSTRUCTION,   "INSTRUCTION" }, \
> - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
> + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
> + { FAULT_FLAG_VMA_LOCK,  "VMA_LOCK" }
>  
>  /*
>   * vm_fault is filled by the pagefault handler and passed to the vma's
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6a03f59c1e78..36562e702baf 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -886,6 +886,7 @@ enum fault_flag {
>   FAULT_FLAG_INTERRUPTIBLE =  1 << 9,
>   FAULT_FLAG_UNSHARE =1 << 10,
>   FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
> + FAULT_FLAG_VMA_LOCK =   1 << 12,
>  };
>  
>  typedef unsigned int __bitwise zap_flags_t;



  1   2   3   4   5   6   7   8   9   10   >