Re: [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression

2023-11-26 Thread Oliver Sang
hi, Linus,

On Sun, Nov 26, 2023 at 03:20:58PM -0800, Linus Torvalds wrote:
> On Sun, 26 Nov 2023 at 12:23, Linus Torvalds
>  wrote:
> >
> > IOW, I might have messed up some "trivial cleanup" when prepping for
> > sending it out...
> 
> Bah. Famous last words. One of the "trivial cleanups" made the code
> more "obvious" by renaming the nospec mask as just "mask".
> 
> And that trivial rename broke that patch *entirely*, because now that
> name shadowed the "fmode_t" mask argument.
> 
> Don't even ask how long it took me to go from "I *tested* this,
> dammit, now it doesn't work at all" to "Oh God, I'm so stupid".
> 
> So that nobody else would waste any time on this, attached is a new
> attempt. This time actually tested *after* the changes.

we applied the new patch upon 0ede61d858, and confirmed regression is gone,
even 3.4% better than 93faf426e3 now.

Tested-by: kernel test robot 

=
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  
gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale

commit:
  93faf426e3 ("vfs: shave work on failed file open")
  0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU")
  c712b4365b ("Improve __fget_files_rcu() code generation (and thus 
__fget_light())")

93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 c712b4365b5b4dbe1d1380edd37
 --- ---
 %stddev %change %stddev %change %stddev
 \  |\  |\
228481 ±  4%  -4.6% 217900 ±  6% -11.7% 201857 ±  5%  
meminfo.DirectMap4k
 89056-2.0%  87309-1.6%  87606
proc-vmstat.nr_slab_unreclaimable
 16.28-0.7%  16.16-1.0%  16.12
turbostat.RAMWatt
  0.01 ±  9%  +58125.6%   4.17 ±175%  +23253.5%   1.67 ±222%  
perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
781.67 ± 10%  +6.5% 832.50 ± 19% -14.3% 670.17 ±  4%  
perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
 97958 ±  7%  -9.7%  88449 ±  4%  -0.6%  97399 ±  4%  
sched_debug.cpu.avg_idle.stddev
  0.00 ± 12% +24.2%   0.00 ± 17%  -5.2%   0.00 ±  7%  
sched_debug.cpu.next_balance.stddev
   6391048-2.9%6208584+3.4%6605584
will-it-scale.16.threads
399440-2.9% 388036+3.4% 412848
will-it-scale.per_thread_ops
   6391048-2.9%6208584+3.4%6605584
will-it-scale.workload
 19.99 ±  4%  -2.2   17.74+1.2   21.18 ±  2%  
perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
  1.27 ±  5%  +0.82.11 ±  3% +31.1   32.36 ±  2%  
perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
 32.69 ±  4%  +5.0   37.70   -32.70.00
perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
  0.00   +27.9   27.85+0.00.00
perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll
 20.00 ±  4%  -2.3   17.75+0.4   20.43 ±  2%  
perf-profile.children.cycles-pp.fput
  0.24 ± 10%  -0.10.18 ±  2%  -0.10.18 ± 10%  
perf-profile.children.cycles-pp.syscall_return_via_sysret
  1.48 ±  5%  +0.51.98 ±  3% +30.8   32.32 ±  2%  
perf-profile.children.cycles-pp.__fdget
 31.85 ±  4%  +6.0   37.86   -31.80.00
perf-profile.children.cycles-pp.__fget_light
  0.00   +27.7   27.67+0.00.00
perf-profile.children.cycles-pp.__get_file_rcu
 30.90 ±  4% -20.6   10.35 ±  2% -30.90.00
perf-profile.self.cycles-pp.__fget_light
 19.94 ±  4%  -2.4   17.53-0.3   19.62 ±  2%  
perf-profile.self.cycles-pp.fput
  9.81 ±  4%  -2.47.42 ±  2%  +1.7   11.51 ±  4%  
perf-profile.self.cycles-pp.do_poll
  0.23 ± 11%  -0.10.17 ±  4%  -0.10.18 ± 11%  
perf-profile.self.cycles-pp.syscall_return_via_sysret
  0.44 ±  7%  +0.00.45 ±  5%  +0.10.52 ±  4%  
perf-profile.self.cycles-pp.__poll
  0.85 ±  4%  +0.10.92 ±  3% +30.3   31.17 ±  2%  
perf-profile.self.cycles-pp.__fdget
  0.00   +26.5   26.48+0.00.00
perf-profile.self.cycles-pp.__get_file_rcu
 2.146e+10 ±  2%  +8.5%  2.329e+10 ±  2%  -2.1%  2.101e+10
perf-stat.i.branch-instructions
   

Re: [PATCH] powerpc/mm: Fix null-pointer dereference in pgtable_cache_add

2023-11-26 Thread Kunwu Chan

Hi Christophe,

Thanks for your reply.
It's my bad. According your reply, i read the code in 
sysfs_do_create_link_sd.There is a null pointer check indeed.


My intention was to check null pointer after memory allocation.
Whether we can add a comment here for someone like me, the null pointer 
check is no need here?


Thanks,
Kunwu

On 2023/11/24 23:17, Christophe Leroy wrote:



Le 22/11/2023 à 10:00, Kunwu Chan a écrit :

[Vous ne recevez pas souvent de courriers de chen...@kylinos.cn. Découvrez 
pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]

kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.


Are you sure this is needed ? Did you check what happens what name is NULL ?

If I followed stuff correctly, I end up in function
sysfs_do_create_link_sd() which already handles the NULL name case which
a big hammer warning.



Signed-off-by: Kunwu Chan 
---
   arch/powerpc/mm/init-common.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 119ef491f797..0884fc601c46 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -139,6 +139,8 @@ void pgtable_cache_add(unsigned int shift)

  align = max_t(unsigned long, align, minalign);
  name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
+   if (!name)
+   return;
  new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
  if (!new)
  panic("Could not allocate pgtable cache for order %d", shift);
--
2.34.1



[PATCH 3/3] powerpc/64: Only warn for kuap locked when KCSAN not present

2023-11-26 Thread Rohan McLure
Arbitrary instrumented locations, including syscall handlers, can call
arch_local_irq_restore() transitively when KCSAN is enabled, and in turn
also replay_soft_interrupts_irqrestore(). The precondition on entry to
this routine that is checked is that KUAP is enabled (user access
prohibited). Failure to meet this condition only triggers a warning
however, and afterwards KUAP is enabled anyway. That is, KUAP being
disabled on entry is in fact permissable, but not possible on an
uninstrumented kernel.

Disable this assertion only when KCSAN is enabled.

Suggested-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/irq_64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq_64.c b/arch/powerpc/kernel/irq_64.c
index 938e66829eae..1b7e8ebb052a 100644
--- a/arch/powerpc/kernel/irq_64.c
+++ b/arch/powerpc/kernel/irq_64.c
@@ -189,7 +189,8 @@ static inline __no_kcsan void 
replay_soft_interrupts_irqrestore(void)
 * and re-locking AMR but we shouldn't get here in the first place,
 * hence the warning.
 */
-   kuap_assert_locked();
+   if (!IS_ENABLED(CONFIG_KCSAN))
+   kuap_assert_locked();
 
if (kuap_state != AMR_KUAP_BLOCKED)
set_kuap(AMR_KUAP_BLOCKED);
-- 
2.43.0



[PATCH 2/3] powerpc: Apply __always_inline to interrupt_{enter,exit}_prepare()

2023-11-26 Thread Rohan McLure
In keeping with the advice given by Documentation/core-api/entry.rst,
entry and exit handlers for interrupts should not be instrumented.
Guarantee that the interrupt_{enter,exit}_prepare() routines are inlined
so that they will inheret instrumentation from their caller.

KCSAN kernels were observed to compile without inlining these routines,
which would lead to grief on NMI handlers.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/interrupt.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index a4196ab1d016..5f9be87c01ca 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -150,7 +150,7 @@ static inline void booke_restore_dbcr0(void)
 #endif
 }
 
-static inline void interrupt_enter_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_enter_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC64
irq_soft_mask_set(IRQS_ALL_DISABLED);
@@ -215,11 +215,11 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs)
  * However interrupt_nmi_exit_prepare does return directly to regs, because
  * NMIs do not do "exit work" or replay soft-masked interrupts.
  */
-static inline void interrupt_exit_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_exit_prepare(struct pt_regs *regs)
 {
 }
 
-static inline void interrupt_async_enter_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_async_enter_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC64
/* Ensure interrupt_enter_prepare does not enable MSR[EE] */
@@ -238,7 +238,7 @@ static inline void interrupt_async_enter_prepare(struct 
pt_regs *regs)
irq_enter();
 }
 
-static inline void interrupt_async_exit_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_async_exit_prepare(struct pt_regs *regs)
 {
/*
 * Adjust at exit so the main handler sees the true NIA. This must
@@ -278,7 +278,7 @@ static inline bool nmi_disables_ftrace(struct pt_regs *regs)
return true;
 }
 
-static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+static __always_inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, 
struct interrupt_nmi_state *state)
 {
 #ifdef CONFIG_PPC64
state->irq_soft_mask = local_paca->irq_soft_mask;
@@ -340,7 +340,7 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
nmi_enter();
 }
 
-static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+static __always_inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, 
struct interrupt_nmi_state *state)
 {
if (mfmsr() & MSR_DR) {
// nmi_exit if relocations are on
-- 
2.43.0



[PATCH 1/3] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2023-11-26 Thread Rohan McLure
Prior to this patch, data races are detectable by KCSAN of the following
forms:

[1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
or otherwise outside of a critical section
[2] Interrupted critical sections, where the interrupt will itself
acquire a lock

In case [1], calling context does not need an mmiowb() call to be
issued, otherwise it would do so itself. Such calls to
mmiowb_set_pending() are either idempotent or no-ops.

In case [2], irrespective of when the interrupt occurs, the interrupt
will acquire and release its locks prior to its return, nesting_count
will continue balanced. In the worst case, the interrupted critical
section during a mmiowb_spin_unlock() call observes an mmiowb to be
pending and afterward is interrupted, leading to an extraneous call to
mmiowb(). This data race is clearly innocuous.

Resolve KCSAN warnings of type [1] by means of READ_ONCE, WRITE_ONCE.
As increments and decrements to nesting_count are balanced by interrupt
contexts, resolve type [2] warnings by simply revoking instrumentation,
with data_race() rather than READ_ONCE() and WRITE_ONCE(), the memory
consistency semantics of plain-accesses will still lead to correct
behaviour.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
Reported-by: Gautam Menghani 
Tested-by: Gautam Menghani 
Acked-by: Arnd Bergmann 
---
Previously discussed here:
https://lore.kernel.org/linuxppc-dev/20230510033117.1395895-4-rmcl...@linux.ibm.com/
But pushed back due to affecting other architectures. Reissuing, to
linuxppc-dev, as it does not enact a functional change.
---
 include/asm-generic/mmiowb.h | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..f8c7c8a84e9e 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -37,25 +37,28 @@ static inline void mmiowb_set_pending(void)
struct mmiowb_state *ms = __mmiowb_state();
 
if (likely(ms->nesting_count))
-   ms->mmiowb_pending = ms->nesting_count;
+   WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
 }
 
 static inline void mmiowb_spin_lock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
-   ms->nesting_count++;
+
+   /* Increment need not be atomic. Nestedness is balanced over 
interrupts. */
+   data_race(ms->nesting_count++);
 }
 
 static inline void mmiowb_spin_unlock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
+   u16 pending = READ_ONCE(ms->mmiowb_pending);
 
-   if (unlikely(ms->mmiowb_pending)) {
-   ms->mmiowb_pending = 0;
+   WRITE_ONCE(ms->mmiowb_pending, 0);
+   if (unlikely(pending))
mmiowb();
-   }
 
-   ms->nesting_count--;
+   /* Decrement need not be atomic. Nestedness is balanced over 
interrupts. */
+   data_race(ms->nesting_count--);
 }
 #else
 #define mmiowb_set_pending()   do { } while (0)
-- 
2.43.0



Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-11-26 Thread Hari Bathini

Hi Pingfan, Michael,

On 17/10/23 4:03 pm, Hari Bathini wrote:



On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 



On second thoughts, probably better off with no impact for
bootcpu < nr_cpu_ids case and changing only two cores logical
numbering otherwise. Something like the below (Please share
your thoughts):

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..78a8312aa8c4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
 unsigned int boot_cpu_node_count __ro_after_init;
 #endif
 static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
 static int __initdata boot_cpu_count;
+#endif

 static int __init early_parse_mem(char *p)
 {
@@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned 
long node,

fdt_boot_cpuid_phys(initial_boot_params)) {
found = boot_cpu_count;
found_thread = i;
+   /*
+* Map boot-cpu logical id into the range
+* of [0, thread_per_core) if it can't be
+* accommodated within nr_cpu_ids.
+*/
+   if (i != boot_cpu_count && boot_cpu_count >= 
nr_cpu_ids) {
+   boot_cpuid = i;
+   DBG("Logical CPU number for boot CPU changed from %d 
to %d\n",
+   boot_cpu_count, i);
+   } else {
+   boot_cpuid = boot_cpu_count;
+   }
+
+   /* Ensure boot thread is acconted for in nr_cpu_ids */
+   if (boot_cpuid >= nr_cpu_ids) {
+   set_nr_cpu_ids(boot_cpuid + 1);
+   DBG("Adjusted nr_cpu_ids to %u, to include boot 
CPU.\n",
+   nr_cpu_ids);
+   }
}
 #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned 
long node,

if (found < 0)
return 0;

-   DBG("boot cpu: logical %d physical %d\n", found,
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;

boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c

index b7b733474b60..f7179525c774 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int tpc)

 u32 *cpu_to_phys_id = NULL;

+struct interrupt_server_node {
+   boolavail;
+   int len;
+   __be32 intserv[];
+};
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *  cpu_possible_mask
@@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL;
  */
 void __init smp_setup_cpu_maps(void)
 {
+   struct interrupt_server_node *core0_node = NULL, *bt_node = NULL;
+   int orig_boot_cpu = -1, orig_boot_thread = -1;
+   bool found_boot_cpu = false;
struct device_node *dn;
-   int cpu = 0;
int nthreads = 1;
+   int cpu = 0;
+   int j, len;

DBG("smp_setup_cpu_maps()\n");

@@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void)
  __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32));

for_each_node_by_type(dn, "cpu") {
+   bool 

[PATCH] powerpc/powernv/pci: Do setup dev PE in pnv_pci_enable_device_hook

2023-11-26 Thread Luming Yu
after hot remove a pcie deivce with pci_dn having pnp_php driver attached,
pci rescan with echo 1 > /sys/bus/pci/rescan could fail with error
message like:
pci 0020:0e:00.0: BAR 0: assigned [mem 0x3fe80182-0x3fe80182
64bit]
nvme nvme1: pci function 0020:0e:00.0
nvme 0020:0e:00.0 pci_enable_device() blocked, no PE assigned.

It appears that the pci_dn object is reused with only pe_number
clobbered in the case. And a simple call to pnv_ioda_setup_dev_PE should
get PE number back and solve the problem.

Signed-off-by: Luming Yu 
---
 arch/powerpc/platforms/powernv/pci-ioda.c |  11 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 215 --
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.h |  46 
 include/soc/arc/aux.h |  59 -
 4 files changed, 9 insertions(+), 322 deletions(-)
 delete mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
 delete mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.h
 delete mode 100644 include/soc/arc/aux.h

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 28fac4770073..9d7add79ee3d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2325,11 +2325,18 @@ static resource_size_t pnv_pci_default_alignment(void)
 static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 {
struct pci_dn *pdn;
+   struct pnv_ioda_pe *pe;
 
pdn = pci_get_pdn(dev);
-   if (!pdn || pdn->pe_number == IODA_INVALID_PE) {
-   pci_err(dev, "pci_enable_device() blocked, no PE assigned.\n");
+   if (!pdn)
return false;
+
+   if (pdn->pe_number == IODA_INVALID_PE) {
+   pe = pnv_ioda_setup_dev_PE(dev);
+   if (!pe) {
+   pci_err(dev, "pci_enable_device() blocked, no PE 
assigned.\n");
+   return false;
+   }
}
 
return true;


Re: linux-next: build failure after merge of the mm tree

2023-11-26 Thread Stephen Rothwell
Hi all,

Just cc'ing the PowerPC guys to see if my fix is sensible.

On Mon, 27 Nov 2023 13:28:09 +1100 Stephen Rothwell  
wrote:
>
> After merging the mm tree, today's linux-next build (powerpc64
> allnoconfig) failed like this:
> 
> arch/powerpc/mm/book3s64/pgtable.c:557:5: error: no previous prototype for 
> 'pmd_move_must_withdraw' [-Werror=missing-prototypes]
>   557 | int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
>   | ^~
> cc1: all warnings being treated as errors
> 
> Caused by commit
> 
>   c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
> 
> I have added the following patch for today (which could be applied to
> the mm or powerpc trees):
> 
> From 194805b44c11b4c0aa28bdcdc0bb0d82acef394c Mon Sep 17 00:00:00 2001
> From: Stephen Rothwell 
> Date: Mon, 27 Nov 2023 13:08:57 +1100
> Subject: [PATCH] powerpc: pmd_move_must_withdraw() is only needed for
>  CONFIG_TRANSPARENT_HUGEPAGE
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  arch/powerpc/mm/book3s64/pgtable.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index be229290a6a7..3438ab72c346 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -542,6 +542,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
> unsigned long addr,
>   set_pte_at(vma->vm_mm, addr, ptep, pte);
>  }
>  
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  /*
>   * For hash translation mode, we use the deposited table to store hash slot
>   * information and they are stored at PTRS_PER_PMD offset from related pmd
> @@ -563,6 +564,7 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
>  
>   return true;
>  }
> +#endif
>  
>  /*
>   * Does the CPU support tlbie?
> -- 
> 2.40.1

-- 
Cheers,
Stephen Rothwell


pgpJOpHFq0Tyu.pgp
Description: OpenPGP digital signature


[PATCH] powerpc/powernv: Add a null pointer check in opal_event_init

2023-11-26 Thread Kunwu Chan
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fixes: 2717a33d6074 ("powerpc/opal-irqchip: Use interrupt names if present")
Signed-off-by: Kunwu Chan 
---
 arch/powerpc/platforms/powernv/opal-irqchip.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c 
b/arch/powerpc/platforms/powernv/opal-irqchip.c
index f9a7001dacb7..56a1f7ce78d2 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -275,6 +275,8 @@ int __init opal_event_init(void)
else
name = kasprintf(GFP_KERNEL, "opal");
 
+   if (!name)
+   continue;
/* Install interrupt handler */
rc = request_irq(r->start, opal_interrupt, r->flags & 
IRQD_TRIGGER_MASK,
 name, NULL);
-- 
2.34.1



linux-next: duplicate patch in the tty tree

2023-11-26 Thread Stephen Rothwell
Hi all,

The following commit is also in the powerpc tree as a different commit
(but the same patch):

  aa46b225ebbf ("tty: hvc: hvc_opal: Convert to platform remove callback 
returning void")

This is commit

  f99c0e0d0859 ("tty: hvc: hvc_opal: Convert to platform remove callback 
returning void")

in the powerpc tree.

-- 
Cheers,
Stephen Rothwell


pgpVSW535scZV.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the tty tree with the powerpc tree

2023-11-26 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tty tree got a conflict in:

  drivers/tty/hvc/hvc_console.h

between commit:

  c9e38dc90e1c ("tty: hvc: Make hvc_remove() return no value")

from the powerpc tree and commit:

  7f30c19caf94 ("tty: hvc: Make hvc_remove() return no value")

from the tty tree.

These are slightly different versions of the same patch.

I fixed it up (I used the former version (removed "extern" from a function
declaration)) and can carry the fix as necessary. This is now fixed as
far as linux-next is concerned, but any non trivial conflicts should be
mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgprYaHQi9oYD.pgp
Description: OpenPGP digital signature


Re: [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression

2023-11-26 Thread Linus Torvalds
On Sun, 26 Nov 2023 at 12:23, Linus Torvalds
 wrote:
>
> IOW, I might have messed up some "trivial cleanup" when prepping for
> sending it out...

Bah. Famous last words. One of the "trivial cleanups" made the code
more "obvious" by renaming the nospec mask as just "mask".

And that trivial rename broke that patch *entirely*, because now that
name shadowed the "fmode_t" mask argument.

Don't even ask how long it took me to go from "I *tested* this,
dammit, now it doesn't work at all" to "Oh God, I'm so stupid".

So that nobody else would waste any time on this, attached is a new
attempt. This time actually tested *after* the changes.

  Linus
From 45f70b5413a654d20ead410c533ec1d604bdb1e2 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Sun, 26 Nov 2023 12:24:38 -0800
Subject: [PATCH] Improve __fget_files_rcu() code generation (and thus __fget_light())

Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a
performance regression as reported by the kernel test robot.

The __fget_light() function is one of those critical ones for some
loads, and the code generation was unnecessarily impacted.  Let's just
write that function to better.

Reported-by: kernel test robot 
Cc: Christian Brauner 
Cc: Jann Horn 
Cc: Mateusz Guzik 
Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.s...@intel.com
Signed-off-by: Linus Torvalds 
---
 fs/file.c   | 48 +
 include/linux/fdtable.h | 15 -
 2 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 5fb0b146e79e..608b4802c214 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		struct file *file;
 		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 		struct file __rcu **fdentry;
+		unsigned long nospec;
 
-		if (unlikely(fd >= fdt->max_fds))
+		/* Mask is a 0 for invalid fd's, ~0 for valid ones */
+		nospec = array_index_mask_nospec(fd, fdt->max_fds);
+
+		/* fdentry points to the 'fd' offset, or fdt->fd[0] */
+		fdentry = fdt->fd + (fd&nospec);
+
+		/* Do the load, then mask any invalid result */
+		file = rcu_dereference_raw(*fdentry);
+		file = (void *)(nospec & (unsigned long)file);
+
+		if (unlikely(!file))
 			return NULL;
 
-		fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+		/*
+		 * Ok, we have a file pointer that was valid at
+		 * some point, but it might have become stale since.
+		 *
+		 * We need to confirm it by incrementing the refcount
+		 * and then check the lookup again.
+		 *
+		 * atomic_long_inc_not_zero() gives us a full memory
+		 * barrier. We only really need an 'acquire' one to
+		 * protect the loads below, but we don't have that.
+		 */
+		if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+			continue;
 
 		/*
-		 * Ok, we have a file pointer. However, because we do
-		 * this all locklessly under RCU, we may be racing with
-		 * that file being closed.
-		 *
 		 * Such a race can take two forms:
 		 *
 		 *  (a) the file ref already went down to zero and the
 		 *  file hasn't been reused yet or the file count
 		 *  isn't zero but the file has already been reused.
-		 */
-		file = __get_file_rcu(fdentry);
-		if (unlikely(!file))
-			return NULL;
-
-		if (unlikely(IS_ERR(file)))
-			continue;
-
-		/*
+		 *
 		 *  (b) the file table entry has changed under us.
 		 *   Note that we don't need to re-check the 'fdt->fd'
 		 *   pointer having changed, because it always goes
@@ -991,7 +1002,8 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		 *
 		 * If so, we need to put our ref and try again.
 		 */
-		if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
+		if (unlikely(file != rcu_dereference_raw(*fdentry)) ||
+		unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
 			fput(file);
 			continue;
 		}
@@ -1128,13 +1140,13 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask)
 	 * atomic_read_acquire() pairs with atomic_dec_and_test() in
 	 * put_files_struct().
 	 */
-	if (atomic_read_acquire(&files->count) == 1) {
+	if (likely(atomic_read_acquire(&files->count) == 1)) {
 		file = files_lookup_fd_raw(files, fd);
 		if (!file || unlikely(file->f_mode & mask))
 			return 0;
 		return (unsigned long)file;
 	} else {
-		file = __fget(fd, mask);
+		file = __fget_files(files, fd, mask);
 		if (!file)
 			return 0;
 		return FDPUT_FPUT | (unsigned long)file;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index bc4c3287a65e..80bd7789bab1 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -83,12 +83,17 @@ struct dentry;
 static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	unsigned long mask = array_index_mask_nospec(fd, fdt->max_fds);
+	struct file *needs_masking;
 
-	if (fd < fdt->max_fds) {
-		fd = array_i

Re: [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression

2023-11-26 Thread Linus Torvalds
On Sun, 19 Nov 2023 at 23:11, kernel test robot  wrote:
>
> kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops 
> on:
>
> commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to 
> SLAB_TYPESAFE_BY_RCU")

Ok, so __fget_light() is one of our more performance-critical things,
and that commit ends up making it call a rather more expensive version
in __get_file_rcu(), so we have:

>  30.90 ą  4% -20.6   10.35 ą  2%  
> perf-profile.self.cycles-pp.__fget_light
>   0.00   +26.5   26.48
> perf-profile.self.cycles-pp.__get_file_rcu

and that "20% decrease balanced by 26% increase elsewhere" then
directly causes the ~3% regression.

I took a look at the code generation, and honestly, I think we're
better off just making __fget_files_rcu() have special logic for this
all, and not use __get_file_rcu().

The 'fd' case really is special because we need to do that
non-speculative pointer access.

Because it turns out that we already have to use array_index_nospec()
to safely generate that pointer to the fd entry, and once you have to
do that "use non-speculative accesses to generate a safe pointer", you
might as well just go whole hog.

So this takes a different approach, and uses the
array_index_mask_nospec() thing that we have exactly to generate that
non-speculative mask for these things:

/* Mask is a 0 for invalid fd's, ~0 for valid ones */
mask = array_index_mask_nospec(fd, fdt->max_fds);

and then it does something you can consider either horribly clever, or
horribly ugly (this first part is basically just
array_index_nospec()):

/* fdentry points to the 'fd' offset, or fdt->fd[0] */
fdentry = fdt->fd + (fd&mask);

and then we can just *unconditionally* do the load - but since we
might be loading fd[0] for an invalid case, we need to mask the result
too:

/* Do the load, then mask any invalid result */
file = rcu_dereference_raw(*fdentry);
file = (void *)(mask & (unsigned long)file);

but now we have done everything without any conditionals, and the only
conditional is now "did we load NULL" - which includes that "we masked
the bad value".

Then we just do that atomic_long_inc_not_zero() on the file, and
re-check the pointer chain we used.

I made files_lookup_fd_raw() do the same thing.

The end result is much nicer code generation at least from my quick
check. And I assume the regression will be gone, and hopefully even
turned into an improvement since this is so hot.

Comments? I also looked at that odd OPTIMIZER_HIDE_VAR() that
__get_file_rcu() does, and I don't get it. Both things come from
volatile accesses, I don't see the point of those games, but I also
didn't care, since it's no longer in a critical code path.

Christian?

NOTE! This patch is not well tested. I verified an earlier version of
this, but have been playing with it since, so caveat emptor.

IOW, I might have messed up some "trivial cleanup" when prepping for
sending it out...

  Linus
 fs/file.c   | 48 ++--
 include/linux/fdtable.h | 15 ++-
 2 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 5fb0b146e79e..c74a6e8687d9 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		struct file *file;
 		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 		struct file __rcu **fdentry;
+		unsigned long mask;
 
-		if (unlikely(fd >= fdt->max_fds))
+		/* Mask is a 0 for invalid fd's, ~0 for valid ones */
+		mask = array_index_mask_nospec(fd, fdt->max_fds);
+
+		/* fdentry points to the 'fd' offset, or fdt->fd[0] */
+		fdentry = fdt->fd + (fd&mask);
+
+		/* Do the load, then mask any invalid result */
+		file = rcu_dereference_raw(*fdentry);
+		file = (void *)(mask & (unsigned long)file);
+
+		if (unlikely(!file))
 			return NULL;
 
-		fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+		/*
+		 * Ok, we have a file pointer that was valid at
+		 * some point, but it might have become stale since.
+		 *
+		 * We need to confirm it by incrementing the refcount
+		 * and then check the lookup again.
+		 *
+		 * atomic_long_inc_not_zero() gives us a full memory
+		 * barrier. We only really need an 'acquire' one to
+		 * protect the loads below, but we don't have that.
+		 */
+		if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+			continue;
 
 		/*
-		 * Ok, we have a file pointer. However, because we do
-		 * this all locklessly under RCU, we may be racing with
-		 * that file being closed.
-		 *
 		 * Such a race can take two forms:
 		 *
 		 *  (a) the file ref already went down to zero and the
 		 *  file hasn't been reused yet or the file count
 		 *  isn't zero but the file has already been reused.
-		 */
-		file = __get_file_rcu(fdentry);
-		if (unlikely(!file))
-			return NULL;
-
-		if (unlikely(IS

Re: [PATCH v2 03/10] i2c: pasemi: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON

2023-11-26 Thread Andi Shyti
Hi Heiner,

On Fri, Nov 24, 2023 at 11:16:12AM +0100, Heiner Kallweit wrote:
> After removal of the legacy eeprom driver the only remaining I2C
> client device driver supporting I2C_CLASS_SPD is jc42. Because this
> driver also supports I2C_CLASS_HWMON, adapters don't have to
> declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
> It's one step towards getting rid of I2C_CLASS_SPD mid-term.
> 
> Series was created supported by Coccinelle and its splitpatch.
> 
> Signed-off-by: Heiner Kallweit 

Acked-by: Andi Shyti 

Thanks,
Andi


Re: [PATCH v2 4/7] kexec_file, arm64: print out debugging message if required

2023-11-26 Thread Baoquan He
On 11/26/23 at 05:26am, kernel test robot wrote:
> Hi Baoquan,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on arm64/for-next/core]
> [also build test ERROR on tip/x86/core powerpc/next powerpc/fixes 
> linus/master v6.7-rc2 next-20231124]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/kexec_file-add-kexec_file-flag-to-control-debug-printing/20231124-113942
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
> for-next/core
> patch link:
> https://lore.kernel.org/r/20231124033642.520686-5-bhe%40redhat.com
> patch subject: [PATCH v2 4/7] kexec_file, arm64: print out debugging message 
> if required
> config: arm64-randconfig-001-20231126 
> (https://download.01.org/0day-ci/archive/20231126/202311260548.1haxcdne-...@intel.com/config)
> compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git 
> f28c006a5895fc0e329fe15fead81e37457cb1d1)
> reproduce (this is a W=1 build): 
> (https://download.01.org/0day-ci/archive/20231126/202311260548.1haxcdne-...@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version 
> of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot 
> | Closes: 
> https://lore.kernel.org/oe-kbuild-all/202311260548.1haxcdne-...@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
> >> arch/arm64/kernel/machine_kexec.c:35:2: error: implicit declaration of 
> >> function 'kexec_dprintk' is invalid in C99 
> >> [-Werror,-Wimplicit-function-declaration]
>kexec_dprintk("%s:%d:\n", func, line);
>^
>1 error generated.

Thanks for reporting. It has below kexec related config items, whereas
the kexec_drpintk() is only defined in CONFIG_KEXEC_FILE ifdeffery
scope, moving it to CONFIG_KEXEC_CORE iddeffery scope in 
can fix it as below draft code. Will update patch 1 to include the code
change.

===
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
# CONFIG_KEXEC_FILE is not set
CONFIG_CRASH_DUMP=y
===

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 66997efe36f1..b457b0d70f3f 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -265,13 +265,6 @@ arch_kexec_apply_relocations(struct purgatory_info *pi, 
Elf_Shdr *section,
 }
 #endif
 
-extern bool kexec_file_dbg_print;
-
-#define kexec_dprintk(fmt, ...)\
-   printk("%s" fmt,\
-  kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
-  ##__VA_ARGS__)
-
 #endif /* CONFIG_KEXEC_FILE */
 
 #ifdef CONFIG_KEXEC_ELF
@@ -508,6 +501,13 @@ static inline int crash_hotplug_memory_support(void) { 
return 0; }
 static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
 #endif
 
+extern bool kexec_file_dbg_print;
+
+#define kexec_dprintk(fmt, ...)\
+   printk("%s" fmt,\
+  kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
+  ##__VA_ARGS__)
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index be5642a4ec49..bddba29a1557 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -52,6 +52,8 @@ atomic_t __kexec_lock = ATOMIC_INIT(0);
 /* Flag to indicate we are going to kexec a new kernel */
 bool kexec_in_progress = false;
 
+bool kexec_file_dbg_print;
+
 int kexec_should_crash(struct task_struct *p)
 {
/*
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 7ae1b0901aa4..8f87644b4eec 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -38,8 +38,6 @@ void set_kexec_sig_enforced(void)
 }
 #endif
 
-bool kexec_file_dbg_print;
-
 static int kexec_calculate_store_digests(struct kimage *image);
 
 /* Maximum size in bytes for kernel/initrd files. */



[PATCH] powerpc/powernv: Fix null pointer dereference in opal_powercap_init

2023-11-26 Thread Kunwu Chan
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fixes: b9ef7b4b867f ("powerpc: Convert to using %pOFn instead of 
device_node.name")
Signed-off-by: Kunwu Chan 
---
 arch/powerpc/platforms/powernv/opal-powercap.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c 
b/arch/powerpc/platforms/powernv/opal-powercap.c
index 7bfe4cbeb35a..ea917266aa17 100644
--- a/arch/powerpc/platforms/powernv/opal-powercap.c
+++ b/arch/powerpc/platforms/powernv/opal-powercap.c
@@ -196,6 +196,12 @@ void __init opal_powercap_init(void)
 
j = 0;
pcaps[i].pg.name = kasprintf(GFP_KERNEL, "%pOFn", node);
+   if (!pcaps[i].pg.name) {
+   kfree(pcaps[i].pattrs);
+   kfree(pcaps[i].pg.attrs);
+   goto out_pcaps_pattrs;
+   }
+
if (has_min) {
powercap_add_attr(min, "powercap-min",
  &pcaps[i].pattrs[j]);
-- 
2.34.1



[PATCH] powerpc/imc-pmu: Fix null pointer dereference in update_events_in_group

2023-11-26 Thread Kunwu Chan
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Kunwu Chan 
---
 arch/powerpc/perf/imc-pmu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 5d12ca386c1f..8664a7d297ad 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -299,6 +299,8 @@ static int update_events_in_group(struct device_node *node, 
struct imc_pmu *pmu)
attr_group->attrs = attrs;
do {
ev_val_str = kasprintf(GFP_KERNEL, "event=0x%x", 
pmu->events[i].value);
+   if (!ev_val_str)
+   continue;
dev_str = device_str_attr_create(pmu->events[i].name, 
ev_val_str);
if (!dev_str)
continue;
@@ -306,6 +308,8 @@ static int update_events_in_group(struct device_node *node, 
struct imc_pmu *pmu)
attrs[j++] = dev_str;
if (pmu->events[i].scale) {
ev_scale_str = kasprintf(GFP_KERNEL, "%s.scale", 
pmu->events[i].name);
+   if (!ev_scale_str)
+   continue;
dev_str = device_str_attr_create(ev_scale_str, 
pmu->events[i].scale);
if (!dev_str)
continue;
@@ -315,6 +319,8 @@ static int update_events_in_group(struct device_node *node, 
struct imc_pmu *pmu)
 
if (pmu->events[i].unit) {
ev_unit_str = kasprintf(GFP_KERNEL, "%s.unit", 
pmu->events[i].name);
+   if (!ev_unit_str)
+   continue;
dev_str = device_str_attr_create(ev_unit_str, 
pmu->events[i].unit);
if (!dev_str)
continue;
-- 
2.34.1