Re: [RFC PATCH v2 00/20] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)

2024-05-23 Thread Michael Ellerman
Hi Peter,

Peter Xu  writes:
> On Fri, May 17, 2024 at 08:59:54PM +0200, Christophe Leroy wrote:
>> This is the continuation of the RFC v1 series "Reimplement huge pages
>> without hugepd on powerpc 8xx". It now get rid of hugepd completely
>> after handling also e500 and book3s/64
>> 
>> Unlike most architectures, powerpc 8xx HW requires a two-level
>> pagetable topology for all page sizes. So a leaf PMD-contig approach
>> is not feasible as such.

>
> Great to see this series, thanks again Christophe.
>
> I requested for help on the lsfmm hugetlb unification session, but
> unfortunately I don't think there were Power people around.. I'd like to
> request help from Power developers again here on the list: it will be very
> appreciated if you can help have a look at this series.

Christophe is a powerpc developer :)

I'll help where I can, but I don't know the hugepd code that well, I've
never really worked on it before. Nick will hopefully also be able to
help, he at least knows mm better than me, but he also has other work.

Hopefully we can make this series work, and replace hugepd. But if we
can't make that work then there is the possibility of just dropping
support for 16M/16G pages with HPT/4K pages.

> It's a direct dependent work to the hugetlb refactoring that we'll be
> working on, while it looks like the hugetlb refactoring is something the
> community as a whole would like to see in the near future.
>
> We don't want to add more Power-only CONFIG_ARCH_HAS_HUGEPD checks for
> hugetlb in any new code.

Yes I understand.

cheers


Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

2024-05-23 Thread Guenter Roeck
Hi,

On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
> 
> Acked-by: Alex Deucher 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Samuel Holland 

With this patch in the mainline kernel, I get the following build error
when trying to build powerpc:ppc32_allmodconfig.

powerpc64-linux-ld: 
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard 
float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses 
soft float
powerpc64-linux-ld: failed to merge target specific data of file 
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o

[ repeated many times ]

The problem is no longer seen after reverting this patch.

Guenter


Re: [PATCH v2 4/6] KVM: Delete the now unused kvm_arch_sched_in()

2024-05-23 Thread maobibo




On 2024/5/22 上午9:40, Sean Christopherson wrote:

Delete kvm_arch_sched_in() now that all implementations are nops.

Signed-off-by: Sean Christopherson 
---
  arch/arm64/include/asm/kvm_host.h | 1 -
  arch/loongarch/include/asm/kvm_host.h | 1 -
  arch/mips/include/asm/kvm_host.h  | 1 -
  arch/powerpc/include/asm/kvm_host.h   | 1 -
  arch/riscv/include/asm/kvm_host.h | 1 -
  arch/s390/include/asm/kvm_host.h  | 1 -
  arch/x86/kvm/pmu.c| 6 +++---
  arch/x86/kvm/x86.c| 5 -
  include/linux/kvm_host.h  | 2 --
  virt/kvm/kvm_main.c   | 1 -
  10 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 8170c04fde91..615e7a2e5590 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1225,7 +1225,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
  }
  
  static inline void kvm_arch_sync_events(struct kvm *kvm) {}

-static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
  
  void kvm_arm_init_debug(void);

  void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index c87b6ea0ec47..4162a252cdf6 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -261,7 +261,6 @@ static inline bool kvm_is_ifetch_fault(struct kvm_vcpu_arch 
*arch)
  static inline void kvm_arch_hardware_unsetup(void) {}
  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
-static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 179f320cc231..6743a57c1ab4 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -890,7 +890,6 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
  static inline void kvm_arch_free_memslot(struct kvm *kvm,
 struct kvm_memory_slot *slot) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
-static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
  
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h

index 8abac532146e..c4fb6a27fb92 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -897,7 +897,6 @@ struct kvm_vcpu_arch {
  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
  static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
-static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
  
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h

index d96281278586..dd77c2db6819 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -286,7 +286,6 @@ struct kvm_vcpu_arch {
  };
  
  static inline void kvm_arch_sync_events(struct kvm *kvm) {}

-static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
  
  #define KVM_RISCV_GSTAGE_TLB_MIN_ORDER		12
  
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h

index 95990461888f..e9fcaf4607a6 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -1045,7 +1045,6 @@ extern int kvm_s390_gisc_register(struct kvm *kvm, u32 
gisc);
  extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
  
  static inline void kvm_arch_sync_events(struct kvm *kvm) {}

-static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_free_memslot(struct kvm *kvm,
 struct kvm_memory_slot *slot) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a593b03c9aed..f9149c9fc275 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -521,9 +521,9 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
}
  
  	/*

-* Unused perf_events are only released if the corresponding MSRs
-* weren't accessed during the last vCPU time slice. kvm_arch_sched_in
-* triggers KVM_REQ_PMU if cleanup is needed.
+* Release unused perf_events 

Re: [Patch v2] mm/memblock: discard .text/.data if CONFIG_ARCH_KEEP_MEMBLOCK not set

2024-05-23 Thread Wei Yang
On Tue, May 21, 2024 at 10:21:52AM +0300, Mike Rapoport wrote:
>Hi,
>
>On Fri, May 10, 2024 at 02:04:22AM +, Wei Yang wrote:
>> When CONFIG_ARCH_KEEP_MEMBLOCK not set, we expect to discard related
>> code and data. But it doesn't until CONFIG_MEMORY_HOTPLUG not set
>> neither.
>> 
>> This patch puts memblock's .text/.data into its own section, so that it
>> only depends on CONFIG_ARCH_KEEP_MEMBLOCK to discard related code and
>> data.
>> 
>> After this, from the log message in mem_init_print_info(), init size
>> increase from 2420K to 2432K on arch x86.
>> 
>> Signed-off-by: Wei Yang 
>> 
>> ---
>> v2: fix orphan section for powerpc
>> ---
>>  arch/powerpc/kernel/vmlinux.lds.S |  1 +
>>  include/asm-generic/vmlinux.lds.h | 14 +-
>>  include/linux/memblock.h  |  8 
>>  3 files changed, 18 insertions(+), 5 deletions(-)
>>  
>> +#define __init_memblock__section(".mbinit.text") __cold notrace \
>> +  __latent_entropy
>> +#define __initdata_memblock__section(".mbinit.data")
>> +
>
>The new .mbinit.* sections should be added to scripts/mod/modpost.c
>alongside .meminit.* sections and then I expect modpost to report a bunch
>of section mismatches because many memblock functions are called on memory
>hotplug even on architectures that don't select ARCH_KEEP_MEMBLOCK.
>

I tried to add some code in modpost.c, "make all" looks good.

May I ask how can I trigger the "mismatch" warning?

BTW, if ARCH_KEEP_MEMBLOCK unset, we would discard memblock meta-data. If
hotplug would call memblock function, it would be dangerous?

The additional code I used is like below.

---
 scripts/mod/modpost.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 937294ff164f..c837e2882904 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -777,14 +777,14 @@ static void check_section(const char *modname, struct 
elf_info *elf,
 
 #define ALL_INIT_DATA_SECTIONS \
".init.setup", ".init.rodata", ".meminit.rodata", \
-   ".init.data", ".meminit.data"
+   ".init.data", ".meminit.data", "mbinit.data"
 
 #define ALL_PCI_INIT_SECTIONS  \
".pci_fixup_early", ".pci_fixup_header", ".pci_fixup_final", \
".pci_fixup_enable", ".pci_fixup_resume", \
".pci_fixup_resume_early", ".pci_fixup_suspend"
 
-#define ALL_XXXINIT_SECTIONS ".meminit.*"
+#define ALL_XXXINIT_SECTIONS ".meminit.*", "mbinit.*"
 
 #define ALL_INIT_SECTIONS INIT_SECTIONS, ALL_XXXINIT_SECTIONS
 #define ALL_EXIT_SECTIONS ".exit.*"
@@ -799,7 +799,7 @@ static void check_section(const char *modname, struct 
elf_info *elf,
 
 #define INIT_SECTIONS  ".init.*"
 
-#define ALL_TEXT_SECTIONS  ".init.text", ".meminit.text", ".exit.text", \
+#define ALL_TEXT_SECTIONS  ".init.text", ".meminit.text", ".mbinit.text", 
".exit.text", \
TEXT_SECTIONS, OTHER_TEXT_SECTIONS
 
 enum mismatch {
-- 
2.34.1


>>  #ifndef CONFIG_ARCH_KEEP_MEMBLOCK
>> -#define __init_memblock __meminit
>> -#define __initdata_memblock __meminitdata
>>  void memblock_discard(void);
>>  #else
>> -#define __init_memblock
>> -#define __initdata_memblock
>>  static inline void memblock_discard(void) {}
>>  #endif
>>  
>> -- 
>> 2.34.1
>> 
>> 
>
>-- 
>Sincerely yours,
>Mike.

-- 
Wei Yang
Help you, Help me


Re: [PATCH] PowerPC: Replace kretprobe with rethook

2024-05-23 Thread Google
On Thu, 16 May 2024 09:46:46 -0400
Abhishek Dubey  wrote:

> This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes:
> Replace kretprobe with rethook on x86") to Power.
> 
> Replaces the kretprobe code with rethook on Power. With this patch,
> kretprobe on Power uses the rethook instead of kretprobe specific
> trampoline code.
> 
> Reference to other archs:
> commit b57c2f124098 ("riscv: add riscv rethook implementation")
> commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook")
> 

Hi Abhishek,

Thanks for applying rethook, it looks good. I have comments below.

> diff --git a/arch/powerpc/kernel/stacktrace.c 
> b/arch/powerpc/kernel/stacktrace.c
> index e6a958a5da27..6de912cf198c 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -133,14 +134,13 @@ int __no_sanitize_address 
> arch_stack_walk_reliable(stack_trace_consume_fn consum
>* arch-dependent code, they are generic.
>*/
>   ip = ftrace_graph_ret_addr(task, _idx, ip, stack);
> -#ifdef CONFIG_KPROBES

This still needs to check CONFIG_RETHOOK.

> +
>   /*
>* Mark stacktraces with kretprobed functions on them
>* as unreliable.
>*/
> - if (ip == (unsigned long)__kretprobe_trampoline)
> + if (ip == (unsigned long)arch_rethook_trampoline)
>   return -EINVAL;
> -#endif
>  
>   if (!consume_entry(cookie, ip))
>   return -EINVAL;
> diff --git a/include/linux/rethook.h b/include/linux/rethook.h
> index ba60962805f6..9f2fb6abdc60 100644
> --- a/include/linux/rethook.h
> +++ b/include/linux/rethook.h
> @@ -65,7 +65,6 @@ void rethook_recycle(struct rethook_node *node);
>  void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool 
> mcount);
>  unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long 
> frame,
>   struct llist_node **cur);
> -

nit: removed unrelated line.

>  /* Arch dependent code must implement arch_* and trampoline code */
>  void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs, 
> bool mcount);
>  void arch_rethook_trampoline(void);
> -- 
> 2.44.0
> 

Thank you,

-- 
Masami Hiramatsu (Google) 


Re: [PATCH v2 3/6] KVM: x86: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()

2024-05-23 Thread Huang, Kai




On 22/05/2024 1:40 pm, Sean Christopherson wrote:

Fold the guts of kvm_arch_sched_in() into kvm_arch_vcpu_load(), keying
off the recently added kvm_vcpu.scheduled_out as appropriate.

Note, there is a very slight functional change, as PLE shrink updates will
now happen after blasting WBINVD, but that is quite uninteresting as the
two operations do not interact in any way.

Signed-off-by: Sean Christopherson 
---


Acked-by: Kai Huang 

[...]


@@ -1548,6 +1548,9 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct svm_cpu_data *sd = per_cpu_ptr(_data, cpu);
  
+	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))

+   shrink_ple_window(vcpu);
+


[...]


@@ -1517,6 +1517,9 @@ void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
  {
struct vcpu_vmx *vmx = to_vmx(vcpu);
  
+	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))

+   shrink_ple_window(vcpu);
+


Nit:  Perhaps we need a kvm_x86_ops::shrink_ple_window()?  :-)


Re: [PATCH v2 0/6] KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()

2024-05-23 Thread Huang, Kai




On 22/05/2024 1:40 pm, Sean Christopherson wrote:

Drop kvm_arch_sched_in() and instead add and use kvm_vcpu.scheduled_out
to communicate to kvm_arch_vcpu_load() that the vCPU is being scheduling
back in.



For this series,

Acked-by: Kai Huang 


Re: [PATCH v2 5/6] KVM: x86: Unconditionally set l1tf_flush_l1d during vCPU load

2024-05-23 Thread Huang, Kai




On 22/05/2024 1:40 pm, Sean Christopherson wrote:

Always set l1tf_flush_l1d during kvm_arch_vcpu_load() instead of setting
it only when the vCPU is being scheduled back in.  The flag is processed
only when VM-Enter is imminent, and KVM obviously needs to load the vCPU
before VM-Enter, so attempting to precisely set l1tf_flush_l1d provides no
meaningful value.  I.e. the flag _will_ be set either way, it's simply a
matter of when.


Seems reasonable.



Signed-off-by: Sean Christopherson 
---


Acked-by: Kai Huang 


  arch/x86/kvm/x86.c | 11 +--
  1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59aa772af755..60fea297f91f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5006,12 +5006,11 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
  {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
  
-	if (vcpu->scheduled_out) {

-   vcpu->arch.l1tf_flush_l1d = true;
-   if (pmu->version && unlikely(pmu->event_count)) {
-   pmu->need_cleanup = true;
-   kvm_make_request(KVM_REQ_PMU, vcpu);
-   }
+   vcpu->arch.l1tf_flush_l1d = true;
+
+   if (vcpu->scheduled_out && pmu->version && pmu->event_count) {
+   pmu->need_cleanup = true;
+   kvm_make_request(KVM_REQ_PMU, vcpu);
}


Nit, the unlikely() is lost, but I guess it is OK?


Re: [RFC PATCH v2 00/20] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)

2024-05-23 Thread Peter Xu
On Fri, May 17, 2024 at 08:59:54PM +0200, Christophe Leroy wrote:
> This is the continuation of the RFC v1 series "Reimplement huge pages
> without hugepd on powerpc 8xx". It now get rid of hugepd completely
> after handling also e500 and book3s/64
> 
> Unlike most architectures, powerpc 8xx HW requires a two-level
> pagetable topology for all page sizes. So a leaf PMD-contig approach
> is not feasible as such.
> 
> Possible sizes are 4k, 16k, 512k and 8M.
> 
> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries
> must point to a single entry level-2 page table. Until now that was
> done using hugepd. This series changes it to use standard page tables
> where the entry is replicated 1024 times on each of the two pagetables
> refered by the two associated PMD entries for that 8M page.
> 
> At the moment it has to look into each helper to know if the
> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or
> a lower size. I hope this can me handled by core-mm in the future.
> 
> For e500 and book3s/64 there are less constraints because it is not
> tied to the HW assisted tablewalk like on 8xx, so it is easier to use
> leaf PMDs (and PUDs).
> 
> On e500 the supported page sizes are 4M, 16M, 64M, 256M and 1G. All at
> PMD level on e500/32 and mix of PMD and PUD for e500/64. We encode page
> size with 4 available bits in PTE entries. On e300/32 PGD entries size
> is increases to 64 bits in order to allow leaf-PMD entries because PTE
> are 64 bits on e500.
> 
> On book3s/64 only the hash-4k mode is concerned. It supports 16M pages
> as cont-PMD and 16G pages as cont-PUD. In other modes (radix-4k, radix-6k
> and hash-64k) the sizes match with PMD and PUD sizes so that's just leaf
> entries.
> 
> Christophe Leroy (20):
>   mm: Provide pagesize to pmd_populate()
>   mm: Provide page size to pte_alloc_huge()
>   mm: Provide pmd to pte_leaf_size()
>   mm: Provide mm_struct and address to huge_ptep_get()
>   powerpc/mm: Allow hugepages without hugepd
>   powerpc/8xx: Fix size given to set_huge_pte_at()
>   powerpc/8xx: Rework support for 8M pages using contiguous PTE entries
>   powerpc/8xx: Simplify struct mmu_psize_def
>   powerpc/mm: Remove _PAGE_PSIZE
>   powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries
>   powerpc/mm: Complement huge_pte_alloc() for all non HUGEPD setups
>   powerpc/64e: Remove unneeded #ifdef CONFIG_PPC_E500
>   powerpc/64e: Clean up impossible setups
>   powerpc/e500: Remove enc field from struct mmu_psize_def
>   powerpc/85xx: Switch to 64 bits PGD
>   powerpc/e500: Encode hugepage size in PTE bits
>   powerpc/e500: Use contiguous PMD instead of hugepd
>   powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD
>   powerpc/mm: Remove hugepd leftovers
>   mm: Remove CONFIG_ARCH_HAS_HUGEPD

Great to see this series, thanks again Christophe.

I requested for help on the lsfmm hugetlb unification session, but
unfortunately I don't think there were Power people around.. I'd like to
request help from Power developers again here on the list: it will be very
appreciated if you can help have a look at this series.

It's a direct dependent work to the hugetlb refactoring that we'll be
working on, while it looks like the hugetlb refactoring is something the
community as a whole would like to see in the near future.

We don't want to add more Power-only CONFIG_ARCH_HAS_HUGEPD checks for
hugetlb in any new code.

Currently Oscar offered help on that hugetlb project, and Oscar will start
to work on page_walk API refactoring.  I guess currently the simple way is
we'll work on top of Christophe's series.  Some proper review on this
series will definitely make it clearer on what we should do next.

Thanks,

-- 
Peter Xu



Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-23 Thread Peter Xu
On Thu, May 23, 2024 at 05:08:29AM +0200, Oscar Salvador wrote:
> On Wed, May 22, 2024 at 05:46:09PM -0400, Peter Xu wrote:
> > > Now, ProcessB still has the page mapped, so upon re-accessing it,
> > > it will trigger a new MCE event. memory-failure code will see that this
> > 
> > The question is why accessing that hwpoison entry from ProcB triggers an
> > MCE.  Logically that's a swap entry and it should generate a page fault
> > rather than MCE.  Then in the pgfault hanlder we don't need that encoded
> > pfn as we have vmf->address.
> 
> It would be a swap entry if we reach try_to_umap_one() without trouble.
> Then we have the code that converts it:
> 
>  ...
>  if (PageHWPoison(p))
>  pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
>set_{huge_}pte_at
>  ...
> 
> But maybe we could only do that for ProcA, while ProcB failed to do that,
> which means that for ProcA that is a hwpoisoned-swap-entry, but ProcB still
> has this page mapped as usual, so if ProcB re-access it, that will not
> trigger a fault (because the page is still mapped in its pagetables).

But in that case "whether encode pfn in hwpoison swap entry" doesn't matter
either.. as it's not yet converted to a swap entry, so the pfn is there.

Thanks,

-- 
Peter Xu



Re: [PATCH v2] mm/mm_init: use node's number of cpus in deferred_page_init_max_threads

2024-05-23 Thread Mike Rapoport
On Wed, May 22, 2024 at 04:38:01PM -0400, Eric Chanudet wrote:
> x86_64 is already using the node's cpu as maximum threads. Make that the
> default for all archs setting DEFERRED_STRUCT_PAGE_INIT.
> 
> This returns to the behavior prior making the function arch-specific
> with commit ecd096506922 ("mm: make deferred init's max threads
> arch-specific").
> 
> Signed-off-by: Eric Chanudet 
> 
> ---
> Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms
> shows faster deferred_init_memmap completions:
> 
> | | x13s| SA8775p-ride | Ampere R137-P31 | Ampere HR330 |
> | | Metal, 32GB | VM, 36GB | VM, 58GB| Metal, 128GB |
> | | 8cpus   | 8cpus| 8cpus   | 32cpus   |
> |-|-|--|-|--|
> | threads |  ms (%) | ms   (%) |  ms (%) |  ms  (%) |
> |-|-|--|-|--|
> | 1   | 108(0%) | 72  (0%) | 224(0%) | 324 (0%) |
> | cpus|  24  (-77%) | 36(-50%) |  40  (-82%) |  56   (-82%) |
> 
> - v1: 
> https://lore.kernel.org/linux-arm-kernel/20240520231555.395979-5-echan...@redhat.com
> - Changes since v1:
>  - Make the generic function return the number of cpus of the node as
>max threads limit instead overriding it for arm64.
> - Drop Baoquan He's R-b on v1 since the logic changed.
> - Add CCs according to patch changes (ppc and s390 set
>   DEFERRED_STRUCT_PAGE_INIT by default).
> 
>  arch/x86/mm/init_64.c | 12 
>  mm/mm_init.c  |  2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 7e177856ee4f..adec42928ec1 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1354,18 +1354,6 @@ void __init mem_init(void)
>   preallocate_vmalloc_pages();
>  }
>  
> -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> -int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask)
> -{
> - /*
> -  * More CPUs always led to greater speedups on tested systems, up to
> -  * all the nodes' CPUs.  Use all since the system is otherwise idle
> -  * now.
> -  */
> - return max_t(int, cpumask_weight(node_cpumask), 1);
> -}
> -#endif
> -
>  int kernel_set_to_readonly;
>  
>  void mark_rodata_ro(void)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index f72b852bd5b8..e0023aa68555 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -2126,7 +2126,7 @@ deferred_init_memmap_chunk(unsigned long start_pfn, 
> unsigned long end_pfn,
>  __weak int __init

If s390 folks confirm there's no regression for them I think we can make
this static.

>  deferred_page_init_max_threads(const struct cpumask *node_cpumask)
>  {
> - return 1;
> + return max_t(int, cpumask_weight(node_cpumask), 1);
>  }
>  
>  /* Initialise remaining memory on a node */
> -- 
> 2.44.0
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-05-23 Thread Borislav Petkov
On Thu, May 23, 2024 at 11:57:00AM +0530, Balasubrmanian, Vignesh wrote:
> Currently, this enum is the same as XSAVE, but when we add other features, 
> this
> enum might have a different value of the XSAVE features and can be modified
> without disturbing the existing kernel code.

We will do that when we cross that bridge, right?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] sysfs: Unbreak the build around sysfs_bin_attr_simple_read()

2024-05-23 Thread Guenter Roeck
On Thu, May 23, 2024 at 01:00:00PM +0200, Lukas Wunner wrote:
> Günter reports build breakage for m68k "m5208evb_defconfig" plus
> CONFIG_BLK_DEV_INITRD=y caused by commit 66bc1a173328 ("treewide:
> Use sysfs_bin_attr_simple_read() helper").
> 
> The defconfig disables CONFIG_SYSFS, so sysfs_bin_attr_simple_read()
> is not compiled into the kernel.  But init/initramfs.c references
> that function in the initializer of a struct bin_attribute.
> 
> Add an empty static inline to avoid the build breakage.
> 
> Fixes: 66bc1a173328 ("treewide: Use sysfs_bin_attr_simple_read() helper")
> Reported-by: Guenter Roeck 
> Closes: 
> https://lore.kernel.org/r/e12b0027-b199-4de7-b83d-668171447...@roeck-us.net
> Signed-off-by: Lukas Wunner 

Tested-by: Guenter Roeck 

> ---
>  include/linux/sysfs.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
> index a7d725fbf739..c4e64dc11206 100644
> --- a/include/linux/sysfs.h
> +++ b/include/linux/sysfs.h
> @@ -750,6 +750,15 @@ static inline int sysfs_emit_at(char *buf, int at, const 
> char *fmt, ...)
>  {
>   return 0;
>  }
> +
> +static inline ssize_t sysfs_bin_attr_simple_read(struct file *file,
> +  struct kobject *kobj,
> +  struct bin_attribute *attr,
> +  char *buf, loff_t off,
> +  size_t count)
> +{
> + return 0;
> +}
>  #endif /* CONFIG_SYSFS */
>  
>  static inline int __must_check sysfs_create_file(struct kobject *kobj,
> -- 
> 2.43.0
> 


Re: [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc

2024-05-23 Thread Athira Rajeev



> On 7 May 2024, at 3:22 PM, Christophe Leroy  
> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> Add instruction tracking function "update_insn_state_powerpc" for
>> powerpc. Example sequence in powerpc:
>> 
>> ld  r10,264(r3)
>> mr  r31,r3
>> <
>> ld  r9,312(r31)
> 
> Your approach looks fragile.
> 
> mr is a simplified instruction which in fact is  "or r31, r3, r3"
> 
> By the way, the compiler sometimes does it different, like below:
> 
> lwz r10,264(r3)
> addi r31, r3, 312
> lwz r9, 0(r31)
> 
> And what about sequences with lwzu ?
Hi Christophe,

This patch added “mr”. In next patch in same series, add instruction also is 
added to instruction tracking.
I will be posting a V3 with some changes to the logic for annotating add 
instructions.

Thanks
Athira
> 
>> 
>> Consider ithe sample is pointing to: "ld r9,312(r31)".
>> Here the memory reference is hit at "312(r31)" where 312 is the offset
>> and r31 is the source register. Previous instruction sequence shows that
>> register state of r3 is moved to r31. So to identify the data type for r31
>> access, the previous instruction ("mr") needs to be tracked and the
>> state type entry has to be updated. Current instruction tracking support
>> in perf tools infrastructure is specific to x86. Patch adds this for
>> powerpc and adds "mr" instruction to be tracked.
>> 
>> Signed-off-by: Athira Rajeev 



Re: [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction

2024-05-23 Thread Athira Rajeev



> On 7 May 2024, at 3:28 PM, Christophe Leroy  
> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> Update instruction tracking with add instruction. Apart from "mr"
>> instruction, the register state is carried on by other insns, ie,
>> "add, addi, addis". Since these are not memory instructions and doesn't
>> fall in the range of (32 to 63), add these as part of nmemonic table.
>> For now, add* instructions are added. There is possibility of getting
>> more added here. But to extract regs, still the binary code will be
>> used. So associate this with "load_store_ops" itself and no other
>> changes required.
> 
> Looks fragile.
> 
> How do you handle addc, adde, addic ?
> And also addme, addze, which only have two operands instead of 3 ?

Hi Christophe,

Thanks for pointing these cases. Will address these cases in next version

Thanks
Athira
> 
>> 
>> Signed-off-by: Athira Rajeev 
>> ---
>>  .../perf/arch/powerpc/annotate/instructions.c | 21 +++
>>  tools/perf/util/disasm.c  |  1 +
>>  2 files changed, 22 insertions(+)
>> 
>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c 
>> b/tools/perf/arch/powerpc/annotate/instructions.c
>> index cce7023951fe..1f35d8a65bb4 100644
>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>> @@ -1,6 +1,17 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include 
>> 
>> +/*
>> + * powerpc instruction nmemonic table to associate load/store instructions 
>> with
>> + * move_ops. mov_ops is used to identify add/mr to do instruction tracking.
>> + */
>> +static struct ins powerpc__instructions[] = {
>> + { .name = "mr", .ops = _store_ops,  },
>> + { .name = "addi",   .ops = _store_ops,   },
>> + { .name = "addis",  .ops = _store_ops,  },
>> + { .name = "add",.ops = _store_ops,  },
>> +};
>> +
>>  static struct ins_ops *powerpc__associate_instruction_ops(struct arch 
>> *arch, const char *name)
>>  {
>>   int i;
>> @@ -75,6 +86,9 @@ static void update_insn_state_powerpc(struct type_state 
>> *state,
>>   if (annotate_get_insn_location(dloc->arch, dl, ) < 0)
>>   return;
>> 
>> + if (!strncmp(dl->ins.name, "add", 3))
>> + goto regs_check;
>> +
>>   if (strncmp(dl->ins.name, "mr", 2))
>>   return;
>> 
>> @@ -85,6 +99,7 @@ static void update_insn_state_powerpc(struct type_state 
>> *state,
>>   dst->reg1 = src_reg;
>>   }
>> 
>> +regs_check:
>>   if (!has_reg_type(state, dst->reg1))
>>   return;
>> 
>> @@ -115,6 +130,12 @@ static void update_insn_state_powerpc(struct type_state 
>> *state __maybe_unused, s
>>  static int powerpc__annotate_init(struct arch *arch, char *cpuid 
>> __maybe_unused)
>>  {
>>   if (!arch->initialized) {
>> + arch->nr_instructions = ARRAY_SIZE(powerpc__instructions);
>> + arch->instructions = calloc(arch->nr_instructions, sizeof(struct ins));
>> + if (!arch->instructions)
>> + return -ENOMEM;
>> + memcpy(arch->instructions, powerpc__instructions, sizeof(struct ins) * 
>> arch->nr_instructions);
>> + arch->nr_instructions_allocated = arch->nr_instructions;
>>   arch->initialized = true;
>>   arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>>   arch->objdump.comment_char  = '#';
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index ac6b8b8da38a..32cf506a9010 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -36,6 +36,7 @@ static struct ins_ops mov_ops;
>>  static struct ins_ops nop_ops;
>>  static struct ins_ops lock_ops;
>>  static struct ins_ops ret_ops;
>> +static struct ins_ops load_store_ops;
>> 
>>  static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>> struct ins_operands *ops, int max_ins_name);



bnx2x: UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c

2024-05-23 Thread Michael Ellerman
Hi folks,

I'm seeing an UBSAN warning when loading the bnx2x module on my Power8 machine:

  [ cut here ]
  UBSAN: array-index-out-of-bounds in 
../drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11
  index 20 is out of range for type 'stats_query_entry [19]'
  CPU: 1 PID: 3870 Comm: NetworkManager Not tainted 6.9.0-dirty #17
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf04 of:IBM,FW860.B3 
(SV860_245) hv:phyp pSeries
  Call Trace:
dump_stack_lvl+0x80/0xe8 (unreliable)
__ubsan_handle_out_of_bounds+0xc4/0x110
bnx2x_stats_init+0x6f0/0x724 [bnx2x]
bnx2x_post_irq_nic_init+0x2bc/0x51c [bnx2x]
bnx2x_nic_load+0xd74/0x2de0 [bnx2x]
bnx2x_open+0x194/0x310 [bnx2x]
__dev_open+0x16c/0x22c
__dev_change_flags+0x258/0x2f4
dev_change_flags+0x3c/0x9c
do_setlink+0x35c/0x13b4
__rtnl_newlink+0x9b8/0xd88
rtnl_newlink+0x70/0xac
rtnetlink_rcv_msg+0x380/0x578
netlink_rcv_skb+0x80/0x190
rtnetlink_rcv+0x28/0x3c
netlink_unicast+0x2bc/0x3d4
netlink_sendmsg+0x21c/0x54c
sys_sendmsg+0x28c/0x3c0
___sys_sendmsg+0xcc/0x128
__sys_sendmsg+0x94/0xf4
system_call_exception+0x174/0x320
system_call_common+0x160/0x2c4

It seems there's some confusion about how many queues there should be, earlier 
the driver prints:
 
  [  480.692141] bnx2x 0010:01:00.0: set number of queues to 21

But the stats array only has space for 19?

Loading the driver with num_queues=18 avoids the warning.

This naive patch does fix it, but is probably just papering over the issue:

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index e2a4e1088b7f..7fe3562fa8a9 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1263,7 +1263,7 @@ enum {
 struct bnx2x_fw_stats_req {
struct stats_query_header hdr;
struct stats_query_entry query[FP_SB_MAX_E1x+
-   BNX2X_FIRST_QUEUE_QUERY_IDX];
+   BNX2X_FIRST_QUEUE_QUERY_IDX + 2];
 };
 
 struct bnx2x_fw_stats_data {


Full dmesg leading up to the UBSAN report below.

cheers


$ modprobe bnx2x
[  480.575366] bnx2x 0010:01:00.0: msix capability found
[  480.594616] bnx2x 0010:01:00.0: me reg PF num: 0
[  480.594747] bnx2x 0010:01:00.0: This is a physical function
[  480.594754] bnx2x 0010:01:00.0: Cnic support is on
[  480.594760] bnx2x 0010:01:00.0: Max num of status blocks 31
[  480.594766] bnx2x 0010:01:00.0: Allocated netdev with 91 tx and 31 rx queues
[  480.594781] bnx2x 0010:01:00.0: chip is in 2_PORT_MODE
[  480.594787] bnx2x 0010:01:00.0: pf_id: 0
[  480.594792] bnx2x 0010:01:00.0: chip ID is 0x168e1000
[  480.594802] bnx2x 0010:01:00.0: flash_size 0x20 (2097152)
[  480.594815] bnx2x 0010:01:00.0: shmem offset 0x3c6c80  shmem2 offset 0x3c575c
[  480.594824] bnx2x 0010:01:00.0: hw_config 0x000f0001
[  480.594831] bnx2x 0010:01:00.0: bc_ver 70A04
[  480.594939] bnx2x 0010:01:00.0: not WoL capable
[  480.594948] bnx2x 0010:01:00.0: part number 0-0-0-0
[  480.594963] bnx2x 0010:01:00.0: IGU Normal Mode
[  480.595077] bnx2x 0010:01:00.0: igu_dsb_id 0  igu_base_sb 1  igu_sb_cnt 31
   base_fw_ndsb 1
[  480.595086] bnx2x 0010:01:00.0: shmem2base 0x3c575c, size 412, mfcfg offset 
16
[  480.595096] bnx2x 0010:01:00.0: single function mode
[  480.595107] bnx2x 0010:01:00.0: lane_config 0x  speed_cap_mask0 
0x005c  link_config0 0x
[  480.595118] bnx2x: [bnx2x_phy_probe:12595(eth%d)]Begin phy probe
[  480.595124] bnx2x: [bnx2x_phy_probe:12608(eth%d)]phy_config_swapped 0, 
phy_index 0, actual_phy_idx 0
[  480.595137] bnx2x: [bnx2x_populate_int_phy:12217(eth%d)]:chip_id = 0x168e1000
[  480.595147] bnx2x: [bnx2x_populate_int_phy:12335(eth%d)]Internal phy port=0, 
addr=0x1, mdio_ctl=0x8000
[  480.595162] bnx2x: [bnx2x_phy_def_cfg:12505(eth%d)]Default config phy idx 0 
cfg 0x0 speed_cap_mask 0x5c
[  480.595171] bnx2x: [bnx2x_phy_probe:12608(eth%d)]phy_config_swapped 0, 
phy_index 1, actual_phy_idx 1
[  480.595184] bnx2x: [bnx2x_populate_ext_phy:12463(eth%d)]phy_type 0xd00 port 
0 found in index 1
[  480.595192] bnx2x: [bnx2x_populate_ext_phy:12465(eth%d)] 
addr=0x10, mdio_ctl=0x8000
[  480.595202] bnx2x: [bnx2x_phy_def_cfg:12505(eth%d)]Default config phy idx 1 
cfg 0x0 speed_cap_mask 0x5c
[  480.595210] bnx2x: [bnx2x_phy_probe:12608(eth%d)]phy_config_swapped 0, 
phy_index 2, actual_phy_idx 2
[  480.595218] bnx2x: [bnx2x_phy_probe:12658(eth%d)]End phy probe. #phys found 2
[  480.595226] bnx2x 0010:01:00.0: phy_addr 0x1
[  480.595231] bnx2x 0010:01:00.0: supported 0x70ec 0x0
[  480.595237] bnx2x 0010:01:00.0: req_line_speed 0  req_duplex 1 req_flow_ctrl 
0x0 advertising 0x70ec
[  480.595255] bnx2x 0010:01:00.0: max_iscsi_conn 0x0
[  480.595265] bnx2x 0010:01:00.0: max_fcoe_conn 0x0
[  480.595273] bnx2x 0010:01:00.0: msix_table_size 32
[  480.595278] bnx2x 0010:01:00.0: fp_array_size 31
[  480.595307] bnx2x 

Re: [PATCH next] arch: powerpc: platforms: Remove unnecessary call to of_node_get

2024-05-23 Thread Michael Ellerman
Prabhav Kumar Vaish  writes:
> `dev->of_node` has a pointer to device node, of_node_get call seems
> unnecessary.

Sorry but it is necessary.

> Signed-off-by: Prabhav Kumar Vaish 
> ---
>  arch/powerpc/platforms/cell/iommu.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/cell/iommu.c 
> b/arch/powerpc/platforms/cell/iommu.c
> index 4cd9c0de22c2..5b794ce08689 100644
> --- a/arch/powerpc/platforms/cell/iommu.c
> +++ b/arch/powerpc/platforms/cell/iommu.c
> @@ -780,14 +780,13 @@ static int __init cell_iommu_init_disabled(void)
>  static u64 cell_iommu_get_fixed_address(struct device *dev)
>  {
>   u64 cpu_addr, size, best_size, dev_addr = OF_BAD_ADDR;
> - struct device_node *np;
> + struct device_node *np = dev->of_node;
>   const u32 *ranges = NULL;
>   int i, len, best, naddr, nsize, pna, range_size;
>  
>   /* We can be called for platform devices that have no of_node */
> - np = of_node_get(dev->of_node);
>   if (!np)
> - goto out;
> + return dev_addr;
>  
>   while (1) {
>   naddr = of_n_addr_cells(np);

nsize = of_n_size_cells(np);
np = of_get_next_parent(np);
if (!np)
break;

of_get_next_parent() drops the reference of the node passed to it (np).

So if you actually tested your patch you should see a recount underflow.

cheers


Re: [PATCH v2] mm/mm_init: use node's number of cpus in deferred_page_init_max_threads

2024-05-23 Thread Michael Ellerman
Eric Chanudet  writes:
> x86_64 is already using the node's cpu as maximum threads. Make that the
> default for all archs setting DEFERRED_STRUCT_PAGE_INIT.
>
> This returns to the behavior prior making the function arch-specific
> with commit ecd096506922 ("mm: make deferred init's max threads
> arch-specific").
>
> Signed-off-by: Eric Chanudet 
>
> ---
> Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms
> shows faster deferred_init_memmap completions:
>
> | | x13s| SA8775p-ride | Ampere R137-P31 | Ampere HR330 |
> | | Metal, 32GB | VM, 36GB | VM, 58GB| Metal, 128GB |
> | | 8cpus   | 8cpus| 8cpus   | 32cpus   |
> |-|-|--|-|--|
> | threads |  ms (%) | ms   (%) |  ms (%) |  ms  (%) |
> |-|-|--|-|--|
> | 1   | 108(0%) | 72  (0%) | 224(0%) | 324 (0%) |
> | cpus|  24  (-77%) | 36(-50%) |  40  (-82%) |  56   (-82%) |
>
> - v1: 
> https://lore.kernel.org/linux-arm-kernel/20240520231555.395979-5-echan...@redhat.com
> - Changes since v1:
>  - Make the generic function return the number of cpus of the node as
>max threads limit instead overriding it for arm64.
> - Drop Baoquan He's R-b on v1 since the logic changed.
> - Add CCs according to patch changes (ppc and s390 set
>   DEFERRED_STRUCT_PAGE_INIT by default).
>
>  arch/x86/mm/init_64.c | 12 
>  mm/mm_init.c  |  2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)

On a machine here (1TB, 40 cores, 4KB pages) the existing code gives:

  [0.500124] node 2 deferred pages initialised in 210ms
  [0.515790] node 3 deferred pages initialised in 230ms
  [0.516061] node 0 deferred pages initialised in 230ms
  [0.516522] node 7 deferred pages initialised in 230ms
  [0.516672] node 4 deferred pages initialised in 230ms
  [0.516798] node 6 deferred pages initialised in 230ms
  [0.517051] node 5 deferred pages initialised in 230ms
  [0.523887] node 1 deferred pages initialised in 240ms

vs with the patch:

  [0.379613] node 0 deferred pages initialised in 90ms
  [0.380388] node 1 deferred pages initialised in 90ms
  [0.380540] node 4 deferred pages initialised in 100ms
  [0.390239] node 6 deferred pages initialised in 100ms
  [0.390249] node 2 deferred pages initialised in 100ms
  [0.390786] node 3 deferred pages initialised in 110ms
  [0.396721] node 5 deferred pages initialised in 110ms
  [0.397095] node 7 deferred pages initialised in 110ms

Which is a nice speedup.

Tested-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH] sysfs: Unbreak the build around sysfs_bin_attr_simple_read()

2024-05-23 Thread Rafael J. Wysocki
On Thu, May 23, 2024 at 1:00 PM Lukas Wunner  wrote:
>
> Günter reports build breakage for m68k "m5208evb_defconfig" plus
> CONFIG_BLK_DEV_INITRD=y caused by commit 66bc1a173328 ("treewide:
> Use sysfs_bin_attr_simple_read() helper").
>
> The defconfig disables CONFIG_SYSFS, so sysfs_bin_attr_simple_read()
> is not compiled into the kernel.  But init/initramfs.c references
> that function in the initializer of a struct bin_attribute.
>
> Add an empty static inline to avoid the build breakage.
>
> Fixes: 66bc1a173328 ("treewide: Use sysfs_bin_attr_simple_read() helper")
> Reported-by: Guenter Roeck 
> Closes: 
> https://lore.kernel.org/r/e12b0027-b199-4de7-b83d-668171447...@roeck-us.net
> Signed-off-by: Lukas Wunner 

Works for me.

Reviewed-by: Rafael J. Wysocki 

> ---
>  include/linux/sysfs.h | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
> index a7d725fbf739..c4e64dc11206 100644
> --- a/include/linux/sysfs.h
> +++ b/include/linux/sysfs.h
> @@ -750,6 +750,15 @@ static inline int sysfs_emit_at(char *buf, int at, const 
> char *fmt, ...)
>  {
> return 0;
>  }
> +
> +static inline ssize_t sysfs_bin_attr_simple_read(struct file *file,
> +struct kobject *kobj,
> +struct bin_attribute *attr,
> +char *buf, loff_t off,
> +size_t count)
> +{
> +   return 0;
> +}
>  #endif /* CONFIG_SYSFS */
>
>  static inline int __must_check sysfs_create_file(struct kobject *kobj,
> --
> 2.43.0
>


[PATCH] sysfs: Unbreak the build around sysfs_bin_attr_simple_read()

2024-05-23 Thread Lukas Wunner
Günter reports build breakage for m68k "m5208evb_defconfig" plus
CONFIG_BLK_DEV_INITRD=y caused by commit 66bc1a173328 ("treewide:
Use sysfs_bin_attr_simple_read() helper").

The defconfig disables CONFIG_SYSFS, so sysfs_bin_attr_simple_read()
is not compiled into the kernel.  But init/initramfs.c references
that function in the initializer of a struct bin_attribute.

Add an empty static inline to avoid the build breakage.

Fixes: 66bc1a173328 ("treewide: Use sysfs_bin_attr_simple_read() helper")
Reported-by: Guenter Roeck 
Closes: 
https://lore.kernel.org/r/e12b0027-b199-4de7-b83d-668171447...@roeck-us.net
Signed-off-by: Lukas Wunner 
---
 include/linux/sysfs.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index a7d725fbf739..c4e64dc11206 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -750,6 +750,15 @@ static inline int sysfs_emit_at(char *buf, int at, const 
char *fmt, ...)
 {
return 0;
 }
+
+static inline ssize_t sysfs_bin_attr_simple_read(struct file *file,
+struct kobject *kobj,
+struct bin_attribute *attr,
+char *buf, loff_t off,
+size_t count)
+{
+   return 0;
+}
 #endif /* CONFIG_SYSFS */
 
 static inline int __must_check sysfs_create_file(struct kobject *kobj,
-- 
2.43.0



Re: [PATCH 2/2] treewide: Use sysfs_bin_attr_simple_read() helper

2024-05-23 Thread Greg Kroah-Hartman
On Wed, May 22, 2024 at 07:51:35PM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Sat, Apr 06, 2024 at 03:52:02PM +0200, Lukas Wunner wrote:
> > Deduplicate ->read() callbacks of bin_attributes which are backed by a
> > simple buffer in memory:
> > 
> > Use the newly introduced sysfs_bin_attr_simple_read() helper instead,
> > either by referencing it directly or by declaring such bin_attributes
> > with BIN_ATTR_SIMPLE_RO() or BIN_ATTR_SIMPLE_ADMIN_RO().
> > 
> > Aside from a reduction of LoC, this shaves off a few bytes from vmlinux
> > (304 bytes on an x86_64 allyesconfig).
> > 
> > No functional change intended.
> > 
> 
> Not really; see below.
> 
> > Signed-off-by: Lukas Wunner 
> > Acked-by: Michael Ellerman  (powerpc)
> > ---
> ...
> > index da79760..5193fae 100644
> > --- a/init/initramfs.c
> > +++ b/init/initramfs.c
> > @@ -575,15 +575,7 @@ static int __init initramfs_async_setup(char *str)
> >  #include 
> >  #include 
> >  
> > -static ssize_t raw_read(struct file *file, struct kobject *kobj,
> > -   struct bin_attribute *attr, char *buf,
> > -   loff_t pos, size_t count)
> > -{
> > -   memcpy(buf, attr->private + pos, count);
> > -   return count;
> > -}
> > -
> > -static BIN_ATTR(initrd, 0440, raw_read, NULL, 0);
> > +static BIN_ATTR(initrd, 0440, sysfs_bin_attr_simple_read, NULL, 0);
> >  
> 
> sysfs_bin_attr_simple_read is only declared and available if CONFIG_SYSFS=y.
> With m68k:m5208evb_defconfig + CONFIG_BLK_DEV_INITRD=y, this results in
> 
> /opt/buildbot/slave/qemu-m68k/build/init/initramfs.c:578:31:
>   error: 'sysfs_bin_attr_simple_read' undeclared here (not in a function)
> 
> This happens because CONFIG_SYSFS=n and there is no dummy function for
> sysfs_bin_attr_simple_read(). Presumably the problem will be seen for all
> configurations with CONFIG_BLK_DEV_INITRD=y and CONFIG_SYSFS=n.

Lukas, can you send a patch adding a dummy function?

thanks,

greg k-h