date:20210607

[PATCH] selftests/powerpc: Fix typo in spectre_v2

2021-06-07 Thread Russell Currey

Signed-off-by: Russell Currey 
---
 tools/testing/selftests/powerpc/security/spectre_v2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/security/spectre_v2.c 
b/tools/testing/selftests/powerpc/security/spectre_v2.c
index adc2b7294e5f..e66f66bc482e 100644
--- a/tools/testing/selftests/powerpc/security/spectre_v2.c
+++ b/tools/testing/selftests/powerpc/security/spectre_v2.c
@@ -209,7 +209,7 @@ int spectre_v2_test(void)
break;
case COUNT_CACHE_DISABLED:
if (miss_percent < 95) {
-   printf("Branch misses < 20%% unexpected in this 
configuration!\n");
+   printf("Branch misses < 95%% unexpected in this 
configuration!\n");
printf("Possible mis-match between reported & actual 
mitigation\n");
return 1;
}
-- 
2.32.0

Re: [PATCH v2 8/9] mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA

2021-06-07 Thread Mike Rapoport

Hi,

On Mon, Jun 07, 2021 at 10:53:08AM +0200, Geert Uytterhoeven wrote:
> Hi Mike,
> 
> On Fri, Jun 4, 2021 at 8:50 AM Mike Rapoport  wrote:
> > From: Mike Rapoport 
> >
> > After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
> > configuration options are equivalent.
> >
> > Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.
> >
> > Done with
> >
> > $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
> > $(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
> > $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
> > $(git grep -wl NEED_MULTIPLE_NODES)
> >
> > with manual tweaks afterwards.
> >
> > Signed-off-by: Mike Rapoport 
> 
> Thanks for your patch!
> 
> As you dropped the following hunk from v2 of PATCH 5/9, there's now
> one reference left of CONFIG_NEED_MULTIPLE_NODES
> (plus the discontigmem comment):

Aargh, indeed. Thanks for catching this.

And I wondered why you suggested to fix spelling in cover letter for v3 :)
 
> -diff --git a/mm/memory.c b/mm/memory.c
> -index f3ffab9b9e39157b..fd0ebb63be3304f5 100644
>  a/mm/memory.c
> -+++ b/mm/memory.c
> -@@ -90,8 +90,7 @@
> - #warning Unfortunate NUMA and NUMA Balancing config, growing
> page-frame for last_cpupid.
> - #endif
> -
> --#ifndef CONFIG_NEED_MULTIPLE_NODES
> --/* use the per-pgdat data instead for discontigmem - mbligh */
> -+#ifdef CONFIG_FLATMEM
> - unsigned long max_mapnr;
> - EXPORT_SYMBOL(max_mapnr);
> -
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds

-- 
Sincerely yours,
Mike.

Re: [PATCH v2] libnvdimm/pmem: Fix blk_cleanup_disk() usage

2021-06-07 Thread Sachin Sant



> Reported-by: Sachin Sant 
> Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to 
> blk_alloc_disk/blk_cleanup_disk")
> Link: 
> http://lore.kernel.org/r/dfb75ba8-603f-4a35-880b-c5b23ef8f...@linux.vnet.ibm.com
> Cc: Christoph Hellwig 
> Cc: Ulf Hansson 
> Cc: Jens Axboe 
> Signed-off-by: Dan Williams 
> ---

Thanks Dan. This patch fixes the reported crash for me.

Tested-by: Sachin Sant 
> 
> Changes in v2 Improve the changelog.
> 
> drivers/nvdimm/pmem.c |4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 31f3c4bd6f72..fc6b78dd2d24 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -337,8 +337,9 @@ static void pmem_pagemap_cleanup(struct dev_pagemap 
> *pgmap)
> {
>   struct request_queue *q =
>   container_of(pgmap->ref, struct request_queue, q_usage_counter);

With this change variable ‘q' is no longer needed and can be removed.

drivers/nvdimm/pmem.c: In function 'pmem_pagemap_cleanup':
drivers/nvdimm/pmem.c:338:24: warning: unused variable 'q' [-Wunused-variable]
  struct request_queue *q =  
  ^
> + struct pmem_device *pmem = pgmap->owner;
> 
> - blk_cleanup_disk(queue_to_disk(q));
> + blk_cleanup_disk(pmem->disk);
> }
> 
> static void pmem_release_queue(void *pgmap)
> @@ -427,6 +428,7 @@ static int pmem_attach_disk(struct device *dev,
>   q = disk->queue;
> 
>   pmem->disk = disk;
> + pmem->pgmap.owner = pmem;
>   pmem->pfn_flags = PFN_DEV;
>   pmem->pgmap.ref = >q_usage_counter;
>   if (is_nd_pfn(dev)) {
> 

Thanks
-Sachin

Re: [PATCH 20/30] nullb: use blk_mq_alloc_disk

2021-06-07 Thread Christoph Hellwig

On Thu, Jun 03, 2021 at 12:10:09AM +, Chaitanya Kulkarni wrote:
> > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> > index d8e098f1e5b5..74fb2ec63219 100644
> > --- a/drivers/block/null_blk/main.c
> > +++ b/drivers/block/null_blk/main.c
> > @@ -1851,13 +1851,12 @@ static int null_add_dev(struct nullb_device *dev)
> >  
> > rv = -ENOMEM;
> 
> Is above initialization needed ?

It isn't strictly required any more.

Re: [PATCH v2] libnvdimm/pmem: Fix blk_cleanup_disk() usage

2021-06-07 Thread Christoph Hellwig

Thanks Dan, this looks good to me:

Reviewed-by: Christoph Hellwig 

Jens, can you quickly pick this up?

Re: [PATCH] powerpc: Fix kernel-jump address for ppc64 wrapper boot

2021-06-07 Thread Oliver O'Halloran

On Fri, Jun 4, 2021 at 7:39 PM He Ying  wrote:
>
> From "64-bit PowerPC ELF Application Binary Interface Supplement 1.9",
> we know that the value of a function pointer in a language like C is
> the address of the function descriptor and the first doubleword
> of the function descriptor contains the address of the entry point
> of the function.
>
> So, when we want to jump to an address (e.g. addr) to execute for
> PPC-elf64abi, we should assign the address of addr *NOT* addr itself
> to the function pointer or system will jump to the wrong address.

How have you tested this?

IIRC the 64bit wrapper is only used for ppc64le builds. For that case
the current code is work because the LE ABI (ABIv2) doesn't use
function descriptors. I think even for a BE kernel we need the current
behaviour because the vmlinux's entry point is screwed up (i.e.
doesn't point a descriptor) and tools in the wild (probably kexec)
expect it to be screwed up.

ABIv2 (LE) reference:
https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture

Re: [PATCH v7 00/11] Speedup mremap on ppc64

2021-06-07 Thread Nicholas Piggin

Excerpts from Aneesh Kumar K.V's message of June 8, 2021 2:39 pm:
> On 6/7/21 3:40 PM, Nick Piggin wrote:
>> On Monday, 7 June 2021, Aneesh Kumar K.V  
>> wrote: This patchset enables MOVE_PMD/MOVE_PUD support on power. This 
>> requires the platform to support updating higher-level page tables 
>> without updating page table ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> ZjQcmQRYFpfptBannerEnd
>> 
>> 
>> On Monday, 7 June 2021, Aneesh Kumar K.V > > wrote:
>> 
>> 
>> This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
>> the platform to support updating higher-level page tables without
>> updating page table entries. This also needs to invalidate the Page Walk
>> Cache on architecture supporting the same.
>> 
>> Changes from v6:
>> * Update ppc64 flush_tlb_range to invalidate page walk cache.
>> 
>> 
>> I'd really rather not do this, I'm not sure if micro bench mark captures 
>> everything.
>> 
>> Page tables coming from L2/L3 probably aren't the primary purpose or 
>> biggest benefit of intermediate level caches.
>> 
>> The situation on POWER with nest mmu (coherent accelerators) is 
>> magnified. They have huge page walk cashes to make up for the fact they 
>> don't have data caches for walking page tables which makes the 
>> invalidation more painful in terms of subsequent misses, but also 
>> latency to invalidate (can be order of microseconds whereas a page 
>> invalidate is a couple of orders of magnitude faster).
>> 
> 
> If we are using NestMMU, we already upgrade that flush to invalidate 
> page walk cache right? ie, if we have > PMD_SIZE range, we would upgrade 
> the invalidate to a pid flush via
> 
> flush_pid = nr_pages > tlb_single_page_flush_ceiling;

Not that we've tuned that parameter for a long time, certainly not with 
nMMU probably. Quite possibly it should be higher for nMMU because of 
the big TLBs they have. (and what about == PMD_SIZE)?

>   
> and if it is PID flush if we are using NestMMU we already upgrade a 
> RIC_FLUSH_TLB to RIC_FLUSH_ALL ?

Does P10 still have that bug?

At any rate, the core MMU I think still has the same issues just less
pronounced. PWC invalidates take longer, and PWC should have most
benefit when CPU data caches are highly used and don't filled with
page table entries.

Thanks,
Nick

Re: [PATCH] powerpc/kprobes: Pass ppc_inst as a pointer to emulate_step() on ppc32

2021-06-07 Thread Christophe Leroy





Le 07/06/2021 à 19:36, Christophe Leroy a écrit :



Le 07/06/2021 à 16:31, Christophe Leroy a écrit :



Le 07/06/2021 à 13:34, Naveen N. Rao a écrit :

Naveen N. Rao wrote:

Trying to use a kprobe on ppc32 results in the below splat:
    BUG: Unable to handle kernel data access on read at 0x7c0802a6
    Faulting instruction address: 0xc002e9f0
    Oops: Kernel access of bad area, sig: 11 [#1]
    BE PAGE_SIZE=4K PowerPC 44x Platform
    Modules linked in:
    CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
    NIP:  c002e9f0 LR: c0011858 CTR: 8a47
    REGS: c292fd50 TRAP: 0300   Not tainted  (5.13.0-rc1-01824-g3a81c0495fdb)
    MSR:  9000   CR: 24002002  XER: 2000
    DEAR: 7c0802a6 ESR: 
    
    NIP [c002e9f0] emulate_step+0x28/0x324
    LR [c0011858] optinsn_slot+0x128/0x1
    Call Trace:
 opt_pre_handler+0x7c/0xb4 (unreliable)
 optinsn_slot+0x128/0x1
 ret_from_syscall+0x0/0x28

The offending instruction is:
    81 24 00 00 lwz r9,0(r4)

Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.

Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)


Christophe,
Can you confirm if this patch works for you? It would be good if this can go in 
v5.13.



I'm trying to use kprobes, but I must be missing something. I have tried to follow the exemple in 
kernel's documentation:


  # echo 'p:myprobe do_sys_open dfd=%r3' > 
/sys/kernel/debug/tracing/kprobe_events

  # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable

  # cat /sys/kernel/debug/kprobes/list

  c00122e4  k  kretprobe_trampoline+0x0    [OPTIMIZED]
  c018a1b4  k  do_sys_open+0x0    [OPTIMIZED]

  # cat /sys/kernel/debug/tracing/tracing_on

  1

  # cat /sys/kernel/debug/tracing/trace

# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:1
#
#    _-=> irqs-off
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / delay
#   TASK-PID CPU#     TIMESTAMP  FUNCTION
#  | | |     | |



So it looks like I get no event. I can't believe that do_sys_open() is never 
hit.

This is without your patch, so it should Oops ?


Then it looks like something is locked up somewhere, because I can't do 
anything else:

  # echo 'p:myprobe2 do_sys_openat2 dfd=%r3' 
>/sys/kernel/debug/tracing/kprobe_events

  -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy

  # echo '-:myprobe' > /sys/kernel/debug/tracing/kprobe_events

  -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy

  # echo > /sys/kernel/debug/tracing/kprobe_events

  -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy




Ok, did a new test. Seems like do_sys_open() is really never called.
I set the test at do_sys_openat2 , it was not optimised and was working.
I set the test at do_sys_openat2+0x10 , it was optimised and crashed.
Now I'm going to test the patch.

When I set an event, is that normal that it removes the previous one ? Then we can have only one 
event at a time ? And then when that event is enabled we get 'Device or resource busy' when trying 
to add a new one ?




I confirm it doesn't crash anymore and it now works with optimised probes.

Tested-by: Christophe Leroy

Re: [PATCH] powerpc: Fix kernel-jump address for ppc64 wrapper boot

2021-06-07 Thread Christophe Leroy





Le 04/06/2021 à 11:22, He Ying a écrit :

 From "64-bit PowerPC ELF Application Binary Interface Supplement 1.9",
we know that the value of a function pointer in a language like C is
the address of the function descriptor and the first doubleword
of the function descriptor contains the address of the entry point
of the function.

So, when we want to jump to an address (e.g. addr) to execute for
PPC-elf64abi, we should assign the address of addr *NOT* addr itself
to the function pointer or system will jump to the wrong address.

Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-DES
Signed-off-by: He Ying 
---
  arch/powerpc/boot/main.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c
index cae31a6e8f02..50fd7f11b642 100644
--- a/arch/powerpc/boot/main.c
+++ b/arch/powerpc/boot/main.c
@@ -268,7 +268,16 @@ void start(void)
if (console_ops.close)
console_ops.close();
  
+#ifdef CONFIG_PPC64_BOOT_WRAPPER


This kind of need doesn't desserve a #ifdef, see 
https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation


You can do:


kentry = (kernel_entry_t)(IS_ENABLED(CONFIG_PPC64_BOOT_WRAPPER) ? 
 : vmlinux.addr);


Or, if you prefer something less compact:


if (IS_ENABLED(CONFIG_PPC64_BOOT_WRAPPER))
kentry = (kernel_entry_t) 
else
kentry = (kernel_entry_t) vmlinux.addr;



+   /*
+* For PPC-elf64abi, the value of a function pointer is the address
+* of the function descriptor. And the first doubleword of a function
+* descriptor contains the address of the entry point of the function.
+*/
+   kentry = (kernel_entry_t) 
+#else
kentry = (kernel_entry_t) vmlinux.addr;
+#endif
if (ft_addr) {
if(platform_ops.kentry)
platform_ops.kentry(ft_addr, vmlinux.addr);

Re: [PATCH v7 00/11] Speedup mremap on ppc64

2021-06-07 Thread Aneesh Kumar K.V


On 6/7/21 3:40 PM, Nick Piggin wrote:
On Monday, 7 June 2021, Aneesh Kumar K.V  
wrote: This patchset enables MOVE_PMD/MOVE_PUD support on power. This 
requires the platform to support updating higher-level page tables 
without updating page table ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd


On Monday, 7 June 2021, Aneesh Kumar K.V > wrote:



This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
the platform to support updating higher-level page tables without
updating page table entries. This also needs to invalidate the Page Walk
Cache on architecture supporting the same.

Changes from v6:
* Update ppc64 flush_tlb_range to invalidate page walk cache.


I'd really rather not do this, I'm not sure if micro bench mark captures 
everything.


Page tables coming from L2/L3 probably aren't the primary purpose or 
biggest benefit of intermediate level caches.


The situation on POWER with nest mmu (coherent accelerators) is 
magnified. They have huge page walk cashes to make up for the fact they 
don't have data caches for walking page tables which makes the 
invalidation more painful in terms of subsequent misses, but also 
latency to invalidate (can be order of microseconds whereas a page 
invalidate is a couple of orders of magnitude faster).




If we are using NestMMU, we already upgrade that flush to invalidate 
page walk cache right? ie, if we have > PMD_SIZE range, we would upgrade 
the invalidate to a pid flush via


flush_pid = nr_pages > tlb_single_page_flush_ceiling;

and if it is PID flush if we are using NestMMU we already upgrade a 
RIC_FLUSH_TLB to RIC_FLUSH_ALL ?


Yes it is a deficiency of the ppc invalidation architecture, we are 
aware and would like to improve it but for now those is what we have.




-aneesh

Re: [PATCH v4 1/4] lazy tlb: introduce lazy mm refcount helper functions

2021-06-07 Thread Nicholas Piggin

Excerpts from Andrew Morton's message of June 8, 2021 11:48 am:
> On Tue, 08 Jun 2021 11:39:56 +1000 Nicholas Piggin  wrote:
> 
>> > Looks like a functional change.  What's happening here?
>> 
>> That's kthread_use_mm being clever about the lazy tlb mm. If it happened 
>> that the kthread had inherited a the lazy tlb mm that happens to be the 
>> one we want to use here, then we already have a refcount to it via the 
>> lazy tlb ref.
>> 
>> So then it doesn't have to touch the refcount, but rather just converts
>> it from the lazy tlb ref to the returned reference. If the lazy tlb mm
>> doesn't get a reference, we can't do that.
> 
> Please cover this in the changelog and perhaps a code comment.
> 

Yeah fair enough, I'll even throw in a bug fix as well (your nose was right, 
and it was too clever for me by half...)

Thanks,
Nick

--
Fix a refcounting bug in kthread_use_mm (the mm reference is increased
unconditionally now, but the lazy tlb refcount is still only dropped only
if mm != active_mm).

And an update for the changelog:

If a kernel thread's current lazy tlb mm happens to be the one it wants to
use, then kthread_use_mm() cleverly transfers the mm refcount from the
lazy tlb mm reference to the returned reference. If the lazy tlb mm
reference is no longer identical to a normal reference, this trick does not
work, so that is changed to be explicit about the two references.

Signed-off-by: Nicholas Piggin 
---
 kernel/kthread.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index b70e28431a01..5e9797b2d06e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1314,6 +1314,11 @@ void kthread_use_mm(struct mm_struct *mm)
WARN_ON_ONCE(!(tsk->flags & PF_KTHREAD));
WARN_ON_ONCE(tsk->mm);
 
+   /*
+* It's possible that tsk->active_mm == mm here, but we must
+* still mmgrab(mm) and mmdrop_lazy_tlb(active_mm), because lazy
+* mm may not have its own refcount (see mmgrab/drop_lazy_tlb()).
+*/
mmgrab(mm);
 
task_lock(tsk);
@@ -1338,12 +1343,9 @@ void kthread_use_mm(struct mm_struct *mm)
 * memory barrier after storing to tsk->mm, before accessing
 * user-space memory. A full memory barrier for membarrier
 * {PRIVATE,GLOBAL}_EXPEDITED is implicitly provided by
-* mmdrop(), or explicitly with smp_mb().
+* mmdrop_lazy_tlb().
 */
-   if (active_mm != mm)
-   mmdrop_lazy_tlb(active_mm);
-   else
-   smp_mb();
+   mmdrop_lazy_tlb(active_mm);
 
to_kthread(tsk)->oldfs = force_uaccess_begin();
 }
-- 
2.23.0

Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout

2021-06-07 Thread Hugh Dickins

On Mon, 7 Jun 2021, Aneesh Kumar K.V wrote:

> CPU 1 CPU 2   CPU 3
> 
> mremap(old_addr, new_addr)  page_shrinker/try_to_unmap_one
> 
> mmap_write_lock_killable()
> 
>   addr = old_addr
>   lock(pte_ptl)
> lock(pmd_ptl)
> pmd = *old_pmd
> pmd_clear(old_pmd)
> flush_tlb_range(old_addr)
> 
> *new_pmd = pmd
>   
> *new_addr = 10; and fills
>   TLB 
> with new addr
>   and old 
> pfn
> 
> unlock(pmd_ptl)
>   ptep_clear_flush()
>   old pfn is free.
>   Stale 
> TLB entry
> 
> Fix this race by holding pmd lock in pageout. This still doesn't handle the 
> race
> between MOVE_PUD and pageout.
> 
> Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
> Link: 
> https://lore.kernel.org/linux-mm/CAHk-=wgxvr04ebntxqfevontwnp6fdm+oj5vauqxp3s-huw...@mail.gmail.com
> Signed-off-by: Aneesh Kumar K.V 

This seems very wrong to me, to require another level of locking in the
rmap lookup, just to fix some new pagetable games in mremap.

But Linus asked "Am I missing something?": neither of you have mentioned
mremap's take_rmap_locks(), so I hope that already meets your need.  And
if it needs to be called more often than before (see "need_rmap_locks"),
that's probably okay.

Hugh

> ---
>  include/linux/rmap.h |  9 ++---
>  mm/page_vma_mapped.c | 36 ++--
>  2 files changed, 24 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index def5c62c93b3..272ab0c2b60b 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -207,7 +207,8 @@ struct page_vma_mapped_walk {
>   unsigned long address;
>   pmd_t *pmd;
>   pte_t *pte;
> - spinlock_t *ptl;
> + spinlock_t *pte_ptl;
> + spinlock_t *pmd_ptl;
>   unsigned int flags;
>  };
>  
> @@ -216,8 +217,10 @@ static inline void page_vma_mapped_walk_done(struct 
> page_vma_mapped_walk *pvmw)
>   /* HugeTLB pte is set to the relevant page table entry without 
> pte_mapped. */
>   if (pvmw->pte && !PageHuge(pvmw->page))
>   pte_unmap(pvmw->pte);
> - if (pvmw->ptl)
> - spin_unlock(pvmw->ptl);
> + if (pvmw->pte_ptl)
> + spin_unlock(pvmw->pte_ptl);
> + if (pvmw->pmd_ptl)
> + spin_unlock(pvmw->pmd_ptl);
>  }
>  
>  bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw);
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2cf01d933f13..87a2c94c7e27 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -47,8 +47,10 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
>   return false;
>   }
>   }
> - pvmw->ptl = pte_lockptr(pvmw->vma->vm_mm, pvmw->pmd);
> - spin_lock(pvmw->ptl);
> + if (USE_SPLIT_PTE_PTLOCKS) {
> + pvmw->pte_ptl = pte_lockptr(pvmw->vma->vm_mm, pvmw->pmd);
> + spin_lock(pvmw->pte_ptl);
> + }
>   return true;
>  }
>  
> @@ -162,8 +164,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk 
> *pvmw)
>   if (!pvmw->pte)
>   return false;
>  
> - pvmw->ptl = huge_pte_lockptr(page_hstate(page), mm, pvmw->pte);
> - spin_lock(pvmw->ptl);
> + pvmw->pte_ptl = huge_pte_lockptr(page_hstate(page), mm, 
> pvmw->pte);
> + spin_lock(pvmw->pte_ptl);
>   if (!check_pte(pvmw))
>   return not_found(pvmw);
>   return true;
> @@ -179,6 +181,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk 
> *pvmw)
>   if (!pud_present(*pud))
>   return false;
>   pvmw->pmd = pmd_offset(pud, pvmw->address);
> + pvmw->pmd_ptl = pmd_lock(mm, pvmw->pmd);
>   /*
>* Make sure the pmd value isn't cached in a register by the
>* compiler and used as a stale value after we've observed a
> @@ -186,7 +189,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk 
> *pvmw)
>*/
>   pmde = READ_ONCE(*pvmw->pmd);
>   if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>   if (likely(pmd_trans_huge(*pvmw->pmd))) {
>   if (pvmw->flags & PVMW_MIGRATION)
>   return not_found(pvmw);
> @@ -206,14 +208,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk 
> *pvmw)
>   }
>   }
>   return not_found(pvmw);
> - } else {
> - /* THP pmd was split under us: handle on

Re: [PATCH v4 3/4] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2021-06-07 Thread Nicholas Piggin

Excerpts from Nicholas Piggin's message of June 5, 2021 11:42 am:
> On big systems, the mm refcount can become highly contented when doing
> a lot of context switching with threaded applications (particularly
> switching between the idle thread and an application thread).
> 
> Abandoning lazy tlb slows switching down quite a bit in the important
> user->idle->user cases, so instead implement a non-refcounted scheme
> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> any remaining lazy ones.
> 
> Shootdown IPIs are some concern, but they have not been observed to be
> a big problem with this scheme (the powerpc implementation generated
> 314 additional interrupts on a 144 CPU system during a kernel compile).
> There are a number of strategies that could be employed to reduce IPIs
> if they turn out to be a problem for some workload.
> 
> Signed-off-by: Nicholas Piggin 
> ---

Update the comment to be clearer, and account for the improvement
to MMU_LAZY_TLB_REFCOUNT comment.

Signed-off-by: Nicholas Piggin 
---
 arch/Kconfig | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 2ad1a505ca55..cf468c9777d8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -433,15 +433,16 @@ config MMU_LAZY_TLB_REFCOUNT
def_bool y
depends on !MMU_LAZY_TLB_SHOOTDOWN
 
-# Instead of refcounting the lazy mm struct for kernel thread references
-# (which can cause contention with multi-threaded apps on large multiprocessor
-# systems), this option causes __mmdrop to IPI all CPUs in the mm_cpumask and
-# switch to init_mm if they were using the to-be-freed mm as the lazy tlb. To
-# implement this, architectures must use _lazy_tlb variants of mm refcounting
-# when releasing kernel thread mm references, and mm_cpumask must include at
-# least all possible CPUs in which the mm might be lazy, at the time of the
-# final mmdrop. mmgrab/mmdrop in arch/ code must be switched to _lazy_tlb
-# postfix as necessary.
+# This option allows MMU_LAZY_TLB_REFCOUNT=n. It ensures no CPUs are using an
+# mm as a lazy tlb beyond its last reference count, by shooting down these
+# users before the mm is deallocated. __mmdrop() first IPIs all CPUs that may
+# be using the mm as a lazy tlb, so that they may switch themselves to using
+# init_mm for their active mm. mm_cpumask(mm) is used to determine which CPUs
+# may be using mm as a lazy tlb mm.
+#
+# To implement this, an arch must ensure mm_cpumask(mm) contains at least all
+# possible CPUs in which the mm is lazy, and it must meet the requirements for
+# MMU_LAZY_TLB_REFCOUNT=n (see above).
 config MMU_LAZY_TLB_SHOOTDOWN
bool
 
-- 
2.23.0

Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable

2021-06-07 Thread Nicholas Piggin

Excerpts from Nicholas Piggin's message of June 5, 2021 11:42 am:
> Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
> when it is context switched. This can be disabled by architectures that
> don't require this refcounting if they clean up lazy tlb mms when the
> last refcount is dropped. Currently this is always enabled, which is
> what existing code does, so the patch is effectively a no-op.
> 
> Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is.
> 
> Signed-off-by: Nicholas Piggin 

Can I give you a couple of incremental patches for 2/4 and 3/4 to 
improve the implementation requirement comments a bit for benefit of 
other archs.

Thanks,
Nick
--

Explain the requirements for lazy tlb mm refcounting in the comment,
to help with archs that may want to disable this by some means other
than MMU_LAZY_TLB_SHOOTDOWN.

Signed-off-by: Nicholas Piggin 
---
 arch/Kconfig | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1cff045cdde6..39d8c7dcf220 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -419,6 +419,16 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
  shootdowns should enable this.
 
 # Use normal mm refcounting for MMU_LAZY_TLB kernel thread references.
+# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching
+# to/from kernel threads when the same mm is running on a lot of CPUs (a large
+# multi-threaded application), by reducing contention on the mm refcount.
+#
+# This can be disabled if the architecture ensures no CPUs are using an mm as a
+# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm
+# or its kernel page tables). This could be arranged by arch_exit_mmap(), or
+# final exit(2) TLB flush, for example. arch code must also ensure the
+# _lazy_tlb variants of mmgrab/mmdrop are used when dropping the lazy reference
+# to a kthread ->active_mm (non-arch code has been converted already).
 config MMU_LAZY_TLB_REFCOUNT
def_bool y
 
-- 
2.23.0

Re: [PATCH] powerpc: Fix duplicate included _clear.h

2021-06-07 Thread Michael Ellerman

Jiapeng Chong  writes:
> Clean up the following includecheck warning:
>
> ./arch/powerpc/perf/req-gen/perf.h: _clear.h is included more than once.

That's by design.

See the error reported by the kbuild robot.

> No functional change.

Not true.

cheers

> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  arch/powerpc/perf/req-gen/perf.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/powerpc/perf/req-gen/perf.h 
> b/arch/powerpc/perf/req-gen/perf.h
> index fa9bc80..59fa588 100644
> --- a/arch/powerpc/perf/req-gen/perf.h
> +++ b/arch/powerpc/perf/req-gen/perf.h
> @@ -51,7 +51,6 @@ enum CAT2(NAME_LOWER, _requests) {
>   *   r_fields
>   * };
>   */
> -#include "_clear.h"
>  #define STRUCT_NAME__(name_lower, r_name) name_lower ## _ ## r_name
>  #define STRUCT_NAME_(name_lower, r_name) STRUCT_NAME__(name_lower, r_name)
>  #define STRUCT_NAME(r_name) STRUCT_NAME_(NAME_LOWER, r_name)
> -- 
> 1.8.3.1

Re: [PATCH v4 4/4] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

2021-06-07 Thread Nicholas Piggin

Excerpts from Andrew Morton's message of June 8, 2021 9:52 am:
> On Sat,  5 Jun 2021 11:42:16 +1000 Nicholas Piggin  wrote:
> 
>> On a 16-socket 192-core POWER8 system, a context switching benchmark
>> with as many software threads as CPUs (so each switch will go in and
>> out of idle), upstream can achieve a rate of about 1 million context
>> switches per second. After this patch it goes up to 118 million.
> 
> Nice.  Do we have a feel for the benefit on any real-world workloads?

Not really unfortunately. I think it's always been a "known" cacheline,
it just showed up badly on will-it-scale tests recently when Anton was
doing a sweep of low hanging scalability issues on big systems.

We have some very big systems running certain in-memory databases that 
get into very high contention conditions on mutexes that push context
switch rates right up and with idle times pretty high, which would get
a lot of parallel context switching between user and idle thread, we
might be getting a bit of this contention there.

It's not something at the top of profiles though. And on multi-threaded
workloads like this, the normal refcounting of the user mm still has
fundmaental contention. It's tricky to get the change tested on these
workloads (machine time is very limited and I can't drive the software).

I suspect it could also show in things that do high net or disk IO rates
(enough to need a lot of cores), and do some user processing steps along
the way. You'd potentially get a lot of idle switching.

> 
> Could any other architectures benefit from these changes?
> 

The cacheline is going to bounce in the same situations on other archs, 
so I would say yes. Rik at one stage had some patches to try avoid it
for x86 some years ago, I don't know what happened to those.

The way powerpc has to maintain mm_cpumask for its TLB flushing makes it
relatively easy to do this shootdown, and we decided the additional IPIs
were less of a concern than the bouncing. Others have different concerns,
but I tried to make it generic and add comments explaining what other
archs can do, or possibly different ways it might be achieved.

Thanks,
Nick

Re: [PATCH v4 1/4] lazy tlb: introduce lazy mm refcount helper functions

2021-06-07 Thread Andrew Morton

On Tue, 08 Jun 2021 11:39:56 +1000 Nicholas Piggin  wrote:

> > Looks like a functional change.  What's happening here?
> 
> That's kthread_use_mm being clever about the lazy tlb mm. If it happened 
> that the kthread had inherited a the lazy tlb mm that happens to be the 
> one we want to use here, then we already have a refcount to it via the 
> lazy tlb ref.
> 
> So then it doesn't have to touch the refcount, but rather just converts
> it from the lazy tlb ref to the returned reference. If the lazy tlb mm
> doesn't get a reference, we can't do that.

Please cover this in the changelog and perhaps a code comment.

Re: [PATCH v4 1/4] lazy tlb: introduce lazy mm refcount helper functions

2021-06-07 Thread Nicholas Piggin

Excerpts from Andrew Morton's message of June 8, 2021 9:49 am:
> On Sat,  5 Jun 2021 11:42:13 +1000 Nicholas Piggin  wrote:
> 
>> Add explicit _lazy_tlb annotated functions for lazy mm refcounting.
>> This makes lazy mm references more obvious, and allows explicit
>> refcounting to be removed if it is not used.
>> 
>> ...
>>
>> --- a/kernel/kthread.c
>> +++ b/kernel/kthread.c
>> @@ -1314,14 +1314,14 @@ void kthread_use_mm(struct mm_struct *mm)
>>  WARN_ON_ONCE(!(tsk->flags & PF_KTHREAD));
>>  WARN_ON_ONCE(tsk->mm);
>>  
>> +mmgrab(mm);
>> +
>>  task_lock(tsk);
>>  /* Hold off tlb flush IPIs while switching mm's */
>>  local_irq_disable();
>>  active_mm = tsk->active_mm;
>> -if (active_mm != mm) {
>> -mmgrab(mm);
>> +if (active_mm != mm)
>>  tsk->active_mm = mm;
>> -}
> 
> Looks like a functional change.  What's happening here?

That's kthread_use_mm being clever about the lazy tlb mm. If it happened 
that the kthread had inherited a the lazy tlb mm that happens to be the 
one we want to use here, then we already have a refcount to it via the 
lazy tlb ref.

So then it doesn't have to touch the refcount, but rather just converts
it from the lazy tlb ref to the returned reference. If the lazy tlb mm
doesn't get a reference, we can't do that.

Thanks,
Nick

[PATCH -next] soc: fsl: dpio: use list_move_tail instead of list_del/list_add_tail

2021-06-07 Thread Zou Wei

Using list_move_tail() instead of list_del() + list_add_tail().

Reported-by: Hulk Robot 
Signed-off-by: Zou Wei 
---
 drivers/soc/fsl/dpio/dpio-service.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
b/drivers/soc/fsl/dpio/dpio-service.c
index 7351f30..340c090 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -76,8 +76,7 @@ static inline struct dpaa2_io *service_select(struct dpaa2_io 
*d)
 
spin_lock(_list_lock);
d = list_entry(dpio_list.next, struct dpaa2_io, node);
-   list_del(>node);
-   list_add_tail(>node, _list);
+   list_move_tail(>node, _list);
spin_unlock(_list_lock);
 
return d;
-- 
2.6.2

[PATCH v2] libnvdimm/pmem: Fix blk_cleanup_disk() usage

2021-06-07 Thread Dan Williams

The queue_to_disk() helper can not be used after del_gendisk()
communicate @disk via the pgmap->owner.

Otherwise, queue_to_disk() returns NULL resulting in the splat below.

 Kernel attempted to read user page (330) - exploit attempt? (uid: 0)
 BUG: Kernel NULL pointer dereference on read at 0x0330
 Faulting instruction address: 0xc0906344
 Oops: Kernel access of bad area, sig: 11 [#1]
 [..]
 NIP [c0906344] pmem_pagemap_cleanup+0x24/0x40
 LR [c04701d4] memunmap_pages+0x1b4/0x4b0
 Call Trace:
 [c00022cbb9c0] [c09063c8] pmem_pagemap_kill+0x28/0x40 (unreliable)
 [c00022cbb9e0] [c04701d4] memunmap_pages+0x1b4/0x4b0
 [c00022cbba90] [c08b28a0] devm_action_release+0x30/0x50
 [c00022cbbab0] [c08b39c8] release_nodes+0x2f8/0x3e0
 [c00022cbbb60] [c08ac440] 
device_release_driver_internal+0x190/0x2b0
 [c00022cbbba0] [c08a8450] unbind_store+0x130/0x170

Reported-by: Sachin Sant 
Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk")
Link: 
http://lore.kernel.org/r/dfb75ba8-603f-4a35-880b-c5b23ef8f...@linux.vnet.ibm.com
Cc: Christoph Hellwig 
Cc: Ulf Hansson 
Cc: Jens Axboe 
Signed-off-by: Dan Williams 
---
Changes in v2 Improve the changelog.

 drivers/nvdimm/pmem.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 31f3c4bd6f72..fc6b78dd2d24 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -337,8 +337,9 @@ static void pmem_pagemap_cleanup(struct dev_pagemap *pgmap)
 {
struct request_queue *q =
container_of(pgmap->ref, struct request_queue, q_usage_counter);
+   struct pmem_device *pmem = pgmap->owner;
 
-   blk_cleanup_disk(queue_to_disk(q));
+   blk_cleanup_disk(pmem->disk);
 }
 
 static void pmem_release_queue(void *pgmap)
@@ -427,6 +428,7 @@ static int pmem_attach_disk(struct device *dev,
q = disk->queue;
 
pmem->disk = disk;
+   pmem->pgmap.owner = pmem;
pmem->pfn_flags = PFN_DEV;
pmem->pgmap.ref = >q_usage_counter;
if (is_nd_pfn(dev)) {

Re: [PATCH v4 4/4] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

2021-06-07 Thread Andrew Morton

On Sat,  5 Jun 2021 11:42:16 +1000 Nicholas Piggin  wrote:

> On a 16-socket 192-core POWER8 system, a context switching benchmark
> with as many software threads as CPUs (so each switch will go in and
> out of idle), upstream can achieve a rate of about 1 million context
> switches per second. After this patch it goes up to 118 million.

Nice.  Do we have a feel for the benefit on any real-world workloads?

Could any other architectures benefit from these changes?

Re: [PATCH v4 1/4] lazy tlb: introduce lazy mm refcount helper functions

2021-06-07 Thread Andrew Morton

On Sat,  5 Jun 2021 11:42:13 +1000 Nicholas Piggin  wrote:

> Add explicit _lazy_tlb annotated functions for lazy mm refcounting.
> This makes lazy mm references more obvious, and allows explicit
> refcounting to be removed if it is not used.
> 
> ...
>
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1314,14 +1314,14 @@ void kthread_use_mm(struct mm_struct *mm)
>   WARN_ON_ONCE(!(tsk->flags & PF_KTHREAD));
>   WARN_ON_ONCE(tsk->mm);
>  
> + mmgrab(mm);
> +
>   task_lock(tsk);
>   /* Hold off tlb flush IPIs while switching mm's */
>   local_irq_disable();
>   active_mm = tsk->active_mm;
> - if (active_mm != mm) {
> - mmgrab(mm);
> + if (active_mm != mm)
>   tsk->active_mm = mm;
> - }

Looks like a functional change.  What's happening here?

[PATCH 1/2] selftests/powerpc: Add missing clobbered register to to ptrace TM tests

2021-06-07 Thread Jordan Niethe

ISA v3.1 removes TM but includes a synthetic implementation for
backwards compatibility.  With this implementation,  the tests
ptrace-tm-spd-gpr and ptrace-tm-gpr should never be able to make any
forward progress and eventually should be killed by the timeout.
Instead on a P10 running in P9 mode, ptrace_tm_gpr fails like so:

test: ptrace_tm_gpr
tags: git_version:unknown
Starting the child
...
...
GPR[27]: 1 Expected: 2
GPR[28]: 1 Expected: 2
GPR[29]: 1 Expected: 2
GPR[30]: 1 Expected: 2
GPR[31]: 1 Expected: 2
[FAIL] Test FAILED on line 98
failure: ptrace_tm_gpr
selftests:  ptrace-tm-gpr [FAIL]

The problem is in the inline assembly of the child. r0 is loaded with a
value in the child's transaction abort handler but this register is not
included in the clobbers list.  This means it is possible that this
statement:
cptr[1] = 0;
which is meant to signal the parent to wait may actually use the value
placed into r0 by the inline assembly incorrectly signal the parent to
continue.

By inspection the same problem is present in ptrace-tm-spd-gpr.

Adding r0 to the clobbbers list makes the test fail correctly via a
timeout on a P10 running in P8/P9 compatibility mode.

Suggested-by: Michael Neuling 
Signed-off-by: Jordan Niethe 
---
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c | 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
index 82f7bdc2e5e6..7df7100a29be 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
@@ -57,7 +57,7 @@ void tm_gpr(void)
: [gpr_1]"i"(GPR_1), [gpr_2]"i"(GPR_2),
[sprn_texasr] "i" (SPRN_TEXASR), [flt_1] "b" (),
[flt_2] "b" (), [cptr1] "b" ([1])
-   : "memory", "r7", "r8", "r9", "r10",
+   : "memory", "r0", "r7", "r8", "r9", "r10",
"r11", "r12", "r13", "r14", "r15", "r16",
"r17", "r18", "r19", "r20", "r21", "r22",
"r23", "r24", "r25", "r26", "r27", "r28",
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
index ad65be6e8e85..8706bea5d015 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
@@ -65,7 +65,7 @@ void tm_spd_gpr(void)
: [gpr_1]"i"(GPR_1), [gpr_2]"i"(GPR_2), [gpr_4]"i"(GPR_4),
[sprn_texasr] "i" (SPRN_TEXASR), [flt_1] "b" (),
[flt_4] "b" ()
-   : "memory", "r5", "r6", "r7",
+   : "memory", "r0", "r5", "r6", "r7",
"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
"r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
"r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31"
-- 
2.25.1

[PATCH 2/2] selftests: Skip TM tests on synthetic TM implementations

2021-06-07 Thread Jordan Niethe

Transactional Memory was removed from the architecture in ISA v3.1. For
threads running in P8/P9 compatibility mode on P10 a synthetic TM
implementation is provided. In this implementation, tbegin. always sets
cr0 eq meaning the abort handler is always called. This is not an issue
as users of TM are expected to have a fallback non transactional way to
make forward progress in the abort handler.

As the TM self tests exist only to test TM, no alternative path forward
is provided, leading to them timing out and failing on the synthetic TM
implementation.

The TEXASR indicates if a transaction failure is due to a synthetic
implementation. Check for a synthetic implementation and skip the TM
tests if so.

Signed-off-by: Jordan Niethe 
---
 .../selftests/powerpc/ptrace/ptrace-tm-gpr.c  |  1 +
 .../powerpc/ptrace/ptrace-tm-spd-gpr.c|  1 +
 .../powerpc/ptrace/ptrace-tm-spd-tar.c|  1 +
 .../selftests/powerpc/ptrace/ptrace-tm-tar.c  |  1 +
 .../selftests/powerpc/tm/tm-resched-dscr.c|  1 +
 .../selftests/powerpc/tm/tm-signal-stack.c|  1 +
 .../testing/selftests/powerpc/tm/tm-syscall.c |  2 +-
 tools/testing/selftests/powerpc/tm/tm.h   | 36 +++
 8 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
index 7df7100a29be..67ca297c5cca 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c
@@ -113,6 +113,7 @@ int ptrace_tm_gpr(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 2, 0777|IPC_CREAT);
pid = fork();
if (pid < 0) {
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
index 8706bea5d015..6f2bce1b6c5d 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c
@@ -119,6 +119,7 @@ int ptrace_tm_spd_gpr(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT);
pid = fork();
if (pid < 0) {
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c
index 2ecfa1158e2b..e112a34fbe59 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c
@@ -129,6 +129,7 @@ int ptrace_tm_spd_tar(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 3, 0777|IPC_CREAT);
pid = fork();
if (pid == 0)
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar.c
index 46ef378a15ec..d0db6df0f0ea 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar.c
@@ -117,6 +117,7 @@ int ptrace_tm_tar(void)
int ret, status;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
shm_id = shmget(IPC_PRIVATE, sizeof(int) * 2, 0777|IPC_CREAT);
pid = fork();
if (pid == 0)
diff --git a/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c 
b/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c
index 4cdb83964bb3..85c940ae6ff8 100644
--- a/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c
+++ b/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c
@@ -40,6 +40,7 @@ int test_body(void)
uint64_t rv, dscr1 = 1, dscr2, texasr;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
 
printf("Check DSCR TM context switch: ");
fflush(stdout);
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-stack.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-stack.c
index cdcf8c5bbbc7..68807aac8dd3 100644
--- a/tools/testing/selftests/powerpc/tm/tm-signal-stack.c
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-stack.c
@@ -35,6 +35,7 @@ int tm_signal_stack()
int pid;
 
SKIP_IF(!have_htm());
+   SKIP_IF(htm_is_synthetic());
 
pid = fork();
if (pid < 0)
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
index becb8207b432..467a6b3134b2 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -25,7 +25,6 @@ extern int getppid_tm_suspended(void);
 unsigned retries = 0;
 
 #define TEST_DURATION 10 /* seconds */
-#define TM_RETRIES 100
 
 pid_t getppid_tm(bool suspend)
 {
@@ -67,6 +66,7 @@ int tm_syscall(void)
struct timeval end, now;
 
SKIP_IF(!have_htm_nosc());
+

[PATCH] libnvdimm/pmem: Fix blk_cleanup_disk() usage

2021-06-07 Thread Dan Williams

The queue_to_disk() helper can not be used after del_gendisk()
communicate @disk via the pgmap->owner.

Reported-by: Sachin Sant 
Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk")
Cc: Christoph Hellwig 
Cc: Ulf Hansson 
Cc: Jens Axboe 
Signed-off-by: Dan Williams 
---
Jens,

Please take or fold this into your tree after Sachin has a chance
to test it out. It's passing my tests.

 drivers/nvdimm/pmem.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 31f3c4bd6f72..fc6b78dd2d24 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -337,8 +337,9 @@ static void pmem_pagemap_cleanup(struct dev_pagemap *pgmap)
 {
struct request_queue *q =
container_of(pgmap->ref, struct request_queue, q_usage_counter);
+   struct pmem_device *pmem = pgmap->owner;
 
-   blk_cleanup_disk(queue_to_disk(q));
+   blk_cleanup_disk(pmem->disk);
 }
 
 static void pmem_release_queue(void *pgmap)
@@ -427,6 +428,7 @@ static int pmem_attach_disk(struct device *dev,
q = disk->queue;
 
pmem->disk = disk;
+   pmem->pgmap.owner = pmem;
pmem->pfn_flags = PFN_DEV;
pmem->pgmap.ref = >q_usage_counter;
if (is_nd_pfn(dev)) {

Re: [PATCH v7 00/11] Speedup mremap on ppc64

2021-06-07 Thread Nick Piggin

On Monday, 7 June 2021, Aneesh Kumar K.V  wrote:

>
> This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
> the platform to support updating higher-level page tables without
> updating page table entries. This also needs to invalidate the Page Walk
> Cache on architecture supporting the same.
>
> Changes from v6:
> * Update ppc64 flush_tlb_range to invalidate page walk cache.


I'd really rather not do this, I'm not sure if micro bench mark captures
everything.

Page tables coming from L2/L3 probably aren't the primary purpose or
biggest benefit of intermediate level caches.

The situation on POWER with nest mmu (coherent accelerators) is magnified.
They have huge page walk cashes to make up for the fact they don't have
data caches for walking page tables which makes the invalidation more
painful in terms of subsequent misses, but also latency to invalidate (can
be order of microseconds whereas a page invalidate is a couple of orders of
magnitude faster).

Yes it is a deficiency of the ppc invalidation architecture, we are aware
and would like to improve it but for now those is what we have.

Thanks,
Nick


> * Add patches to fix race between mremap and page out
> * Add patch to fix build error with page table levels 2
>
> Changes from v5:
> * Drop patch mm/mremap: Move TLB flush outside page table lock
> * Add fixes for race between optimized mremap and page out
>
> Changes from v4:
> * Change function name and arguments based on review feedback.
>
> Changes from v3:
> * Fix build error reported by kernel test robot
> * Address review feedback.
>
> Changes from v2:
> * switch from using mmu_gather to flush_pte_tlb_pwc_range()
>
> Changes from v1:
> * Rebase to recent upstream
> * Fix build issues with tlb_gather_mmu changes
>
>
> Aneesh Kumar K.V (11):
>   mm/mremap: Fix race between MOVE_PMD mremap and pageout
>   mm/mremap: Fix race between MOVE_PUD mremap and pageout
>   selftest/mremap_test: Update the test to handle pagesize other than 4K
>   selftest/mremap_test: Avoid crash with static build
>   mm/mremap: Convert huge PUD move to separate helper
>   mm/mremap: Don't enable optimized PUD move if page table levels is 2
>   mm/mremap: Use pmd/pud_poplulate to update page table entries
>   powerpc/mm/book3s64: Fix possible build error
>   mm/mremap: Allow arch runtime override
>   powerpc/book3s64/mm: Update flush_tlb_range to flush page walk cache
>   powerpc/mm: Enable HAVE_MOVE_PMD support
>
>  .../include/asm/book3s/64/tlbflush-radix.h|   2 +
>  arch/powerpc/include/asm/tlb.h|   6 +
>  arch/powerpc/mm/book3s64/radix_hugetlbpage.c  |   8 +-
>  arch/powerpc/mm/book3s64/radix_tlb.c  |  70 +++
>  arch/powerpc/platforms/Kconfig.cputype|   2 +
>  include/linux/rmap.h  |  13 +-
>  mm/mremap.c   | 104 +--
>  mm/page_vma_mapped.c  |  43 ---
>  tools/testing/selftests/vm/mremap_test.c  | 118 ++
>  9 files changed, 251 insertions(+), 115 deletions(-)
>
> --
> 2.31.1
>
>

Re: [PATCH] powerpc/stacktrace: fix raise_backtrace_ipi() logic

2021-06-07 Thread Nathan Lynch

Michael Ellerman  writes:
> Nathan Lynch  writes:
>> Hi Michael,
>>
>> Michael Ellerman  writes:
>>> Nathan Lynch  writes:
 When smp_send_safe_nmi_ipi() indicates that the target CPU has
 responded to the IPI, skip the remote paca inspection
 fallback. Otherwise both the sending and target CPUs attempt the
 backtrace, usually creating a misleading ("didn't respond to backtrace
 IPI" is wrong) and interleaved mess:
>>>
>>> Thanks for fixing my bugs for me :)
>>>
>>
>> Thanks for your review! I was beginning to think I had missed some
>> subtletly here, thanks for illustrating it.
>>
>> I'll run with your proposed change below for the problem I'm working.
>
> Thanks. I did test it a bit with the test_lockup module, but some real
> world testing would be good too.

Been running with this to work some LPM issues and can confirm it waits
the intended amount of time before falling back to a remote stack walk,
avoiding interleaved traces from source and target CPUs. You can add my
tested-by, thanks.

Re: [PATCH] powerpc: Fix duplicate included _clear.h

2021-06-07 Thread kernel test robot

Hi Jiapeng,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.13-rc5 next-20210607]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Jiapeng-Chong/powerpc-Fix-duplicate-included-_clear-h/20210607-182626
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/9ee542baa91ad7cfa1e498c539ffb42b8d7f07b0
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Jiapeng-Chong/powerpc-Fix-duplicate-included-_clear-h/20210607-182626
git checkout 9ee542baa91ad7cfa1e498c539ffb42b8d7f07b0
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/perf/hv-gpci.h:29,
from arch/powerpc/perf/hv-gpci.c:18:
>> arch/powerpc/perf/req-gen/perf.h:57: warning: "REQUEST_" redefined
  57 | #define REQUEST_(r_name, r_value, r_idx_1, r_fields) \
 | 
   In file included from arch/powerpc/perf/hv-gpci.h:29,
from arch/powerpc/perf/hv-gpci.c:18:
   arch/powerpc/perf/req-gen/perf.h:42: note: this is the location of the 
previous definition
  42 | #define REQUEST_(r_name, r_value, r_idx_1, r_fields) \
 | 


vim +/REQUEST_ +57 arch/powerpc/perf/req-gen/perf.h

9e9f60108423f1 Cody P Schafer 2015-01-30  47  
9e9f60108423f1 Cody P Schafer 2015-01-30  48  /*
9e9f60108423f1 Cody P Schafer 2015-01-30  49   * For each request:
9e9f60108423f1 Cody P Schafer 2015-01-30  50   * struct _ {
9e9f60108423f1 Cody P Schafer 2015-01-30  51   *r_fields
9e9f60108423f1 Cody P Schafer 2015-01-30  52   * };
9e9f60108423f1 Cody P Schafer 2015-01-30  53   */
9e9f60108423f1 Cody P Schafer 2015-01-30  54  #define STRUCT_NAME__(name_lower, 
r_name) name_lower ## _ ## r_name
9e9f60108423f1 Cody P Schafer 2015-01-30  55  #define STRUCT_NAME_(name_lower, 
r_name) STRUCT_NAME__(name_lower, r_name)
9e9f60108423f1 Cody P Schafer 2015-01-30  56  #define STRUCT_NAME(r_name) 
STRUCT_NAME_(NAME_LOWER, r_name)
9e9f60108423f1 Cody P Schafer 2015-01-30 @57  #define REQUEST_(r_name, r_value, 
r_idx_1, r_fields)  \
9e9f60108423f1 Cody P Schafer 2015-01-30  58  struct STRUCT_NAME(r_name) {  
\
9e9f60108423f1 Cody P Schafer 2015-01-30  59r_fields
\
9e9f60108423f1 Cody P Schafer 2015-01-30  60  };
9e9f60108423f1 Cody P Schafer 2015-01-30  61  #define __field_(r_name, r_value, 
r_idx_1, f_offset, f_bytes, f_name) \
9e9f60108423f1 Cody P Schafer 2015-01-30  62BYTES_TO_BE_TYPE(f_bytes) 
f_name;
9e9f60108423f1 Cody P Schafer 2015-01-30  63  #define __count_(r_name, r_value, 
r_idx_1, f_offset, f_bytes, f_name) \
9e9f60108423f1 Cody P Schafer 2015-01-30  64__field_(r_name, r_value, 
r_idx_1, f_offset, f_bytes, f_name)
9e9f60108423f1 Cody P Schafer 2015-01-30  65  #define __array_(r_name, r_value, 
r_idx_1, a_offset, a_bytes, a_name) \
9e9f60108423f1 Cody P Schafer 2015-01-30  66__u8 a_name[a_bytes];
9e9f60108423f1 Cody P Schafer 2015-01-30  67  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

[PATCH v1 05/12] mm/memory_hotplug: remove nid parameter from remove_memory() and friends

2021-06-07 Thread David Hildenbrand

There is only a single user remaining. We can simply try to offline all
online nodes - which is fast, because we usually span pages and can skip
such nodes right away.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Andrew Morton 
Cc: Nathan Lynch 
Cc: Laurent Dufour 
Cc: "Aneesh Kumar K.V" 
Cc: Scott Cheloha 
Cc: Anton Blanchard 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-a...@vger.kernel.org
Cc: nvd...@lists.linux.dev
Signed-off-by: David Hildenbrand 
---
 .../platforms/pseries/hotplug-memory.c|  9 -
 drivers/acpi/acpi_memhotplug.c|  7 +--
 drivers/dax/kmem.c|  3 +--
 drivers/virtio/virtio_mem.c   |  4 ++--
 include/linux/memory_hotplug.h| 10 +-
 mm/memory_hotplug.c   | 20 +--
 6 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 8377f1f7c78e..4a9232ddbefe 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -286,7 +286,7 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned long memblock_si
 {
unsigned long block_sz, start_pfn;
int sections_per_block;
-   int i, nid;
+   int i;
 
start_pfn = base >> PAGE_SHIFT;
 
@@ -297,10 +297,9 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned long memblock_si
 
block_sz = pseries_memory_block_size();
sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
-   nid = memory_add_physaddr_to_nid(base);
 
for (i = 0; i < sections_per_block; i++) {
-   __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
+   __remove_memory(base, MIN_MEMORY_BLOCK_SIZE);
base += MIN_MEMORY_BLOCK_SIZE;
}
 
@@ -386,7 +385,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 
block_sz = pseries_memory_block_size();
 
-   __remove_memory(mem_block->nid, lmb->base_addr, block_sz);
+   __remove_memory(lmb->base_addr, block_sz);
put_device(_block->dev);
 
/* Update memory regions for memory remove */
@@ -638,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
rc = dlpar_online_lmb(lmb);
if (rc) {
-   __remove_memory(nid, lmb->base_addr, block_sz);
+   __remove_memory(lmb->base_addr, block_sz);
invalidate_lmb_associativity_index(lmb);
} else {
lmb->flags |= DRCONF_MEM_ASSIGNED;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8cc195c4c861..1d01d9414c40 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -239,19 +239,14 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
 
 static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
 {
-   acpi_handle handle = mem_device->device->handle;
struct acpi_memory_info *info, *n;
-   int nid = acpi_get_node(handle);
 
list_for_each_entry_safe(info, n, _device->res_list, list) {
if (!info->enabled)
continue;
 
-   if (nid == NUMA_NO_NODE)
-   nid = memory_add_physaddr_to_nid(info->start_addr);
-
acpi_unbind_memory_blocks(info);
-   __remove_memory(nid, info->start_addr, info->length);
+   __remove_memory(info->start_addr, info->length);
list_del(>list);
kfree(info);
}
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index ac231cc36359..99e0f60c4c26 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -156,8 +156,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
if (rc)
continue;
 
-   rc = remove_memory(dev_dax->target_node, range.start,
-   range_len());
+   rc = remove_memory(range.start, range_len());
if (rc == 0) {
release_resource(data->res[i]);
kfree(data->res[i]);
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 10ec60d81e84..e327fb878143 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -673,7 +673,7 @@ static int virtio_mem_remove_memory(struct virtio_mem *vm, 
uint64_t addr,
 
dev_dbg(>vdev->dev, "removing memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
-   rc = remove_memory(vm->nid, addr, size);
+   rc = remove_memory(addr, size);
if (!rc) {
atomic64_sub(size, >offline_size);
/*
@@ -728,7 +728,7 @@ static int

[PATCH v1 04/12] mm/memory_hotplug: remove nid parameter from arch_remove_memory()

2021-06-07 Thread David Hildenbrand

The parameter is unused, let's remove it.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: Anshuman Khandual 
Cc: Ard Biesheuvel 
Cc: Mike Rapoport 
Cc: Nicholas Piggin 
Cc: Pavel Tatashin 
Cc: Baoquan He 
Cc: Laurent Dufour 
Cc: Sergei Trofimovich 
Cc: Kefeng Wang 
Cc: Michel Lespinasse 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: Thiago Jung Bauermann 
Cc: Joe Perches 
Cc: Pierre Morel 
Cc: Jia He 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Signed-off-by: David Hildenbrand 
---
 arch/arm64/mm/mmu.c| 3 +--
 arch/ia64/mm/init.c| 3 +--
 arch/powerpc/mm/mem.c  | 3 +--
 arch/s390/mm/init.c| 3 +--
 arch/sh/mm/init.c  | 3 +--
 arch/x86/mm/init_32.c  | 3 +--
 arch/x86/mm/init_64.c  | 3 +--
 include/linux/memory_hotplug.h | 3 +--
 mm/memory_hotplug.c| 4 ++--
 mm/memremap.c  | 5 +
 10 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 89b66ef43a0f..c7821013f551 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1502,8 +1502,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
 }
 
-void arch_remove_memory(int nid, u64 start, u64 size,
-   struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 064a967a7b6e..5c6da8d83c1a 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -484,8 +484,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
 }
 
-void arch_remove_memory(int nid, u64 start, u64 size,
-   struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 043bbeaf407c..fc5c36189c26 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -115,8 +115,7 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return rc;
 }
 
-void __ref arch_remove_memory(int nid, u64 start, u64 size,
- struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 8ac710de1ab1..d85bd7f5d8dc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -306,8 +306,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
return rc;
 }
 
-void arch_remove_memory(int nid, u64 start, u64 size,
-   struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 168d7d4dd735..d74daf68e59e 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -414,8 +414,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
 }
 
-void arch_remove_memory(int nid, u64 start, u64 size,
-   struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = PFN_DOWN(start);
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 21ffb03f6c72..5e82aafb5b49 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -801,8 +801,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start_pfn, nr_pages, params);
 }
 
-void arch_remove_memory(int nid, u64 start, u64 size,
-   struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e527d829e1ed..d0296c081607 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1254,8 +1254,7 @@ kernel_physical_mapping_remove(unsigned long start, 
unsigned long end)
remove_pagetable(start, end, true, NULL);
 }

Re: [PATCH] powerpc/kprobes: Pass ppc_inst as a pointer to emulate_step() on ppc32

2021-06-07 Thread Christophe Leroy





Le 07/06/2021 à 16:31, Christophe Leroy a écrit :



Le 07/06/2021 à 13:34, Naveen N. Rao a écrit :

Naveen N. Rao wrote:

Trying to use a kprobe on ppc32 results in the below splat:
    BUG: Unable to handle kernel data access on read at 0x7c0802a6
    Faulting instruction address: 0xc002e9f0
    Oops: Kernel access of bad area, sig: 11 [#1]
    BE PAGE_SIZE=4K PowerPC 44x Platform
    Modules linked in:
    CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
    NIP:  c002e9f0 LR: c0011858 CTR: 8a47
    REGS: c292fd50 TRAP: 0300   Not tainted  (5.13.0-rc1-01824-g3a81c0495fdb)
    MSR:  9000   CR: 24002002  XER: 2000
    DEAR: 7c0802a6 ESR: 
    
    NIP [c002e9f0] emulate_step+0x28/0x324
    LR [c0011858] optinsn_slot+0x128/0x1
    Call Trace:
 opt_pre_handler+0x7c/0xb4 (unreliable)
 optinsn_slot+0x128/0x1
 ret_from_syscall+0x0/0x28

The offending instruction is:
    81 24 00 00 lwz r9,0(r4)

Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.

Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)


Christophe,
Can you confirm if this patch works for you? It would be good if this can go in 
v5.13.



I'm trying to use kprobes, but I must be missing something. I have tried to follow the exemple in 
kernel's documentation:


  # echo 'p:myprobe do_sys_open dfd=%r3' > 
/sys/kernel/debug/tracing/kprobe_events

  # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable

  # cat /sys/kernel/debug/kprobes/list

  c00122e4  k  kretprobe_trampoline+0x0    [OPTIMIZED]
  c018a1b4  k  do_sys_open+0x0    [OPTIMIZED]

  # cat /sys/kernel/debug/tracing/tracing_on

  1

  # cat /sys/kernel/debug/tracing/trace

# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:1
#
#    _-=> irqs-off
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / delay
#   TASK-PID CPU#     TIMESTAMP  FUNCTION
#  | | |     | |



So it looks like I get no event. I can't believe that do_sys_open() is never 
hit.

This is without your patch, so it should Oops ?


Then it looks like something is locked up somewhere, because I can't do 
anything else:

  # echo 'p:myprobe2 do_sys_openat2 dfd=%r3' 
>/sys/kernel/debug/tracing/kprobe_events

  -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy

  # echo '-:myprobe' > /sys/kernel/debug/tracing/kprobe_events

  -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy

  # echo > /sys/kernel/debug/tracing/kprobe_events

  -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy




Ok, did a new test. Seems like do_sys_open() is really never called.
I set the test at do_sys_openat2 , it was not optimised and was working.
I set the test at do_sys_openat2+0x10 , it was optimised and crashed.
Now I'm going to test the patch.

When I set an event, is that normal that it removes the previous one ? Then we can have only one 
event at a time ? And then when that event is enabled we get 'Device or resource busy' when trying 
to add a new one ?


Christophe

Re: [PATCH v1 1/1] powerpc/prom_init: Move custom isspace() to its own namespace

2021-06-07 Thread Andy Shevchenko

On Mon, May 10, 2021 at 05:49:25PM +0300, Andy Shevchenko wrote:
> If by some reason any of the headers will include ctype.h
> we will have a name collision. Avoid this by moving isspace()
> to the dedicate namespace.
> 
> First appearance of the code is in the commit cf68787b68a2
> ("powerpc/prom_init: Evaluate mem kernel parameter for early allocation").

Any comments on this?

> Reported-by: kernel test robot 
> Signed-off-by: Andy Shevchenko 
> ---
>  arch/powerpc/kernel/prom_init.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 41ed7e33d897..6845cbbc0cd4 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -701,13 +701,13 @@ static int __init prom_setprop(phandle node, const char 
> *nodename,
>  }
>  
>  /* We can't use the standard versions because of relocation headaches. */
> -#define isxdigit(c)  (('0' <= (c) && (c) <= '9') \
> -  || ('a' <= (c) && (c) <= 'f') \
> -  || ('A' <= (c) && (c) <= 'F'))
> +#define prom_isxdigit(c) (('0' <= (c) && (c) <= '9') \
> +  || ('a' <= (c) && (c) <= 'f') \
> +  || ('A' <= (c) && (c) <= 'F'))
>  
> -#define isdigit(c)   ('0' <= (c) && (c) <= '9')
> -#define islower(c)   ('a' <= (c) && (c) <= 'z')
> -#define toupper(c)   (islower(c) ? ((c) - 'a' + 'A') : (c))
> +#define prom_isdigit(c)  ('0' <= (c) && (c) <= '9')
> +#define prom_islower(c)  ('a' <= (c) && (c) <= 'z')
> +#define prom_toupper(c)  (prom_islower(c) ? ((c) - 'a' + 'A') : 
> (c))
>  
>  static unsigned long prom_strtoul(const char *cp, const char **endp)
>  {
> @@ -716,14 +716,14 @@ static unsigned long prom_strtoul(const char *cp, const 
> char **endp)
>   if (*cp == '0') {
>   base = 8;
>   cp++;
> - if (toupper(*cp) == 'X') {
> + if (prom_toupper(*cp) == 'X') {
>   cp++;
>   base = 16;
>   }
>   }
>  
> - while (isxdigit(*cp) &&
> -(value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < 
> base) {
> + while (prom_isxdigit(*cp) &&
> +(value = prom_isdigit(*cp) ? *cp - '0' : prom_toupper(*cp) - 'A' 
> + 10) < base) {
>   result = result * base + value;
>   cp++;
>   }
> -- 
> 2.30.2
> 

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] Revert "powerpc: Switch to relative jump labels"

2021-06-07 Thread Greg Kurz

On Tue, 01 Jun 2021 17:36:15 +1000
Michael Ellerman  wrote:

> Roman Bolshakov  writes:
> > On Sat, May 29, 2021 at 09:39:49AM +1000, Michael Ellerman wrote:
> >> Roman Bolshakov  writes:
> >> > This reverts commit b0b3b2c78ec075cec4721986a95abbbac8c3da4f.
> >> >
> >> > Otherwise, direct kernel boot with initramfs no longer works in QEMU.
> >> > It's broken in some bizarre way because a valid initramfs is not
> >> > recognized anymore:
> >> >
> >> >   Found initrd at 0xc1f7:0xc3d61d64
> >> >   rootfs image is not initramfs (XZ-compressed data is corrupt); looks 
> >> > like an initrd
> >> >
> >> > The issue is observed on v5.13-rc3 if the kernel is built with
> >> > defconfig, GCC 7.5.0 and GNU ld 2.32.0.
> >> 
> >> Are you able to try a different compiler?
> >
> > Hi Michael,
> >
> > I've just tried GCC 9.3.1 and the result is the same.
> >
> > The offending patch has assembly inlines, they typically go through
> > binutils/GAS and it might also be a case when older binutils doesn't
> > implement something properly (i've seen this on x86 and arm).
> 
> Jump labels use asm goto, which is a compiler feature, but you're right
> that the binutils version could also be important.
> 
> What ld versions have you tried?
> 
> And are those the toolchains from kernel.org or somewhere else?
> 
> >> I test booting qemu constantly, but I don't use GCC 7.5.
> >>
> >> And what qemu version are you using?
> >> 
> >
> > QEMU 3.1.1, but I've also tried 6.0.50 (QEMU master, 62c0ac5041e913) and
> > it fails the same way.
> 
> OK.
> 
> >> I assume your initramfs is compressed with XZ? How large is it
> >> compressed?
> >> 
> >
> > Yes, XZ. initramfs size is 30 MB (around 100 MB cpio size).
> >
> > It's interesting that the issue doesn't happen if I pass initramfs from
> > host (11MB), then the initramfs can be recognized. It might be related
> > to initramfs size then and bigger initramfs that used to work no longer
> > work with v5.13-rc3.
> 
> Are you using qemu's -initrd option to pass the initramfs, or are you
> building the initramfs into the kernel?
> 

Hi Michael,

I'm hitting the same issue while trying to boot a RHEL9 guest with
the distro's default kernel/initramfs and grub.

Interestingly this doesn't happen with older QEMU, e.g. 4.2.0 that
is shipped with RHEL8. I've bissected to this commit from the
QEMU 5.0 era :


commit 8897ea5a9fc0aafa5ed7eee1e0c49893b91a2d87
Author: David Gibson 
Date:   Thu Nov 28 16:37:04 2019 +1100

spapr: Don't attempt to clamp RMA to VRMA constraint


This mostly changes how memory is presented in the FDT.

Before 8897ea5a9fc, for a VM with 1 gig of RAM, we had several nodes,
first one being the VRMA (limited to 256 megs).

memory@2000 {
ibm,associativity = <0x04 0x00 0x00 0x00 0x00>;
reg = <0x00 0x2000 0x00 0x2000>;
device_type = "memory";
};

memory@1000 {
ibm,associativity = <0x04 0x00 0x00 0x00 0x00>;
reg = <0x00 0x1000 0x00 0x1000>;
device_type = "memory";
};

memory@0 {
ibm,associativity = <0x04 0x00 0x00 0x00 0x00>;
reg = <0x00 0x00 0x00 0x1000>;
device_type = "memory";
};


Now we have a single node for all RAM:

memory@0 {
ibm,associativity = <0x04 0x00 0x00 0x00 0x00>;
reg = <0x00 0x00 0x00 0x4000>;
device_type = "memory";
};

If I set an arbitrary constraint again on the VRMA, I get the
multiple memory nodes back and, depending on the value, the
boot succeeds. In my 1 gig RHEL9 guest case, I need to set
a VRMA size <= 0x3200.

Not sure how this can relate to the initramfs though. I just see
that grub doens't map it at the same place:

0x0310 when boot fails

0x0f00 when boot succeeds

In case this rings a bell...

> > So, I've created a small initramfs using only static busybox (2.7M
> > uncompressed, 960K compressed with xz). No error is produced and it
> > boots fine.
> >
> > If I add a dummy file (11M off /dev/urandom) to the small busybox
> > initramfs, it boots and the init is started but I'm seeing the error:
> >
> >   rootfs image is not initramfs (XZ-compressed data is corrupt); looks like 
> > an initrd
> >
> > sha1sum of the file inside initramfs doesn't match sha1sum on the host.
> >
> >   guest # sha1sum dummy
> >   407c347e671ddd00f69df12b3368048bad0ebf0c  dummy
> >   # QEMU: Terminated
> >   host $ sha1sum dummy
> >   ed8494b3eecab804960ceba2c497270eed0b0cd1  dummy
> >
> > sha1sum is the same in the guest and on the host for 10M dummy file:
> >
> >   guest # sha1sum dummy
> >   43855f7a772a28cce91da9eb8f86f53bc807631f  dummy
> >   # QEMU: Terminated
> >   host $ sha1sum dummy
> >   43855f7a772a28cce91da9eb8f86f53bc807631f  dummy
> >
> > That might explain why bigger initramfs (or initramfs with bigger files)
> > doesn't boot -

Re: [PATCH] powerpc/kprobes: Pass ppc_inst as a pointer to emulate_step() on ppc32

2021-06-07 Thread Christophe Leroy





Le 07/06/2021 à 13:34, Naveen N. Rao a écrit :

Naveen N. Rao wrote:

Trying to use a kprobe on ppc32 results in the below splat:
    BUG: Unable to handle kernel data access on read at 0x7c0802a6
    Faulting instruction address: 0xc002e9f0
    Oops: Kernel access of bad area, sig: 11 [#1]
    BE PAGE_SIZE=4K PowerPC 44x Platform
    Modules linked in:
    CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
    NIP:  c002e9f0 LR: c0011858 CTR: 8a47
    REGS: c292fd50 TRAP: 0300   Not tainted  (5.13.0-rc1-01824-g3a81c0495fdb)
    MSR:  9000   CR: 24002002  XER: 2000
    DEAR: 7c0802a6 ESR: 
    
    NIP [c002e9f0] emulate_step+0x28/0x324
    LR [c0011858] optinsn_slot+0x128/0x1
    Call Trace:
 opt_pre_handler+0x7c/0xb4 (unreliable)
 optinsn_slot+0x128/0x1
 ret_from_syscall+0x0/0x28

The offending instruction is:
    81 24 00 00 lwz r9,0(r4)

Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.

Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)


Christophe,
Can you confirm if this patch works for you? It would be good if this can go in 
v5.13.



I'm trying to use kprobes, but I must be missing something. I have tried to follow the exemple in 
kernel's documentation:


 # echo 'p:myprobe do_sys_open dfd=%r3' > 
/sys/kernel/debug/tracing/kprobe_events

 # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable

 # cat /sys/kernel/debug/kprobes/list

 c00122e4  k  kretprobe_trampoline+0x0[OPTIMIZED]
 c018a1b4  k  do_sys_open+0x0[OPTIMIZED]

 # cat /sys/kernel/debug/tracing/tracing_on

 1

 # cat /sys/kernel/debug/tracing/trace

# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:1
#
#_-=> irqs-off
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / delay
#   TASK-PID CPU#     TIMESTAMP  FUNCTION
#  | | |     | |



So it looks like I get no event. I can't believe that do_sys_open() is never 
hit.

This is without your patch, so it should Oops ?


Then it looks like something is locked up somewhere, because I can't do 
anything else:

 # echo 'p:myprobe2 do_sys_openat2 dfd=%r3' 
>/sys/kernel/debug/tracing/kprobe_events

 -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy

 # echo '-:myprobe' > /sys/kernel/debug/tracing/kprobe_events

 -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy

 # echo > /sys/kernel/debug/tracing/kprobe_events

 -sh: can't create /sys/kernel/debug/tracing/kprobe_events: Device or resource 
busy


Christophe

Re: [PATCH] Fixup for "[v2] powerpc/8xx: Allow disabling KUAP at boot time"

2021-06-07 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 06/06/2021 à 19:43, Christophe Leroy a écrit :
>
> Michael, I sent it as a Fixup because it's in next-test, but if you prefer I 
> can sent a v3.

That's fine, I squashed it in.

cheers

Re: [PATCH v3] powerpc: Fixup for v3 "powerpc/nohash: Refactor update of BDI2000 pointers in switch_mmu_context()" in next-test

2021-06-07 Thread Michael Ellerman

Christophe Leroy  writes:
> As mentionned in history, v3 doesn't apply to book3s/32 so the hunk
> on head_book3s_32.S has to be dropped from the commit mentionned
> in the title.
>
> Signed-off-by: Christophe Leroy 
> ---
> Michael, tell me if you prefer a v4 of the series.

Nah that's OK, I squashed this in.

cheers

Re: [PATCH] powerpc/kprobes: Pass ppc_inst as a pointer to emulate_step() on ppc32

2021-06-07 Thread Naveen N. Rao


Naveen N. Rao wrote:

Trying to use a kprobe on ppc32 results in the below splat:
BUG: Unable to handle kernel data access on read at 0x7c0802a6
Faulting instruction address: 0xc002e9f0
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PowerPC 44x Platform
Modules linked in:
CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
NIP:  c002e9f0 LR: c0011858 CTR: 8a47
REGS: c292fd50 TRAP: 0300   Not tainted  (5.13.0-rc1-01824-g3a81c0495fdb)
MSR:  9000   CR: 24002002  XER: 2000
DEAR: 7c0802a6 ESR: 

NIP [c002e9f0] emulate_step+0x28/0x324
LR [c0011858] optinsn_slot+0x128/0x1
Call Trace:
 opt_pre_handler+0x7c/0xb4 (unreliable)
 optinsn_slot+0x128/0x1
 ret_from_syscall+0x0/0x28

The offending instruction is:
81 24 00 00 lwz r9,0(r4)

Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.

Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)


Christophe,
Can you confirm if this patch works for you? It would be good if this 
can go in v5.13.



Thanks,
Naveen

Re: [PATCH] watchdog: Remove MV64x60 watchdog driver

2021-06-07 Thread Guenter Roeck

On Mon, Jun 07, 2021 at 11:43:26AM +1000, Michael Ellerman wrote:
> Guenter Roeck  writes:
> > On 5/17/21 4:17 AM, Michael Ellerman wrote:
> >> Guenter Roeck  writes:
> >>> On 3/18/21 10:25 AM, Christophe Leroy wrote:
>  Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
>  removed the last selector of CONFIG_MV64X60.
> 
>  Therefore CONFIG_MV64X60_WDT cannot be selected anymore and
>  can be removed.
> 
>  Signed-off-by: Christophe Leroy 
> >>>
> >>> Reviewed-by: Guenter Roeck 
> >>>
>  ---
>    drivers/watchdog/Kconfig   |   4 -
>    drivers/watchdog/Makefile  |   1 -
>    drivers/watchdog/mv64x60_wdt.c | 324 -
>    include/linux/mv643xx.h|   8 -
>    4 files changed, 337 deletions(-)
>    delete mode 100644 drivers/watchdog/mv64x60_wdt.c
> >> 
> >> I assumed this would go via the watchdog tree, but seems like I
> >> misinterpreted.
> >> 
> >
> > Wim didn't send a pull request this time around.
> >
> > Guenter
> >
> >> Should I take this via the powerpc tree for v5.14 ?
> 
> I still don't see this in the watchdog tree, should I take it?
> 
It is in my personal watchdog-next tree, but afaics Wim hasn't picked any
of it up yet. Wim ?

Thanks,
Guenter

[PATCH 3/3] powerpc/32s: Rename PTE_SIZE to PTE_T_SIZE

2021-06-07 Thread Christophe Leroy

PTE_SIZE means PTE page table size in most placed, whereas
in hash_low.S in means size of one entry in the table.

Rename it PTE_T_SIZE, and define it directly in hash_low.S
instead of going through asm-offsets.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/asm-offsets.c   | 2 --
 arch/powerpc/mm/book3s32/hash_low.S | 6 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index f1b6ff14c8a0..2bd936ebcae8 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -361,8 +361,6 @@ int main(void)
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
 #endif
 
-   DEFINE(PTE_SIZE, sizeof(pte_t));
-
 #ifdef CONFIG_KVM
OFFSET(VCPU_HOST_STACK, kvm_vcpu, arch.host_stack);
OFFSET(VCPU_HOST_PID, kvm_vcpu, arch.host_pid);
diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index fb4233a5bdf7..6925ce998557 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -27,8 +27,10 @@
 #include 
 
 #ifdef CONFIG_PTE_64BIT
+#define PTE_T_SIZE 8
 #define PTE_FLAGS_OFFSET   4   /* offset of PTE flags, in bytes */
 #else
+#define PTE_T_SIZE 4
 #define PTE_FLAGS_OFFSET   0
 #endif
 
@@ -488,7 +490,7 @@ _GLOBAL(flush_hash_pages)
bne 2f
ble cr1,19f
addir4,r4,0x1000
-   addir5,r5,PTE_SIZE
+   addir5,r5,PTE_T_SIZE
addir6,r6,-1
b   1b
 
@@ -573,7 +575,7 @@ _GLOBAL(flush_hash_pages)
 
 8: ble cr1,9f  /* if all ptes checked */
 81:addir6,r6,-1
-   addir5,r5,PTE_SIZE
+   addir5,r5,PTE_T_SIZE
addir4,r4,0x1000
lwz r0,0(r5)/* check next pte */
cmpwi   cr1,r6,1
-- 
2.25.0

[PATCH 2/3] powerpc: Define swapper_pg_dir[] in C

2021-06-07 Thread Christophe Leroy

Don't duplicate swapper_pg_dir[] in each platform's head.S

Define it in mm/pgtable.c

Define MAX_PTRS_PER_PGD because on book3s/64 PTRS_PER_PGD is
not a constant.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  3 +++
 arch/powerpc/include/asm/pgtable.h   |  4 
 arch/powerpc/kernel/asm-offsets.c|  5 -
 arch/powerpc/kernel/head_40x.S   | 11 ---
 arch/powerpc/kernel/head_44x.S   | 17 +
 arch/powerpc/kernel/head_64.S| 15 ---
 arch/powerpc/kernel/head_8xx.S   | 12 
 arch/powerpc/kernel/head_book3s_32.S | 11 ---
 arch/powerpc/kernel/head_fsl_booke.S | 12 
 arch/powerpc/mm/pgtable.c|  2 ++
 10 files changed, 10 insertions(+), 82 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a666d561b44d..4d9941b2fe51 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -232,6 +232,9 @@ extern unsigned long __pmd_frag_size_shift;
 #define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
 
+#define MAX_PTRS_PER_PGD   (1 << (H_PGD_INDEX_SIZE > RADIX_PGD_INDEX_SIZE 
? \
+  H_PGD_INDEX_SIZE : RADIX_PGD_INDEX_SIZE))
+
 /* PMD_SHIFT determines what a second-level page table entry can map */
 #define PMD_SHIFT  (PAGE_SHIFT + PTE_INDEX_SIZE)
 #define PMD_SIZE   (1UL << PMD_SHIFT)
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index c6a676714f04..b9c8641654f4 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -41,6 +41,10 @@ struct mm_struct;
 
 #ifndef __ASSEMBLY__
 
+#ifndef MAX_PTRS_PER_PGD
+#define MAX_PTRS_PER_PGD PTRS_PER_PGD
+#endif
+
 /* Keep these as a macros to avoid include dependency mess */
 #define pte_page(x)pfn_to_page(pte_pfn(x))
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0480f4006e0c..f1b6ff14c8a0 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -361,11 +361,6 @@ int main(void)
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
-   DEFINE(PGD_TABLE_SIZE, (sizeof(pgd_t) << max(RADIX_PGD_INDEX_SIZE, 
H_PGD_INDEX_SIZE)));
-#else
-   DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE);
-#endif
DEFINE(PTE_SIZE, sizeof(pte_t));
 
 #ifdef CONFIG_KVM
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 92b6c7356161..7d72ee5ab387 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -701,14 +701,3 @@ _GLOBAL(abort)
 mfspr   r13,SPRN_DBCR0
 orisr13,r13,DBCR0_RST_SYSTEM@h
 mtspr   SPRN_DBCR0,r13
-
-/* We put a few things here that have to be page-aligned. This stuff
- * goes at the beginning of the data segment, which is page-aligned.
- */
-   .data
-   .align  12
-   .globl  sdata
-sdata:
-   .globl  swapper_pg_dir
-swapper_pg_dir:
-   .space  PGD_TABLE_SIZE
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index e037eb615757..ddc978a2d381 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1233,23 +1233,8 @@ head_start_common:
isync
blr
 
-/*
- * We put a few things here that have to be page-aligned. This stuff
- * goes at the beginning of the data segment, which is page-aligned.
- */
-   .data
-   .align  PAGE_SHIFT
-   .globl  sdata
-sdata:
-
-/*
- * To support >32-bit physical addresses, we use an 8KB pgdir.
- */
-   .globl  swapper_pg_dir
-swapper_pg_dir:
-   .space  PGD_TABLE_SIZE
-
 #ifdef CONFIG_SMP
+   .data
.align  12
 temp_boot_stack:
.space  1024
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 730838c7ca39..79f2d1e61abd 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -997,18 +997,3 @@ start_here_common:
 0: trap
EMIT_BUG_ENTRY 0b, __FILE__, __LINE__, 0
.previous
-
-/*
- * We put a few things here that have to be page-aligned.
- * This stuff goes at the beginning of the bss, which is page-aligned.
- */
-   .section ".bss"
-/*
- * pgd dir should be aligned to PGD_TABLE_SIZE which is 64K.
- * We will need to find a better way to fix this
- */
-   .align  16
-
-   .globl  swapper_pg_dir
-swapper_pg_dir:
-   .space  PGD_TABLE_SIZE
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 5ce42dfac061..9bdb95f5694f 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -786,15 +786,3 @@ _GLOBAL(mmu_pin_tlb)
mtspr

[PATCH 1/3] powerpc: Define empty_zero_page[] in C

2021-06-07 Thread Christophe Leroy

At the time being, empty_zero_page[] is defined in each
platform head.S.

Define it in mm/mem.c instead, and put it in BSS section instead
of the DATA section. Commit 5227cfa71f9e ("arm64: mm: place
empty_zero_page in bss") explains why it is interesting to have
it in BSS.

Signed-off-by: Christophe Leroy 
---
Applies on top of powerpc next-test with the Fixup for v3 "powerpc/nohash: 
Refactor update of BDI2000 pointers in switch_mmu_context()".
The last patch of this small series can be applied independently at the cost of 
a trivial conflict in asm-offsets.c

 arch/powerpc/kernel/head_40x.S   | 4 
 arch/powerpc/kernel/head_44x.S   | 4 
 arch/powerpc/kernel/head_64.S| 5 -
 arch/powerpc/kernel/head_8xx.S   | 6 --
 arch/powerpc/kernel/head_book3s_32.S | 5 -
 arch/powerpc/kernel/head_fsl_booke.S | 4 
 arch/powerpc/mm/mem.c| 3 +++
 7 files changed, 3 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 2717aa860cae..92b6c7356161 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -709,10 +709,6 @@ _GLOBAL(abort)
.align  12
.globl  sdata
 sdata:
-   .globl  empty_zero_page
-empty_zero_page:
-   .space  4096
-EXPORT_SYMBOL(empty_zero_page)
.globl  swapper_pg_dir
 swapper_pg_dir:
.space  PGD_TABLE_SIZE
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 2c4ffec027a2..e037eb615757 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1241,10 +1241,6 @@ head_start_common:
.align  PAGE_SHIFT
.globl  sdata
 sdata:
-   .globl  empty_zero_page
-empty_zero_page:
-   .space  PAGE_SIZE
-EXPORT_SYMBOL(empty_zero_page)
 
 /*
  * To support >32-bit physical addresses, we use an 8KB pgdir.
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index ece7f97bafff..730838c7ca39 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -1012,8 +1012,3 @@ start_here_common:
.globl  swapper_pg_dir
 swapper_pg_dir:
.space  PGD_TABLE_SIZE
-
-   .globl  empty_zero_page
-empty_zero_page:
-   .space  PAGE_SIZE
-EXPORT_SYMBOL(empty_zero_page)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 817df9fe7fb3..5ce42dfac061 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -795,12 +795,6 @@ _GLOBAL(mmu_pin_tlb)
.data
.globl  sdata
 sdata:
-   .globl  empty_zero_page
-   .align  PAGE_SHIFT
-empty_zero_page:
-   .space  PAGE_SIZE
-EXPORT_SYMBOL(empty_zero_page)
-
.globl  swapper_pg_dir
 swapper_pg_dir:
.space  PGD_TABLE_SIZE
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 326262030279..79c744afc6b6 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -1274,11 +1274,6 @@ setup_usbgecko_bat:
.data
.globl  sdata
 sdata:
-   .globl  empty_zero_page
-empty_zero_page:
-   .space  4096
-EXPORT_SYMBOL(empty_zero_page)
-
.globl  swapper_pg_dir
 swapper_pg_dir:
.space  PGD_TABLE_SIZE
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index ab718fd5e2a2..f33bc5a8e73e 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1221,10 +1221,6 @@ _GLOBAL(restore_to_as0)
.align  12
.globl  sdata
 sdata:
-   .globl  empty_zero_page
-empty_zero_page:
-   .space  4096
-EXPORT_SYMBOL(empty_zero_page)
.globl  swapper_pg_dir
 swapper_pg_dir:
.space  PGD_TABLE_SIZE
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 043bbeaf407c..b06e5d969051 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -27,6 +27,9 @@
 unsigned long long memory_limit;
 bool init_mem_is_free;
 
+unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] 
__page_aligned_bss;
+EXPORT_SYMBOL(empty_zero_page);
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
  unsigned long size, pgprot_t vma_prot)
 {
-- 
2.25.0

[PATCH] powerpc: Fix duplicate included _clear.h

2021-06-07 Thread Jiapeng Chong

Clean up the following includecheck warning:

./arch/powerpc/perf/req-gen/perf.h: _clear.h is included more than once.

No functional change.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 arch/powerpc/perf/req-gen/perf.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/perf/req-gen/perf.h b/arch/powerpc/perf/req-gen/perf.h
index fa9bc80..59fa588 100644
--- a/arch/powerpc/perf/req-gen/perf.h
+++ b/arch/powerpc/perf/req-gen/perf.h
@@ -51,7 +51,6 @@ enum CAT2(NAME_LOWER, _requests) {
  * r_fields
  * };
  */
-#include "_clear.h"
 #define STRUCT_NAME__(name_lower, r_name) name_lower ## _ ## r_name
 #define STRUCT_NAME_(name_lower, r_name) STRUCT_NAME__(name_lower, r_name)
 #define STRUCT_NAME(r_name) STRUCT_NAME_(NAME_LOWER, r_name)
-- 
1.8.3.1

[PATCH] powerpc: Fix duplicate included linux/sched/clock.h

2021-06-07 Thread Jiapeng Chong

Clean up the following includecheck warning:

./arch/powerpc/kernel/time.c: linux/sched/clock.h is included more than
once.

No functional change.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 arch/powerpc/kernel/time.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b67d93a..2c87620 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -53,7 +53,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-- 
1.8.3.1

[PATCH v3] powerpc: Fixup for v3 "powerpc/nohash: Refactor update of BDI2000 pointers in switch_mmu_context()" in next-test

2021-06-07 Thread Christophe Leroy

As mentionned in history, v3 doesn't apply to book3s/32 so the hunk
on head_book3s_32.S has to be dropped from the commit mentionned
in the title.

Signed-off-by: Christophe Leroy 
---
Michael, tell me if you prefer a v4 of the series.
---
 arch/powerpc/kernel/head_book3s_32.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 32c27dac9b80..326262030279 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -1282,3 +1282,9 @@ EXPORT_SYMBOL(empty_zero_page)
.globl  swapper_pg_dir
 swapper_pg_dir:
.space  PGD_TABLE_SIZE
+
+/* Room for two PTE pointers, usually the kernel and current user pointers
+ * to their respective root page table.
+ */
+abatron_pteptrs:
+   .space  8
-- 
2.25.0

Re: [PATCH v2 00/15] init_mm: cleanup ARCH's text/data/brk setup code

2021-06-07 Thread Russell King (Oracle)

On Mon, Jun 07, 2021 at 07:48:54AM +0200, Christophe Leroy wrote:
> Hi Kefeng,
> 
> What you could do is to define a __weak function that architectures can
> override and call that function from mm_init() as suggested by Mike,

The problem with weak functions is that they bloat the kernel. Each
time a weak function is overriden, it becomes dead unreachable code
within the kernel image.

At some point we're probabily going to have to enable -ffunction-sections
to (hopefully) allow the dead code to be discarded.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH v2 8/9] mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA

2021-06-07 Thread Geert Uytterhoeven

Hi Mike,

On Fri, Jun 4, 2021 at 8:50 AM Mike Rapoport  wrote:
> From: Mike Rapoport 
>
> After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
> configuration options are equivalent.
>
> Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.
>
> Done with
>
> $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
> $(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
> $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
> $(git grep -wl NEED_MULTIPLE_NODES)
>
> with manual tweaks afterwards.
>
> Signed-off-by: Mike Rapoport 

Thanks for your patch!

As you dropped the following hunk from v2 of PATCH 5/9, there's now
one reference left of CONFIG_NEED_MULTIPLE_NODES
(plus the discontigmem comment):

-diff --git a/mm/memory.c b/mm/memory.c
-index f3ffab9b9e39157b..fd0ebb63be3304f5 100644
 a/mm/memory.c
-+++ b/mm/memory.c
-@@ -90,8 +90,7 @@
- #warning Unfortunate NUMA and NUMA Balancing config, growing
page-frame for last_cpupid.
- #endif
-
--#ifndef CONFIG_NEED_MULTIPLE_NODES
--/* use the per-pgdat data instead for discontigmem - mbligh */
-+#ifdef CONFIG_FLATMEM
- unsigned long max_mapnr;
- EXPORT_SYMBOL(max_mapnr);
-

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v2 0/9] Remove DISCINTIGMEM memory model

2021-06-07 Thread Geert Uytterhoeven

Hi Mike,

You may want to fix the DISCINTIGMEM typo in the subject for v3, unless
you think that makes tracking series versions more complicated ;-)

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v2 00/15] init_mm: cleanup ARCH's text/data/brk setup code

2021-06-07 Thread Kefeng Wang




On 2021/6/7 13:48, Christophe Leroy wrote:

Hi Kefeng,

Le 07/06/2021 à 02:55, Kefeng Wang a écrit :


On 2021/6/7 5:29, Mike Rapoport wrote:

Hello Kefeng,

On Fri, Jun 04, 2021 at 03:06:18PM +0800, Kefeng Wang wrote:

Add setup_initial_init_mm() helper, then use it
to cleanup the text, data and brk setup code.

v2:
- change argument from "char *" to "void *" setup_initial_init_mm()
   suggested by Geert Uytterhoeven
- use NULL instead of (void *)0 on h8300 and m68k
- collect ACKs

Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-m...@lists.linux-m68k.org
Cc: openr...@lists.librecores.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Kefeng Wang (15):
   mm: add setup_initial_init_mm() helper
   arc: convert to setup_initial_init_mm()
   arm: convert to setup_initial_init_mm()
   arm64: convert to setup_initial_init_mm()
   csky: convert to setup_initial_init_mm()
   h8300: convert to setup_initial_init_mm()
   m68k: convert to setup_initial_init_mm()
   nds32: convert to setup_initial_init_mm()
   nios2: convert to setup_initial_init_mm()
   openrisc: convert to setup_initial_init_mm()
   powerpc: convert to setup_initial_init_mm()
   riscv: convert to setup_initial_init_mm()
   s390: convert to setup_initial_init_mm()
   sh: convert to setup_initial_init_mm()
   x86: convert to setup_initial_init_mm()
I might be missing something, but AFAIU the init_mm.start_code and 
other

fields are not used really early so the new setup_initial_init_mm()
function can be called in the generic code outside setup_arch(), e.g in
mm_init().


Hi Mike， each architecture has their own value, not the same, eg m68K 
and


h8300, also the name of the text/code/brk is different in some arch, 
so I keep


unchanged.


What you could do is to define a __weak function that architectures 
can override and call that function from mm_init() as suggested by Mike,


Something like:

void __weak setup_initial_init_mm(void)
{
init_mm.start_code = (unsigned long)_stext;
init_mm.end_code = (unsigned long)_etext;
init_mm.end_data = (unsigned long)_edata;
init_mm.brk = (unsigned long)_end;
}

Then only the few architecture that are different would override it.

I see a few archictectures are usigne PAGE_OFFSET to set .start_code, 
but it is likely that this is equivalent to _stext.



Yes,  the __weak function is option, but the change is only covered 14 
archs, there are 7 other archs（alpha  hexagon  ia64


microblaze  mips sparc  um xtensa）without related setup code. Also like 
x86, it has own brk , maybe there are some


other different in some arch, so I think let's keep unchanged for now,  
thanks.




Christophe
.

Re: [PATCH 00/21] Rid W=1 warnings from IDE

2021-06-07 Thread Christoph Hellwig

Please don't touch this code as it is about to be removed entirely.

Re: [FSL P50x0] KVM HV doesn't work anymore

2021-06-07 Thread Christian Zigotzky


On 02 June 2021 at 01:26pm, Christian Zigotzky wrote:

On 20 May 2021 at 01:07am, Nicholas Piggin wrote:

Hmm, okay that probably rules out those notifier changes then.
Can you remind me were you able to rule these out as suspects?

8f6cc75a97d1 powerpc: move norestart trap flag to bit 0
8dc7f0229b78 powerpc: remove partial register save logic
c45ba4f44f6b powerpc: clean up do_page_fault
d738ee8d56de powerpc/64e/interrupt: handle bad_page_fault in C
ceff77efa4f8 powerpc/64e/interrupt: Use new interrupt context 
tracking scheme

097157e16cf8 powerpc/64e/interrupt: reconcile irq soft-mask state in C
3db8aa10de9a powerpc/64e/interrupt: NMI save irq soft-mask state in C
0c2472de23ae powerpc/64e/interrupt: use new interrupt return
dc6231821a14 powerpc/interrupt: update common interrupt code for
4228b2c3d20e powerpc/64e/interrupt: always save nvgprs on interrupt
5a5a893c4ad8 powerpc/syscall: switch user_exit_irqoff and 
trace_hardirqs_off order


Thanks,
Nick

Hi Nick,

I tested these commits above today and all works with -smp 4. [1]

Smp 4 still doesn't work with the RC4 of kernel 5.13 on quad core 
e5500 CPUs with KVM HV. I use -smp 3 currently.


What shall I test next?

Thanks,
Christian

[1] https://forum.hyperion-entertainment.com/viewtopic.php?p=53367#p53367

Hi All,

I tested the RC5 of kernel 5.13 today. Unfortunately the KVM HV issue 
still exists.

I also figured out, that '-smp 2' doesn't work either.

Summary:

-smp 1 -> works
-smp 2 -> doesn't work
-smp 3 -> works
-smp 4 -> doesn't work

Cheers,
Christian

[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
This is already a custom built kernel with lots of debugging options turned on
(see bugzilla attached kernel .config). But of course I can add "debug" to the
other kernel command line parameters.

I'll report back when I get access to this G5 next time in about 2-3 weeks.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

51 matches

Mail list logo