date:20210419

On Tue, 2021-04-20 at 15:18 +1000, Alexey Kardashevskiy wrote:
> 
> On 20/04/2021 14:54, Leonardo Bras wrote:
> > As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
> > possible to use direct DMA mapping even with pmem region.
> > 
> > But, if that happens, the window size (len) is set to
> > (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
> > pagesize times smaller DDW to be created, being insufficient for correct
> > usage.
> > 
> > Fix this so the correct window size is used in this case.
> 
> Good find indeed.
> 
> afaict this does not create a huge problem though as 
> query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - 
> page_shift) where it matters (phyp).
> 
> 
> Reviewed-by: Alexey Kardashevskiy 
> 

Thanks for reviewing!

Leonardo Bras

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-19 Thread Lakshmi Ramasubramanian


On 4/19/21 10:00 PM, Dan Carpenter wrote:

On Tue, Apr 20, 2021 at 09:30:16AM +1000, Michael Ellerman wrote:

Lakshmi Ramasubramanian  writes:

On 4/16/21 2:05 AM, Michael Ellerman wrote:


Daniel Axtens  writes:

On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:

Sorry - missed copying device-tree and powerpc mailing lists.


There are a few "goto out;" statements before the local variable "fdt"
is initialized through the call to of_kexec_alloc_and_setup_fdt() in
elf64_load(). This will result in an uninitialized "fdt" being passed
to kvfree() in this function if there is an error before the call to
of_kexec_alloc_and_setup_fdt().

Initialize the local variable "fdt" to NULL.


I'm a huge fan of initialising local variables! But I'm struggling to
find the code path that will lead to an uninit fdt being returned...

The out label reads in part:

/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
return ret ? ERR_PTR(ret) : fdt;

As far as I can tell, any time we get a non-zero ret, we're going to
return an error pointer rather than the uninitialised value...


As Dan pointed out, the new code is in linux-next.

I have copied the new one below - the function doesn't return fdt, but
instead sets it in the arch specific field (please see the link to the
updated elf_64.c below).

https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/tree/arch/powerpc/kexec/elf_64.c?h=for-next



(btw, it does look like we might leak fdt if we have an error after we
successfully kmalloc it.)

Am I missing something? Can you link to the report for the kernel test
robot or from Dan?


/*
   * Once FDT buffer has been successfully passed to
kexec_add_buffer(),
   * the FDT buffer address is saved in image->arch.fdt. In that
case,
   * the memory cannot be freed here in case of any other error.
   */
  if (ret && !image->arch.fdt)
  kvfree(fdt);

  return ret ? ERR_PTR(ret) : NULL;

In case of an error, the memory allocated for fdt is freed unless it has
already been passed to kexec_add_buffer().


It feels like the root of the problem is that the kvfree of fdt is in
the wrong place. It's only allocated later in the function, so the error
path should reflect that. Something like the patch below.

cheers


diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 5a569bb51349..02662e72c53d 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -114,7 +114,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
  initrd_len, cmdline);
if (ret)
-   goto out;
+   goto out_free_fdt;
  
  	fdt_pack(fdt);
  
@@ -125,7 +125,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,

kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
if (ret)
-   goto out;
+   goto out_free_fdt;
  
  	/* FDT will be freed in arch_kimage_file_post_load_cleanup */

image->arch.fdt = fdt;
@@ -140,18 +140,14 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
pr_err("Error setting up the purgatory.\n");
  
+	goto out;


This will leak.  It would need to be something like:

if (ret) {
pr_err("Error setting up the purgatory.\n");
goto out_free_fdt;
}
Once "fdt" buffer is successfully passed to kexec_add_buffer() it cannot 
be freed here - it will be freed when the kexec cleanup function is called.




goto out;

But we should also fix the uninitialized variable of "elf_info" if
kexec_build_elf_info() fails.


kexec_build_elf_info() frees elf_info and zeroes it in error paths, 
except when elf_read_ehdr() fails. So, I think it is better to 
initialize the local variable "elf_info" before calling 
kexec_build_elf_info().


memset(&elf_info, 0, sizeof(elf_info));

thanks,
 -lakshmi




+
+out_free_fdt:
+   kvfree(fdt);
  out:
kfree(modified_cmdline);
kexec_free_elf_info(&elf_info);
  
-	/*

-* Once FDT buffer has been successfully passed to kexec_add_buffer(),
-* the FDT buffer address is saved in image->arch.fdt. In that case,
-* the memory cannot be freed here in case of any other error.
-*/
-   if (ret && !image->arch.fdt)
-   kvfree(fdt);
-
return ret ? ERR_PTR(ret) : NULL;
  }


regards,
dan carpenter

Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Alexey Kardashevskiy





On 20/04/2021 14:54, Leonardo Bras wrote:

As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
possible to use direct DMA mapping even with pmem region.

But, if that happens, the window size (len) is set to
(MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
pagesize times smaller DDW to be created, being insufficient for correct
usage.

Fix this so the correct window size is used in this case.


Good find indeed.

afaict this does not create a huge problem though as 
query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - 
page_shift) where it matters (phyp).



Reviewed-by: Alexey Kardashevskiy 



Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory 
present")
Signed-off-by: Leonardo Bras 
---
  arch/powerpc/platforms/pseries/iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..836cbbe0ecc5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (pmem_present) {
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
-   len = MAX_PHYSMEM_BITS - page_shift;
+   len = MAX_PHYSMEM_BITS;
else
dev_info(&dev->dev, "Skipping ibm,pmemory");
}



--
Alexey

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-19 Thread Dan Carpenter

On Tue, Apr 20, 2021 at 09:30:16AM +1000, Michael Ellerman wrote:
> Lakshmi Ramasubramanian  writes:
> > On 4/16/21 2:05 AM, Michael Ellerman wrote:
> >
> >> Daniel Axtens  writes:
>  On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
> 
>  Sorry - missed copying device-tree and powerpc mailing lists.
> 
> > There are a few "goto out;" statements before the local variable "fdt"
> > is initialized through the call to of_kexec_alloc_and_setup_fdt() in
> > elf64_load(). This will result in an uninitialized "fdt" being passed
> > to kvfree() in this function if there is an error before the call to
> > of_kexec_alloc_and_setup_fdt().
> >
> > Initialize the local variable "fdt" to NULL.
> >
> >>> I'm a huge fan of initialising local variables! But I'm struggling to
> >>> find the code path that will lead to an uninit fdt being returned...
> >>>
> >>> The out label reads in part:
> >>>
> >>>   /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
> >>>   return ret ? ERR_PTR(ret) : fdt;
> >>>
> >>> As far as I can tell, any time we get a non-zero ret, we're going to
> >>> return an error pointer rather than the uninitialised value...
> >
> > As Dan pointed out, the new code is in linux-next.
> >
> > I have copied the new one below - the function doesn't return fdt, but 
> > instead sets it in the arch specific field (please see the link to the 
> > updated elf_64.c below).
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/tree/arch/powerpc/kexec/elf_64.c?h=for-next
> >  
> >
> >>>
> >>> (btw, it does look like we might leak fdt if we have an error after we
> >>> successfully kmalloc it.)
> >>>
> >>> Am I missing something? Can you link to the report for the kernel test
> >>> robot or from Dan?
> >
> > /*
> >   * Once FDT buffer has been successfully passed to 
> > kexec_add_buffer(),
> >   * the FDT buffer address is saved in image->arch.fdt. In that 
> > case,
> >   * the memory cannot be freed here in case of any other error.
> >   */
> >  if (ret && !image->arch.fdt)
> >  kvfree(fdt);
> >
> >  return ret ? ERR_PTR(ret) : NULL;
> >
> > In case of an error, the memory allocated for fdt is freed unless it has 
> > already been passed to kexec_add_buffer().
> 
> It feels like the root of the problem is that the kvfree of fdt is in
> the wrong place. It's only allocated later in the function, so the error
> path should reflect that. Something like the patch below.
> 
> cheers
> 
> 
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index 5a569bb51349..02662e72c53d 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -114,7 +114,7 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
> initrd_len, cmdline);
>   if (ret)
> - goto out;
> + goto out_free_fdt;
>  
>   fdt_pack(fdt);
>  
> @@ -125,7 +125,7 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer(&kbuf);
>   if (ret)
> - goto out;
> + goto out_free_fdt;
>  
>   /* FDT will be freed in arch_kimage_file_post_load_cleanup */
>   image->arch.fdt = fdt;
> @@ -140,18 +140,14 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   if (ret)
>   pr_err("Error setting up the purgatory.\n");
>  
> + goto out;

This will leak.  It would need to be something like:

if (ret) {
pr_err("Error setting up the purgatory.\n");
goto out_free_fdt;
}

goto out;

But we should also fix the uninitialized variable of "elf_info" if
kexec_build_elf_info() fails.

> +
> +out_free_fdt:
> + kvfree(fdt);
>  out:
>   kfree(modified_cmdline);
>   kexec_free_elf_info(&elf_info);
>  
> - /*
> -  * Once FDT buffer has been successfully passed to kexec_add_buffer(),
> -  * the FDT buffer address is saved in image->arch.fdt. In that case,
> -  * the memory cannot be freed here in case of any other error.
> -  */
> - if (ret && !image->arch.fdt)
> - kvfree(fdt);
> -
>   return ret ? ERR_PTR(ret) : NULL;
>  }

regards,
dan carpenter

Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr





Le 19/04/2021 à 23:39, Randy Dunlap a écrit :

On 4/19/21 6:16 AM, Michael Ellerman wrote:

Randy Dunlap  writes:



Sure.  I'll post them later today.
They keep FPU and ALTIVEC as independent (build) features.


Those patches look OK.

But I don't think it makes sense to support that configuration, FPU=n
ALTVEC=y. No one is ever going to make a CPU like that. We have enough
testing surface due to configuration options, without adding artificial
combinations that no one is ever going to use.

IMHO :)

So I'd rather we just make ALTIVEC depend on FPU.


That's rather simple. See below.
I'm doing a bunch of randconfig builds with it now.

---
From: Randy Dunlap 
Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU

On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled,
there are build errors:

drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 
'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration]
enable_kernel_fp();
../arch/powerpc/lib/sstep.c: In function 'do_vec_load':
../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 
'put_vr' [-Werror=implicit-function-declaration]
   637 |   put_vr(rn, &u.v);
   |   ^~
../arch/powerpc/lib/sstep.c: In function 'do_vec_store':
../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 
'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration]
   660 |   get_vr(rn, &u.v);
   |   ^~

In theory ALTIVEC is independent of PPC_FPU but in practice nobody
is going to build such a machine, so make ALTIVEC require PPC_FPU
by depending on PPC_FPU.

Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Christophe Leroy 
Cc: Segher Boessenkool 
Cc: l...@intel.com
---
  arch/powerpc/platforms/86xx/Kconfig|1 +
  arch/powerpc/platforms/Kconfig.cputype |2 ++
  2 files changed, 3 insertions(+)

--- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig
+++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig
@@ -4,6 +4,7 @@ menuconfig PPC_86xx
bool "86xx-based boards"
depends on PPC_BOOK3S_32
select FSL_SOC
+   select PPC_FPU
select ALTIVEC
help
  The Freescale E600 SoCs have 74xx cores.
--- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype
@@ -186,6 +186,7 @@ config E300C3_CPU
  config G4_CPU
bool "G4 (74xx)"
depends on PPC_BOOK3S_32
+   select PPC_FPU
select ALTIVEC
  
  endchoice

@@ -309,6 +310,7 @@ config PHYS_64BIT
  
  config ALTIVEC

bool "AltiVec Support"
+   depends on PPC_FPU


Shouldn't we do it the other way round ? In extenso make ALTIVEC select PPC_FPU and avoid the two 
selects that are above ?



depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64)
help
  This option enables kernel support for the Altivec extensions to the

[PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
possible to use direct DMA mapping even with pmem region.

But, if that happens, the window size (len) is set to
(MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
pagesize times smaller DDW to be created, being insufficient for correct
usage.

Fix this so the correct window size is used in this case.

Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory 
present")
Signed-off-by: Leonardo Bras 
---
 arch/powerpc/platforms/pseries/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..836cbbe0ecc5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (pmem_present) {
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
-   len = MAX_PHYSMEM_BITS - page_shift;
+   len = MAX_PHYSMEM_BITS;
else
dev_info(&dev->dev, "Skipping ibm,pmemory");
}
-- 
2.30.2

Re: [PATCH v4 8/9] mm/mremap: Allow arch runtime override

2021-04-19 Thread Aneesh Kumar K.V


On 4/20/21 9:22 AM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

Architectures like ppc64 support faster mremap only with radix
translation. Hence allow a runtime check w.r.t support for fast mremap.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/tlb.h |  6 ++
  mm/mremap.c| 15 ++-
  2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 160422a439aa..058918a7cd3c 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -83,5 +83,11 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
  }
  #endif
  
+#define arch_supports_page_tables_move arch_supports_page_tables_move

+static inline bool arch_supports_page_tables_move(void)
+{
+   return radix_enabled();
+}


Not sure it's worth a respin on its own, but page table*s* move is
slightly strange phrasing.

arch_supports_move_page_tables() or arch_supports_page_table_move()
would be more typical.



I will switch to arch_supports_page_table_move()

-aneesh

[PATCH] powerpc/64s: Add FA_DUMP to defconfig

FA_DUMP (Firmware Assisted Dump) is a powerpc only feature that should
be enabled in our defconfig to get some build / test coverage.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc64_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 4f05a6652478..72b235ef6f3b 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -50,6 +50,7 @@ CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_KEXEC=y
 CONFIG_KEXEC_FILE=y
 CONFIG_CRASH_DUMP=y
+CONFIG_FA_DUMP=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_PPC_64K_PAGES=y
 CONFIG_SCHED_SMT=y
-- 
2.25.1

Re: [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush

2021-04-19 Thread Aneesh Kumar K.V


On 4/20/21 9:17 AM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

Some architectures do have the concept of page walk cache which need
to be flush when updating higher levels of page tables. A fast mremap
that involves moving page table pages instead of copying pte entries
should flush page walk cache since the old translation cache is no more
valid.

Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and
page walk cache where TLB entries are mapped with page size PAGE_SIZE.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/tlbflush.h | 11 +++
  mm/mremap.c   | 15 +--
  2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index f9f8a3a264f7..c236b66f490b 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -80,6 +80,17 @@ static inline void flush_hugetlb_tlb_range(struct 
vm_area_struct *vma,
return flush_hugetlb_tlb_pwc_range(vma, start, end, false);
  }
  
+#define flush_pte_tlb_pwc_range flush_tlb_pwc_range

+static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long 
end,
+  bool also_pwc)


This still uses the also_pwc name, which is a bit inconsistent with the
previous patch.



will fix that.


But, does it even need to be a parameter? AFAICS you always pass true,
and pwc=true is sort of implied by the name isn't it?



I don't have strong opinion about that. I was wondering having flush_pwc 
explicitly called out is a better indication of we are flushing page 
walk cache. Will drop that in the next update.



-aneesh

Re: [PATCH 0/2] Change struct page layout for page_pool

"Matthew Wilcox (Oracle)"  writes:
> The first patch here fixes two bugs on ppc32, and mips32.  It fixes one
> bug on arc and arm32 (in certain configurations).  It probably makes
> sense to get it in ASAP through the networking tree.  I'd like to see
> testing on those four architectures if possible?

Sorry I don't have easy access to any hardware that fits the bill. At
some point I'll be able to go to the office and setup a machine that (I
think) can test these paths, but I don't have an ETA on that.

You and others seem to have done lots of analysis on this though, so I
think you should merge the fixes without waiting on ppc32 testing.

cheers


>
> The second patch enables new functionality.  It is much less urgent.
> I'd really like to see Mel & Michal's thoughts on it.
>
> I have only compile-tested these patches.
>
> Matthew Wilcox (Oracle) (2):
>   mm: Fix struct page layout on 32-bit systems
>   mm: Indicate pfmemalloc pages in compound_head
>
>  include/linux/mm.h   | 12 +++-
>  include/linux/mm_types.h |  9 -
>  include/net/page_pool.h  | 12 +++-
>  net/core/page_pool.c | 12 +++-
>  4 files changed, 29 insertions(+), 16 deletions(-)
>
> -- 
> 2.30.2

Re: [PATCH 2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

Daniel Henrique Barboza  writes:
> On 4/19/21 9:48 AM, Michael Ellerman wrote:
>> Daniel Henrique Barboza  writes:
>>> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
>>> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
>>> for both QEMU and phyp. This gives us an opportunity to use this
>>> behavior to signal the hypervisor layer when an error during device
>>> removal happens, allowing it to do a proper error handling, while not
>>> breaking QEMU/phyp implementations that don't have this support.
>>>
>>> This patch introduces this idea by unisolating all CPU DRCs that failed
>>> to be removed by dlpar_cpu_remove_by_index(), when handling the
>>> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
>>> only because its the only CPU removal event QEMU uses, and there's no
>>> need at this moment to add this mechanism for phyp only code.
>> 
>> Have you also confirmed that phyp is not bothered by it? ie. everything
>> seems to continue working when you trigger this path on phyp.
>
> Yes. Daniel Bueso (dbue...@us.ibm.com) from the partition firmware team
> helped me with that. We confirmed that phyp returns RTAS_OK under these
> conditions (Unisolating an unisolated/configured DRC).

Thanks.

cheers

Re: [PATCH v4 8/9] mm/mremap: Allow arch runtime override

"Aneesh Kumar K.V"  writes:
> Architectures like ppc64 support faster mremap only with radix
> translation. Hence allow a runtime check w.r.t support for fast mremap.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/tlb.h |  6 ++
>  mm/mremap.c| 15 ++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 160422a439aa..058918a7cd3c 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -83,5 +83,11 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
>  }
>  #endif
>  
> +#define arch_supports_page_tables_move arch_supports_page_tables_move
> +static inline bool arch_supports_page_tables_move(void)
> +{
> + return radix_enabled();
> +}

Not sure it's worth a respin on its own, but page table*s* move is
slightly strange phrasing.

arch_supports_move_page_tables() or arch_supports_page_table_move()
would be more typical.

cheers

Re: [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush

"Aneesh Kumar K.V"  writes:
> Some architectures do have the concept of page walk cache which need
> to be flush when updating higher levels of page tables. A fast mremap
> that involves moving page table pages instead of copying pte entries
> should flush page walk cache since the old translation cache is no more
> valid.
>
> Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and
> page walk cache where TLB entries are mapped with page size PAGE_SIZE.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/tlbflush.h | 11 +++
>  mm/mremap.c   | 15 +--
>  2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
> b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> index f9f8a3a264f7..c236b66f490b 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -80,6 +80,17 @@ static inline void flush_hugetlb_tlb_range(struct 
> vm_area_struct *vma,
>   return flush_hugetlb_tlb_pwc_range(vma, start, end, false);
>  }
>  
> +#define flush_pte_tlb_pwc_range flush_tlb_pwc_range
> +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
> +unsigned long start, unsigned long 
> end,
> +bool also_pwc)

This still uses the also_pwc name, which is a bit inconsistent with the
previous patch.

But, does it even need to be a parameter? AFAICS you always pass true,
and pwc=true is sort of implied by the name isn't it?

cheers

> +{
> + if (radix_enabled())
> + return radix__flush_tlb_pwc_range_psize(vma->vm_mm, start,
> + end, mmu_virtual_psize, 
> also_pwc);
> + return hash__flush_tlb_range(vma, start, end);
> +}
> +
>  static inline void flush_tlb_range(struct vm_area_struct *vma,
>  unsigned long start, unsigned long end)
>  {
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 574287f9bb39..0e7b11daafee 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -210,6 +210,17 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t 
> *old_pmd,
>   drop_rmap_locks(vma);
>  }
>  
> +#ifndef flush_pte_tlb_pwc_range
> +#define flush_pte_tlb_pwc_range flush_pte_tlb_pwc_range
> +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
> +unsigned long start,
> +unsigned long end,
> +bool also_pwc)
> +{
> + return flush_tlb_range(vma, start, end);
> +}
> +#endif
> +
>  #ifdef CONFIG_HAVE_MOVE_PMD
>  static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long 
> old_addr,
> unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
> @@ -260,7 +271,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   VM_BUG_ON(!pmd_none(*new_pmd));
>   pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
>  
> - flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
> + flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE, true);
>   if (new_ptl != old_ptl)
>   spin_unlock(new_ptl);
>   spin_unlock(old_ptl);
> @@ -307,7 +318,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, 
> unsigned long old_addr,
>   VM_BUG_ON(!pud_none(*new_pud));
>  
>   pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
> - flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
> + flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PUD_SIZE, true);
>   if (new_ptl != old_ptl)
>   spin_unlock(new_ptl);
>   spin_unlock(old_ptl);
> -- 
> 2.30.2

Re: [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error

"Aneesh Kumar K.V"  writes:
> Update _tlbiel_pid() such that we can avoid build errors like below when
> using this function in other places.
>
> arch/powerpc/mm/book3s64/radix_tlb.c: In function 
> ‘__radix__flush_tlb_range_psize’:
> arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably 
> does not match constraints
>   114 |  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
>   |  ^~~
> arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in 
> ‘asm’
> make[4]: *** [scripts/Makefile.build:271: 
> arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
> m
>
> With this fix, we can also drop the __always_inline in 
> __radix_flush_tlb_range_psize
> which was added by commit e12d6d7d46a6 ("powerpc/mm/radix: mark 
> __radix__flush_tlb_range_psize() as __always_inline")
>
> Reviewed-by: Christophe Leroy 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 26 +-
>  1 file changed, 17 insertions(+), 9 deletions(-)

Acked-by: Michael Ellerman 

cheers

> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
> b/arch/powerpc/mm/book3s64/radix_tlb.c
> index 409e61210789..817a02ef6032 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -291,22 +291,30 @@ static inline void fixup_tlbie_lpid(unsigned long lpid)
>  /*
>   * We use 128 set in radix mode and 256 set in hpt mode.
>   */
> -static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
> +static inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
>  {
>   int set;
>  
>   asm volatile("ptesync": : :"memory");
>  
> - /*
> -  * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
> -  * also flush the entire Page Walk Cache.
> -  */
> - __tlbiel_pid(pid, 0, ric);
> + switch (ric) {
> + case RIC_FLUSH_PWC:
>  
> - /* For PWC, only one flush is needed */
> - if (ric == RIC_FLUSH_PWC) {
> + /* For PWC, only one flush is needed */
> + __tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
>   ppc_after_tlbiel_barrier();
>   return;
> + case RIC_FLUSH_TLB:
> + __tlbiel_pid(pid, 0, RIC_FLUSH_TLB);
> + break;
> + case RIC_FLUSH_ALL:
> + default:
> + /*
> +  * Flush the first set of the TLB, and if
> +  * we're doing a RIC_FLUSH_ALL, also flush
> +  * the entire Page Walk Cache.
> +  */
> + __tlbiel_pid(pid, 0, RIC_FLUSH_ALL);
>   }
>  
>   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
> @@ -1176,7 +1184,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
>   }
>  }
>  
> -static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct 
> *mm,
> +static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
>   unsigned long start, unsigned long end,
>   int psize, bool also_pwc)
>  {
> -- 
> 2.30.2

Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

Tyrel Datwyler  writes:
> On 4/17/21 5:30 AM, Michael Ellerman wrote:
>> Tyrel Datwyler  writes:
>>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
 Currently, neither the vio_bus or vio_driver structures provide support
 for a shutdown() routine.

 Add support for shutdown() by allowing drivers to provide a
 implementation via function pointer in their vio_driver struct and
 provide a proper implementation in the driver template for the vio_bus
 that calls a vio drivers shutdown() if defined.

 In the case that no shutdown() is defined by a vio driver and a kexec is
 in progress we implement a big hammer that calls remove() to ensure no
 further DMA for the devices is possible.

 Signed-off-by: Tyrel Datwyler 
 ---
>>>
>>> Ping... any comments, problems with this approach?
>> 
>> The kexec part seems like a bit of a hack.
>> 
>> It also doesn't help for kdump, when none of the shutdown code is run.
>
> If I understand correctly for kdump we have a reserved memory space where the
> kdump kernel is loaded, but for kexec the memory region isn't reserved ahead 
> of
> time meaning we can try and load the kernel over potential memory used for DMA
> by the current kernel.

That's correct.

>> How many drivers do we have? Can we just implement a proper shutdown for
>> them?
>
> Well that is the end goal. I just don't currently have the bandwidth to do 
> each
> driver myself with a proper shutdown sequence, and thought this was a 
> launching
> off point to at least introduce the shutdown callback to the VIO bus.

Fair enough.

> Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, 
> vmc,
> pseries_rng, nx, nx842, hvcs, hvc_vio.
>
> I can drop the kexec_in_progress hammer and just have each driver call 
> remove()
> themselves in their shutdown function. Leave it to each maintainer to decide 
> if
> remove() is enough or if there is a more lightweight quiesce sequence they
> choose to implement.

That's OK, you've convinced me. I'll take it as-is.

Eventually it would be good for drivers to implement shutdown in the
optimal way for their device, but that can be done incrementally.

cheers

Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-19 Thread Matthew Wilcox

On Tue, Apr 20, 2021 at 02:48:17AM +, Vineet Gupta wrote:
> > 32-bit architectures which expect 8-byte alignment for 8-byte integers
> > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> > page inadvertently expanded in 2019.
> 
> FWIW, ARC doesn't require 8 byte alignment for 8 byte integers. This is 
> only needed for 8-byte atomics due to the requirements of LLOCKD/SCOND 
> instructions.

Ah, like x86?  OK, great, I'll drop your arch from the list of
affected.  Thanks!

[PATCH V2 1/1] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

2021-04-19 Thread Athira Rajeev

Running perf fuzzer showed below in dmesg logs:
"Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq and doorbell exception, we
conditionally enable interrupts via may_hard_irq_enable(). This could
potentially create a window to generate a PMI. Since irq soft mask is set
to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before
perf interrupt is replayed and the PMU events could deleted or stopped.
This will change the PMU SPR values and resets the counters. Snippet of
ftrace log showing PMU callbacks invoked in "__do_irq":

-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
-0 [051] dns. 132025441306430: irq_enter <-__do_irq
-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
<<>>
-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt 
<-smp_ipi_demux_relaxed
-0 [051] dnH. 132025441307839: flush_smp_call_function_queue 
<-smp_ipi_demux_relaxed
-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
-0 [051] dnH. 132025441308787: power_pmu_event_idx 
<-perf_event_update_userpage
-0 [051] dnH. 132025441308859: rcu_read_unlock_strict 
<-perf_event_update_userpage
-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
<<>>
-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
-0 [051] dns. 132025441311319: performance_monitor_exception 
<-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add.
If the local_add operation happens within a local_irq_save, replay of
PMI will be during local_irq_restore. Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Patch adds a fix to update the PMU callback functions (del,stop,enable) to
check for pending perf interrupt. If there is an overflown PMC and pending
perf interrupt indicated in Paca or by PMAO bit set in MMCR0, clear the PMI
bit in paca to drop that sample. Also clear the MMCR0 PMAO bit which
otherwise could lead to spurious interrupts in some corner cases. Example,
a timer after power_pmu_del which will re-enable interrupts since PMI is
cleared and triggers a PMI again since PMAO bit is still set. Another
condition occures if had disabled MSR[EE] right before perf interrupt
came in. Re-enabling interrupt will trigger PMI since PMAO is still set.
But fails to find valid overflow if PMC get cleared before enabling EE.

We can't just replay PMI any time. Hence this approach is preferred rather
than replaying PMI before resetting overflown PMC. Patch also documents
core-book3s on a race condition which can trigger these PMC messages during
idle path in PowerNV.

Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and 
replay them")
Reported-by: Nageswara R Sastry 
Suggested-by: Nicholas Piggin 
Suggested-by: Madhavan Srinivasan 
Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/hw_irq.h | 19 
 arch/powerpc/perf/core-book3s.c   | 77 +++
 2 files changed, 96 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 56a98936a6a9..7e192bd8253b 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -215,6 +215,23 @@ static inline bool arch_irqs_disabled(void)
return arch_irqs_disabled_flags(arch_local_save_flags());
 }
 
+static inline int get_clear_pmi_irq_pending(void)
+{
+   /*
+* Some corner cases could clear the PMU counter overflow
+* while a masked PMI is pending. One of such case is
+* when a PMI happens during interrupt replay and perf
+* counter values gets cleared by PMU callbacks before
+* replay. So the pending PMI must be cleared here.
+*/
+   if (get_paca()->irq_happened & PACA_IRQ_PMI) {
+   WARN_ON_ONCE(mfmsr() & MSR_EE);
+   get_paca()->irq_happened &= ~PACA_IRQ_PMI;
+   return 1;
+   }
+   return 0;
+}
+
 #ifdef CONFIG_PPC_BOOK3S
 /*
  * To support disabling and enabling of irq with PMI, set of
@@ -391,6 +408,8 @@ static inline bool arch_ir

[PATCH V2 0/1] powerpc/perf: Clear pending PMI in ppmu callbacks

2021-04-19 Thread Athira Rajeev

Running perf fuzzer testsuite popped up below messages
in the dmesg logs:

"Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. Perf interrupt handler checks
the PMC's to see which PMC has overflown and if none of the PMCs are
overflown ( counter value not >= 0x8000 ), it throws warning:
"Can't find PMC that caused IRQ".

Powerpc has capability to mask and replay a performance monitoring
interrupt (PMI). In case of replayed PMI, there are some corner cases
that clears the PMCs after masking. In such cases, the perf interrupt
handler will not find the active PMC values that had caused the overflow
and thus leading to this message. This patchset attempts to fix those
corner cases.

However there is one more case in PowerNV where these messages are
emitted during system wide profiling or when a specific CPU is monitored
for an event. That is, when a counter overflow just before entering idle
and a PMI gets triggered after wakeup from idle. Since PMCs
are not saved in the idle path, perf interrupt handler will not
find overflown counter value and emits the "Can't find PMC" messages.
This patch documents this race condition in powerpc core-book3s.

Patch fixes the ppmu callbacks to disable pending interrupt before clearing
the overflown PMC and documents the race condition in idle path.

Changelog:
Changes from v1 -> v2
   Addressed review comments from Nicholas Piggin
   - Moved the PMI pending check and clearing function
 to arch/powerpc/include/asm/hw_irq.h and renamed
 function to "get_clear_pmi_irq_pending"
   - Along with checking for pending PMI bit in Paca,
 look for PMAO bit in MMCR0 register to decide on
 pending PMI interrupt.

Athira Rajeev (1):
  powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting
an overflown PMC

 arch/powerpc/include/asm/hw_irq.h | 19 
 arch/powerpc/perf/core-book3s.c   | 77 +++
 2 files changed, 96 insertions(+)

-- 
2.26.2

[powerpc:next] BUILD SUCCESS cbd3d5ba46b68c033986a6087209930f001cbcca

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: cbd3d5ba46b68c033986a6087209930f001cbcca  powerpc/fadump: Fix 
compile error since trap type change

elapsed time: 725m

configs tested: 132
configs skipped: 5

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
i386 allyesconfig
riscvallyesconfig
m68k  atari_defconfig
arcvdk_hs38_smp_defconfig
powerpc kmeter1_defconfig
openrisc simple_smp_defconfig
powerpc mpc5200_defconfig
powerpc   motionpro_defconfig
sh ap325rxa_defconfig
nds32 allnoconfig
mips  pic32mzda_defconfig
sh  rts7751r2d1_defconfig
xtensa virt_defconfig
armmmp2_defconfig
arm   omap1_defconfig
armmulti_v5_defconfig
powerpc tqm8540_defconfig
arm64alldefconfig
powerpcfsp2_defconfig
shdreamcast_defconfig
powerpc ppa8548_defconfig
xtensa  cadence_csp_defconfig
sh   se7750_defconfig
sh apsh4a3a_defconfig
sparc   sparc32_defconfig
um   alldefconfig
m68km5272c3_defconfig
umallnoconfig
powerpc  ep88xc_defconfig
powerpcsocrates_defconfig
sparc   sparc64_defconfig
armmagician_defconfig
m68kmac_defconfig
powerpc linkstation_defconfig
sh  sdk7786_defconfig
armvexpress_defconfig
mips  fuloong2e_defconfig
pariscgeneric-64bit_defconfig
mipsmaltaup_defconfig
arm   h5000_defconfig
powerpc mpc83xx_defconfig
m68k amcore_defconfig
mipsjmr3927_defconfig
mips   capcella_defconfig
powerpcwarp_defconfig
alphaallyesconfig
mipsbcm47xx_defconfig
mips mpc30x_defconfig
powerpc   ppc64_defconfig
powerpc kilauea_defconfig
mips  maltaaprp_defconfig
armspear3xx_defconfig
sh   se7751_defconfig
m68k   m5275evb_defconfig
shsh7785lcr_defconfig
powerpcicon_defconfig
arm  gemini_defconfig
arm  exynos_defconfig
powerpc mpc834x_mds_defconfig
riscv  rv32_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a003-20210419
x86_64   randconfig-a001-20210419
x86_64   randconfig-a005-20210419
x86_64   randconfig-a002-20210419
x86_64

[powerpc:merge] BUILD SUCCESS 40f5c8e99b3f2f53db08055f415af2aac416360e

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 40f5c8e99b3f2f53db08055f415af2aac416360e  Automatic merge of 
'master' into merge (2021-04-19 12:37)

elapsed time: 1430m

configs tested: 131
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
i386 allyesconfig
riscvallyesconfig
powerpc  chrp32_defconfig
powerpc  ppc6xx_defconfig
armmini2440_defconfig
arm lpc32xx_defconfig
mips tb0226_defconfig
armmvebu_v5_defconfig
armrealview_defconfig
powerpc pseries_defconfig
arm  pxa255-idp_defconfig
arm   versatile_defconfig
arm   multi_v4t_defconfig
powerpc   motionpro_defconfig
sh ap325rxa_defconfig
nds32 allnoconfig
mips  pic32mzda_defconfig
arm64alldefconfig
powerpcfsp2_defconfig
shdreamcast_defconfig
powerpc ppa8548_defconfig
xtensa  cadence_csp_defconfig
sh   se7750_defconfig
sh apsh4a3a_defconfig
sparc   sparc32_defconfig
um   alldefconfig
m68km5272c3_defconfig
mips   gcw0_defconfig
arm assabet_defconfig
powerpc pq2fads_defconfig
m68k   m5475evb_defconfig
arc  axs103_smp_defconfig
mips  malta_kvm_defconfig
riscvalldefconfig
powerpc mpc85xx_cds_defconfig
ia64defconfig
sparc   defconfig
powerpc  ppc64e_defconfig
powerpc wii_defconfig
mipse55_defconfig
mips tb0287_defconfig
powerpc powernv_defconfig
m68km5407c3_defconfig
arm lubbock_defconfig
powerpc   ppc64_defconfig
m68k amcore_defconfig
mipsjmr3927_defconfig
mips   capcella_defconfig
powerpcwarp_defconfig
mipsbcm47xx_defconfig
alphaallyesconfig
sh   se7751_defconfig
m68k   m5275evb_defconfig
shsh7785lcr_defconfig
powerpcicon_defconfig
arm  gemini_defconfig
arm  exynos_defconfig
powerpc kmeter1_defconfig
powerpc mpc834x_mds_defconfig
riscv  rv32_defconfig
ia64 allmodconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a003-20210419
x86_64   randconfig-a001-20210419
x86_64   randconfig-a005-20210419
x86_64   randconfig-a002-20210419
x86_64   randconfig-a006-20210419
x86_64   randconfig-a004-202

Re: [PATCH 1/3] powerpc/8xx: Enhance readability of trap types

2021-04-19 Thread Xiongwei Song

On Mon, Apr 19, 2021 at 11:48 PM Christophe Leroy
 wrote:
>
> This patch makes use of trap types in head_8xx.S
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/interrupt.h | 29 
>  arch/powerpc/kernel/head_8xx.S   | 49 ++--
>  2 files changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/interrupt.h 
> b/arch/powerpc/include/asm/interrupt.h
> index ed2c4042c3d1..cf2c5c3ae716 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -2,13 +2,6 @@
>  #ifndef _ASM_POWERPC_INTERRUPT_H
>  #define _ASM_POWERPC_INTERRUPT_H
>
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
>  /* BookE/4xx */
>  #define INTERRUPT_CRITICAL_INPUT  0x100
>
> @@ -39,9 +32,11 @@
>  /* BookE/BookS/4xx/8xx */
>  #define INTERRUPT_DATA_STORAGE0x300
>  #define INTERRUPT_INST_STORAGE0x400
> +#define INTERRUPT_EXTERNAL 0x500
>  #define INTERRUPT_ALIGNMENT   0x600
>  #define INTERRUPT_PROGRAM 0x700
>  #define INTERRUPT_SYSCALL 0xc00
> +#define INTERRUPT_TRACE0xd00

The INTERRUPT_TRACE macro is defined in BookS section.
In BookE, 0xd00 stands for debug interrupt, so I defined it as
INTERRUPT_DEBUG.  I understand they are similar things,
but the terminologies are different in reference manuals.

Regards,
Xiongwei

Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

On Mon, 2021-04-19 at 20:39 -0500, Rob Herring wrote:
> On Mon, Apr 19, 2021 at 7:35 PM Leonardo Bras  wrote:
> > 
> > On Mon, 2021-04-19 at 10:44 -0500, Rob Herring wrote:
> > > On Fri, Apr 16, 2021 at 3:58 PM Leonardo Bras  wrote:
> > > > 
> > > > Hello Rob, thanks for this feedback!
> > > > 
> > > > On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> > > > > +PPC and PCI lists
> > > > > 
> > > > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  
> > > > > wrote:
> > > > > > 
> > > > > > Many other resource flag parsers already add this flag when the 
> > > > > > input
> > > > > > has bits 24 & 25 set, so update this one to do the same.
> > > > > 
> > > > > Many others? Looks like sparc and powerpc to me.
> > > > > 
> > > > 
> > > > s390 also does that, but it look like it comes from a device-tree.
> > > 
> > > I'm only looking at DT based platforms, and s390 doesn't use DT.
> > 
> > Correct.
> > Sorry, I somehow write above the opposite of what I was thinking.
> > 
> > > 
> > > > > Those would be the
> > > > > ones I worry about breaking. Sparc doesn't use of/address.c so it's
> > > > > fine. Powerpc version of the flags code was only fixed in 2019, so I
> > > > > don't think powerpc will care either.
> > > > 
> > > > In powerpc I reach this function with this stack, while configuring a
> > > > virtio-net device for a qemu/KVM pseries guest:
> > > > 
> > > > pci_process_bridge_OF_ranges+0xac/0x2d4
> > > > pSeries_discover_phbs+0xc4/0x158
> > > > discover_phbs+0x40/0x60
> > > > do_one_initcall+0x60/0x2d0
> > > > kernel_init_freeable+0x308/0x3a8
> > > > kernel_init+0x2c/0x168
> > > > ret_from_kernel_thread+0x5c/0x70
> > > > 
> > > > For this, both MMIO32 and MMIO64 resources will have flags 0x200.
> > > 
> > > Oh good, powerpc has 2 possible flags parsing functions. So in the
> > > above path, do we need to set PCI_BASE_ADDRESS_MEM_TYPE_64?
> > > 
> > > Does pci_parse_of_flags() get called in your case?
> > > 
> > 
> > It's called in some cases, but not for the device I am debugging
> > (virtio-net pci@8002000).
> > 
> > For the above device, here is an expanded stack trace:
> > 
> > of_bus_pci_get_flags() (from parser->bus->get_flags())
> > of_pci_range_parser_one() (from macro for_each_of_pci_range)
> > pci_process_bridge_OF_ranges+0xac/0x2d4
> > pSeries_discover_phbs+0xc4/0x158
> > discover_phbs+0x40/0x60
> > do_one_initcall+0x60/0x2d0
> > kernel_init_freeable+0x308/0x3a8
> > kernel_init+0x2c/0x168
> > ret_from_kernel_thread+0x5c/0x70
> > 
> > For other devices, I could also see the following stack trace:
> > ## device ethernet@8
> > 
> > pci_parse_of_flags()
> > of_create_pci_dev+0x7f0/0xa40
> > __of_scan_bus+0x248/0x320
> > pcibios_scan_phb+0x370/0x3b0
> > pcibios_init+0x8c/0x12c
> > do_one_initcall+0x60/0x2d0
> > kernel_init_freeable+0x308/0x3a8
> > kernel_init+0x2c/0x168
> > ret_from_kernel_thread+0x5c/0x70
> > 
> > Devices that get parsed with of_bus_pci_get_flags() appears first at
> > dmesg (around 0.015s in my test), while devices that get parsed by
> > pci_parse_of_flags() appears later (0.025s in my test).
> > 
> > I am not really used to this code, but having the term "discover phbs"
> > in the first trace and the term "scan phb" in the second, makes me
> > wonder if the first trace is seen on devices that are seen/described in
> > the device-tree and the second trace is seen in devices not present in
> > the device-tree and found scanning pci bus.
> 
> That was my guess as well. I think on pSeries that most PCI devices
> are in the DT whereas on Arm and other flattened DT (non OpenFirmware)
> platforms PCI devices are not in DT.
> 

It makes sense to me. 

>  Of course, for virtio devices,
> they would not be in DT in either case.

I don't get this part... in pseries it looks like virtio devices can be
in device-tree.

Oh, I think I get it... this pci@8002000 looks like a bus
(described in device-tree, so discovered), and then the devices are
inside it, getting scanned.

The virtio device gets the correct flags (from pci_parse_of_flags), but
the bus (pci@8002000) does not seem to get it correctly,
because it comes from of_bus_pci_get_flags() which makes sense
according to the name of the function.

(see lspci bellow, output without patch)


00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev
01)
Subsystem: Red Hat, Inc. Device 1100
Device tree node:
/sys/firmware/devicetree/base/pci@8002000/ethernet@8
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- 
BAR=0 offset= size=
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
BAR=4 offset=3000 size=1000 multiplier=0004
Capabilities: [60] Vendor Specific Information: VirtIO:
DeviceCfg
BAR=4 offset=2000 size=1000
Capabilities:

Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

2021-04-19 Thread Rob Herring

On Mon, Apr 19, 2021 at 7:35 PM Leonardo Bras  wrote:
>
> On Mon, 2021-04-19 at 10:44 -0500, Rob Herring wrote:
> > On Fri, Apr 16, 2021 at 3:58 PM Leonardo Bras  wrote:
> > >
> > > Hello Rob, thanks for this feedback!
> > >
> > > On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> > > > +PPC and PCI lists
> > > >
> > > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  
> > > > wrote:
> > > > >
> > > > > Many other resource flag parsers already add this flag when the input
> > > > > has bits 24 & 25 set, so update this one to do the same.
> > > >
> > > > Many others? Looks like sparc and powerpc to me.
> > > >
> > >
> > > s390 also does that, but it look like it comes from a device-tree.
> >
> > I'm only looking at DT based platforms, and s390 doesn't use DT.
>
> Correct.
> Sorry, I somehow write above the opposite of what I was thinking.
>
> >
> > > > Those would be the
> > > > ones I worry about breaking. Sparc doesn't use of/address.c so it's
> > > > fine. Powerpc version of the flags code was only fixed in 2019, so I
> > > > don't think powerpc will care either.
> > >
> > > In powerpc I reach this function with this stack, while configuring a
> > > virtio-net device for a qemu/KVM pseries guest:
> > >
> > > pci_process_bridge_OF_ranges+0xac/0x2d4
> > > pSeries_discover_phbs+0xc4/0x158
> > > discover_phbs+0x40/0x60
> > > do_one_initcall+0x60/0x2d0
> > > kernel_init_freeable+0x308/0x3a8
> > > kernel_init+0x2c/0x168
> > > ret_from_kernel_thread+0x5c/0x70
> > >
> > > For this, both MMIO32 and MMIO64 resources will have flags 0x200.
> >
> > Oh good, powerpc has 2 possible flags parsing functions. So in the
> > above path, do we need to set PCI_BASE_ADDRESS_MEM_TYPE_64?
> >
> > Does pci_parse_of_flags() get called in your case?
> >
>
> It's called in some cases, but not for the device I am debugging
> (virtio-net pci@8002000).
>
> For the above device, here is an expanded stack trace:
>
> of_bus_pci_get_flags() (from parser->bus->get_flags())
> of_pci_range_parser_one() (from macro for_each_of_pci_range)
> pci_process_bridge_OF_ranges+0xac/0x2d4
> pSeries_discover_phbs+0xc4/0x158
> discover_phbs+0x40/0x60
> do_one_initcall+0x60/0x2d0
> kernel_init_freeable+0x308/0x3a8
> kernel_init+0x2c/0x168
> ret_from_kernel_thread+0x5c/0x70
>
> For other devices, I could also see the following stack trace:
> ## device ethernet@8
>
> pci_parse_of_flags()
> of_create_pci_dev+0x7f0/0xa40
> __of_scan_bus+0x248/0x320
> pcibios_scan_phb+0x370/0x3b0
> pcibios_init+0x8c/0x12c
> do_one_initcall+0x60/0x2d0
> kernel_init_freeable+0x308/0x3a8
> kernel_init+0x2c/0x168
> ret_from_kernel_thread+0x5c/0x70
>
> Devices that get parsed with of_bus_pci_get_flags() appears first at
> dmesg (around 0.015s in my test), while devices that get parsed by
> pci_parse_of_flags() appears later (0.025s in my test).
>
> I am not really used to this code, but having the term "discover phbs"
> in the first trace and the term "scan phb" in the second, makes me
> wonder if the first trace is seen on devices that are seen/described in
> the device-tree and the second trace is seen in devices not present in
> the device-tree and found scanning pci bus.

That was my guess as well. I think on pSeries that most PCI devices
are in the DT whereas on Arm and other flattened DT (non OpenFirmware)
platforms PCI devices are not in DT. Of course, for virtio devices,
they would not be in DT in either case.

> > > > I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in
> > > > the flags. AFAICT, that's not set anywhere outside of arch code. So
> > > > never for riscv, arm and arm64 at least. That leads me to
> > > > pci_std_update_resource() which is where the PCI code sets BARs and
> > > > just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring
> > > > IORESOURCE_* flags. So it seems like 64-bit is still not handled and
> > > > neither is prefetch.
> > > >
> > >
> > > I am not sure if you mean here:
> > > a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect
> > > anything else, or
> > > b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64
> > > (or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since
> > > it's how it's added in powerpc/sparc, and else there is no point.
> >
> > I'm wondering if a) is incomplete and PCI_BASE_ADDRESS_MEM_TYPE_64
> > also needs to be set. The question is ultimately are BARs getting set
> > correctly for 64-bit? It looks to me like they aren't.
>
> I am not used to these terms, does BAR means 'Base Address Register'?

Yes. Standard PCI thing.

> If so, those are the addresses stored in pci->phb->mem_resources[i] and
> pci->phb->mem_offset[i], printed from enable_ddw() (which takes place a
> lot after discovering the device (0.17s in my run)).
>
> resource #1 pci@8002000: start=0x20008000
> end=0x2000 flags=0x200 desc=0x0 offset=0x2000
> resource #2 pci@8002000: start=0x2100
> end=0x21ff flags=0x200 desc=0x0 offset=0x0
>
> The messag

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-19 Thread Lakshmi Ramasubramanian


On 4/19/21 4:30 PM, Michael Ellerman wrote:

Lakshmi Ramasubramanian  writes:

On 4/16/21 2:05 AM, Michael Ellerman wrote:


Daniel Axtens  writes:

On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:

Sorry - missed copying device-tree and powerpc mailing lists.


There are a few "goto out;" statements before the local variable "fdt"
is initialized through the call to of_kexec_alloc_and_setup_fdt() in
elf64_load(). This will result in an uninitialized "fdt" being passed
to kvfree() in this function if there is an error before the call to
of_kexec_alloc_and_setup_fdt().

Initialize the local variable "fdt" to NULL.


I'm a huge fan of initialising local variables! But I'm struggling to
find the code path that will lead to an uninit fdt being returned...

The out label reads in part:

/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
return ret ? ERR_PTR(ret) : fdt;

As far as I can tell, any time we get a non-zero ret, we're going to
return an error pointer rather than the uninitialised value...


As Dan pointed out, the new code is in linux-next.

I have copied the new one below - the function doesn't return fdt, but
instead sets it in the arch specific field (please see the link to the
updated elf_64.c below).

https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/tree/arch/powerpc/kexec/elf_64.c?h=for-next



(btw, it does look like we might leak fdt if we have an error after we
successfully kmalloc it.)

Am I missing something? Can you link to the report for the kernel test
robot or from Dan?


/*
   * Once FDT buffer has been successfully passed to
kexec_add_buffer(),
   * the FDT buffer address is saved in image->arch.fdt. In that
case,
   * the memory cannot be freed here in case of any other error.
   */
  if (ret && !image->arch.fdt)
  kvfree(fdt);

  return ret ? ERR_PTR(ret) : NULL;

In case of an error, the memory allocated for fdt is freed unless it has
already been passed to kexec_add_buffer().


It feels like the root of the problem is that the kvfree of fdt is in
the wrong place. It's only allocated later in the function, so the error
path should reflect that. Something like the patch below.

cheers


diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 5a569bb51349..02662e72c53d 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -114,7 +114,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
  initrd_len, cmdline);
if (ret)
-   goto out;
+   goto out_free_fdt;
  
  	fdt_pack(fdt);
  
@@ -125,7 +125,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,

kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
if (ret)
-   goto out;
+   goto out_free_fdt;
  
  	/* FDT will be freed in arch_kimage_file_post_load_cleanup */

image->arch.fdt = fdt;
@@ -140,18 +140,14 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
pr_err("Error setting up the purgatory.\n");
  
+	goto out;

+
+out_free_fdt:
+   kvfree(fdt);
  out:
kfree(modified_cmdline);
kexec_free_elf_info(&elf_info);
  
-	/*

-* Once FDT buffer has been successfully passed to kexec_add_buffer(),
-* the FDT buffer address is saved in image->arch.fdt. In that case,
-* the memory cannot be freed here in case of any other error.
-*/
-   if (ret && !image->arch.fdt)
-   kvfree(fdt);
-
return ret ? ERR_PTR(ret) : NULL;
  }
  



This looks good to me. Thanks Michael.

I'll post the updated patch shortly.

 -lakshmi

Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-19 Thread Dominique MARTINET

Arnd Bergmann wrote on Mon, Apr 19, 2021 at 02:16:36PM +0200:
> In some cases, you can use the device_link infrastructure to deal
> with dependencies between devices. Not sure if this would help
> in your case, but have a look at device_link_add() etc in drivers/base/core.c

I'll need to actually try to convince myself but if creating the link
forces driver registration then it should be workable.

> > In this particular case the problem is that since 7d981405d0fd ("soc:
> > imx8m: change to use platform driver") the soc probe tries to use the
> > nvmem driver for ocotp fuses for imx8m devices, which isn't ready yet.
> > So soc loading gets pushed back to the end of the list because it gets
> > defered and other drivers relying on soc_device_match get confused
> > because they wrongly think a device doesn't match a quirk when it
> > actually does.
> >
> > If there is a way to ensure the nvmem driver gets loaded before the soc,
> > that would also solve the problem nicely, and avoid the need to mess
> > with all the ~50 drivers which use it.
> >
> > Is there a way to control in what order drivers get loaded? Something in
> > the dtb perhaps?
> 
> For built-in drivers, load order depends on the initcall level and
> link order (how things are lined listed in the Makefile hierarchy).
> 
> For loadable modules, this is up to user space in the end.
> 
> Which of the drivers in this scenario are loadable modules?

All the drivers involved in my case are built-in (nvmem, soc and final
soc_device_match consumer e.g. caam_jr that crashes the kernel if soc is
not identified properly).

I frankly don't like the idea of moving nvmem/ above soc/ in
drivers/Makefile as a "solution" to this (especially as there is one
that seems to care about what soc they run on...), so I'll have a look
at links first, hopefully that will work out.


Thanks,
-- 
Dominique

Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

On Mon, 2021-04-19 at 10:44 -0500, Rob Herring wrote:
> On Fri, Apr 16, 2021 at 3:58 PM Leonardo Bras  wrote:
> > 
> > Hello Rob, thanks for this feedback!
> > 
> > On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> > > +PPC and PCI lists
> > > 
> > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  wrote:
> > > > 
> > > > Many other resource flag parsers already add this flag when the input
> > > > has bits 24 & 25 set, so update this one to do the same.
> > > 
> > > Many others? Looks like sparc and powerpc to me.
> > > 
> > 
> > s390 also does that, but it look like it comes from a device-tree.
> 
> I'm only looking at DT based platforms, and s390 doesn't use DT.

Correct. 
Sorry, I somehow write above the opposite of what I was thinking.

> 
> > > Those would be the
> > > ones I worry about breaking. Sparc doesn't use of/address.c so it's
> > > fine. Powerpc version of the flags code was only fixed in 2019, so I
> > > don't think powerpc will care either.
> > 
> > In powerpc I reach this function with this stack, while configuring a
> > virtio-net device for a qemu/KVM pseries guest:
> > 
> > pci_process_bridge_OF_ranges+0xac/0x2d4
> > pSeries_discover_phbs+0xc4/0x158
> > discover_phbs+0x40/0x60
> > do_one_initcall+0x60/0x2d0
> > kernel_init_freeable+0x308/0x3a8
> > kernel_init+0x2c/0x168
> > ret_from_kernel_thread+0x5c/0x70
> > 
> > For this, both MMIO32 and MMIO64 resources will have flags 0x200.
> 
> Oh good, powerpc has 2 possible flags parsing functions. So in the
> above path, do we need to set PCI_BASE_ADDRESS_MEM_TYPE_64?
> 
> Does pci_parse_of_flags() get called in your case?
> 

It's called in some cases, but not for the device I am debugging
(virtio-net pci@8002000). 

For the above device, here is an expanded stack trace:

of_bus_pci_get_flags() (from parser->bus->get_flags()) 
of_pci_range_parser_one() (from macro for_each_of_pci_range)
pci_process_bridge_OF_ranges+0xac/0x2d4
pSeries_discover_phbs+0xc4/0x158
discover_phbs+0x40/0x60
do_one_initcall+0x60/0x2d0
kernel_init_freeable+0x308/0x3a8
kernel_init+0x2c/0x168
ret_from_kernel_thread+0x5c/0x70

For other devices, I could also see the following stack trace:
## device ethernet@8

pci_parse_of_flags()
of_create_pci_dev+0x7f0/0xa40
__of_scan_bus+0x248/0x320
pcibios_scan_phb+0x370/0x3b0
pcibios_init+0x8c/0x12c
do_one_initcall+0x60/0x2d0
kernel_init_freeable+0x308/0x3a8
kernel_init+0x2c/0x168
ret_from_kernel_thread+0x5c/0x70

Devices that get parsed with of_bus_pci_get_flags() appears first at
dmesg (around 0.015s in my test), while devices that get parsed by
pci_parse_of_flags() appears later (0.025s in my test).

I am not really used to this code, but having the term "discover phbs"
in the first trace and the term "scan phb" in the second, makes me
wonder if the first trace is seen on devices that are seen/described in
the device-tree and the second trace is seen in devices not present in
the device-tree and found scanning pci bus.

> > > I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in
> > > the flags. AFAICT, that's not set anywhere outside of arch code. So
> > > never for riscv, arm and arm64 at least. That leads me to
> > > pci_std_update_resource() which is where the PCI code sets BARs and
> > > just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring
> > > IORESOURCE_* flags. So it seems like 64-bit is still not handled and
> > > neither is prefetch.
> > > 
> > 
> > I am not sure if you mean here:
> > a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect
> > anything else, or
> > b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64
> > (or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since
> > it's how it's added in powerpc/sparc, and else there is no point.
> 
> I'm wondering if a) is incomplete and PCI_BASE_ADDRESS_MEM_TYPE_64
> also needs to be set. The question is ultimately are BARs getting set
> correctly for 64-bit? It looks to me like they aren't.

I am not used to these terms, does BAR means 'Base Address Register'?

If so, those are the addresses stored in pci->phb->mem_resources[i] and
pci->phb->mem_offset[i], printed from enable_ddw() (which takes place a
lot after discovering the device (0.17s in my run)).

resource #1 pci@8002000: start=0x20008000
end=0x2000 flags=0x200 desc=0x0 offset=0x2000
resource #2 pci@8002000: start=0x2100
end=0x21ff flags=0x200 desc=0x0 offset=0x0

The message above was printed without this patch.
With the patch, the flags for memory resource #2 gets ORed with 
0x0010.

Is it enough to know if BARs are correctly set for 64-bit?
If it's not, how can I check?

> 
> Rob

Thanks Rob!

Leonardo Brás

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

Lakshmi Ramasubramanian  writes:
> On 4/16/21 2:05 AM, Michael Ellerman wrote:
>
>> Daniel Axtens  writes:
 On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:

 Sorry - missed copying device-tree and powerpc mailing lists.

> There are a few "goto out;" statements before the local variable "fdt"
> is initialized through the call to of_kexec_alloc_and_setup_fdt() in
> elf64_load(). This will result in an uninitialized "fdt" being passed
> to kvfree() in this function if there is an error before the call to
> of_kexec_alloc_and_setup_fdt().
>
> Initialize the local variable "fdt" to NULL.
>
>>> I'm a huge fan of initialising local variables! But I'm struggling to
>>> find the code path that will lead to an uninit fdt being returned...
>>>
>>> The out label reads in part:
>>>
>>> /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
>>> return ret ? ERR_PTR(ret) : fdt;
>>>
>>> As far as I can tell, any time we get a non-zero ret, we're going to
>>> return an error pointer rather than the uninitialised value...
>
> As Dan pointed out, the new code is in linux-next.
>
> I have copied the new one below - the function doesn't return fdt, but 
> instead sets it in the arch specific field (please see the link to the 
> updated elf_64.c below).
>
> https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/tree/arch/powerpc/kexec/elf_64.c?h=for-next
>
>>>
>>> (btw, it does look like we might leak fdt if we have an error after we
>>> successfully kmalloc it.)
>>>
>>> Am I missing something? Can you link to the report for the kernel test
>>> robot or from Dan?
>
> /*
>   * Once FDT buffer has been successfully passed to 
> kexec_add_buffer(),
>   * the FDT buffer address is saved in image->arch.fdt. In that 
> case,
>   * the memory cannot be freed here in case of any other error.
>   */
>  if (ret && !image->arch.fdt)
>  kvfree(fdt);
>
>  return ret ? ERR_PTR(ret) : NULL;
>
> In case of an error, the memory allocated for fdt is freed unless it has 
> already been passed to kexec_add_buffer().

It feels like the root of the problem is that the kvfree of fdt is in
the wrong place. It's only allocated later in the function, so the error
path should reflect that. Something like the patch below.

cheers


diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 5a569bb51349..02662e72c53d 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -114,7 +114,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
  initrd_len, cmdline);
if (ret)
-   goto out;
+   goto out_free_fdt;
 
fdt_pack(fdt);
 
@@ -125,7 +125,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
if (ret)
-   goto out;
+   goto out_free_fdt;
 
/* FDT will be freed in arch_kimage_file_post_load_cleanup */
image->arch.fdt = fdt;
@@ -140,18 +140,14 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
pr_err("Error setting up the purgatory.\n");
 
+   goto out;
+
+out_free_fdt:
+   kvfree(fdt);
 out:
kfree(modified_cmdline);
kexec_free_elf_info(&elf_info);
 
-   /*
-* Once FDT buffer has been successfully passed to kexec_add_buffer(),
-* the FDT buffer address is saved in image->arch.fdt. In that case,
-* the memory cannot be freed here in case of any other error.
-*/
-   if (ret && !image->arch.fdt)
-   kvfree(fdt);
-
return ret ? ERR_PTR(ret) : NULL;
 }

[powerpc:next 231/236] arch/powerpc/kernel/fadump.c:731:28: error: 'INTERRUPT_SYSTEM_RESET' undeclared

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
head:   cbd3d5ba46b68c033986a6087209930f001cbcca
commit: 7153d4bf0b373428d0393c001019da4d0483fddb [231/236] powerpc/traps: 
Enhance readability for trap types
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=7153d4bf0b373428d0393c001019da4d0483fddb
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next
git checkout 7153d4bf0b373428d0393c001019da4d0483fddb
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

Note: the powerpc/next HEAD cbd3d5ba46b68c033986a6087209930f001cbcca builds 
fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/fadump.c:82:12: warning: no previous prototype for 
'fadump_cma_init' [-Wmissing-prototypes]
  82 | int __init fadump_cma_init(void)
 |^~~
   arch/powerpc/kernel/fadump.c: In function 'crash_fadump':
>> arch/powerpc/kernel/fadump.c:731:28: error: 'INTERRUPT_SYSTEM_RESET' 
>> undeclared (first use in this function)
 731 |  if (TRAP(&(fdh->regs)) == INTERRUPT_SYSTEM_RESET) {
 |^~
   arch/powerpc/kernel/fadump.c:731:28: note: each undeclared identifier is 
reported only once for each function it appears in


vim +/INTERRUPT_SYSTEM_RESET +731 arch/powerpc/kernel/fadump.c

   679  
   680  void crash_fadump(struct pt_regs *regs, const char *str)
   681  {
   682  unsigned int msecs;
   683  struct fadump_crash_info_header *fdh = NULL;
   684  int old_cpu, this_cpu;
   685  /* Do not include first CPU */
   686  unsigned int ncpus = num_online_cpus() - 1;
   687  
   688  if (!should_fadump_crash())
   689  return;
   690  
   691  /*
   692   * old_cpu == -1 means this is the first CPU which has come 
here,
   693   * go ahead and trigger fadump.
   694   *
   695   * old_cpu != -1 means some other CPU has already on it's way
   696   * to trigger fadump, just keep looping here.
   697   */
   698  this_cpu = smp_processor_id();
   699  old_cpu = cmpxchg(&crashing_cpu, -1, this_cpu);
   700  
   701  if (old_cpu != -1) {
   702  atomic_inc(&cpus_in_fadump);
   703  
   704  /*
   705   * We can't loop here indefinitely. Wait as long as 
fadump
   706   * is in force. If we race with fadump un-registration 
this
   707   * loop will break and then we go down to normal panic 
path
   708   * and reboot. If fadump is in force the first crashing
   709   * cpu will definitely trigger fadump.
   710   */
   711  while (fw_dump.dump_registered)
   712  cpu_relax();
   713  return;
   714  }
   715  
   716  fdh = __va(fw_dump.fadumphdr_addr);
   717  fdh->crashing_cpu = crashing_cpu;
   718  crash_save_vmcoreinfo();
   719  
   720  if (regs)
   721  fdh->regs = *regs;
   722  else
   723  ppc_save_regs(&fdh->regs);
   724  
   725  fdh->online_mask = *cpu_online_mask;
   726  
   727  /*
   728   * If we came in via system reset, wait a while for the 
secondary
   729   * CPUs to enter.
   730   */
 > 731  if (TRAP(&(fdh->regs)) == INTERRUPT_SYSTEM_RESET) {
   732  msecs = CRASH_TIMEOUT;
   733  while ((atomic_read(&cpus_in_fadump) < ncpus) && 
(--msecs > 0))
   734  mdelay(1);
   735  }
   736  
   737  fw_dump.ops->fadump_trigger(fdh, str);
   738  }
   739  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH] powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

2021-04-19 Thread Athira Rajeev

On 12-Apr-2021, at 12:49 PM, Athira Rajeev wrote:On 12-Apr-2021, at 8:38 AM, Nicholas Piggin wrote:Excerpts from Athira Rajeev's message of April 9, 2021 10:53 pm:On 09-Apr-2021, at 6:38 AM, Nicholas Piggin wrote:Hi Nick,Thanks for checking the patch and sharing review comments.I was going to nitpick "overflown" here as something birds do, but somesources says overflown is okay for past tense.You could use "overflowed" for that, but I understand the issue with the word: you are talking about counters that are currently in an "overflow" state, but the overflow occurred in the past and is not still happeningso you "overflowing" doesn't exactly fit either.overflown kind of works for some reason you can kind of use it forpresent tense!Ok sure, Yes counter is currently in an “overflow” state.Excerpts from Athira Rajeev's message of April 7, 2021 12:47 am:Running perf fuzzer showed below in dmesg logs:"Can't find PMC that caused IRQ"This means a PMU exception happened, but none of the PMC's (PerformanceMonitor Counter) were found to be overflown. There are some corner casesthat clears the PMCs after PMI gets masked. In such cases, the perfinterrupt handler will not find the active PMC values that had causedthe overflow and thus leads to this message while replaying.Case 1: PMU Interrupt happens during replay of other interrupts andcounter values gets cleared by PMU callbacks before replay:During replay of interrupts like timer, __do_irq and doorbell exception, weconditionally enable interrupts via may_hard_irq_enable(). This couldpotentially create a window to generate a PMI. Since irq soft mask is setto ALL_DISABLED, the PMI will get masked here.I wonder if may_hard_irq_enable shouldn't enable if PMI is softdisabled. And also maybe replay should not set ALL_DISABLED ifthere are no PMI interrupts pending.Still, I think those are a bit more tricky and might take a whileto get right or just not be worth while, so I think your patch isfine.Ok Nick.We could get IPIs run beforeperf interrupt is replayed and the PMU events could deleted or stopped.This will change the PMU SPR values and resets the counters. Snippet offtrace log showing PMU callbacks invoked in "__do_irq":-0 [051] dns. 132025441306354: __do_irq <-call_do_irq-0 [051] dns. 132025441306430: irq_enter <-__do_irq-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq<<>>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable<<>>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interruptsCase 2: PMI's masked during local_* operations, example local_add.If the local_add operation happens within a local_irq_save, replay ofPMI will be during local_irq_restore. Similar to case 1, this couldalso create a window before replay where PMU events gets deleted orstopped.Here as well perhaps PMIs should be replayed if they are unmaskedeven if other interrupts are still masked. Again that might be morecomplexity than it's worth.Ok..Patch adds a fix to update the PMU callback functions (del,stop,enable) tocheck for pending perf interrupt. If there is an overflown PMC and pendingperf interrupt indicated in Paca, clear the PMI bit in paca to drop thatsample. In case of power_pmu_del, also clear the MMCR0 PMAO bit whichotherwise could lead to spurious interrupts in some corner cases. Example,a timer after power_pmu_del which will re-enable interrupts since PMI iscleared and triggers a PMI again since PMAO bit is still set.We can't just replay PMI any time. Hence this approach is preferred ratherthan replaying PMI before resetting overflown PMC. Patch also documentscore-book3s on a race condition which can trigger these PMC messages duringidle path in PowerNV.Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them")Reported-by: Nageswara R Sastry Suggested-by: Nicholas Piggin Suggested-by: Madhavan Srinivasan Signed-off-by: Athira Rajeev ---arch/powerpc/include/asm/pmc.h | 11 +arch/powerpc/perf/core-book3s.c | 55 +2 files changed, 66 insertions(+)diff --git a/arch/powerpc/include/asm/pmc.h b/arch/power

Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-19 Thread Guenter Roeck

On 4/18/21 9:27 PM, Alice Guo (OSS) wrote:
> From: Alice Guo 
> 
> Update all the code that use soc_device_match because add support for
> soc_device_match returning -EPROBE_DEFER.
> 
> Signed-off-by: Alice Guo 
> ---
[ ... ]
>  drivers/watchdog/renesas_wdt.c|  2 +-
>  48 files changed, 131 insertions(+), 52 deletions(-)
> 
[ ... ]
> diff --git a/drivers/watchdog/renesas_wdt.c b/drivers/watchdog/renesas_wdt.c
> index 5791198960e6..fdc534dc4024 100644
> --- a/drivers/watchdog/renesas_wdt.c
> +++ b/drivers/watchdog/renesas_wdt.c
> @@ -197,7 +197,7 @@ static bool rwdt_blacklisted(struct device *dev)
>   const struct soc_device_attribute *attr;
>  
>   attr = soc_device_match(rwdt_quirks_match);
> - if (attr && setup_max_cpus > (uintptr_t)attr->data) {
> + if (!IS_ERR(attr) && attr && setup_max_cpus > (uintptr_t)attr->data) {

This is wrong. We can not make the decision below without having access
to attr. The function may wrongly return false if soc_device_match()
returns an error.

Guenter

>   dev_info(dev, "Watchdog blacklisted on %s %s\n", attr->soc_id,
>attr->revision);
>   return true;
>

Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

2021-04-19 Thread Randy Dunlap

On 4/19/21 6:16 AM, Michael Ellerman wrote:
> Randy Dunlap  writes:

>> Sure.  I'll post them later today.
>> They keep FPU and ALTIVEC as independent (build) features.
> 
> Those patches look OK.
> 
> But I don't think it makes sense to support that configuration, FPU=n
> ALTVEC=y. No one is ever going to make a CPU like that. We have enough
> testing surface due to configuration options, without adding artificial
> combinations that no one is ever going to use.
> 
> IMHO :)
> 
> So I'd rather we just make ALTIVEC depend on FPU.

That's rather simple. See below.
I'm doing a bunch of randconfig builds with it now.

---
From: Randy Dunlap 
Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU

On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled,
there are build errors:

drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 
'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration]
   enable_kernel_fp();
../arch/powerpc/lib/sstep.c: In function 'do_vec_load':
../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 
'put_vr' [-Werror=implicit-function-declaration]
  637 |   put_vr(rn, &u.v);
  |   ^~
../arch/powerpc/lib/sstep.c: In function 'do_vec_store':
../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 
'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration]
  660 |   get_vr(rn, &u.v);
  |   ^~

In theory ALTIVEC is independent of PPC_FPU but in practice nobody
is going to build such a machine, so make ALTIVEC require PPC_FPU
by depending on PPC_FPU.

Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Christophe Leroy 
Cc: Segher Boessenkool 
Cc: l...@intel.com
---
 arch/powerpc/platforms/86xx/Kconfig|1 +
 arch/powerpc/platforms/Kconfig.cputype |2 ++
 2 files changed, 3 insertions(+)

--- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig
+++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig
@@ -4,6 +4,7 @@ menuconfig PPC_86xx
bool "86xx-based boards"
depends on PPC_BOOK3S_32
select FSL_SOC
+   select PPC_FPU
select ALTIVEC
help
  The Freescale E600 SoCs have 74xx cores.
--- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype
@@ -186,6 +186,7 @@ config E300C3_CPU
 config G4_CPU
bool "G4 (74xx)"
depends on PPC_BOOK3S_32
+   select PPC_FPU
select ALTIVEC

 endchoice
@@ -309,6 +310,7 @@ config PHYS_64BIT

 config ALTIVEC
bool "AltiVec Support"
+   depends on PPC_FPU
depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64)
help
  This option enables kernel support for the Altivec extensions to the

Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-19 Thread Tyrel Datwyler

On 4/17/21 5:30 AM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
>>> Currently, neither the vio_bus or vio_driver structures provide support
>>> for a shutdown() routine.
>>>
>>> Add support for shutdown() by allowing drivers to provide a
>>> implementation via function pointer in their vio_driver struct and
>>> provide a proper implementation in the driver template for the vio_bus
>>> that calls a vio drivers shutdown() if defined.
>>>
>>> In the case that no shutdown() is defined by a vio driver and a kexec is
>>> in progress we implement a big hammer that calls remove() to ensure no
>>> further DMA for the devices is possible.
>>>
>>> Signed-off-by: Tyrel Datwyler 
>>> ---
>>
>> Ping... any comments, problems with this approach?
> 
> The kexec part seems like a bit of a hack.
> 
> It also doesn't help for kdump, when none of the shutdown code is run.

If I understand correctly for kdump we have a reserved memory space where the
kdump kernel is loaded, but for kexec the memory region isn't reserved ahead of
time meaning we can try and load the kernel over potential memory used for DMA
by the current kernel. Please correct me if I've got that wrong.

> 
> How many drivers do we have? Can we just implement a proper shutdown for
> them?

Well that is the end goal. I just don't currently have the bandwidth to do each
driver myself with a proper shutdown sequence, and thought this was a launching
off point to at least introduce the shutdown callback to the VIO bus.

Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, vmc,
pseries_rng, nx, nx842, hvcs, hvc_vio.

I can drop the kexec_in_progress hammer and just have each driver call remove()
themselves in their shutdown function. Leave it to each maintainer to decide if
remove() is enough or if there is a more lightweight quiesce sequence they
choose to implement.

-Tyrel

> 
> cheers
>

[powerpc:next 231/236] arch/powerpc/kernel/fadump.c:731:28: error: use of undeclared identifier 'INTERRUPT_SYSTEM_RESET'

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
head:   cbd3d5ba46b68c033986a6087209930f001cbcca
commit: 7153d4bf0b373428d0393c001019da4d0483fddb [231/236] powerpc/traps: 
Enhance readability for trap types
config: powerpc-randconfig-r016-20210419 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
2b50f5a4343f8fb06acaa5c36355bcf58092c9cd)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=7153d4bf0b373428d0393c001019da4d0483fddb
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next
git checkout 7153d4bf0b373428d0393c001019da4d0483fddb
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

Note: the powerpc/next HEAD cbd3d5ba46b68c033986a6087209930f001cbcca builds 
fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/fadump.c:731:28: error: use of undeclared identifier 
>> 'INTERRUPT_SYSTEM_RESET'
   if (TRAP(&(fdh->regs)) == INTERRUPT_SYSTEM_RESET) {
 ^
   arch/powerpc/kernel/fadump.c:1703:22: error: no previous prototype for 
function 'arch_reserved_kernel_pages' [-Werror,-Wmissing-prototypes]
   unsigned long __init arch_reserved_kernel_pages(void)
^
   arch/powerpc/kernel/fadump.c:1703:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
   unsigned long __init arch_reserved_kernel_pages(void)
   ^
   static 
   2 errors generated.


vim +/INTERRUPT_SYSTEM_RESET +731 arch/powerpc/kernel/fadump.c

   679  
   680  void crash_fadump(struct pt_regs *regs, const char *str)
   681  {
   682  unsigned int msecs;
   683  struct fadump_crash_info_header *fdh = NULL;
   684  int old_cpu, this_cpu;
   685  /* Do not include first CPU */
   686  unsigned int ncpus = num_online_cpus() - 1;
   687  
   688  if (!should_fadump_crash())
   689  return;
   690  
   691  /*
   692   * old_cpu == -1 means this is the first CPU which has come 
here,
   693   * go ahead and trigger fadump.
   694   *
   695   * old_cpu != -1 means some other CPU has already on it's way
   696   * to trigger fadump, just keep looping here.
   697   */
   698  this_cpu = smp_processor_id();
   699  old_cpu = cmpxchg(&crashing_cpu, -1, this_cpu);
   700  
   701  if (old_cpu != -1) {
   702  atomic_inc(&cpus_in_fadump);
   703  
   704  /*
   705   * We can't loop here indefinitely. Wait as long as 
fadump
   706   * is in force. If we race with fadump un-registration 
this
   707   * loop will break and then we go down to normal panic 
path
   708   * and reboot. If fadump is in force the first crashing
   709   * cpu will definitely trigger fadump.
   710   */
   711  while (fw_dump.dump_registered)
   712  cpu_relax();
   713  return;
   714  }
   715  
   716  fdh = __va(fw_dump.fadumphdr_addr);
   717  fdh->crashing_cpu = crashing_cpu;
   718  crash_save_vmcoreinfo();
   719  
   720  if (regs)
   721  fdh->regs = *regs;
   722  else
   723  ppc_save_regs(&fdh->regs);
   724  
   725  fdh->online_mask = *cpu_online_mask;
   726  
   727  /*
   728   * If we came in via system reset, wait a while for the 
secondary
   729   * CPUs to enter.
   730   */
 > 731  if (TRAP(&(fdh->regs)) == INTERRUPT_SYSTEM_RESET) {
   732  msecs = CRASH_TIMEOUT;
   733  while ((atomic_read(&cpus_in_fadump) < ncpus) && 
(--msecs > 0))
   734  mdelay(1);
   735  }
   736  
   737  fw_dump.ops->fadump_trigger(fdh, str);
   738  }
   739  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

2021-04-19 Thread Randy Dunlap

On 4/19/21 6:16 AM, Michael Ellerman wrote:
> Randy Dunlap  writes:
>> On 4/18/21 10:46 AM, Segher Boessenkool wrote:
>>> On Sun, Apr 18, 2021 at 06:24:29PM +0200, Christophe Leroy wrote:
 Le 17/04/2021 à 22:17, Randy Dunlap a écrit :
> Should the code + Kconfigs/Makefiles handle that kind of
> kernel config or should ALTIVEC always mean PPC_FPU as well?

 As far as I understand, Altivec is completely independant of FPU in 
 Theory. 
>>>
>>> And, as far as the hardware is concerned, in practice as well.
>>>
 So it should be possible to use Altivec without using FPU.
>>>
>>> Yup.
>>>
 However, until recently, it was not possible to de-activate FPU support on 
 book3s/32. I made it possible in order to reduce unneccessary processing 
 on 
 processors like the 832x that has no FPU.
>>>
>>> The processor has to implement FP to be compliant to any version of
>>> PowerPC, as far as I know?  So that is all done by emulation, including
>>> all the registers?  Wow painful.
>>>
 As far as I can see in cputable.h/.c, 832x is the only book3s/32 without 
 FPU, and it doesn't have ALTIVEC either.
>>>
>>> 602 doesn't have double-precision hardware, also no 64-bit FP registers.
>>> But that CPU was never any widely used :-)
>>>
 So we can in the future ensure that Altivec can be used without FPU 
 support, but for the time being I think it is OK to force selection of FPU 
 when selecting ALTIVEC in order to avoid build failures.
>>>
>>> It is useful to allow MSR[VEC,FP]=1,0 but yeah there are no CPUs that
>>> have VMX (aka AltiVec) but that do not have FP.  I don't see how making
>>> that artificial dependency buys anything, but maybe it does?
>>>
> I have patches to fix the build errors with the config as
> reported but I don't know if that's the right thing to do...
>>>
>>> Neither do we, we cannot see those patches :-)
>>
>> Sure.  I'll post them later today.
>> They keep FPU and ALTIVEC as independent (build) features.
> 
> Those patches look OK.
> 
> But I don't think it makes sense to support that configuration, FPU=n
> ALTVEC=y. No one is ever going to make a CPU like that. We have enough

Agreed.

> testing surface due to configuration options, without adding artificial
> combinations that no one is ever going to use.
> 
> IMHO :)
> 
> So I'd rather we just make ALTIVEC depend on FPU.
> 
> cheers

Makes sense and sounds good to me.

thanks.
-- 
~Randy

Re: [PATCH v2] perf vendor events: Initial json/events list for power10 platform

2021-04-19 Thread Arnaldo Carvalho de Melo

Em Mon, Apr 19, 2021 at 10:38:46PM +1000, Michael Ellerman escreveu:
> Kajol Jain  writes:
> > Patch adds initial json/events for POWER10.
> 
> Acked-by: Michael Ellerman 

Thanks, applied.

- Arnaldo

 
> cheers
> 
> > Signed-off-by: Kajol Jain 
> > Tested-by: Paul A. Clarke 
> > Reviewed-by: Paul A. Clarke 
> > ---
> >  .../perf/pmu-events/arch/powerpc/mapfile.csv  |   1 +
> >  .../arch/powerpc/power10/cache.json   |  47 +++
> >  .../arch/powerpc/power10/floating_point.json  |   7 +
> >  .../arch/powerpc/power10/frontend.json| 217 +
> >  .../arch/powerpc/power10/locks.json   |  12 +
> >  .../arch/powerpc/power10/marked.json  | 147 +
> >  .../arch/powerpc/power10/memory.json  | 192 +++
> >  .../arch/powerpc/power10/others.json  | 297 ++
> >  .../arch/powerpc/power10/pipeline.json| 297 ++
> >  .../pmu-events/arch/powerpc/power10/pmc.json  |  22 ++
> >  .../arch/powerpc/power10/translation.json |  57 
> >  11 files changed, 1296 insertions(+)
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/cache.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/locks.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/marked.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/others.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pipeline.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pmc.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/powerpc/power10/translation.json
> >
> > ---
> > Changelog:
> > v1 -> v2
> > - Removed inconsistencies in "BriefDescription" field and make sure
> >   it will end with period without any space at the end.
> >   Suggested by : Paul A. Clarke  
> > - Added Tested-by and Reviewed-by tag.
> > ---
> > diff --git a/tools/perf/pmu-events/arch/powerpc/mapfile.csv 
> > b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> > index 229150e7ab7d..4abdfc3f9692 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> > +++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> > @@ -15,3 +15,4 @@
> >  # Power8 entries
> >  004[bcd][[:xdigit:]]{4},1,power8,core
> >  004e[[:xdigit:]]{4},1,power9,core
> > +0080[[:xdigit:]]{4},1,power10,core
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json 
> > b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
> > new file mode 100644
> > index ..95e33531fbc6
> > --- /dev/null
> > +++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
> > @@ -0,0 +1,47 @@
> > +[
> > +  {
> > +"EventCode": "1003C",
> > +"EventName": "PM_EXEC_STALL_DMISS_L2L3",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was waiting for a load miss to resolve from either the local L2 or 
> > local L3."
> > +  },
> > +  {
> > +"EventCode": "34056",
> > +"EventName": "PM_EXEC_STALL_LOAD_FINISH",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was finishing a load after its data was reloaded from a data 
> > source beyond the local L1; cycles in which the LSU was processing an 
> > L1-hit; cycles in which the NTF instruction merged with another load in the 
> > LMQ."
> > +  },
> > +  {
> > +"EventCode": "3006C",
> > +"EventName": "PM_RUN_CYC_SMT2_MODE",
> > +"BriefDescription": "Cycles when this thread's run latch is set and 
> > the core is in SMT2 mode."
> > +  },
> > +  {
> > +"EventCode": "300F4",
> > +"EventName": "PM_RUN_INST_CMPL_CONC",
> > +"BriefDescription": "PowerPC instructions completed by this thread 
> > when all threads in the core had the run-latch set."
> > +  },
> > +  {
> > +"EventCode": "4C016",
> > +"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was waiting for a load miss to resolve from the local L2 or local 
> > L3, with a dispatch conflict."
> > +  },
> > +  {
> > +"EventCode": "4D014",
> > +"EventName": "PM_EXEC_STALL_LOAD",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was a load instruction executing in the Load Store Unit."
> > +  },
> > +  {
> > +"EventCode": "4D016",
> > +"EventName": "PM_EXEC_STALL_PTESYNC",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was a PTESYNC instruction executing in the Load Store Unit."
> > +  },
> > +  {
> > +"EventCode": "401EA",
> > +"EventName": "PM_THRESH_EXC_128",
> > +"BriefDescription": "Threshold counter exceeded a value of 128."
> > +  },
> > +  {
> > +

Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

Le 19/04/2021 à 16:00, Steven Price a écrit :

On 19/04/2021 14:14, Christophe Leroy wrote:

Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:

Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

To be honest I don't fully understand why powerpc requires the page_size - it appears to be
using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such
holes would occur.

I was indeed introduced for KASAN. We have a first commit
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a
KASAN like stuff.

Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the
problem was exactly, something around the use of hugepages for kernel memory, came as part of
the series
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/

Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN
output to x86.

Given the generic ptdump code has handling for KASAN already it should be possible to drop that
from the powerpc arch code, which I think means we don't actually need to provide page size to
notepage(). Hopefully that means more code to delete ;)

Looking at how the generic ptdump code handles KASAN, I'm a bit sceptic.

IIUC, it is checking that kasan_early_shadow_pte is in the same page as the pgtable referred by
the PMD entry. But what happens if that PMD entry is referring another pgtable which is inside the
same page as kasan_early_shadow_pte ?

Shouldn't the test be

if (pmd_page_vaddr(val) == lm_alias(kasan_early_shadow_pte))
return note_kasan_page_table(walk, addr);

Now I come to look at this code again, I think you're right. On arm64 this doesn't cause a problem -
page tables are page sized and page aligned, so there couldn't be any non-KASAN pgtables sharing the
page. Obviously that's not necessarily true of other architectures.

Feel free to add a patch to your series ;)

Ok.

I'll leave that outside of the series, it is not a show stopper because early shadow page
directories are all tagged __bss_page_aligned so we can't have two of them in the same page and it
is really unlikely that we'll have any other statically defined page directory in the same pages either.

And for the special case of powerpc 8xx which is the only one for which we have both KASAN and
HUGEPD at the time being, there are only two levels of page directories so no issue.

Christophe

[PATCH 2/3] powerpc/32s: Enhance readability of trap types

This patch makes use of trap types in head_book3s_32.S

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/interrupt.h |  6 
 arch/powerpc/kernel/head_book3s_32.S | 43 ++--
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index cf2c5c3ae716..8970990e3b08 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -27,6 +27,7 @@
 #ifdef CONFIG_PPC_BOOK3S
 #define INTERRUPT_DOORBELL0xa00
 #define INTERRUPT_PERFMON 0xf00
+#define INTERRUPT_ALTIVEC_UNAVAIL  0xf20
 #endif
 
 /* BookE/BookS/4xx/8xx */
@@ -57,6 +58,11 @@
 #define INTERRUPT_DATA_BREAKPOINT_8xx  0x1c00
 #define INTERRUPT_INST_BREAKPOINT_8xx  0x1d00
 
+/* 603 */
+#define INTERRUPT_INST_TLB_MISS_6030x1000
+#define INTERRUPT_DATA_LOAD_TLB_MISS_603   0x1100
+#define INTERRUPT_DATA_STORE_TLB_MISS_603  0x1200
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 18f4ae163f34..065178f19a3d 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "head_32.h"
 
@@ -239,7 +240,7 @@ __secondary_hold_acknowledge:
 /* System reset */
 /* core99 pmac starts the seconary here by changing the vector, and
putting it back to what it was (unknown_async_exception) when done.  */
-   EXCEPTION(0x100, Reset, unknown_async_exception)
+   EXCEPTION(INTERRUPT_SYSTEM_RESET, Reset, unknown_async_exception)
 
 /* Machine check */
 /*
@@ -255,7 +256,7 @@ __secondary_hold_acknowledge:
  * pointer when we take an exception from supervisor mode.)
  * -- paulus.
  */
-   START_EXCEPTION(0x200, MachineCheck)
+   START_EXCEPTION(INTERRUPT_MACHINE_CHECK, MachineCheck)
EXCEPTION_PROLOG_0
 #ifdef CONFIG_PPC_CHRP
mtspr   SPRN_SPRG_SCRATCH2,r1
@@ -276,7 +277,7 @@ __secondary_hold_acknowledge:
b   interrupt_return
 
 /* Data access exception. */
-   START_EXCEPTION(0x300, DataAccess)
+   START_EXCEPTION(INTERRUPT_DATA_STORAGE, DataAccess)
 #ifdef CONFIG_PPC_BOOK3S_604
 BEGIN_MMU_FTR_SECTION
mtspr   SPRN_SPRG_SCRATCH2,r10
@@ -297,7 +298,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
 #endif
 1: EXCEPTION_PROLOG_0 handle_dar_dsisr=1
EXCEPTION_PROLOG_1
-   EXCEPTION_PROLOG_2 0x300 DataAccess handle_dar_dsisr=1
+   EXCEPTION_PROLOG_2 INTERRUPT_DATA_STORAGE DataAccess handle_dar_dsisr=1
prepare_transfer_to_handler
lwz r5, _DSISR(r11)
andis.  r0, r5, DSISR_DABRMATCH@h
@@ -310,7 +311,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
 
 
 /* Instruction access exception. */
-   START_EXCEPTION(0x400, InstructionAccess)
+   START_EXCEPTION(INTERRUPT_INST_STORAGE, InstructionAccess)
mtspr   SPRN_SPRG_SCRATCH0,r10
mtspr   SPRN_SPRG_SCRATCH1,r11
mfspr   r10, SPRN_SPRG_THREAD
@@ -330,7 +331,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
andi.   r11, r11, MSR_PR
 
EXCEPTION_PROLOG_1
-   EXCEPTION_PROLOG_2 0x400 InstructionAccess
+   EXCEPTION_PROLOG_2 INTERRUPT_INST_STORAGE InstructionAccess
andis.  r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
stw r5, _DSISR(r11)
stw r12, _DAR(r11)
@@ -339,19 +340,19 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
b   interrupt_return
 
 /* External interrupt */
-   EXCEPTION(0x500, HardwareInterrupt, do_IRQ)
+   EXCEPTION(INTERRUPT_EXTERNAL, HardwareInterrupt, do_IRQ)
 
 /* Alignment exception */
-   START_EXCEPTION(0x600, Alignment)
-   EXCEPTION_PROLOG 0x600 Alignment handle_dar_dsisr=1
+   START_EXCEPTION(INTERRUPT_ALIGNMENT, Alignment)
+   EXCEPTION_PROLOG INTERRUPT_ALIGNMENT Alignment handle_dar_dsisr=1
prepare_transfer_to_handler
bl  alignment_exception
REST_NVGPRS(r1)
b   interrupt_return
 
 /* Program check exception */
-   START_EXCEPTION(0x700, ProgramCheck)
-   EXCEPTION_PROLOG 0x700 ProgramCheck
+   START_EXCEPTION(INTERRUPT_PROGRAM, ProgramCheck)
+   EXCEPTION_PROLOG INTERRUPT_PROGRAM ProgramCheck
prepare_transfer_to_handler
bl  program_check_exception
REST_NVGPRS(r1)
@@ -367,7 +368,7 @@ BEGIN_FTR_SECTION
  */
b   ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
-   EXCEPTION_PROLOG 0x800 FPUnavailable
+   EXCEPTION_PROLOG INTERRUPT_FP_UNAVAIL FPUnavailable
beq 1f
bl  load_up_fpu /* if from user, just load it up */
b   fast_exception_return
@@ -379,16 +380,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 #endif
 
 /* Decrementer */
-   EXCEPTION(0x900, Decrementer, timer_interrupt)
+   EXCEPTION(INTERRUPT_DECREMENTER, Dec

[PATCH 3/3] powerpc/irq: Enhance readability of trap types

This patch makes use of trap types in irq.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/interrupt.h |  1 +
 arch/powerpc/kernel/irq.c| 13 +
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8970990e3b08..44cde2e129b8 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -23,6 +23,7 @@
 #define INTERRUPT_INST_SEGMENT0x480
 #define INTERRUPT_TRACE   0xd00
 #define INTERRUPT_H_DATA_STORAGE  0xe00
+#define INTERRUPT_HMI  0xe60
 #define INTERRUPT_H_FAC_UNAVAIL   0xf80
 #ifdef CONFIG_PPC_BOOK3S
 #define INTERRUPT_DOORBELL0xa00
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 893d3f8d6f47..72cb45393ef2 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -142,7 +142,7 @@ void replay_soft_interrupts(void)
 */
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (local_paca->irq_happened & 
PACA_IRQ_HMI)) {
local_paca->irq_happened &= ~PACA_IRQ_HMI;
-   regs.trap = 0xe60;
+   regs.trap = INTERRUPT_HMI;
handle_hmi_exception(®s);
if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
hard_irq_disable();
@@ -150,7 +150,7 @@ void replay_soft_interrupts(void)
 
if (local_paca->irq_happened & PACA_IRQ_DEC) {
local_paca->irq_happened &= ~PACA_IRQ_DEC;
-   regs.trap = 0x900;
+   regs.trap = INTERRUPT_DECREMENTER;
timer_interrupt(®s);
if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
hard_irq_disable();
@@ -158,7 +158,7 @@ void replay_soft_interrupts(void)
 
if (local_paca->irq_happened & PACA_IRQ_EE) {
local_paca->irq_happened &= ~PACA_IRQ_EE;
-   regs.trap = 0x500;
+   regs.trap = INTERRUPT_EXTERNAL;
do_IRQ(®s);
if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
hard_irq_disable();
@@ -166,10 +166,7 @@ void replay_soft_interrupts(void)
 
if (IS_ENABLED(CONFIG_PPC_DOORBELL) && (local_paca->irq_happened & 
PACA_IRQ_DBELL)) {
local_paca->irq_happened &= ~PACA_IRQ_DBELL;
-   if (IS_ENABLED(CONFIG_PPC_BOOK3E))
-   regs.trap = 0x280;
-   else
-   regs.trap = 0xa00;
+   regs.trap = INTERRUPT_DOORBELL;
doorbell_exception(®s);
if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
hard_irq_disable();
@@ -178,7 +175,7 @@ void replay_soft_interrupts(void)
/* Book3E does not support soft-masking PMI interrupts */
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (local_paca->irq_happened & 
PACA_IRQ_PMI)) {
local_paca->irq_happened &= ~PACA_IRQ_PMI;
-   regs.trap = 0xf00;
+   regs.trap = INTERRUPT_PERFMON;
performance_monitor_exception(®s);
if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
hard_irq_disable();
-- 
2.25.0

[PATCH 1/3] powerpc/8xx: Enhance readability of trap types

This patch makes use of trap types in head_8xx.S

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/interrupt.h | 29 
 arch/powerpc/kernel/head_8xx.S   | 49 ++--
 2 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index ed2c4042c3d1..cf2c5c3ae716 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -2,13 +2,6 @@
 #ifndef _ASM_POWERPC_INTERRUPT_H
 #define _ASM_POWERPC_INTERRUPT_H
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 /* BookE/4xx */
 #define INTERRUPT_CRITICAL_INPUT  0x100
 
@@ -39,9 +32,11 @@
 /* BookE/BookS/4xx/8xx */
 #define INTERRUPT_DATA_STORAGE0x300
 #define INTERRUPT_INST_STORAGE0x400
+#define INTERRUPT_EXTERNAL 0x500
 #define INTERRUPT_ALIGNMENT   0x600
 #define INTERRUPT_PROGRAM 0x700
 #define INTERRUPT_SYSCALL 0xc00
+#define INTERRUPT_TRACE0xd00
 
 /* BookE/BookS/44x */
 #define INTERRUPT_FP_UNAVAIL  0x800
@@ -53,6 +48,24 @@
 #define INTERRUPT_PERFMON 0x0
 #endif
 
+/* 8xx */
+#define INTERRUPT_SOFT_EMU_8xx 0x1000
+#define INTERRUPT_INST_TLB_MISS_8xx0x1100
+#define INTERRUPT_DATA_TLB_MISS_8xx0x1200
+#define INTERRUPT_INST_TLB_ERROR_8xx   0x1300
+#define INTERRUPT_DATA_TLB_ERROR_8xx   0x1400
+#define INTERRUPT_DATA_BREAKPOINT_8xx  0x1c00
+#define INTERRUPT_INST_BREAKPOINT_8xx  0x1d00
+
+#ifndef __ASSEMBLY__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
 static inline void nap_adjust_return(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC_970_NAP
@@ -514,4 +527,6 @@ static inline void interrupt_cond_local_irq_enable(struct 
pt_regs *regs)
local_irq_enable();
 }
 
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ASM_POWERPC_INTERRUPT_H */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index e3b066703eab..7d445e4342c0 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Value for the bits that have fixed value in RPN entries.
@@ -118,49 +119,49 @@ instruction_counter:
 #endif
 
 /* System reset */
-   EXCEPTION(0x100, Reset, system_reset_exception)
+   EXCEPTION(INTERRUPT_SYSTEM_RESET, Reset, system_reset_exception)
 
 /* Machine check */
-   START_EXCEPTION(0x200, MachineCheck)
-   EXCEPTION_PROLOG 0x200 MachineCheck handle_dar_dsisr=1
+   START_EXCEPTION(INTERRUPT_MACHINE_CHECK, MachineCheck)
+   EXCEPTION_PROLOG INTERRUPT_MACHINE_CHECK MachineCheck handle_dar_dsisr=1
prepare_transfer_to_handler
bl  machine_check_exception
b   interrupt_return
 
 /* External interrupt */
-   EXCEPTION(0x500, HardwareInterrupt, do_IRQ)
+   EXCEPTION(INTERRUPT_EXTERNAL, HardwareInterrupt, do_IRQ)
 
 /* Alignment exception */
-   START_EXCEPTION(0x600, Alignment)
-   EXCEPTION_PROLOG 0x600 Alignment handle_dar_dsisr=1
+   START_EXCEPTION(INTERRUPT_ALIGNMENT, Alignment)
+   EXCEPTION_PROLOG INTERRUPT_ALIGNMENT Alignment handle_dar_dsisr=1
prepare_transfer_to_handler
bl  alignment_exception
REST_NVGPRS(r1)
b   interrupt_return
 
 /* Program check exception */
-   START_EXCEPTION(0x700, ProgramCheck)
-   EXCEPTION_PROLOG 0x700 ProgramCheck
+   START_EXCEPTION(INTERRUPT_PROGRAM, ProgramCheck)
+   EXCEPTION_PROLOG INTERRUPT_PROGRAM ProgramCheck
prepare_transfer_to_handler
bl  program_check_exception
REST_NVGPRS(r1)
b   interrupt_return
 
 /* Decrementer */
-   EXCEPTION(0x900, Decrementer, timer_interrupt)
+   EXCEPTION(INTERRUPT_DECREMENTER, Decrementer, timer_interrupt)
 
 /* System call */
-   START_EXCEPTION(0xc00, SystemCall)
-   SYSCALL_ENTRY   0xc00
+   START_EXCEPTION(INTERRUPT_SYSCALL, SystemCall)
+   SYSCALL_ENTRY   INTERRUPT_SYSCALL
 
 /* Single step - not used on 601 */
-   EXCEPTION(0xd00, SingleStep, single_step_exception)
+   EXCEPTION(INTERRUPT_TRACE, SingleStep, single_step_exception)
 
 /* On the MPC8xx, this is a software emulation interrupt.  It occurs
  * for all unimplemented and illegal instructions.
  */
-   START_EXCEPTION(0x1000, SoftEmu)
-   EXCEPTION_PROLOG 0x1000 SoftEmu
+   START_EXCEPTION(INTERRUPT_SOFT_EMU_8xx, SoftEmu)
+   EXCEPTION_PROLOG INTERRUPT_SOFT_EMU_8xx SoftEmu
prepare_transfer_to_handler
bl  emulation_assist_interrupt
REST_NVGPRS(r1)
@@ -187,7 +188,7 @@ instruction_counter:
 #define INVALIDATE_ADJACENT_PAGES_CPU15(addr, tmp)
 #endif
 
-   START_EXCEPTION(0x1100, InstructionTLBMiss)
+   START_EXCEPTION(INTERRUPT_INST_TLB_MISS_8xx, InstructionTLBMiss)
mtspr   SPRN_SPRG_SCRATCH2, r10
mtspr   SPRN_M_TW, r11
 
@@ -243,7 +244,7 @@ instruction_counter:

Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

2021-04-19 Thread Rob Herring

On Fri, Apr 16, 2021 at 3:58 PM Leonardo Bras  wrote:
>
> Hello Rob, thanks for this feedback!
>
> On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> > +PPC and PCI lists
> >
> > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  wrote:
> > >
> > > Many other resource flag parsers already add this flag when the input
> > > has bits 24 & 25 set, so update this one to do the same.
> >
> > Many others? Looks like sparc and powerpc to me.
> >
>
> s390 also does that, but it look like it comes from a device-tree.

I'm only looking at DT based platforms, and s390 doesn't use DT.

> > Those would be the
> > ones I worry about breaking. Sparc doesn't use of/address.c so it's
> > fine. Powerpc version of the flags code was only fixed in 2019, so I
> > don't think powerpc will care either.
>
> In powerpc I reach this function with this stack, while configuring a
> virtio-net device for a qemu/KVM pseries guest:
>
> pci_process_bridge_OF_ranges+0xac/0x2d4
> pSeries_discover_phbs+0xc4/0x158
> discover_phbs+0x40/0x60
> do_one_initcall+0x60/0x2d0
> kernel_init_freeable+0x308/0x3a8
> kernel_init+0x2c/0x168
> ret_from_kernel_thread+0x5c/0x70
>
> For this, both MMIO32 and MMIO64 resources will have flags 0x200.

Oh good, powerpc has 2 possible flags parsing functions. So in the
above path, do we need to set PCI_BASE_ADDRESS_MEM_TYPE_64?

Does pci_parse_of_flags() get called in your case?

> > I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in
> > the flags. AFAICT, that's not set anywhere outside of arch code. So
> > never for riscv, arm and arm64 at least. That leads me to
> > pci_std_update_resource() which is where the PCI code sets BARs and
> > just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring
> > IORESOURCE_* flags. So it seems like 64-bit is still not handled and
> > neither is prefetch.
> >
>
> I am not sure if you mean here:
> a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect
> anything else, or
> b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64
> (or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since
> it's how it's added in powerpc/sparc, and else there is no point.

I'm wondering if a) is incomplete and PCI_BASE_ADDRESS_MEM_TYPE_64
also needs to be set. The question is ultimately are BARs getting set
correctly for 64-bit? It looks to me like they aren't.

Rob

Re: [PATCH v2 1/4] mm: pagewalk: Fix walk for hugepage tables

2021-04-19 Thread Steven Price


On 19/04/2021 11:47, Christophe Leroy wrote:

Pagewalk ignores hugepd entries and walk down the tables
as if it was traditionnal entries, leading to crazy result.

Add walk_hugepd_range() and use it to walk hugepage tables.

Signed-off-by: Christophe Leroy 


Looks correct to me, sadly I don't have a suitable system to test it.

Reviewed-by: Steven Price 


---
v2:
- Add a guard for NULL ops->pte_entry
- Take mm->page_table_lock when walking hugepage table, as suggested by 
follow_huge_pd()
---
  mm/pagewalk.c | 58 ++-
  1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index e81640d9f177..9b3db11a4d1d 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,6 +58,45 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, 
unsigned long end,
return err;
  }
  
+#ifdef CONFIG_ARCH_HAS_HUGEPD

+static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr,
+unsigned long end, struct mm_walk *walk, int 
pdshift)
+{
+   int err = 0;
+   const struct mm_walk_ops *ops = walk->ops;
+   int shift = hugepd_shift(*phpd);
+   int page_size = 1 << shift;
+
+   if (!ops->pte_entry)
+   return 0;
+
+   if (addr & (page_size - 1))
+   return 0;
+
+   for (;;) {
+   pte_t *pte;
+
+   spin_lock(&walk->mm->page_table_lock);
+   pte = hugepte_offset(*phpd, addr, pdshift);
+   err = ops->pte_entry(pte, addr, addr + page_size, walk);
+   spin_unlock(&walk->mm->page_table_lock);
+
+   if (err)
+   break;
+   if (addr >= end - page_size)
+   break;
+   addr += page_size;
+   }
+   return err;
+}
+#else
+static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr,
+unsigned long end, struct mm_walk *walk, int 
pdshift)
+{
+   return 0;
+}
+#endif
+
  static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
  struct mm_walk *walk)
  {
@@ -108,7 +147,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, 
unsigned long end,
goto again;
}
  
-		err = walk_pte_range(pmd, addr, next, walk);

+   if (is_hugepd(__hugepd(pmd_val(*pmd
+   err = walk_hugepd_range((hugepd_t *)pmd, addr, next, 
walk, PMD_SHIFT);
+   else
+   err = walk_pte_range(pmd, addr, next, walk);
if (err)
break;
} while (pmd++, addr = next, addr != end);
@@ -157,7 +199,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, 
unsigned long end,
if (pud_none(*pud))
goto again;
  
-		err = walk_pmd_range(pud, addr, next, walk);

+   if (is_hugepd(__hugepd(pud_val(*pud
+   err = walk_hugepd_range((hugepd_t *)pud, addr, next, 
walk, PUD_SHIFT);
+   else
+   err = walk_pmd_range(pud, addr, next, walk);
if (err)
break;
} while (pud++, addr = next, addr != end);
@@ -189,7 +234,9 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, 
unsigned long end,
if (err)
break;
}
-   if (ops->pud_entry || ops->pmd_entry || ops->pte_entry)
+   if (is_hugepd(__hugepd(p4d_val(*p4d
+   err = walk_hugepd_range((hugepd_t *)p4d, addr, next, 
walk, P4D_SHIFT);
+   else if (ops->pud_entry || ops->pmd_entry || ops->pte_entry)
err = walk_pud_range(p4d, addr, next, walk);
if (err)
break;
@@ -224,8 +271,9 @@ static int walk_pgd_range(unsigned long addr, unsigned long 
end,
if (err)
break;
}
-   if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry ||
-   ops->pte_entry)
+   if (is_hugepd(__hugepd(pgd_val(*pgd
+   err = walk_hugepd_range((hugepd_t *)pgd, addr, next, 
walk, PGDIR_SHIFT);
+   else if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || 
ops->pte_entry)
err = walk_p4d_range(pgd, addr, next, walk);
if (err)
break;

Re: [PATCH 2/2] powerpc: add ALTIVEC support to lib/ when PPC_FPU not set

2021-04-19 Thread Segher Boessenkool

On Mon, Apr 19, 2021 at 03:38:02PM +0200, Christophe Leroy wrote:
> Le 19/04/2021 à 15:32, Segher Boessenkool a écrit :
> >On Sun, Apr 18, 2021 at 01:17:26PM -0700, Randy Dunlap wrote:
> >>Add ldstfp.o to the Makefile for CONFIG_ALTIVEC and add
> >>externs for get_vr() and put_vr() in lib/sstep.c to fix the
> >>build errors.
> >
> >>  obj-$(CONFIG_PPC_FPU) += ldstfp.o
> >>+obj-$(CONFIG_ALTIVEC)  += ldstfp.o
> >
> >It is probably a good idea to split ldstfp.S into two, one for each of
> >the two configuration options?
> >
> 
> Or we can build it all the time and #ifdef the FPU part.
> 
> Because it contains FPU, ALTIVEC and VSX stuff.

So it becomes an empty object file if none of the options are selected?
Good idea :-)


Segher

Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-19 Thread Steven Price

On 19/04/2021 14:14, Christophe Leroy wrote:

Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:

Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

To be honest I don't fully understand why powerpc requires the
page_size - it appears to be using it purely to find "holes" in the
calls to note_page(), but I haven't worked out why such holes would
occur.

I was indeed introduced for KASAN. We have a first commit
https://github.com/torvalds/linux/commit/cabe8138 which uses page
size to detect whether it is a KASAN like stuff.

Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a
fix. I can't remember what the problem was exactly, something around
the use of hugepages for kernel memory, came as part of the series
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/

Ah, that's useful context. So it looks like powerpc took a different
route to reducing the KASAN output to x86.

Given the generic ptdump code has handling for KASAN already it should
be possible to drop that from the powerpc arch code, which I think
means we don't actually need to provide page size to notepage().
Hopefully that means more code to delete ;)

Looking at how the generic ptdump code handles KASAN, I'm a bit sceptic.

IIUC, it is checking that kasan_early_shadow_pte is in the same page as
the pgtable referred by the PMD entry. But what happens if that PMD
entry is referring another pgtable which is inside the same page as
kasan_early_shadow_pte ?

Shouldn't the test be

if (pmd_page_vaddr(val) == lm_alias(kasan_early_shadow_pte))
return note_kasan_page_table(walk, addr);

Now I come to look at this code again, I think you're right. On arm64
this doesn't cause a problem - page tables are page sized and page
aligned, so there couldn't be any non-KASAN pgtables sharing the page.
Obviously that's not necessarily true of other architectures.

Feel free to add a patch to your series ;)

Steve

Re: [PATCH 2/2] powerpc: add ALTIVEC support to lib/ when PPC_FPU not set





Le 19/04/2021 à 15:32, Segher Boessenkool a écrit :

Hi!

On Sun, Apr 18, 2021 at 01:17:26PM -0700, Randy Dunlap wrote:

Add ldstfp.o to the Makefile for CONFIG_ALTIVEC and add
externs for get_vr() and put_vr() in lib/sstep.c to fix the
build errors.



  obj-$(CONFIG_PPC_FPU) += ldstfp.o
+obj-$(CONFIG_ALTIVEC)  += ldstfp.o


It is probably a good idea to split ldstfp.S into two, one for each of
the two configuration options?



Or we can build it all the time and #ifdef the FPU part.

Because it contains FPU, ALTIVEC and VSX stuff.

Christophe

Re: [PATCH 2/2] powerpc: add ALTIVEC support to lib/ when PPC_FPU not set

2021-04-19 Thread Segher Boessenkool

Hi!

On Sun, Apr 18, 2021 at 01:17:26PM -0700, Randy Dunlap wrote:
> Add ldstfp.o to the Makefile for CONFIG_ALTIVEC and add
> externs for get_vr() and put_vr() in lib/sstep.c to fix the
> build errors.

>  obj-$(CONFIG_PPC_FPU)+= ldstfp.o
> +obj-$(CONFIG_ALTIVEC)+= ldstfp.o

It is probably a good idea to split ldstfp.S into two, one for each of
the two configuration options?


Segher

Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

Randy Dunlap  writes:
> On 4/18/21 10:46 AM, Segher Boessenkool wrote:
>> On Sun, Apr 18, 2021 at 06:24:29PM +0200, Christophe Leroy wrote:
>>> Le 17/04/2021 à 22:17, Randy Dunlap a écrit :
 Should the code + Kconfigs/Makefiles handle that kind of
 kernel config or should ALTIVEC always mean PPC_FPU as well?
>>>
>>> As far as I understand, Altivec is completely independant of FPU in Theory. 
>> 
>> And, as far as the hardware is concerned, in practice as well.
>> 
>>> So it should be possible to use Altivec without using FPU.
>> 
>> Yup.
>> 
>>> However, until recently, it was not possible to de-activate FPU support on 
>>> book3s/32. I made it possible in order to reduce unneccessary processing on 
>>> processors like the 832x that has no FPU.
>> 
>> The processor has to implement FP to be compliant to any version of
>> PowerPC, as far as I know?  So that is all done by emulation, including
>> all the registers?  Wow painful.
>> 
>>> As far as I can see in cputable.h/.c, 832x is the only book3s/32 without 
>>> FPU, and it doesn't have ALTIVEC either.
>> 
>> 602 doesn't have double-precision hardware, also no 64-bit FP registers.
>> But that CPU was never any widely used :-)
>> 
>>> So we can in the future ensure that Altivec can be used without FPU 
>>> support, but for the time being I think it is OK to force selection of FPU 
>>> when selecting ALTIVEC in order to avoid build failures.
>> 
>> It is useful to allow MSR[VEC,FP]=1,0 but yeah there are no CPUs that
>> have VMX (aka AltiVec) but that do not have FP.  I don't see how making
>> that artificial dependency buys anything, but maybe it does?
>> 
 I have patches to fix the build errors with the config as
 reported but I don't know if that's the right thing to do...
>> 
>> Neither do we, we cannot see those patches :-)
>
> Sure.  I'll post them later today.
> They keep FPU and ALTIVEC as independent (build) features.

Those patches look OK.

But I don't think it makes sense to support that configuration, FPU=n
ALTVEC=y. No one is ever going to make a CPU like that. We have enough
testing surface due to configuration options, without adding artificial
combinations that no one is ever going to use.

IMHO :)

So I'd rather we just make ALTIVEC depend on FPU.

cheers

Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:

Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

To be honest I don't fully understand why powerpc requires the page_size - it appears to be using
it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes
would occur.

I was indeed introduced for KASAN. We have a first commit
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a
KASAN like stuff.

Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the
problem was exactly, something around the use of hugepages for kernel memory, came as part of the
series
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/

Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN
output to x86.

Given the generic ptdump code has handling for KASAN already it should be possible to drop that from
the powerpc arch code, which I think means we don't actually need to provide page size to
notepage(). Hopefully that means more code to delete ;)

Looking at how the generic ptdump code handles KASAN, I'm a bit sceptic.

IIUC, it is checking that kasan_early_shadow_pte is in the same page as the pgtable referred by the
PMD entry. But what happens if that PMD entry is referring another pgtable which is inside the same
page as kasan_early_shadow_pte ?

Shouldn't the test be

if (pmd_page_vaddr(val) == lm_alias(kasan_early_shadow_pte))
return note_kasan_page_table(walk, addr);

Christophe

Re: [PATCH 2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

2021-04-19 Thread Daniel Henrique Barboza





On 4/19/21 9:48 AM, Michael Ellerman wrote:

Daniel Henrique Barboza  writes:

The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
for both QEMU and phyp. This gives us an opportunity to use this
behavior to signal the hypervisor layer when an error during device
removal happens, allowing it to do a proper error handling, while not
breaking QEMU/phyp implementations that don't have this support.

This patch introduces this idea by unisolating all CPU DRCs that failed
to be removed by dlpar_cpu_remove_by_index(), when handling the
PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
only because its the only CPU removal event QEMU uses, and there's no
need at this moment to add this mechanism for phyp only code.


Have you also confirmed that phyp is not bothered by it? ie. everything
seems to continue working when you trigger this path on phyp.


Yes. Daniel Bueso (dbue...@us.ibm.com) from the partition firmware team
helped me with that. We confirmed that phyp returns RTAS_OK under these
conditions (Unisolating an unisolated/configured DRC).


Thanks,


DHB



cheers


diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..ed66895c2f51 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
case PSERIES_HP_ELOG_ACTION_REMOVE:
if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
rc = dlpar_cpu_remove_by_count(count);
-   else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
+   else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
rc = dlpar_cpu_remove_by_index(drc_index);
+   /* Setting the isolation state of an 
UNISOLATED/CONFIGURED
+* device to UNISOLATE is a no-op, but the hypervison 
can
+* use it as a hint that the cpu removal failed.
+*/
+   if (rc)
+   dlpar_unisolate_drc(drc_index);
+   }
else
rc = -EINVAL;
break;
--
2.30.2

Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-19 Thread Arnd Bergmann

On Mon, Apr 19, 2021 at 11:33 AM Dominique MARTINET
 wrote:
> Geert Uytterhoeven wrote on Mon, Apr 19, 2021 at 11:03:24AM +0200:
>
> > soc_device_match() should only be used as a last resort, to identify
> > systems that cannot be identified otherwise.  Typically this is used for
> > quirks, which should only be enabled on a very specific subset of
> > systems.  IMHO such systems should make sure soc_device_match()
> > is available early, by registering their SoC device early.
>
> I definitely agree there, my suggestion to defer was only because I know
> of no other way to influence the ordering of drivers loading reliably
> and gave up on soc being init'd early.

In some cases, you can use the device_link infrastructure to deal
with dependencies between devices. Not sure if this would help
in your case, but have a look at device_link_add() etc in drivers/base/core.c

> In this particular case the problem is that since 7d981405d0fd ("soc:
> imx8m: change to use platform driver") the soc probe tries to use the
> nvmem driver for ocotp fuses for imx8m devices, which isn't ready yet.
> So soc loading gets pushed back to the end of the list because it gets
> defered and other drivers relying on soc_device_match get confused
> because they wrongly think a device doesn't match a quirk when it
> actually does.
>
> If there is a way to ensure the nvmem driver gets loaded before the soc,
> that would also solve the problem nicely, and avoid the need to mess
> with all the ~50 drivers which use it.
>
> Is there a way to control in what order drivers get loaded? Something in
> the dtb perhaps?

For built-in drivers, load order depends on the initcall level and
link order (how things are lined listed in the Makefile hierarchy).

For loadable modules, this is up to user space in the end.

Which of the drivers in this scenario are loadable modules?

Arnd

Re: [PATCH 2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

Daniel Henrique Barboza  writes:
> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
> for both QEMU and phyp. This gives us an opportunity to use this
> behavior to signal the hypervisor layer when an error during device
> removal happens, allowing it to do a proper error handling, while not
> breaking QEMU/phyp implementations that don't have this support.
>
> This patch introduces this idea by unisolating all CPU DRCs that failed
> to be removed by dlpar_cpu_remove_by_index(), when handling the
> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
> only because its the only CPU removal event QEMU uses, and there's no
> need at this moment to add this mechanism for phyp only code.

Have you also confirmed that phyp is not bothered by it? ie. everything
seems to continue working when you trigger this path on phyp.

cheers

> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 12cbffd3c2e3..ed66895c2f51 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>   case PSERIES_HP_ELOG_ACTION_REMOVE:
>   if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>   rc = dlpar_cpu_remove_by_count(count);
> - else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
>   rc = dlpar_cpu_remove_by_index(drc_index);
> + /* Setting the isolation state of an 
> UNISOLATED/CONFIGURED
> +  * device to UNISOLATE is a no-op, but the hypervison 
> can
> +  * use it as a hint that the cpu removal failed.
> +  */
> + if (rc)
> + dlpar_unisolate_drc(drc_index);
> + }
>   else
>   rc = -EINVAL;
>   break;
> -- 
> 2.30.2

Re: [PATCH] powerpc/pseries/mce: Fix a typo in error type assignment

Ganesh  writes:
> On 4/17/21 6:06 PM, Michael Ellerman wrote:
>
>> Ganesh Goudar  writes:
>>> The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.
>> Do you mean "is ICACHE not DCACHE" ?
>
> Right :), Should I send v2 ?

No I can fix it up :)

cheers

>>> Signed-off-by: Ganesh Goudar 
>>> ---
>>>   arch/powerpc/platforms/pseries/ras.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/ras.c 
>>> b/arch/powerpc/platforms/pseries/ras.c
>>> index f8b390a9d9fb..9d4ef65da7f3 100644
>>> --- a/arch/powerpc/platforms/pseries/ras.c
>>> +++ b/arch/powerpc/platforms/pseries/ras.c
>>> @@ -699,7 +699,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
>>> mce_err.error_type = MCE_ERROR_TYPE_DCACHE;
>>> break;
>>> case MC_ERROR_TYPE_I_CACHE:
>>> -   mce_err.error_type = MCE_ERROR_TYPE_DCACHE;
>>> +   mce_err.error_type = MCE_ERROR_TYPE_ICACHE;
>>> break;
>>> case MC_ERROR_TYPE_UNKNOWN:
>>> default:
>>> -- 
>>> 2.26.2

Re: [PATCH v2] perf vendor events: Initial json/events list for power10 platform

Kajol Jain  writes:
> Patch adds initial json/events for POWER10.

Acked-by: Michael Ellerman 

cheers

> Signed-off-by: Kajol Jain 
> Tested-by: Paul A. Clarke 
> Reviewed-by: Paul A. Clarke 
> ---
>  .../perf/pmu-events/arch/powerpc/mapfile.csv  |   1 +
>  .../arch/powerpc/power10/cache.json   |  47 +++
>  .../arch/powerpc/power10/floating_point.json  |   7 +
>  .../arch/powerpc/power10/frontend.json| 217 +
>  .../arch/powerpc/power10/locks.json   |  12 +
>  .../arch/powerpc/power10/marked.json  | 147 +
>  .../arch/powerpc/power10/memory.json  | 192 +++
>  .../arch/powerpc/power10/others.json  | 297 ++
>  .../arch/powerpc/power10/pipeline.json| 297 ++
>  .../pmu-events/arch/powerpc/power10/pmc.json  |  22 ++
>  .../arch/powerpc/power10/translation.json |  57 
>  11 files changed, 1296 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/cache.json
>  create mode 100644 
> tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/frontend.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/locks.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/marked.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/memory.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/others.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pipeline.json
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pmc.json
>  create mode 100644 
> tools/perf/pmu-events/arch/powerpc/power10/translation.json
>
> ---
> Changelog:
> v1 -> v2
> - Removed inconsistencies in "BriefDescription" field and make sure
>   it will end with period without any space at the end.
>   Suggested by : Paul A. Clarke  
> - Added Tested-by and Reviewed-by tag.
> ---
> diff --git a/tools/perf/pmu-events/arch/powerpc/mapfile.csv 
> b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> index 229150e7ab7d..4abdfc3f9692 100644
> --- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> +++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> @@ -15,3 +15,4 @@
>  # Power8 entries
>  004[bcd][[:xdigit:]]{4},1,power8,core
>  004e[[:xdigit:]]{4},1,power9,core
> +0080[[:xdigit:]]{4},1,power10,core
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json 
> b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
> new file mode 100644
> index ..95e33531fbc6
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
> @@ -0,0 +1,47 @@
> +[
> +  {
> +"EventCode": "1003C",
> +"EventName": "PM_EXEC_STALL_DMISS_L2L3",
> +"BriefDescription": "Cycles in which the oldest instruction in the 
> pipeline was waiting for a load miss to resolve from either the local L2 or 
> local L3."
> +  },
> +  {
> +"EventCode": "34056",
> +"EventName": "PM_EXEC_STALL_LOAD_FINISH",
> +"BriefDescription": "Cycles in which the oldest instruction in the 
> pipeline was finishing a load after its data was reloaded from a data source 
> beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles 
> in which the NTF instruction merged with another load in the LMQ."
> +  },
> +  {
> +"EventCode": "3006C",
> +"EventName": "PM_RUN_CYC_SMT2_MODE",
> +"BriefDescription": "Cycles when this thread's run latch is set and the 
> core is in SMT2 mode."
> +  },
> +  {
> +"EventCode": "300F4",
> +"EventName": "PM_RUN_INST_CMPL_CONC",
> +"BriefDescription": "PowerPC instructions completed by this thread when 
> all threads in the core had the run-latch set."
> +  },
> +  {
> +"EventCode": "4C016",
> +"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT",
> +"BriefDescription": "Cycles in which the oldest instruction in the 
> pipeline was waiting for a load miss to resolve from the local L2 or local 
> L3, with a dispatch conflict."
> +  },
> +  {
> +"EventCode": "4D014",
> +"EventName": "PM_EXEC_STALL_LOAD",
> +"BriefDescription": "Cycles in which the oldest instruction in the 
> pipeline was a load instruction executing in the Load Store Unit."
> +  },
> +  {
> +"EventCode": "4D016",
> +"EventName": "PM_EXEC_STALL_PTESYNC",
> +"BriefDescription": "Cycles in which the oldest instruction in the 
> pipeline was a PTESYNC instruction executing in the Load Store Unit."
> +  },
> +  {
> +"EventCode": "401EA",
> +"EventName": "PM_THRESH_EXC_128",
> +"BriefDescription": "Threshold counter exceeded a value of 128."
> +  },
> +  {
> +"EventCode": "400F6",
> +"EventName": "PM_BR_MPRED_CMPL",
> +"BriefDescription": "A mispredicted branch completed. Includes direction 
> and target."
> +  }
> +]
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json 
> b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
> new file mo

Re: linux-next: build failure after merge of the powerpc tree

Xiongwei Song  writes:
> Thank you so much Stephen. Sorry for my negligence.

My fault. I forgot to run allyesconfig.

> Should I fix this myself on powerpc tree?

I'll fix it up.

cheers

Re: [PATCH v4] powerpc/kexec_file: use current CPU info while setting up FDT

Hari Bathini  writes:
> On 19/04/21 2:06 pm, Sourabh Jain wrote:
>> kexec_file_load uses initial_boot_params in setting up the device-tree
>> for the kernel to be loaded. Though initial_boot_params holds info
>> about CPUs at the time of boot, it doesn't account for hot added CPUs.
>> 
>> So, kexec'ing with kexec_file_load syscall would leave the kexec'ed
>> kernel with inaccurate CPU info. Also, if kdump kernel is loaded with
>> kexec_file_load syscall and the system crashes on a hot added CPU,
>> capture kernel hangs failing to identify the boot CPU.
>> 
>>   Kernel panic - not syncing: sysrq triggered crash
>>   CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream 
>> #54
>>   Call Trace:
>>   [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
>>   [c000e590fb00] [c0145290] panic+0x16c/0x41c
>>   [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
>>   [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
>>   [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
>>   [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
>>   [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
>>   [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
>>   [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
>>   [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
>>   --- interrupt: c00 at 0x7fff905b9664
>>   NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
>>   REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
>>   MSR:  8280f033   CR: 28000242
>> XER: 
>>   IRQMASK: 0
>>   GPR00: 0004 75fedf30 7fff906a7300 0001
>>   GPR04: 01002a7355b0 0002 0001 75fef616
>>   GPR08: 0001   
>>   GPR12:  7fff9073a160  
>>   GPR16:    
>>   GPR20:  7fff906a4ee0 0002 0001
>>   GPR24: 7fff906a0898  0002 01002a7355b0
>>   GPR28: 0002 7fff906a1790 01002a7355b0 0002
>>   NIP [7fff905b9664] 0x7fff905b9664
>>   LR [7fff905320c4] 0x7fff905320c4
>>   --- interrupt: c00
>> 
>> To avoid this from happening, extract current CPU info from of_root
>> device node and use it for setting up the fdt in kexec_file_load case.
>> 
>> Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory 
>> reserve map")
>> 
>> Signed-off-by: Sourabh Jain 
>> Reviewed-by: Hari Bathini 
>> Cc: 
>> ---
>>   arch/powerpc/kexec/file_load_64.c | 98 +++
>>   1 file changed, 98 insertions(+)
>> 
>>   ---
>> Changelog:
>> 
>> v1 -> v3
>>- https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-April/227756.html
>> 
>> v3 -> v4
>>- Rearranged if-else statement in update_cpus_node function to avoid
>>  redundant checks for positive cpus_offset.
>>   ---
>> 
>> diff --git a/arch/powerpc/kexec/file_load_64.c 
>> b/arch/powerpc/kexec/file_load_64.c
>> index 02b9e4d0dc40..195ef303d530 100644
>> --- a/arch/powerpc/kexec/file_load_64.c
>> +++ b/arch/powerpc/kexec/file_load_64.c
>> @@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage 
>> *image)
>>  return fdt_size;
>>   }
>>   
>> +/**
>> + * add_node_prop - Read property from device node structure and add
>> + *  them to fdt.
>> + * @fdt:Flattened device tree of the kernel
>> + * @node_offset:offset of the node to add a property at
>> + * np:  device node pointer
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int add_node_prop(void *fdt, int node_offset, const struct 
>> device_node *np)
>> +{
>> +int ret = 0;
>> +struct property *pp;
>> +unsigned long flags;
>> +
>> +if (!np)
>> +return -EINVAL;
>> +
>> +raw_spin_lock_irqsave(&devtree_lock, flags);
>> +for (pp = np->properties; pp; pp = pp->next) {
>> +ret = fdt_setprop(fdt, node_offset, pp->name,
>> +  pp->value, pp->length);
>> +if (ret < 0) {
>> +pr_err("Unable to add %s property: %s\n",
>> +   pp->name, fdt_strerror(ret));
>> +goto out;
>> +}
>> +}
>> +out:
>> +raw_spin_unlock_irqrestore(&devtree_lock, flags);
>> +return ret;
>> +}
>> +
>> +/**
>> + * update_cpus_node - Update cpus node of flattened device-tree using 
>> of_root
>> + *  device node.
>> + * @fdt:Flattened device tree of the kernel.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int update_cpus_node(void *fdt)
>> +{
>> +struct device_node *cpu

[PATCH] powerpc/kvm: Fix PR KVM with KUAP/MEM_KEYS enabled

The changes to add KUAP support with the hash MMU broke booting of KVM
PR guests. The symptom is no visible progress of the guest, or possibly
just "SLOF" being printed to the qemu console.

Host code is still executing, but breaking into xmon might show a stack
trace such as:

  __might_fault+0x84/0xe0 (unreliable)
  kvm_read_guest+0x1c8/0x2f0 [kvm]
  kvmppc_ld+0x1b8/0x2d0 [kvm]
  kvmppc_load_last_inst+0x50/0xa0 [kvm]
  kvmppc_exit_pr_progint+0x178/0x220 [kvm_pr]
  kvmppc_handle_exit_pr+0x31c/0xe30 [kvm_pr]
  after_sprg3_load+0x80/0x90 [kvm_pr]
  kvmppc_vcpu_run_pr+0x104/0x260 [kvm_pr]
  kvmppc_vcpu_run+0x34/0x48 [kvm]
  kvm_arch_vcpu_ioctl_run+0x340/0x450 [kvm]
  kvm_vcpu_ioctl+0x2ac/0x8c0 [kvm]
  sys_ioctl+0x320/0x1060
  system_call_exception+0x160/0x270
  system_call_common+0xf0/0x27c

Bisect points to commit b2ff33a10c8b ("powerpc/book3s64/hash/kuap:
Enable kuap on hash"), but that's just the commit that enabled KUAP with
hash and made the bug visible.

The root cause seems to be that KVM PR is creating kernel mappings that
don't use the correct key, since we switched to using key 3.

We have a helper for adding the right key value, however it's designed
to take a pteflags variable, which the KVM code doesn't have. But we can
make it work by passing 0 for the pteflags, and tell it explicitly that
it should use the kernel key.

With that changed guests boot successfully.

Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with 
hash translation")
Cc: sta...@vger.kernel.org # v5.11+
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kvm/book3s_64_mmu_host.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index e452158a18d7..5ac66be1cb3c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -133,6 +134,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte,
else
kvmppc_mmu_flush_icache(pfn);
 
+   rflags |= pte_to_hpte_pkey_bits(0, HPTE_USE_KERNEL_KEY);
rflags = (rflags & ~HPTE_R_WIMG) | orig_pte->wimg;
 
/*
-- 
2.25.1

Re: [PATCH v4] powerpc/kexec_file: use current CPU info while setting up FDT

2021-04-19 Thread Hari Bathini





On 19/04/21 2:06 pm, Sourabh Jain wrote:

kexec_file_load uses initial_boot_params in setting up the device-tree
for the kernel to be loaded. Though initial_boot_params holds info
about CPUs at the time of boot, it doesn't account for hot added CPUs.

So, kexec'ing with kexec_file_load syscall would leave the kexec'ed
kernel with inaccurate CPU info. Also, if kdump kernel is loaded with
kexec_file_load syscall and the system crashes on a hot added CPU,
capture kernel hangs failing to identify the boot CPU.

  Kernel panic - not syncing: sysrq triggered crash
  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
  Call Trace:
  [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
  [c000e590fb00] [c0145290] panic+0x16c/0x41c
  [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
  [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
  [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
  [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
  [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
  [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
  [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
  [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff905b9664
  NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
  REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
  MSR:  8280f033   CR: 28000242
XER: 
  IRQMASK: 0
  GPR00: 0004 75fedf30 7fff906a7300 0001
  GPR04: 01002a7355b0 0002 0001 75fef616
  GPR08: 0001   
  GPR12:  7fff9073a160  
  GPR16:    
  GPR20:  7fff906a4ee0 0002 0001
  GPR24: 7fff906a0898  0002 01002a7355b0
  GPR28: 0002 7fff906a1790 01002a7355b0 0002
  NIP [7fff905b9664] 0x7fff905b9664
  LR [7fff905320c4] 0x7fff905320c4
  --- interrupt: c00

To avoid this from happening, extract current CPU info from of_root
device node and use it for setting up the fdt in kexec_file_load case.

Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve 
map")

Signed-off-by: Sourabh Jain 
Reviewed-by: Hari Bathini 
Cc: 
---
  arch/powerpc/kexec/file_load_64.c | 98 +++
  1 file changed, 98 insertions(+)

  ---
Changelog:

v1 -> v3
   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-April/227756.html

v3 -> v4
   - Rearranged if-else statement in update_cpus_node function to avoid
 redundant checks for positive cpus_offset.
  ---

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 02b9e4d0dc40..195ef303d530 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage 
*image)
return fdt_size;
  }
  
+/**

+ * add_node_prop - Read property from device node structure and add
+ * them to fdt.
+ * @fdt:   Flattened device tree of the kernel
+ * @node_offset:   offset of the node to add a property at
+ * np: device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_node_prop(void *fdt, int node_offset, const struct device_node 
*np)
+{
+   int ret = 0;
+   struct property *pp;
+   unsigned long flags;
+
+   if (!np)
+   return -EINVAL;
+
+   raw_spin_lock_irqsave(&devtree_lock, flags);
+   for (pp = np->properties; pp; pp = pp->next) {
+   ret = fdt_setprop(fdt, node_offset, pp->name,
+ pp->value, pp->length);
+   if (ret < 0) {
+   pr_err("Unable to add %s property: %s\n",
+  pp->name, fdt_strerror(ret));
+   goto out;
+   }
+   }
+out:
+   raw_spin_unlock_irqrestore(&devtree_lock, flags);
+   return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device-tree using of_root
+ * device node.
+ * @fdt:   Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int update_cpus_node(void *fdt)
+{
+   struct device_node *cpus_node, *dn;
+   int cpus_offset, cpus_subnode_off, ret = 0;
+
+   cpus_offset = fdt_path_offset(fdt, "/cpus");
+   if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
+   pr_err("Malformed device tree: error reading /cpus node: %s\n",
+  fdt_strerror(cpus_offse

[PATCH v2] perf vendor events: Initial json/events list for power10 platform

2021-04-19 Thread Kajol Jain

Patch adds initial json/events for POWER10.

Signed-off-by: Kajol Jain 
Tested-by: Paul A. Clarke 
Reviewed-by: Paul A. Clarke 
---
 .../perf/pmu-events/arch/powerpc/mapfile.csv  |   1 +
 .../arch/powerpc/power10/cache.json   |  47 +++
 .../arch/powerpc/power10/floating_point.json  |   7 +
 .../arch/powerpc/power10/frontend.json| 217 +
 .../arch/powerpc/power10/locks.json   |  12 +
 .../arch/powerpc/power10/marked.json  | 147 +
 .../arch/powerpc/power10/memory.json  | 192 +++
 .../arch/powerpc/power10/others.json  | 297 ++
 .../arch/powerpc/power10/pipeline.json| 297 ++
 .../pmu-events/arch/powerpc/power10/pmc.json  |  22 ++
 .../arch/powerpc/power10/translation.json |  57 
 11 files changed, 1296 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/cache.json
 create mode 100644 
tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/locks.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/marked.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/memory.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/others.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pmc.json
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/translation.json

---
Changelog:
v1 -> v2
- Removed inconsistencies in "BriefDescription" field and make sure
  it will end with period without any space at the end.
  Suggested by : Paul A. Clarke  
- Added Tested-by and Reviewed-by tag.
---
diff --git a/tools/perf/pmu-events/arch/powerpc/mapfile.csv 
b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
index 229150e7ab7d..4abdfc3f9692 100644
--- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv
+++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
@@ -15,3 +15,4 @@
 # Power8 entries
 004[bcd][[:xdigit:]]{4},1,power8,core
 004e[[:xdigit:]]{4},1,power9,core
+0080[[:xdigit:]]{4},1,power10,core
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json 
b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
new file mode 100644
index ..95e33531fbc6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
@@ -0,0 +1,47 @@
+[
+  {
+"EventCode": "1003C",
+"EventName": "PM_EXEC_STALL_DMISS_L2L3",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was waiting for a load miss to resolve from either the local L2 or 
local L3."
+  },
+  {
+"EventCode": "34056",
+"EventName": "PM_EXEC_STALL_LOAD_FINISH",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was finishing a load after its data was reloaded from a data source 
beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles 
in which the NTF instruction merged with another load in the LMQ."
+  },
+  {
+"EventCode": "3006C",
+"EventName": "PM_RUN_CYC_SMT2_MODE",
+"BriefDescription": "Cycles when this thread's run latch is set and the 
core is in SMT2 mode."
+  },
+  {
+"EventCode": "300F4",
+"EventName": "PM_RUN_INST_CMPL_CONC",
+"BriefDescription": "PowerPC instructions completed by this thread when 
all threads in the core had the run-latch set."
+  },
+  {
+"EventCode": "4C016",
+"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was waiting for a load miss to resolve from the local L2 or local L3, 
with a dispatch conflict."
+  },
+  {
+"EventCode": "4D014",
+"EventName": "PM_EXEC_STALL_LOAD",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was a load instruction executing in the Load Store Unit."
+  },
+  {
+"EventCode": "4D016",
+"EventName": "PM_EXEC_STALL_PTESYNC",
+"BriefDescription": "Cycles in which the oldest instruction in the 
pipeline was a PTESYNC instruction executing in the Load Store Unit."
+  },
+  {
+"EventCode": "401EA",
+"EventName": "PM_THRESH_EXC_128",
+"BriefDescription": "Threshold counter exceeded a value of 128."
+  },
+  {
+"EventCode": "400F6",
+"EventName": "PM_BR_MPRED_CMPL",
+"BriefDescription": "A mispredicted branch completed. Includes direction 
and target."
+  }
+]
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json 
b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
new file mode 100644
index ..e9b92f282d3c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
@@ -0,0 +1,7 @@
+[
+  {
+"EventCode": "4016E",
+"EventName": "PM_THRESH_NOT_MET",
+"BriefDescription": "Threshold counter did not meet threshold."
+  }
+

Re: [PATCH V2] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER

2021-04-19 Thread Christoph Lameter

On Mon, 19 Apr 2021, Anshuman Khandual wrote:

> >> Unfortunately the build test fails on both the platforms (powerpc and ia64)
> >> which subscribe HUGETLB_PAGE_SIZE_VARIABLE and where this check would make
> >> sense. I some how overlooked the cross compile build failure that actually
> >> detected this problem.
> >>
> >> But wondering why this assert is not holding true ? and how these platforms
> >> do not see the warning during boot (or do they ?) at mm/vmscan.c:1092 like
> >> arm64 did.
> >>
> >> static int __fragmentation_index(unsigned int order, struct 
> >> contig_page_info *info)
> >> {
> >>  unsigned long requested = 1UL << order;
> >>
> >>  if (WARN_ON_ONCE(order >= MAX_ORDER))
> >>  return 0;
> >> 
> >>
> >> Can pageblock_order really exceed MAX_ORDER - 1 ?

You can have larger blocks but you would need to allocate multiple
contigous max order blocks or do it at boot time before the buddy
allocator is active.

What IA64 did was to do this at boot time thereby avoiding the buddy
lists. And it had a separate virtual address range and page table for the
huge pages.

Looks like the current code does these allocations via CMA which should
also bypass the buddy allocator.

> > }
> >
> >
> > But it's kind of weird, isn't it? Let's assume we have MAX_ORDER - 1 
> > correspond to 4 MiB and pageblock_order correspond to 8 MiB.
> >
> > Sure, we'd be grouping pages in 8 MiB chunks, however, we cannot even
> > allocate 8 MiB chunks via the buddy. So only alloc_contig_range()
> > could really grab them (IOW: gigantic pages).
>
> Right.

But then you can avoid the buddy allocator.

> > Further, we have code like deferred_free_range(), where we end up
> > calling __free_pages_core()->...->__free_one_page() with
> > pageblock_order. Wouldn't we end up setting the buddy order to
> > something > MAX_ORDER -1 on that path?
>
> Agreed.

We would need to return the supersized block to the huge page pool and not
to the buddy allocator. There is a special callback in the compound page
sos that you can call an alternate free function that is not the buddy
allocator.

>
> >
> > Having pageblock_order > MAX_ORDER feels wrong and looks shaky.
> >
> Agreed, definitely does not look right. Lets see what other folks
> might have to say on this.
>
> + Christoph Lameter 
>

It was done for a long time successfully and is running in numerous
configurations.

[PATCH v2 3/4] powerpc/mm: Properly coalesce pages in ptdump

Commit aaa229529244 ("powerpc/mm: Add physical address to Linux page
table dump") changed range coalescing to only combine ranges that are
both virtually and physically contiguous, in order to avoid erroneous
combination of unrelated mappings in IOREMAP space.

But in the VMALLOC space, mappings almost never have contiguous
physical pages, so the commit mentionned above leads to dumping one
line per page for vmalloc mappings.

Taking into account the vmalloc always leave a gap between two areas,
we never have two mappings dumped as a single combination even if they
have the exact same flags. The only space that may have encountered
such an issue was the early IOREMAP which is not using vmalloc engine.
But previous commits added gaps between early IO mappings, so it is
not an issue anymore.

That commit created some difficulties with KASAN mappings, see
commit cabe8138b23c ("powerpc: dump as a single line areas mapping a
single physical page.") and with huge page, see
commit b00ff6d8c1c3 ("powerpc/ptdump: Properly handle non standard
page size").

So, almost revert commit aaa229529244 to properly coalesce pages
mapped with the same flags as before, only keep the display of the
first physical address of the range, as it can be usefull especially
for IO mappings.

It brings back powerpc at the same level as other architectures and
simplifies the conversion to GENERIC PTDUMP.

With the patch:

---[ kasan shadow mem start ]---
0xf800-0xf8ff  0x070016M   hugerw   present 
  dirty  accessed
0xf900-0xf91f  0x01434000 2M   rpresent 
 accessed
0xf920-0xf95a  0x02104000  3776K   rw   present 
  dirty  accessed
0xfef5c000-0xfeff  0x01434000   656K   rpresent 
 accessed
---[ kasan shadow mem end ]---

Before:

---[ kasan shadow mem start ]---
0xf800-0xf8ff  0x070016M   hugerw   present 
  dirty  accessed
0xf900-0xf91f  0x0143400016K   rpresent 
 accessed
0xf920-0xf9203fff  0x0210400016K   rw   present 
  dirty  accessed
0xf9204000-0xf9207fff  0x0213c00016K   rw   present 
  dirty  accessed
0xf9208000-0xf920bfff  0x0217400016K   rw   present 
  dirty  accessed
0xf920c000-0xf920  0x0218800016K   rw   present 
  dirty  accessed
0xf921-0xf9213fff  0x021dc00016K   rw   present 
  dirty  accessed
0xf9214000-0xf9217fff  0x022216K   rw   present 
  dirty  accessed
0xf9218000-0xf921bfff  0x023c16K   rw   present 
  dirty  accessed
0xf921c000-0xf921  0x023d400016K   rw   present 
  dirty  accessed
0xf922-0xf9227fff  0x023ec00032K   rw   present 
  dirty  accessed
...
0xf93b8000-0xf93e3fff  0x02614000   176K   rw   present 
  dirty  accessed
0xf93e4000-0xf94c3fff  0x027c   896K   rw   present 
  dirty  accessed
0xf94c4000-0xf94c7fff  0x0236c00016K   rw   present 
  dirty  accessed
0xf94c8000-0xf94cbfff  0x041f16K   rw   present 
  dirty  accessed
0xf94cc000-0xf94c  0x029c16K   rw   present 
  dirty  accessed
0xf94d-0xf94d3fff  0x041ec00016K   rw   present 
  dirty  accessed
0xf94d4000-0xf94d7fff  0x0407c00016K   rw   present 
  dirty  accessed
0xf94d8000-0xf94f7fff  0x041c   128K   rw   present 
  dirty  accessed
...
0xf95ac000-0xf95a  0x042b16K   rw   present 
  dirty  accessed
0xfef5c000-0xfeff  0x0143400016K   rpresent 
 accessed
---[ kasan shadow mem end ]---

Signed-off-by: Christophe Leroy 
Cc: Oliver O'Halloran 
---
 arch/powerpc/mm/ptdump/ptdump.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index aca354fb670b..5062c58b1e5b 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -58,8 +58,6 @@ struct pg_state {
const struct addr_marker *marker;
unsigned long start_address;
unsigned long start_pa;
-   unsigned long last_pa;
-   unsigned long page_size;
unsigned int level;
u64 current_flags;
bool check_wx;
@@ -163,8 +161,6 @@ static void dump_flag_info(struct pg_state *st, const 
struct flag_info
 
 static void dump_addr(struct pg_state *st, unsigned long addr)
 {
-   unsigned long delta;
-
 #ifdef CONFIG_P

[PATCH v2 4/4] powerpc/mm: Convert powerpc to GENERIC_PTDUMP

This patch converts powerpc to the generic PTDUMP implementation.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/Kconfig.debug|  30 --
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/mmu_decl.h|   2 +-
 arch/powerpc/mm/ptdump/8xx.c  |   6 +-
 arch/powerpc/mm/ptdump/Makefile   |   9 +-
 arch/powerpc/mm/ptdump/book3s64.c |   6 +-
 arch/powerpc/mm/ptdump/ptdump.c   | 165 --
 arch/powerpc/mm/ptdump/shared.c   |   6 +-
 9 files changed, 68 insertions(+), 160 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 475d77a6ebbe..40259437a28f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -120,6 +120,7 @@ config PPC
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
+   select ARCH_HAS_DEBUG_WXif STRICT_KERNEL_RWX
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
@@ -177,6 +178,7 @@ config PPC
select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_PCI_IOMAPif PCI
+   select GENERIC_PTDUMP
select GENERIC_SMP_IDLE_THREAD
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 6342f9da4545..05b1180ea502 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -360,36 +360,6 @@ config FAIL_IOMMU
 
  If you are unsure, say N.
 
-config PPC_PTDUMP
-   bool "Export kernel pagetable layout to userspace via debugfs"
-   depends on DEBUG_KERNEL && DEBUG_FS
-   help
- This option exports the state of the kernel pagetables to a
- debugfs file. This is only useful for kernel developers who are
- working in architecture specific areas of the kernel - probably
- not a good idea to enable this feature in a production kernel.
-
- If you are unsure, say N.
-
-config PPC_DEBUG_WX
-   bool "Warn on W+X mappings at boot"
-   depends on PPC_PTDUMP && STRICT_KERNEL_RWX
-   help
- Generate a warning if any W+X mappings are found at boot.
-
- This is useful for discovering cases where the kernel is leaving
- W+X mappings after applying NX, as such mappings are a security risk.
-
- Note that even if the check fails, your kernel is possibly
- still fine, as W+X mappings are not a security hole in
- themselves, what they do is that they make the exploitation
- of other unfixed kernel bugs easier.
-
- There is no runtime or memory usage effect of this option
- once the kernel has booted up - it's a one time check.
-
- If in doubt, say "Y".
-
 config PPC_FAST_ENDIAN_SWITCH
bool "Deprecated fast endian-switch syscall"
depends on DEBUG_KERNEL && PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index c3df3a8501d4..c90d58aaebe2 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_PPC_MM_SLICES)   += slice.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
-obj-$(CONFIG_PPC_PTDUMP)   += ptdump/
+obj-$(CONFIG_PTDUMP_CORE)  += ptdump/
 obj-$(CONFIG_KASAN)+= kasan/
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 7dac910c0b21..dd1cabc2ea0f 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -180,7 +180,7 @@ static inline void mmu_mark_rodata_ro(void) { }
 void __init mmu_mapin_immr(void);
 #endif
 
-#ifdef CONFIG_PPC_DEBUG_WX
+#ifdef CONFIG_DEBUG_WX
 void ptdump_check_wx(void);
 #else
 static inline void ptdump_check_wx(void) { }
diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
index 86da2a669680..fac932eb8f9a 100644
--- a/arch/powerpc/mm/ptdump/8xx.c
+++ b/arch/powerpc/mm/ptdump/8xx.c
@@ -75,8 +75,10 @@ static const struct flag_info flag_array[] = {
 };
 
 struct pgtable_level pg_level[5] = {
-   {
-   }, { /* pgd */
+   { /* pgd */
+   .flag   = flag_array,
+   .num= ARRAY_SIZE(flag_array),
+   }, { /* p4d */
.flag   = flag_array,
.num= ARRAY_SIZE(flag_array),
}, { /* pud */
diff --git a/arch/powerpc/mm/ptdump/Makefile b/arch/powerpc/mm/ptdump/Makefile
index 712762be3cb1..4050cbb55acf 100644
--- a/arch/powerpc/mm/ptdump/Makefile
+++ b/arch/powerpc/mm/ptdump/Makefile
@@ -5,5 +5,10 @@ obj-y  += ptdump.o
 obj-$(CONFIG_4xx)  += shared.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3E_MMU)   += shared.o
-obj-$(CONFIG_PPC_BOOK3S_32)+= shared.o bats.o segment_regs.o
-obj-$(CONFIG_PPC_BOOK3S_64)+= book3s64.o hashpageta

[PATCH v2 1/4] mm: pagewalk: Fix walk for hugepage tables

Pagewalk ignores hugepd entries and walk down the tables
as if it was traditionnal entries, leading to crazy result.

Add walk_hugepd_range() and use it to walk hugepage tables.

Signed-off-by: Christophe Leroy 
---
v2:
- Add a guard for NULL ops->pte_entry
- Take mm->page_table_lock when walking hugepage table, as suggested by 
follow_huge_pd()
---
 mm/pagewalk.c | 58 ++-
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index e81640d9f177..9b3db11a4d1d 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,6 +58,45 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, 
unsigned long end,
return err;
 }
 
+#ifdef CONFIG_ARCH_HAS_HUGEPD
+static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr,
+unsigned long end, struct mm_walk *walk, int 
pdshift)
+{
+   int err = 0;
+   const struct mm_walk_ops *ops = walk->ops;
+   int shift = hugepd_shift(*phpd);
+   int page_size = 1 << shift;
+
+   if (!ops->pte_entry)
+   return 0;
+
+   if (addr & (page_size - 1))
+   return 0;
+
+   for (;;) {
+   pte_t *pte;
+
+   spin_lock(&walk->mm->page_table_lock);
+   pte = hugepte_offset(*phpd, addr, pdshift);
+   err = ops->pte_entry(pte, addr, addr + page_size, walk);
+   spin_unlock(&walk->mm->page_table_lock);
+
+   if (err)
+   break;
+   if (addr >= end - page_size)
+   break;
+   addr += page_size;
+   }
+   return err;
+}
+#else
+static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr,
+unsigned long end, struct mm_walk *walk, int 
pdshift)
+{
+   return 0;
+}
+#endif
+
 static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
  struct mm_walk *walk)
 {
@@ -108,7 +147,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, 
unsigned long end,
goto again;
}
 
-   err = walk_pte_range(pmd, addr, next, walk);
+   if (is_hugepd(__hugepd(pmd_val(*pmd
+   err = walk_hugepd_range((hugepd_t *)pmd, addr, next, 
walk, PMD_SHIFT);
+   else
+   err = walk_pte_range(pmd, addr, next, walk);
if (err)
break;
} while (pmd++, addr = next, addr != end);
@@ -157,7 +199,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, 
unsigned long end,
if (pud_none(*pud))
goto again;
 
-   err = walk_pmd_range(pud, addr, next, walk);
+   if (is_hugepd(__hugepd(pud_val(*pud
+   err = walk_hugepd_range((hugepd_t *)pud, addr, next, 
walk, PUD_SHIFT);
+   else
+   err = walk_pmd_range(pud, addr, next, walk);
if (err)
break;
} while (pud++, addr = next, addr != end);
@@ -189,7 +234,9 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, 
unsigned long end,
if (err)
break;
}
-   if (ops->pud_entry || ops->pmd_entry || ops->pte_entry)
+   if (is_hugepd(__hugepd(p4d_val(*p4d
+   err = walk_hugepd_range((hugepd_t *)p4d, addr, next, 
walk, P4D_SHIFT);
+   else if (ops->pud_entry || ops->pmd_entry || ops->pte_entry)
err = walk_pud_range(p4d, addr, next, walk);
if (err)
break;
@@ -224,8 +271,9 @@ static int walk_pgd_range(unsigned long addr, unsigned long 
end,
if (err)
break;
}
-   if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry ||
-   ops->pte_entry)
+   if (is_hugepd(__hugepd(pgd_val(*pgd
+   err = walk_hugepd_range((hugepd_t *)pgd, addr, next, 
walk, PGDIR_SHIFT);
+   else if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || 
ops->pte_entry)
err = walk_p4d_range(pgd, addr, next, walk);
if (err)
break;
-- 
2.25.0

[PATCH v2 0/4] Convert powerpc to GENERIC_PTDUMP

This series converts powerpc to generic PTDUMP.

For that, we first need to add missing hugepd support
to pagewalk and ptdump.

v2:
- Reworked the pagewalk modification to add locking and check ops->pte_entry
- Modified powerpc early IO mapping to have gaps between mappings
- Removed the logic that checked for contiguous physical memory
- Removed the articial level calculation in ptdump_pte_entry(), level 4 is ok 
for all.
- Removed page_size argument to note_page()

Christophe Leroy (4):
  mm: pagewalk: Fix walk for hugepage tables
  powerpc/mm: Leave a gap between early allocated IO areas
  powerpc/mm: Properly coalesce pages in ptdump
  powerpc/mm: Convert powerpc to GENERIC_PTDUMP

 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/Kconfig.debug|  30 -
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/ioremap_32.c  |   4 +-
 arch/powerpc/mm/ioremap_64.c  |   2 +-
 arch/powerpc/mm/mmu_decl.h|   2 +-
 arch/powerpc/mm/ptdump/8xx.c  |   6 +-
 arch/powerpc/mm/ptdump/Makefile   |   9 +-
 arch/powerpc/mm/ptdump/book3s64.c |   6 +-
 arch/powerpc/mm/ptdump/ptdump.c   | 187 --
 arch/powerpc/mm/ptdump/shared.c   |   6 +-
 mm/pagewalk.c |  58 -
 12 files changed, 127 insertions(+), 187 deletions(-)

-- 
2.25.0

[PATCH v2 2/4] powerpc/mm: Leave a gap between early allocated IO areas