Re: 回复: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy


Le 14/06/2022 à 08:34, Greg KH a écrit :
> On Tue, Jun 14, 2022 at 06:09:35AM +, Wenhu Wang wrote:
>>>
>>> Odd indentation, did you use checkpatch.pl on your patch?
>>>
>>
>> Actually, I checked with the scripts, and there was no warning here.
>> I also checked in text editors and vim, if I translate tab with 4 spaces,
>> the "vma/mem" areas in the 5 lines were aligned.
> 
> Tabs in Linux are always 8 spaces wide.
> 

See 
https://docs.kernel.org/process/coding-style.html?highlight=coding+style#indentation

Tabs are 8 characters, and thus indentations are also 8 characters. 
There are heretic movements that try to make indentations 4 (or even 2!) 
characters deep, and that is akin to trying to define the value of PI to 
be 3.

Christophe

Re: 回复: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy


Le 14/06/2022 à 08:09, Wenhu Wang a écrit :
>>> +static const struct vm_operations_struct uio_cache_sram_vm_ops = {
>>> +#ifdef CONFIG_HAVE_IOREMAP_PROT
>>
>> Same here.
>>
> 
> I tried to eliminate it in mainline
> See: [PATCH v2] mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files
> https://lkml.org/lkml/2022/6/10/695
> 

I looked at that patch.

I don't think you can just drop the #ifdef in function 
__access_remote_vm() in mm/memory.c

You have to replace it with something like:

if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
break;

Christophe

Re: 回复: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy




Le 14/06/2022 à 09:18, Christophe Leroy a écrit :



Le 14/06/2022 à 08:09, Wenhu Wang a écrit :

+static const struct vm_operations_struct uio_cache_sram_vm_ops = {
+#ifdef CONFIG_HAVE_IOREMAP_PROT


Same here.



I tried to eliminate it in mainline
See: [PATCH v2] mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files
https://lkml.org/lkml/2022/6/10/695



I looked at that patch.

I don't think you can just drop the #ifdef in function 
__access_remote_vm() in mm/memory.c


You have to replace it with something like:

 if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
     break;




Another thing in that patch:

By making generic_access_phys() a static inline, it means that everytime 
you refer to the address of that function in a vm_operations_struct 
struct, the compiler has to provide an outlined instance of the 
function. It means you'll likely have several instances of a 
generic_access_phys().


What you could do instead is to add the following at the start of 
generic_access_phys() in mm/memory.c :


if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
return 0;


Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Wenhu Wang
>> >> +
>> >> +struct mpc85xx_l2ctlr {
>> >> + u32 ctl;/* 0x000 - L2 control */
>> >
>> >What is the endian of these u32 values?  You map them directly to
>> >memory, so they must be specified some way, right?  Please make it
>> >obvious what they are.
>> >
>>
>> Surely, the values should be u32 here, modified in v2
>> The controller info could be found in
>> "QorIQ™ P2020 Integrated Processor Reference Manual"
>> "Chapter 6 L2 Look-Aside Cache/SRAM"
>> See: http://m4udit.dinauz.org/P2020RM_rev0.pdf
>
>That's not the answer to my question :)
>
>These are big-endian, right?  Please mark them as such and access them
>properly with the correct functions.

Yes, they are big-edian.
Does it work to add comments(about order and access functions) for the 
structure ahead of it?
And appending like "_be", "_access_be" or "_big_endian"? (struct 
mpc85xx_l2ctlr_be {……};

>Tabs in Linux are always 8 spaces wide.
>

I will re-confirm the v2 of identation.

>>
>> I looked at that patch.
>>
>> I don't think you can just drop the #ifdef in function
>> __access_remote_vm() in mm/memory.c
>>
>> You have to replace it with something like:
>>
>>  if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
>>  break;
>>
>
>
>Another thing in that patch:
>
>By making generic_access_phys() a static inline, it means that everytime
>you refer to the address of that function in a vm_operations_struct
>struct, the compiler has to provide an outlined instance of the
>function. It means you'll likely have several instances of a
>generic_access_phys().
>
>What you could do instead is to add the following at the start of
>generic_access_phys() in mm/memory.c :
>
>if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
>return 0;
>

It is really a better chmoce, thanks for the advice.
Multiple instances exist as you mentioned, the block returns 0 with no-op
instance which makes no difference with the function return value.

I will update the patch after a re-confirming.

Thanks,
Wenhu

Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Greg KH
On Tue, Jun 14, 2022 at 07:53:46AM +, Wenhu Wang wrote:
> >> >> +
> >> >> +struct mpc85xx_l2ctlr {
> >> >> + u32 ctl;/* 0x000 - L2 control */
> >> >
> >> >What is the endian of these u32 values?  You map them directly to
> >> >memory, so they must be specified some way, right?  Please make it
> >> >obvious what they are.
> >> >
> >>
> >> Surely, the values should be u32 here, modified in v2
> >> The controller info could be found in
> >> "QorIQ�� P2020 Integrated Processor Reference Manual"
> >> "Chapter 6 L2 Look-Aside Cache/SRAM"
> >> See: http://m4udit.dinauz.org/P2020RM_rev0.pdf
> >
> >That's not the answer to my question :)
> >
> >These are big-endian, right?  Please mark them as such and access them
> >properly with the correct functions.
> 
> Yes, they are big-edian.
> Does it work to add comments(about order and access functions) for the 
> structure ahead of it��
> And appending like "_be", "_access_be" or "_big_endian"? (struct 
> mpc85xx_l2ctlr_be {};

No, not comments, these should be of the type __be32, right?

thanks,

greg k-h


Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Wenhu Wang
>On Tue, Jun 14, 2022 at 07:53:46AM +, Wenhu Wang wrote:
>> >> >> +
>> >> >> +struct mpc85xx_l2ctlr {
>> >> >> + u32 ctl;/* 0x000 - L2 control */
>> >> >
>> >> >What is the endian of these u32 values?  You map them directly to
>> >> >memory, so they must be specified some way, right?  Please make it
>> >> >obvious what they are.
>> >> >
>> >>
>> >> Surely, the values should be u32 here, modified in v2
>> >> The controller info could be found in
>> >> "QorIQ P2020 Integrated Processor Reference Manual"
>> >> "Chapter 6 L2 Look-Aside Cache/SRAM"
>> >> See: http://m4udit.dinauz.org/P2020RM_rev0.pdf
>> >
>> >That's not the answer to my question :)
>> >
>> >These are big-endian, right?  Please mark them as such and access them
>> >properly with the correct functions.
>>
>> Yes, they are big-edian.
>> Does it work to add comments(about order and access functions) for the 
>> structure ahead of it?
>> And appending like "_be", "_access_be" or "_big_endian"? (struct 
>> mpc85xx_l2ctlr_be {...};
>
>No, not comments, these should be of the type __be32, right?
>

Yes, understand. It's clear straight forward.
I will update those in patch v2.

Thanks,
Wenhu

Re: [PATCH] cxl: Fix refcount leak in cxl_calc_capp_routing

2022-06-14 Thread Andrew Donnellan
On Sun, 2022-06-05 at 10:00 +0400, Miaoqian Lin wrote:
> of_get_next_parent() returns a node pointer with refcount
> incremented,
> we should use of_node_put() on it when not need anymore.
> This function only calls of_node_put() in normal path,
> missing it in the error path.
> Add missing of_node_put() to avoid refcount leak.
> 
> Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
> Signed-off-by: Miaoqian Lin 

Thanks!

Acked-by: Andrew Donnellan 

> ---
>  drivers/misc/cxl/pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 3de0aea62ade..62385a529d86 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -387,6 +387,7 @@ int cxl_calc_capp_routing(struct pci_dev *dev,
> u64 *chipid,
> rc = get_phb_index(np, phb_index);
> if (rc) {
> pr_err("cxl: invalid phb index\n");
> +   of_node_put(np);
> return rc;
> }
>  




[PATCH 2/3] powerpc/32: Remove 'noltlbs' kernel parameter

2022-06-14 Thread Christophe Leroy
Mapping without large TLBs has no added value on the 8xx.

Mapping without large TLBs is still necessary on 40x when
selecting CONFIG_KFENCE or CONFIG_DEBUG_PAGEALLOC or
CONFIG_STRICT_KERNEL_RWX, but this is done automatically
and doesn't require user selection.

Remove 'noltlbs' kernel parameter, the user has no reason
to use it.

Signed-off-by: Christophe Leroy 
---
 Documentation/admin-guide/kernel-parameters.txt | 3 ---
 arch/powerpc/mm/init_32.c   | 3 ---
 arch/powerpc/mm/nohash/8xx.c| 9 -
 3 files changed, 15 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 96de3f1ece00..2322e429150d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3661,9 +3661,6 @@
 
nolapic_timer   [X86-32,APIC] Do not use the local APIC timer.
 
-   noltlbs [PPC] Do not use large page/tlb entries for kernel
-   lowmem mapping on PPC40x and PPC8xx
-
nomca   [IA-64] Disable machine check abort handling
 
nomce   [X86-32] Disable Machine Check Exception
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 321794747ea1..6f2e6210c273 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -79,9 +79,6 @@ unsigned long __max_low_memory = MAX_LOW_MEM;
  */
 static void __init MMU_setup(void)
 {
-   if (strstr(boot_command_line, "noltlbs")) {
-   __map_without_ltlbs = 1;
-   }
if (IS_ENABLED(CONFIG_PPC_8xx))
return;
 
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 27f9186ae374..6b668ccef836 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -14,8 +14,6 @@
 
 #define IMMR_SIZE (FIX_IMMR_SIZE << PAGE_SHIFT)
 
-extern int __map_without_ltlbs;
-
 static unsigned long block_mapped_ram;
 
 /*
@@ -28,8 +26,6 @@ phys_addr_t v_block_mapped(unsigned long va)
 
if (va >= VIRT_IMMR_BASE && va < VIRT_IMMR_BASE + IMMR_SIZE)
return p + va - VIRT_IMMR_BASE;
-   if (__map_without_ltlbs)
-   return 0;
if (va >= PAGE_OFFSET && va < PAGE_OFFSET + block_mapped_ram)
return __pa(va);
return 0;
@@ -45,8 +41,6 @@ unsigned long p_block_mapped(phys_addr_t pa)
 
if (pa >= p && pa < p + IMMR_SIZE)
return VIRT_IMMR_BASE + pa - p;
-   if (__map_without_ltlbs)
-   return 0;
if (pa < block_mapped_ram)
return (unsigned long)__va(pa);
return 0;
@@ -153,9 +147,6 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
mmu_mapin_immr();
 
-   if (__map_without_ltlbs)
-   return 0;
-
mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, true);
if (debug_pagealloc_enabled_or_kfence()) {
top = boundary;
-- 
2.36.1



[PATCH 3/3] powerpc/32: Remove __map_without_ltlbs

2022-06-14 Thread Christophe Leroy
__map_without_ltlbs is used only for 40x, and only when
STRICT_KERNEL_RWX, KFENCE or DEBUG_PAGEALLOC is active.

Do the verification directly in 40x version of mmu_mapin_ram()
and remove __map_without_ltlbs from core ppc32.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/init_32.c| 23 ---
 arch/powerpc/mm/nohash/40x.c |  9 +++--
 2 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 6f2e6210c273..62d9af6606cd 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -69,29 +69,9 @@ EXPORT_SYMBOL(agp_special_page);
 
 void MMU_init(void);
 
-int __map_without_ltlbs;
-
 /* max amount of low RAM to map in */
 unsigned long __max_low_memory = MAX_LOW_MEM;
 
-/*
- * Check for command-line options that affect what MMU_init will do.
- */
-static void __init MMU_setup(void)
-{
-   if (IS_ENABLED(CONFIG_PPC_8xx))
-   return;
-
-   if (IS_ENABLED(CONFIG_KFENCE))
-   __map_without_ltlbs = 1;
-
-   if (debug_pagealloc_enabled())
-   __map_without_ltlbs = 1;
-
-   if (strict_kernel_rwx_enabled())
-   __map_without_ltlbs = 1;
-}
-
 /*
  * MMU_init sets up the basic memory mappings for the kernel,
  * including both RAM and possibly some I/O regions,
@@ -102,9 +82,6 @@ void __init MMU_init(void)
if (ppc_md.progress)
ppc_md.progress("MMU:enter", 0x111);
 
-   /* parse args from command line */
-   MMU_setup();
-
/*
 * Reserve gigantic pages for hugetlb.  This MUST occur before
 * lowmem_end_addr is initialized below.
diff --git a/arch/powerpc/mm/nohash/40x.c b/arch/powerpc/mm/nohash/40x.c
index b32e465a3d52..3684d6e570fb 100644
--- a/arch/powerpc/mm/nohash/40x.c
+++ b/arch/powerpc/mm/nohash/40x.c
@@ -43,7 +43,6 @@
 
 #include 
 
-extern int __map_without_ltlbs;
 /*
  * MMU_init_hw does the chip-specific initialization of the MMU hardware.
  */
@@ -94,7 +93,13 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
p = 0;
s = total_lowmem;
 
-   if (__map_without_ltlbs)
+   if (IS_ENABLED(CONFIG_KFENCE))
+   return 0;
+
+   if (debug_pagealloc_enabled())
+   return 0;
+
+   if (strict_kernel_rwx_enabled())
return 0;
 
while (s >= LARGE_PAGE_SIZE_16M) {
-- 
2.36.1



[PATCH 1/3] powerpc/32: Remove the 'nobats' kernel parameter

2022-06-14 Thread Christophe Leroy
Mapping without BATs doesn't bring any added value to the user.

Remove that option.

Signed-off-by: Christophe Leroy 
---
 Documentation/admin-guide/kernel-parameters.txt |  3 ---
 arch/powerpc/mm/book3s32/mmu.c  |  2 +-
 arch/powerpc/mm/init_32.c   | 11 ---
 arch/powerpc/mm/mmu_decl.h  |  1 -
 arch/powerpc/platforms/83xx/misc.c  | 14 ++
 5 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 8090130b544b..96de3f1ece00 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3495,9 +3495,6 @@
 
noautogroup Disable scheduler automatic task group creation.
 
-   nobats  [PPC] Do not use BATs for mapping kernel lowmem
-   on "Classic" PPC cores.
-
nocache [ARM]
 
nodsp   [SH] Disable hardware DSP at boot time.
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 49a737fbbd18..1794132db31e 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -161,7 +161,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
 
 
-   if (debug_pagealloc_enabled_or_kfence() || __map_without_bats) {
+   if (debug_pagealloc_enabled_or_kfence()) {
pr_debug_once("Read-Write memory mapped without BATs\n");
if (base >= border)
return base;
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 693a3a7a9463..321794747ea1 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -69,12 +69,6 @@ EXPORT_SYMBOL(agp_special_page);
 
 void MMU_init(void);
 
-/*
- * this tells the system to map all of ram with the segregs
- * (i.e. page tables) instead of the bats.
- * -- Cort
- */
-int __map_without_bats;
 int __map_without_ltlbs;
 
 /* max amount of low RAM to map in */
@@ -85,11 +79,6 @@ unsigned long __max_low_memory = MAX_LOW_MEM;
  */
 static void __init MMU_setup(void)
 {
-   /* Check for nobats option (used in mapin_ram). */
-   if (strstr(boot_command_line, "nobats")) {
-   __map_without_bats = 1;
-   }
-
if (strstr(boot_command_line, "noltlbs")) {
__map_without_ltlbs = 1;
}
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 63c4b1a4d435..229c72e49198 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -92,7 +92,6 @@ extern void mapin_ram(void);
 extern void setbat(int index, unsigned long virt, phys_addr_t phys,
   unsigned int size, pgprot_t prot);
 
-extern int __map_without_bats;
 extern unsigned int rtas_data, rtas_size;
 
 struct hash_pte;
diff --git a/arch/powerpc/platforms/83xx/misc.c 
b/arch/powerpc/platforms/83xx/misc.c
index 3285dabcf923..2fb2a85d131f 100644
--- a/arch/powerpc/platforms/83xx/misc.c
+++ b/arch/powerpc/platforms/83xx/misc.c
@@ -121,17 +121,15 @@ void __init mpc83xx_setup_pci(void)
 
 void __init mpc83xx_setup_arch(void)
 {
+   phys_addr_t immrbase = get_immrbase();
+   int immrsize = IS_ALIGNED(immrbase, SZ_2M) ? SZ_2M : SZ_1M;
+   unsigned long va = fix_to_virt(FIX_IMMR_BASE);
+
if (ppc_md.progress)
ppc_md.progress("mpc83xx_setup_arch()", 0);
 
-   if (!__map_without_bats) {
-   phys_addr_t immrbase = get_immrbase();
-   int immrsize = IS_ALIGNED(immrbase, SZ_2M) ? SZ_2M : SZ_1M;
-   unsigned long va = fix_to_virt(FIX_IMMR_BASE);
-
-   setbat(-1, va, immrbase, immrsize, PAGE_KERNEL_NCG);
-   update_bats();
-   }
+   setbat(-1, va, immrbase, immrsize, PAGE_KERNEL_NCG);
+   update_bats();
 }
 
 int machine_check_83xx(struct pt_regs *regs)
-- 
2.36.1



[PATCH 1/2] powerpc/32: Call mmu_mark_initmem_nx() regardless of data block mapping.

2022-06-14 Thread Christophe Leroy
mark_initmem_nx() calls either mmu_mark_initmem_nx() or
set_memory_attr() based on return from v_block_mapped()
of _sinittext.

But we can now handle text and data independently, so that
text may be mapped by block even when data is mapped by pages.

On the 8xx for instance, at startup 32Mbytes of memory are
pinned in TLB. So the pinned entries need to go away for sinittext.

In next patch a BAT will be set to also covers sinittext on book3s/32.
So it will also be needed to call mmu_mark_initmem_nx() even when
data above sinittext is not mapped with BATs.

As this is highly dependent on the platform, call mmu_mark_initmem_nx()
regardless of data block mapping. Then the platform will know what to
do.

Modify 8xx mmu_mark_initmem_nx() so that inittext mapping is modified
only when pagealloc debug and kfence are not active, otherwise inittext
is mapped with standard pages. And don't do anything on kernel text
which is already mapped with PAGE_KERNEL_TEXT.

Cc: Maxime Bizon 
Fixes: da1adea07576 ("powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/nohash/8xx.c | 4 ++--
 arch/powerpc/mm/pgtable_32.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 27f9186ae374..1ee08c3efe5b 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -179,8 +179,8 @@ void mmu_mark_initmem_nx(void)
unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : 
etext8;
unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
 
-   mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, false);
-   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
+   if (!debug_pagealloc_enabled_or_kfence())
+   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
 
mmu_pin_tlb(block_mapped_ram, false);
 }
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index a56ade39dc68..3ac73f9fb5d5 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -135,9 +135,9 @@ void mark_initmem_nx(void)
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
 PFN_DOWN((unsigned long)_sinittext);
 
-   if (v_block_mapped((unsigned long)_sinittext)) {
-   mmu_mark_initmem_nx();
-   } else {
+   mmu_mark_initmem_nx();
+
+   if (!v_block_mapped((unsigned long)_sinittext)) {
set_memory_nx((unsigned long)_sinittext, numpages);
set_memory_rw((unsigned long)_sinittext, numpages);
}
-- 
2.36.1



[PATCH 2/2] powerpc/32: Set an IBAT covering up to _einittext during init

2022-06-14 Thread Christophe Leroy
Always set an IBAT covering up to _einittext during init because when
CONFIG_MODULES is not selected there is no reason to have an exception
handler for kernel instruction TLB misses.

It implies DBAT and IBAT are now totaly independent, IBATs are set
by setibat() and DBAT by setbat().

This allows to revert commit 9bb162fa26ed ("powerpc/603: Fix
boot failure with DEBUG_PAGEALLOC and KFENCE")

Reported-by: Maxime Bizon 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_book3s_32.S |  4 ++--
 arch/powerpc/mm/book3s32/mmu.c   | 10 --
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 6c739beb938c..519b60695167 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -418,14 +418,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || 
defined(CONFIG_KFENCE)
+#ifdef CONFIG_MODULES
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
rlwinm  r2, r2, 28, 0xf000
-#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || 
defined(CONFIG_KFENCE)
+#ifdef CONFIG_MODULES
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 49a737fbbd18..40029280c320 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -159,7 +159,10 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 {
unsigned long done;
unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
+   unsigned long size;
 
+   size = roundup_pow_of_two((unsigned long)_einittext - PAGE_OFFSET);
+   setibat(0, PAGE_OFFSET, 0, size, PAGE_KERNEL_X);
 
if (debug_pagealloc_enabled_or_kfence() || __map_without_bats) {
pr_debug_once("Read-Write memory mapped without BATs\n");
@@ -245,10 +248,9 @@ void mmu_mark_rodata_ro(void)
 }
 
 /*
- * Set up one of the I/D BAT (block address translation) register pairs.
+ * Set up one of the D BAT (block address translation) register pairs.
  * The parameters are not checked; in particular size must be a power
  * of 2 between 128k and 256M.
- * On 603+, only set IBAT when _PAGE_EXEC is set
  */
 void __init setbat(int index, unsigned long virt, phys_addr_t phys,
   unsigned int size, pgprot_t prot)
@@ -284,10 +286,6 @@ void __init setbat(int index, unsigned long virt, 
phys_addr_t phys,
/* G bit must be zero in IBATs */
flags &= ~_PAGE_EXEC;
}
-   if (flags & _PAGE_EXEC)
-   bat[0] = bat[1];
-   else
-   bat[0].batu = bat[0].batl = 0;
 
bat_addrs[index].start = virt;
bat_addrs[index].limit = virt + ((bl + 1) << 17) - 1;
-- 
2.36.1



Re: [PATCH 1/2] powerpc:mm: export symbol ioremap_coherent

2022-06-14 Thread Michael Ellerman
Wang Wenhu  writes:
> The function ioremap_coherent may be called by modules such as
> fsl_85xx_cache_sram. So export it for access in other modules.

ioremap_coherent() is powerpc specific, and only has one other caller,
I'd like to remove it.

Does ioremap_cache() work for you?

cheers

> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
> index 4f12504fb405..08a00dacef0b 100644
> --- a/arch/powerpc/mm/ioremap.c
> +++ b/arch/powerpc/mm/ioremap.c
> @@ -40,6 +40,7 @@ void __iomem *ioremap_coherent(phys_addr_t addr, unsigned 
> long size)
>   return iowa_ioremap(addr, size, prot, caller);
>   return __ioremap_caller(addr, size, prot, caller);
>  }
> +EXPORT_SYMBOL(ioremap_coherent);
>  
>  void __iomem *ioremap_prot(phys_addr_t addr, unsigned long size, unsigned 
> long flags)
>  {
> -- 
> 2.25.1


Re: [PATCH 1/6] powerpc: Add ZERO_GPRS macros for register clears

2022-06-14 Thread Segher Boessenkool
On Tue, Jun 14, 2022 at 02:31:01PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Sat, Jun 11, 2022 at 08:42:27AM +, Christophe Leroy wrote:
> >> I'd have a preference for using a verb, for instance ZEROISE_REGS or 
> >> CLEAR_REGS
> >
> > "Zero" is a verb as well (as well as a noun and an adjective) :-)
> 
> And "clear" is also a verb and an adjective, though helpfully the noun
> is "clearing" :D

Yeah.  I don't like "clear" here because it isn't as clear what it
actually will do.  The purpose here is to actually makes those registers
hold the number zero, it isn't just to make it harmless some way.  Which
btw is the context in which "zeroisation" is normally used: in crypto
and other security stuff.

But "zero_regs" can be confusing in other ways, like, if it isn't clear
to the reader it is a verb here.

> We could use "nullify", that has some existing usage in the kernel,
> although I don't really mind, "zeroise" sounds kind of cool :)

That is a benefit yes :-)  And it won't be confusing what it does.


Segher


Re: [PATCH 09/10] scsi/ibmvscsi: Replace srp tasklet with work

2022-06-14 Thread 'Sebastian Andrzej Siewior'
On 2022-06-09 15:46:04 [+], David Laight wrote:
> From: Sebastian Andrzej Siewior
> > Sent: 09 June 2022 16:03
> > 
> > On 2022-05-30 16:15:11 [-0700], Davidlohr Bueso wrote:
> > > Tasklets have long been deprecated as being too heavy on the system
> > > by running in irq context - and this is not a performance critical
> > > path. If a higher priority process wants to run, it must wait for
> > > the tasklet to finish before doing so.
> > >
> > > Process srps asynchronously in process context in a dedicated
> > > single threaded workqueue.
> > 
> > I would suggest threaded interrupts instead. The pattern here is the
> > same as in the previous driver except here is less locking.
> 
> How long do these actions runs for, and what is waiting for
> them to finish?

That is something that one with hardware and workload can answer.

> These changes seem to drop the priority from above that of the
> highest priority RT process down to that of a default priority
> user process.
> There is no real guarantee that the latter will run 'any time soon'.

Not sure I can follow. Using threaded interrupts will run at FIFO-50 by
default. Workqueue however is SCHED_OTHER. But then it is not bound to
any CPU so it will run on an available CPU.

> Consider some workloads I'm setting up where most of the cpu are
> likely to spend 90%+ of the time running processes under the RT
> scheduler that are processing audio.
> 
> It is quite likely that a non-RT thread (especially one bound
> to a specific cpu) won't run for several milliseconds.
> (We have to go through 'hoops' to avoid dropping ethernet frames.)
> 
> I'd have thought that some of these kernel threads really
> need to run at a 'middling' RT priority.

The threaded interrupts do this by default. If you run your own RT
threads you need to decide if they are more or less important than the
interrupts.

>   David

Sebastian


RE: [PATCH 09/10] scsi/ibmvscsi: Replace srp tasklet with work

2022-06-14 Thread David Laight
From: Sebastian Andrzej Siewior
> Sent: 14 June 2022 14:26
...
> > These changes seem to drop the priority from above that of the
> > highest priority RT process down to that of a default priority
> > user process.
> > There is no real guarantee that the latter will run 'any time soon'.
> 
> Not sure I can follow. Using threaded interrupts will run at FIFO-50 by
> default. Workqueue however is SCHED_OTHER. But then it is not bound to
> any CPU so it will run on an available CPU.

Ok, I'd only looked at normal workqueues, softints and napi.
They are all SCHED_OTHER.

Unbound FIFO is moderately ok - they are sticky but can move.
The only problem is that they won't move if a process is
spinning in kernel on the cpu they last run on.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


[PATCH] powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address

2022-06-14 Thread Andrew Donnellan
Add a special case to block_rtas_call() to allow the ibm,platform-dump RTAS
call through the RTAS filter if the buffer address is 0.

According to PAPR, ibm,platform-dump is called with a null buffer address
to notify the platform firmware that processing of a particular dump is
finished.

Without this, on a pseries machine with CONFIG_PPC_RTAS_FILTER enabled, an
application such as rtas_errd that is attempting to retrieve a dump will
encounter an error at the end of the retrieval process.

Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
Cc: sta...@vger.kernel.org
Reported-by: Sathvika Vasireddy 
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/kernel/rtas.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index a6fce3106e02..693133972294 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1071,7 +1071,7 @@ static struct rtas_filter rtas_filters[] __ro_after_init 
= {
{ "get-time-of-day", -1, -1, -1, -1, -1 },
{ "ibm,get-vpd", -1, 0, -1, 1, 2 },
{ "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
-   { "ibm,platform-dump", -1, 4, 5, -1, -1 },
+   { "ibm,platform-dump", -1, 4, 5, -1, -1 },  /* Special 
cased */
{ "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
{ "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
{ "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
@@ -1120,6 +1120,15 @@ static bool block_rtas_call(int token, int nargs,
size = 1;
 
end = base + size - 1;
+
+   /*
+* Special case for ibm,platform-dump - NULL buffer
+* address is used to indicate end of dump processing
+*/
+   if (!strcmp(f->name, "ibm,platform-dump") &&
+   base == 0)
+   return false;
+
if (!in_rmo_buf(base, end))
goto err;
}
-- 
2.30.2



[PATCH v2 0/4] Extending NMI watchdog during LPM

2022-06-14 Thread Laurent Dufour
When a partition is transferred, once it arrives at the destination node,
the partition is active but much of its memory must be transferred from the
start node.

It depends on the activity in the partition, but the more CPU the partition
has, the more memory to be transferred is likely to be. This causes latency
when accessing pages that need to be transferred, and often, for large
partitions, it triggers the NMI watchdog.

The NMI watchdog causes the CPU stack to dump where it appears to be
stuck. In this case, it does not bring much information since it can happen
during any memory access of the kernel.

In addition, the NMI interrupt mechanism is not secure and can generate a
dump system in the event that the interruption is taken while MSR[RI]=0.

Depending on the LPAR size and load, it may be interesting to extend the
NMI watchdog timer during the LPM.

That's configurable through sysctl with the new introduced variable
(specific to powerpc) lpm_nmi_watchdog_factor. This value represents the
percentage added to watchdog_tresh to set the NMI watchdog timeout during a
LPM.

Changes in v2:
 - introduce a timer factor.

v1:
[PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
https://lore.kernel.org/linuxppc-dev/20220601155315.35109-1-lduf...@linux.ibm.com/#r

Laurent Dufour (4):
  powerpc/mobility: Wait for memory transfer to complete
  watchdog: export watchdog_mutex and lockup_detector_reconfigure
  powerpc/watchdog: introduce a LPM factor
  pseries/mobility: Set NMI watchdog factor during LPM

 Documentation/admin-guide/sysctl/kernel.rst | 12 +++
 arch/powerpc/include/asm/nmi.h  |  2 +
 arch/powerpc/kernel/watchdog.c  | 22 -
 arch/powerpc/platforms/pseries/mobility.c   | 90 -
 include/linux/nmi.h |  3 +
 kernel/watchdog.c   |  6 +-
 6 files changed, 129 insertions(+), 6 deletions(-)

-- 
2.36.1



[PATCH v2 3/4] powerpc/watchdog: introduce a LPM factor

2022-06-14 Thread Laurent Dufour
Introduce a factor which would apply to the NMI watchdog timeout.

This factor is a percentage added to the watchdog_tresh value. The value is
set under the watchdog_mutex protection and lockup_detector_reconfigure()
is called to recompute wd_panic_timeout_tb.

Once the factor is set, it remains until it is set back to 0, which means
no impact.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/nmi.h |  2 ++
 arch/powerpc/kernel/watchdog.c | 22 +-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index ea0e487f87b1..4eb894ef12a3 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,8 +5,10 @@
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
 long soft_nmi_interrupt(struct pt_regs *regs);
+void watchdog_nmi_set_lpm_factor(u64 factor);
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
+static void watchdog_nmi_set_lpm_factor(u64 factor) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..faaf5ba14d69 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
 static cpumask_t wd_smp_cpus_stuck;
 static u64 wd_smp_last_reset_tb;
 
+#ifdef CONFIG_PPC_PSERIES
+static u64 wd_factor;
+#endif
+
 /*
  * Try to take the exclusive watchdog action / NMI IPI / printing lock.
  * wd_smp_lock must be held. If this fails, we should return and wait
@@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
 
 static void watchdog_calc_timeouts(void)
 {
-   wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
+   u64 threshold = watchdog_thresh;
+
+#ifdef CONFIG_PPC_PSERIES
+   threshold += (wd_factor * threshold) / 100;
+#endif
+
+   wd_panic_timeout_tb = threshold * ppc_tb_freq;
 
/* Have the SMP detector trigger a bit later */
wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
@@ -570,3 +580,13 @@ int __init watchdog_nmi_probe(void)
}
return 0;
 }
+
+#ifdef CONFIG_PPC_PSERIES
+void watchdog_nmi_set_lpm_factor(u64 factor)
+{
+   mutex_lock(&watchdog_mutex);
+   wd_factor = factor;
+   lockup_detector_reconfigure();
+   mutex_unlock(&watchdog_mutex);
+}
+#endif
-- 
2.36.1



[PATCH v2 4/4] pseries/mobility: Set NMI watchdog factor during LPM

2022-06-14 Thread Laurent Dufour
During a LPM, while the memory transfer is in progress on the arrival side,
some latencies is generated when accessing not yet transferred pages on the
arrival side. Thus, the NMI watchdog may be triggered too frequently, which
increases the risk to hit a NMI interrupt in a bad place in the kernel,
leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a too
strong work around, some users would want this timeout to be eventually
triggered if the system is hanging even during LPM.

Introduce a new sysctl variable lpm_nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
stopped for the switchover sequence, the NMI watchdog timer is set to
 watchdog_tresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the NMI
watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during a LPM.

Signed-off-by: Laurent Dufour 
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 ++
 arch/powerpc/platforms/pseries/mobility.c   | 48 +
 2 files changed, 60 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index ddccd1077462..53701ed671de 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -485,6 +485,18 @@ When ``kptr_restrict`` is set to 2, kernel pointers 
printed using
 %pK will be replaced with 0s regardless of privileges.
 
 
+lpm_nmi_watchdog_factor (PPC only)
+==
+
+Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
+set to 1). This factor represents the percentage added to
+``watchdog_thresh`` when calculating the NMI watchdog timeout during a
+LPM. The soft lockup timeout is not impacted.
+
+A value of 0 means no change. The default value is 200 meaning the NMI
+watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
+
+
 modprobe
 
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 179bbd4ae881..4284ceaf9060 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -48,6 +48,39 @@ struct update_props_workarea {
 #define MIGRATION_SCOPE(1)
 #define PRRN_SCOPE -2
 
+#ifdef CONFIG_PPC_WATCHDOG
+static unsigned int lpm_nmi_wd_factor = 200;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table lpm_nmi_wd_factor_ctl_table[] = {
+   {
+   .procname   = "lpm_nmi_watchdog_factor",
+   .data   = &lpm_nmi_wd_factor,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_douintvec_minmax,
+   },
+   {}
+};
+static struct ctl_table lpm_nmi_wd_factor_sysctl_root[] = {
+   {
+   .procname   = "kernel",
+   .mode   = 0555,
+   .child  = lpm_nmi_wd_factor_ctl_table,
+   },
+   {}
+};
+
+static int __init register_lpm_nmi_wd_factor_sysctl(void)
+{
+   register_sysctl_table(lpm_nmi_wd_factor_sysctl_root);
+
+   return 0;
+}
+device_initcall(register_lpm_nmi_wd_factor_sysctl);
+#endif /* CONFIG_SYSCTL */
+#endif /* CONFIG_PPC_WATCHDOG */
+
 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
int rc;
@@ -702,6 +735,7 @@ static int pseries_suspend(u64 handle)
 static int pseries_migrate_partition(u64 handle)
 {
int ret;
+   unsigned int factor = lpm_nmi_wd_factor;
 
ret = wait_for_vasi_session_suspending(handle);
if (ret)
@@ -709,6 +743,13 @@ static int pseries_migrate_partition(u64 handle)
 
vas_migration_handler(VAS_SUSPEND);
 
+#ifdef CONFIG_PPC_WATCHDOG
+   if (factor) {
+   pr_info("Set the NMI watchdog factor to %u%%\n", factor);
+   watchdog_nmi_set_lpm_factor(factor);
+   }
+#endif /* CONFIG_PPC_WATCHDOG */
+
ret = pseries_suspend(handle);
if (ret == 0) {
post_mobility_fixup();
@@ -716,6 +757,13 @@ static int pseries_migrate_partition(u64 handle)
} else
pseries_cancel_migration(handle, ret);
 
+#ifdef CONFIG_PPC_WATCHDOG
+   if (factor) {
+   pr_info("Restoring NMI watchdog timer\n");
+   watchdog_nmi_set_lpm_factor(0);
+   }
+#endif /* CONFIG_PPC_WATCHDOG */
+
vas_migration_handler(VAS_RESUME);
 
return ret;
-- 
2.36.1



[PATCH v2 1/4] powerpc/mobility: Wait for memory transfer to complete

2022-06-14 Thread Laurent Dufour
In pseries_migration_partition(), loop until the memory transfer is
complete. This way the calling drmgr process will not exit earlier,
allowing callbacks to be run only once the migration is fully completed.

If reading the VASI state is done after the hypervisor has completed the
migration, the HCALL is returning H_PARAMETER. We can safely assume that
the memory transfer is achieved if this happens.

This will also allow to manage the NMI watchdog state in the next commits.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/mobility.c | 42 +--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 78f3f74c7056..179bbd4ae881 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
return ret;
 }
 
+static void wait_for_vasi_session_completed(u64 handle)
+{
+   unsigned long state = 0;
+   int ret;
+
+   pr_info("waiting for memory transfert to complete...\n");
+   /*
+* Wait for transition from H_VASI_RESUMED to
+* H_VASI_COMPLETED. Treat anything else as an error.
+*/
+   while (true) {
+   ret = poll_vasi_state(handle, &state);
+
+   /*
+* If the memory transfer is already complete and the migration
+* has been cleaned up by the hypervisor, H_PARAMETER is return,
+* which is translate in EINVAL by poll_vasi_state().
+*/
+   if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
+   pr_info("memory transfert completed.\n");
+   break;
+   }
+
+   if (ret) {
+   pr_err("H_VASI_STATE return error (%d)\n", ret);
+   break;
+   }
+
+   if (state != H_VASI_RESUMED) {
+   pr_err("unexpected H_VASI_STATE result %lu\n", state);
+   break;
+   }
+
+   msleep(500);
+   }
+}
+
 static void prod_single(unsigned int target_cpu)
 {
long hvrc;
@@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
vas_migration_handler(VAS_SUSPEND);
 
ret = pseries_suspend(handle);
-   if (ret == 0)
+   if (ret == 0) {
post_mobility_fixup();
-   else
+   wait_for_vasi_session_completed(handle);
+   } else
pseries_cancel_migration(handle, ret);
 
vas_migration_handler(VAS_RESUME);
-- 
2.36.1



[PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

2022-06-14 Thread Laurent Dufour
In some cricunstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog, would prevent it to happen too frequently during LPM.

The watchdog_mutex is exported to allow some variable to be changed under
its protection and prevent any conflict.
The lockup_detector_reconfigure() function is exported and is expected to
be called under the protection of watchdog_mutex.

Signed-off-by: Laurent Dufour 
---
 include/linux/nmi.h | 3 +++
 kernel/watchdog.c   | 6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..84300fb0f90a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -122,6 +122,9 @@ int watchdog_nmi_probe(void);
 int watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
+extern struct mutex watchdog_mutex;
+void lockup_detector_reconfigure(void);
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 20a7a55e62b6..0a67a2dd1258 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 
-static DEFINE_MUTEX(watchdog_mutex);
+DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
 # define WATCHDOG_DEFAULT  (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
@@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
return 0;
 }
 
-static void lockup_detector_reconfigure(void)
+void lockup_detector_reconfigure(void)
 {
cpus_read_lock();
watchdog_nmi_stop();
@@ -583,7 +583,7 @@ static __init void lockup_detector_setup(void)
 }
 
 #else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static void lockup_detector_reconfigure(void)
+void lockup_detector_reconfigure(void)
 {
cpus_read_lock();
watchdog_nmi_stop();
-- 
2.36.1



Re: [PATCH 2/6] powerpc: Provide syscall wrapper

2022-06-14 Thread Andrew Donnellan
On Fri, 2022-06-03 at 08:39 +, Christophe Leroy wrote:
> 
> 
> Le 03/06/2022 à 09:09, Andrew Donnellan a écrit :
> > On Fri, 2022-06-03 at 13:24 +1000, Rohan McLure wrote:
> > > The implementation of ppc_personality can be immediately reworked
> > > to
> > > call ksys_personality, but I can’t do the same for sys_old_select
> > > for
> > > example, which is required to implement ppc_select. As such we
> > > emit
> > > both
> > 
> > For ppc_select, I suggest we resurrect
> >  
> > https://lore.kernel.org/lkml/5811950d-ef14-d416-35e6-d694ef920...@csgroup.eu/T/#u
> > and just get rid of the hack.
> > 
> 
> Not sure I understand, what would you like to resurrect ? You want to
> resurrect the discussion, or revert commit fd69d544b0e7 
> ("powerpc/syscalls: Use sys_old_select() in ppc_select()") ?

We should get rid of ppc_select(), if we can. If Arnd's history is
correct, there's little reason to keep it around and we should just get
rid of it.


-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited



Re: [Bug 216095] New: sysfs: cannot create duplicate filename '/devices/platform/of-display'

2022-06-14 Thread Michael Ellerman
Hi Erhard,

This is presumably caused by:

  52b1b46c39ae ("of: Create platform devices for OF framebuffers")

Can you try the patch below?

cheers


diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3507095a69f6..a70ff9df5cb9 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -556,7 +556,7 @@ static int __init of_platform_default_populate_init(void)
if (!of_get_property(node, "linux,opened", NULL) ||
!of_get_property(node, "linux,boot-display", NULL))
continue;
-   dev = of_platform_device_create(node, "of-display", 
NULL);
+   dev = of_platform_device_create(node, NULL, NULL);
if (WARN_ON(!dev))
return -ENOMEM;
boot_display = node;
@@ -565,7 +565,7 @@ static int __init of_platform_default_populate_init(void)
for_each_node_by_type(node, "display") {
if (!of_get_property(node, "linux,opened", NULL) || 
node == boot_display)
continue;
-   of_platform_device_create(node, "of-display", NULL);
+   of_platform_device_create(node, NULL, NULL);
}
 
} else {


bugzilla-dae...@kernel.org writes:
> https://bugzilla.kernel.org/show_bug.cgi?id=216095
>
> Bug ID: 216095
>Summary: sysfs: cannot create duplicate filename
> '/devices/platform/of-display'
>Product: Platform Specific/Hardware
>Version: 2.5
> Kernel Version: 5.19-rc1
>   Hardware: PPC-32
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: PPC-32
>   Assignee: platform_ppc...@kernel-bugs.osdl.org
>   Reporter: erhar...@mailbox.org
> Regression: No
>
> Created attachment 301127
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301127&action=edit
> dmesg (5.19-rc1, PowerMac G4 DP)
>
> [...]
> sysfs: cannot create duplicate filename '/devices/platform/of-display'
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc1-PMacG4+ #3
> Call Trace:
> [e9025cc0] [c05984d0] dump_stack_lvl+0x60/0x90 (unreliable)
> [e9025ce0] [c02f043c] sysfs_warn_dup+0x64/0x84
> [e9025d00] [c02f05cc] sysfs_create_dir_ns+0xfc/0x118
> [e9025d30] [c059ffa4] kobject_add_internal+0x114/0x2f0
> [e9025d60] [c05a0790] kobject_add+0x80/0xf0
> [e9025da0] [c064c3d8] device_add+0x114/0x94c
> [e9025e10] [c06f197c] of_platform_device_create_pdata+0xb8/0x144
> [e9025e40] [c0c43bb4] of_platform_default_populate_init+0x284/0x2f4
> [e9025e70] [c0007a94] do_one_initcall+0x50/0x294
> [e9025ee0] [c0c03ff0] kernel_init_freeable+0x228/0x334
> [e9025f20] [c0007efc] kernel_init+0x28/0x144
> [e9025f40] [c0019334] ret_from_kernel_thread+0x5c/0x64
> kobject_add_internal failed for of-display with -EEXIST, don't try to register
> things with the same name in the same directory.
>
> -- 
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.


Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Wenhu Wang
>>>
>>> I looked at that patch.
>>>
>>> I don't think you can just drop the #ifdef in function
>>> __access_remote_vm() in mm/memory.c
>>>
>>> You have to replace it with something like:
>>>
>>>  if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
>>>  break;
>>>
>>
>>
>>Another thing in that patch:
>>
>>By making generic_access_phys() a static inline, it means that everytime
>>you refer to the address of that function in a vm_operations_struct
>>struct, the compiler has to provide an outlined instance of the
>>function. It means you'll likely have several instances of a
>>generic_access_phys().
>>
>>What you could do instead is to add the following at the start of
>>generic_access_phys() in mm/memory.c :
>>
>>    if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
>>    return 0;
>>
>
>It is really a better chmoce, thanks for the advice.
>Multiple instances exist as you mentioned, the block returns 0 with no-op
>instance which makes no difference with the function return value.
>
>I will update the patch after a re-confirming.
>

I tried as adviced but when not defined, error happens on archectures such
as arm64. Actually the function generic_access_phys calls a lot of functions
that become undefined if we compile it with CONFIG_HAVE_IOREMAP_PROT disabled.
The archectures that support CONFIG_HAVE_IOREMAP_PROT are mips, x86, sh, arc,
s390, loongarch and powerpc.

So we may just define the function with static inline and add IS_ENABLED
condition branch in function __access_remote_vm in mm/memory.c. The executing
path breaks if CONFIG_HAVE_IOREMAP_PROT is disabled, and never goes into the
static no-op function.

In short, the static inline no-op function would never be executed, the only
difference is that there would be a lot of function code in compiled target.

Thanks,
Wenhu

Re: [PATCH 1/2] powerpc:mm: export symbol ioremap_coherent

2022-06-14 Thread Christoph Hellwig
On Tue, Jun 14, 2022 at 08:45:25PM +1000, Michael Ellerman wrote:
> Wang Wenhu  writes:
> > The function ioremap_coherent may be called by modules such as
> > fsl_85xx_cache_sram. So export it for access in other modules.
> 
> ioremap_coherent() is powerpc specific, and only has one other caller,
> I'd like to remove it.
> 
> Does ioremap_cache() work for you?

Chances are that both are the wrong thing and this really wants
memremap, as SRAM tends to have memory and not MMIO semantics.


Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christoph Hellwig
UIO seems like the wrong kind of interface for this.  Why isn't this
a simple character device?


Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy


Le 14/06/2022 à 16:40, Wenhu Wang a écrit :

 I looked at that patch.

 I don't think you can just drop the #ifdef in function
 __access_remote_vm() in mm/memory.c

 You have to replace it with something like:

    if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
    break;

>>>
>>>
>>> Another thing in that patch:
>>>
>>> By making generic_access_phys() a static inline, it means that everytime
>>> you refer to the address of that function in a vm_operations_struct
>>> struct, the compiler has to provide an outlined instance of the
>>> function. It means you'll likely have several instances of a
>>> generic_access_phys().
>>>
>>> What you could do instead is to add the following at the start of
>>> generic_access_phys() in mm/memory.c :
>>>
>>>      if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
>>>      return 0;
>>>
>>
>> It is really a better chmoce, thanks for the advice.
>> Multiple instances exist as you mentioned, the block returns 0 with no-op
>> instance which makes no difference with the function return value.
>>
>> I will update the patch after a re-confirming.
>>
> 
> I tried as adviced but when not defined, error happens on archectures such
> as arm64. Actually the function generic_access_phys calls a lot of functions
> that become undefined if we compile it with CONFIG_HAVE_IOREMAP_PROT disabled.
> The archectures that support CONFIG_HAVE_IOREMAP_PROT are mips, x86, sh, arc,
> s390, loongarch and powerpc.
> 
> So we may just define the function with static inline and add IS_ENABLED
> condition branch in function __access_remote_vm in mm/memory.c. The executing
> path breaks if CONFIG_HAVE_IOREMAP_PROT is disabled, and never goes into the
> static no-op function.
> 
> In short, the static inline no-op function would never be executed, the only
> difference is that there would be a lot of function code in compiled target.
> 

In that case all you have to do is:

diff --git a/mm/memory.c b/mm/memory.c
index 7a089145cad4..39b369fc77f6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5413,6 +5413,13 @@ int generic_access_phys(struct vm_area_struct 
*vma, unsigned long addr,
return ret;
  }
  EXPORT_SYMBOL_GPL(generic_access_phys);
+#else
+int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
+   void *buf, int len, int write)
+{
+   return 0;
+}
+EXPORT_SYMBOL_GPL(generic_access_phys);
  #endif

  /*


Christophe

Re: [PATCH] powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address

2022-06-14 Thread Nathan Lynch
Andrew Donnellan  writes:
> Add a special case to block_rtas_call() to allow the ibm,platform-dump RTAS
> call through the RTAS filter if the buffer address is 0.
>
> According to PAPR, ibm,platform-dump is called with a null buffer address
> to notify the platform firmware that processing of a particular dump is
> finished.
>
> Without this, on a pseries machine with CONFIG_PPC_RTAS_FILTER enabled, an
> application such as rtas_errd that is attempting to retrieve a dump will
> encounter an error at the end of the retrieval process.
>
> Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
> Cc: sta...@vger.kernel.org
> Reported-by: Sathvika Vasireddy 
> Signed-off-by: Andrew Donnellan 

I agree this allows ibm,platform-dump to work without weakening the
filter for other calls. Thanks.

Reviewed-by: Nathan Lynch 


Re: [PATCH] powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address

2022-06-14 Thread Tyrel Datwyler
On 6/14/22 06:49, Andrew Donnellan wrote:
> Add a special case to block_rtas_call() to allow the ibm,platform-dump RTAS
> call through the RTAS filter if the buffer address is 0.
> 
> According to PAPR, ibm,platform-dump is called with a null buffer address
> to notify the platform firmware that processing of a particular dump is
> finished.
> 
> Without this, on a pseries machine with CONFIG_PPC_RTAS_FILTER enabled, an
> application such as rtas_errd that is attempting to retrieve a dump will
> encounter an error at the end of the retrieval process.
> 
> Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
> Cc: sta...@vger.kernel.org
> Reported-by: Sathvika Vasireddy 
> Signed-off-by: Andrew Donnellan 

Similar to what is done for ibm,configure-connector with idx_buf2 and a NULL
address.

Reviewed-by: Tyrel Datwyler 

> ---
>  arch/powerpc/kernel/rtas.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index a6fce3106e02..693133972294 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1071,7 +1071,7 @@ static struct rtas_filter rtas_filters[] 
> __ro_after_init = {
>   { "get-time-of-day", -1, -1, -1, -1, -1 },
>   { "ibm,get-vpd", -1, 0, -1, 1, 2 },
>   { "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
> - { "ibm,platform-dump", -1, 4, 5, -1, -1 },
> + { "ibm,platform-dump", -1, 4, 5, -1, -1 },  /* Special 
> cased */
>   { "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
>   { "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
>   { "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
> @@ -1120,6 +1120,15 @@ static bool block_rtas_call(int token, int nargs,
>   size = 1;
> 
>   end = base + size - 1;
> +
> + /*
> +  * Special case for ibm,platform-dump - NULL buffer
> +  * address is used to indicate end of dump processing
> +  */
> + if (!strcmp(f->name, "ibm,platform-dump") &&
> + base == 0)
> + return false;
> +
>   if (!in_rmo_buf(base, end))
>   goto err;
>   }



[PATCH] KVM: PPC: Book3S HV: tracing: Add missing hcall names

2022-06-14 Thread Fabiano Rosas
The kvm_trace_symbol_hcall macro is missing several of the hypercalls
defined in hvcall.h.

Add the most common ones that are issued during guest lifetime,
including the ones that are only used by QEMU and SLOF.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/include/asm/hvcall.h |  8 
 arch/powerpc/kvm/trace_hv.h   | 21 -
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index d92a20a85395..1d454c70e7c6 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -350,6 +350,14 @@
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS 0xf000
 
+/*
+ * Platform specific hcalls, used by QEMU/SLOF. These are ignored by
+ * KVM and only kept here so we can identify them during tracing.
+ */
+#define H_LOGICAL_MEMOP  0xF001
+#define H_CAS0XF002
+#define H_UPDATE_DT  0XF003
+
 /* "Platform specific hcalls", provided by PHYP */
 #define H_GET_24X7_CATALOG_PAGE0xF078
 #define H_GET_24X7_DATA0xF07C
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 32e2cb5811cc..8d57c8428531 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -94,6 +94,7 @@
{H_GET_HCA_INFO,"H_GET_HCA_INFO"}, \
{H_GET_PERF_COUNT,  "H_GET_PERF_COUNT"}, \
{H_MANAGE_TRACE,"H_MANAGE_TRACE"}, \
+   {H_GET_CPU_CHARACTERISTICS, "H_GET_CPU_CHARACTERISTICS"}, \
{H_FREE_LOGICAL_LAN_BUFFER, "H_FREE_LOGICAL_LAN_BUFFER"}, \
{H_QUERY_INT_STATE, "H_QUERY_INT_STATE"}, \
{H_POLL_PENDING,"H_POLL_PENDING"}, \
@@ -125,7 +126,25 @@
{H_COP, "H_COP"}, \
{H_GET_MPP_X,   "H_GET_MPP_X"}, \
{H_SET_MODE,"H_SET_MODE"}, \
-   {H_RTAS,"H_RTAS"}
+   {H_REGISTER_PROC_TBL,   "H_REGISTER_PROC_TBL"}, \
+   {H_QUERY_VAS_CAPABILITIES,  "H_QUERY_VAS_CAPABILITIES"}, \
+   {H_INT_GET_SOURCE_INFO, "H_INT_GET_SOURCE_INFO"}, \
+   {H_INT_SET_SOURCE_CONFIG,   "H_INT_SET_SOURCE_CONFIG"}, \
+   {H_INT_GET_QUEUE_INFO,  "H_INT_GET_QUEUE_INFO"}, \
+   {H_INT_SET_QUEUE_CONFIG,"H_INT_SET_QUEUE_CONFIG"}, \
+   {H_INT_ESB, "H_INT_ESB"}, \
+   {H_INT_RESET,   "H_INT_RESET"}, \
+   {H_RPT_INVALIDATE,  "H_RPT_INVALIDATE"}, \
+   {H_RTAS,"H_RTAS"}, \
+   {H_LOGICAL_MEMOP,   "H_LOGICAL_MEMOP"}, \
+   {H_CAS, "H_CAS"}, \
+   {H_UPDATE_DT,   "H_UPDATE_DT"}, \
+   {H_GET_PERF_COUNTER_INFO,   "H_GET_PERF_COUNTER_INFO"}, \
+   {H_SET_PARTITION_TABLE, "H_SET_PARTITION_TABLE"}, \
+   {H_ENTER_NESTED,"H_ENTER_NESTED"}, \
+   {H_TLB_INVALIDATE,  "H_TLB_INVALIDATE"}, \
+   {H_COPY_TOFROM_GUEST,   "H_COPY_TOFROM_GUEST"}
+
 
 #define kvm_trace_symbol_kvmret \
{RESUME_GUEST,  "RESUME_GUEST"}, \
-- 
2.35.3



[PATCH] arch/*: Disable softirq stacks on PREEMPT_RT.

2022-06-14 Thread Sebastian Andrzej Siewior
PREEMPT_RT preempts softirqs and the current implementation avoids
do_softirq_own_stack() and only uses __do_softirq().

Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and
ensure that do_softirq_own_stack() is not used bwcause it is not
expected.

Signed-off-by: Sebastian Andrzej Siewior 
---

Initially I aimed only for the asm-generic bits and arm since I have
most bits of the port ready. Arnd then suggested to do all arches at
once and here it is.
I tried to keep it minimal in sense that I didn't remove the dedicated
softirq-stacks on parisc or powerpc for instance. That would add another
few ifdefs and I don't know if we manage to get it up and running on
parisc. I do have the missing bits for powerpc however ;)

 arch/arm/kernel/irq.c | 3 ++-
 arch/parisc/kernel/irq.c  | 2 ++
 arch/powerpc/kernel/irq.c | 4 
 arch/s390/include/asm/softirq_stack.h | 3 ++-
 arch/sh/kernel/irq.c  | 2 ++
 arch/sparc/kernel/irq_64.c| 2 ++
 include/asm-generic/softirq_stack.h   | 2 +-
 7 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 5c6f8d11a3ce5..034cb48c9eeb8 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -70,6 +70,7 @@ static void __init init_irq_stacks(void)
}
 }
 
+#ifndef CONFIG_PREEMPT_RT
 static void do_softirq(void *arg)
 {
__do_softirq();
@@ -80,7 +81,7 @@ void do_softirq_own_stack(void)
call_with_stack(do_softirq, NULL,
__this_cpu_read(irq_stack_ptr));
 }
-
+#endif
 #endif
 
 int arch_show_interrupts(struct seq_file *p, int prec)
diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
index 0fe2d79fb123f..eba193bcdab1b 100644
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -480,10 +480,12 @@ static void execute_on_irq_stack(void *func, unsigned 
long param1)
*irq_stack_in_use = 1;
 }
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
execute_on_irq_stack(__do_softirq, 0);
 }
+#endif
 #endif /* CONFIG_IRQSTACKS */
 
 /* ONLY called from entry.S:intr_extint() */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index dd09919c3c668..0822a274a549c 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -611,6 +611,7 @@ static inline void check_stack_overflow(void)
}
 }
 
+#ifndef CONFIG_PREEMPT_RT
 static __always_inline void call_do_softirq(const void *sp)
 {
/* Temporarily switch r1 to sp, call __do_softirq() then restore r1. */
@@ -629,6 +630,7 @@ static __always_inline void call_do_softirq(const void *sp)
   "r11", "r12"
);
 }
+#endif
 
 static __always_inline void call_do_irq(struct pt_regs *regs, void *sp)
 {
@@ -747,10 +749,12 @@ void *mcheckirq_ctx[NR_CPUS] __read_mostly;
 void *softirq_ctx[NR_CPUS] __read_mostly;
 void *hardirq_ctx[NR_CPUS] __read_mostly;
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
+#endif
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
 {
diff --git a/arch/s390/include/asm/softirq_stack.h 
b/arch/s390/include/asm/softirq_stack.h
index fd17f25704bd5..af68d6c1d5840 100644
--- a/arch/s390/include/asm/softirq_stack.h
+++ b/arch/s390/include/asm/softirq_stack.h
@@ -5,9 +5,10 @@
 #include 
 #include 
 
+#ifndef CONFIG_PREEMPT_RT
 static inline void do_softirq_own_stack(void)
 {
call_on_stack(0, S390_lowcore.async_stack, void, __do_softirq);
 }
-
+#endif
 #endif /* __ASM_S390_SOFTIRQ_STACK_H */
diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c
index ef0f0827cf575..2d3eca8fee011 100644
--- a/arch/sh/kernel/irq.c
+++ b/arch/sh/kernel/irq.c
@@ -149,6 +149,7 @@ void irq_ctx_exit(int cpu)
hardirq_ctx[cpu] = NULL;
 }
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
struct thread_info *curctx;
@@ -176,6 +177,7 @@ void do_softirq_own_stack(void)
  "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr"
);
 }
+#endif
 #else
 static inline void handle_one_irq(unsigned int irq)
 {
diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index c8848bb681a11..41fa1be980a33 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -855,6 +855,7 @@ void __irq_entry handler_irq(int pil, struct pt_regs *regs)
set_irq_regs(old_regs);
 }
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
void *orig_sp, *sp = softirq_stack[smp_processor_id()];
@@ -869,6 +870,7 @@ void do_softirq_own_stack(void)
__asm__ __volatile__("mov %0, %%sp"
 : : "r" (orig_sp));
 }
+#endif
 
 #ifdef CONFIG_HOTPLUG_CPU
 void fixup_irqs(void)
diff --git a/include/asm-generic/softirq_stack.h 
b/include/asm-generic/softirq_stack.h
index eceeecf6a5bd8..d3e2d81656e04 100644
--- a/include/asm-generic/softirq_stack.h
+++ b/include/asm-generic/softirq_stack.h

Re: [PATCH] tools/perf/tests: Fix session topology test comparison check

2022-06-14 Thread Arnaldo Carvalho de Melo
Em Tue, Jun 14, 2022 at 07:38:55AM -0700, Ian Rogers escreveu:
> On Fri, Jun 10, 2022 at 7:00 AM Athira Rajeev
>  wrote:
> >
> > commit cfd7092c31ae ("perf test session topology: Fix test to
> > skip the test in guest environment") added check to skip the
> > testcase if the socket_id can't be fetched from topology info.
> > But the condition check uses strncmp which should be changed to
> > !strncmp and to correctly match platform. Patch fixes this
> > condition check.
> >
> > Fixes: cfd7092c31ae ("perf test session topology: Fix test to skip the test 
> > in guest environment")
> > Reported-by: Thomas Richter 
> > Signed-off-by: Athira Rajeev 
> 
> Acked-by: Ian Rogers 

Thanks, applied.

- Arnaldo

 
> Thanks,
> Ian
> 
> > ---
> >  tools/perf/tests/topology.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
> > index d23a9e322ff5..0b4f61b6cc6b 100644
> > --- a/tools/perf/tests/topology.c
> > +++ b/tools/perf/tests/topology.c
> > @@ -115,7 +115,7 @@ static int check_cpu_topology(char *path, struct 
> > perf_cpu_map *map)
> >  * physical_package_id will be set to -1. Hence skip this
> >  * test if physical_package_id returns -1 for cpu from perf_cpu_map.
> >  */
> > -   if (strncmp(session->header.env.arch, "powerpc", 7)) {
> > +   if (!strncmp(session->header.env.arch, "ppc64le", 7)) {
> > if (cpu__get_socket_id(perf_cpu_map__cpu(map, 0)) == -1)
> > return TEST_SKIP;
> > }
> > --
> > 2.35.1
> >

-- 

- Arnaldo


Re: [PATCH 29/36] cpuidle,xenpv: Make more PARAVIRT_XXL noinstr clean

2022-06-14 Thread Srivatsa S. Bhat
com, Arnd Bergmann , ulli.kr...@googlemail.com, 
vgu...@kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfaul
 t.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On 6/8/22 4:27 PM, Peter Zijlstra wrote:
> vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() 
> leaves .noinstr.text section
> vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() 
> leaves .noinstr.text section
> vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to 
> HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Srivatsa S. Bhat (VMware) 


Regards,
Srivatsa
VMware Photon OS

> ---
>  arch/x86/include/asm/paravirt.h  |6 --
>  arch/x86/include/asm/special_insns.h |4 ++--
>  arch/x86/include/asm/xen/hypercall.h |2 +-
>  arch/x86/kernel/paravirt.c   |   14 --
>  arch/x86/xen/enlighten_pv.c  |2 +-
>  arch/x86/xen/irq.c   |2 +-
>  6 files changed, 21 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -168,7 +168,7 @@ static inline void __write_cr4(unsigned
>   PVOP_VCALL1(cpu.write_cr4, x);
>  }
>  
> -static inline void arch_safe_halt(void)
> +static __always_inline void arch_safe_halt(void)
>  {
>   PVOP_VCALL0(irq.safe_halt);
>  }
> @@ -178,7 +178,9 @@ static inline void halt(void)
>   PVOP_VCALL0(irq.halt);
>  }
>  
> -static inline void wbinvd(void)
> +extern noinstr void pv_native_wbinvd(void);
> +
> +static __always_inline void wbinvd(void)
>  {
>   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
>  }
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -115,7 +115,7 @@ static inline void wrpkru(u32 pkru)
>  }
>  #endif
>  
> -static inline void native_wbinvd(void)
> +static __always_inline void native_wbinvd(void)
>  {
>   asm volatile("wbinvd": : :"memory");
>  }
> @@ -179,7 +179,7 @@ static inline void __write_cr4(unsigned
>   native_write_cr4(x);
>  }
>  
> -static inline void wbinvd(void)
> +static __always_inline void wbinvd(void)
>  {
>   native_wbinvd();
>  }
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -382,7 +382,7 @@ MULTI_stack_switch(struct multicall_entr
>  }
>  #endif
>  
> -static inline int
> +static __always_inline int
>  HYPERVISOR_sched_op(int cmd, void *arg)
>  {
>   return _hypercall2(int, sched_op, cmd, arg);
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -233,6 +233,11 @@ static noinstr void pv_native_set_debugr
>   native_set_debugreg(regno, val);
>  }
>  
> +noinstr void pv_native_wbinvd(void)
> +{
> + native_wbinvd();
> +}
> +
>  static noinstr void pv_native_irq_enable(void)
>  {
>   native_irq_enable();
> @@ -242,6 +247,11 @@ static noinstr void pv_native_irq_disabl
>  {
>   native_irq_disable();
>  }
> +
> +static noinstr void pv_native_safe_halt(void)
> +{
> + native_safe_halt();
> +}
>  #endif
>  
>  enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
> @@ -273,7 +283,7 @@ struct paravirt_patch_template pv_ops =
>   .cpu.read_cr0   = native_read_cr0,
>   .cpu.write_cr0  = native_write_cr0,
>   .cpu.write_cr4  = native_write_cr4,
> - .cpu.wbinvd = native_wbinvd,
> + .cpu.wbinvd = pv_native_wbinvd,
>   .cpu.read_msr   = native_read_msr,
>   .cpu.write_msr  = native_write_msr,
>   .cpu.read_msr_safe  = native_read_msr_safe,
> @@ -307,7 +317,7 @@ struct paravirt_patch_template pv_ops =
>   .irq.save_fl= __PV_IS_CALLEE_SAVE(native_save_fl),
>   .irq.irq_disable= __PV_IS_CALLEE_SAVE(pv_native_irq_disable),
>   .irq.irq_enable = __PV_IS_CALLEE_SAVE(pv_native_irq_enable),
> - .irq.safe_halt  = native_safe_halt,

Re: [Pv-drivers] [PATCH 29/36] cpuidle, xenpv: Make more PARAVIRT_XXL noinstr clean

2022-06-14 Thread Nadav Amit
�ۚ�,ڶ*'�+-�X���wZ�*'�� jg��m�i^�j�gz�!��(���z�h��&��j{��w���r��rkۑ� 
���r��rkۑ� ���r���'��*�v)�f���&�yا�   
��W(�G���qz}'��z\^�I�jg��''y�ڎꮉȧq�&�蜝�j;��'"��(�X��7�Ib��l��/���z�ޖ���!�蝭�aj�(���w�v\�h�z
 ��,�)'�^��g�  
޲�b��k�x�u�j�.��첋�q����+���z�,���y�+���}�-z�f���&}�-z�f���&u�ez�&�םzY^�   
�u�Z�'jz%�vu��zY^�Ƨ�X�j�(�蝫ajxg�'bi�&��ڶ��{�v&��h\jX�����&�ƥ�{�i^���w���%zv�z�ޖ��)��^�ƥ�{��ק��+�X��mz{�)��^�ƨ��Ƥz�ޖljG���h�(�X�����&���{�ib��Z�i����G���h�
  b��Z�i����G���h�.u穆�ei��r���מ�%�&�)��n��X���b��f��(� 
b��f��%��l�)��n+��&j)\�k!��ނf���&�)�Ū���zYb��"���u杢�%�{�j��z�ޖX��ȧ~��y�h�!�+3jy�w�r�6���gz��'dz�ޖ�౺2vG���h�b�

�zy�w����x.���z�ޖ��nky�Z��&nky�Z��&r���j)�w�����ṧ�G���h��{�u��+!�)�{��{^��&jW�jw^��b�"�X�����\�iz�^m�r��my���&�y�&�)�ƶ���/�Y^���vIb��kjɮ���z�d��}��jw^�}��jw^���Z)e
朢|"�Y�w������:�k��$ɺ+��,��/�L���%y�&�z�ޖ��z�ޖ��)���$���G���h�  
b��\�Lz�ޖ��>�k���ݮ+ޮ�r���|u�(��'ɫh�'^r���{��zlj�%��l�w�iךv��)���鱪ܖ+-�)߭�^i�+�ǥ��jy�ˬyףi����jyb��b�ץr��i����jyb��b�ץr���wAz�&jyڮwZ�w[u륖)+�Y`��%zf���&�Yb��%�
(�W�j)\�kປZ���zZ+��.�֤z�ޖ!��!�m��#��c��m�*ez�h��z�F�-y�k��^v�(�٢���˯���z�ޖ��˯���z�ޖ��1�a��z왫a��z�y��r*,r��q���蜊w�f�j)�'"��(����u�i���jy�׭��
 
zwZ��Z~�zX��Z�+�zZ+�X��Z�+�zZ+��ڊwں[h�j)�j�m��%�{�jZaj��G���h�
b��Z��Z�zZ+��Z�x.�G���h�!k
ຉ�w���j�u�ޚZ�w�u�ޙ֯zih~�޵֥��%��(�Z�&��&ܢ�z׫f(��ڥ�^�8�~��y�h�㢹ڝ׫��'�)ڮ�+���v�u�첉�v�z�'�)ᥬ�ܢk)j�%�{��zZ+��Z��b��o���z�ޖ��)�Ƹ�r�b��"���u杢�%�{��+�X��ȧ~��y�h�'��Ƚ�轩�x�jz/q���'`z
���(�:'j�(��i�W�z:'j�(��i�W�{+��z+��&j)\�l��)讋rZ���u�k��Z���u�W�
�Τz�ޖ��)�Ū�)�Ɗ�Ib��Z�ib��h��j
���w�����,�G���h���眱^��nj��y�z���v�Z�Y��G���h�
Z�Y��G���h��y׫���w�����^�'$z�ޖ��ןjy+��bj{,�{�v�jb~+-y���&���'���jV��'⢗��+�+-�X���(��(�
)zz��b��%���r�޲�५���f�W��'��(���^�ȟ��ib��mz
ھzZ+�X��^��z�ޖ��zZ+�v��+��G���h��v����r���b��b�ץr���^��^�J%�{��{^��&��"�����zZ+�:h�f�zG���h�nz��j��஋,��r���{-�j'��޺j'���{-�륊{��*l�zZ+�X��z��w���)jY��֛m�mr��jY��֛m�mr���{���Z�z�ޖ��)���j
'�zZ+�+��Z��ۥ�bzz:!jy޲ȩ��n�*'�w���Z�w��*l�[�e�{���z�b��i��^�X���3��좸��+�:%�{���z��w��r�

Re: [PATCH 10/36] cpuidle,omap3: Push RCU-idle into driver

2022-06-14 Thread Tony Lindgren
rndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, 
pv-driv...@vmware.com, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@bra
 infault.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


* Peter Zijlstra  [220608 14:42]:
> Doing RCU-idle outside the driver, only to then teporarily enable it
> again before going idle is daft.

Reviewed-by: Tony Lindgren 
Tested-by: Tony Lindgren 


Re: [PATCH 12/36] cpuidle,omap2: Push RCU-idle into driver

2022-06-14 Thread Tony Lindgren
rndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, 
pv-driv...@vmware.com, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@bra
 infault.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


* Peter Zijlstra  [220608 14:42]:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, some *four* times, before going idle is daft.

Maybe update the subject line with s/omap2/omap4/, other than that:

Reviewed-by: Tony Lindgren 
Tested-by: Tony Lindgren 


Re: [PATCH 34/36] cpuidle,omap3: Push RCU-idle into omap_sram_idle()

2022-06-14 Thread Tony Lindgren
rndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, 
pv-driv...@vmware.com, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@bra
 infault.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


* Peter Zijlstra  [220608 15:00]:
> On Wed, Jun 08, 2022 at 04:27:57PM +0200, Peter Zijlstra wrote:
> > @@ -254,11 +255,18 @@ void omap_sram_idle(void)
> >  */
> > if (save_state)
> > omap34xx_save_context(omap3_arm_context);
> > +
> > +   if (rcuidle)
> > +   cpuidle_rcu_enter();
> > +
> > if (save_state == 1 || save_state == 3)
> > cpu_suspend(save_state, omap34xx_do_sram_idle);
> > else
> > omap34xx_do_sram_idle(save_state);
> >  
> > +   if (rcuidle)
> > +   rcuidle_rcu_exit();
> 
> *sigh* so much for this having been exposed to the robots for >2 days :/

I tested your git branch of these patches, so:

Reviewed-by: Tony Lindgren 
Tested-by: Tony Lindgren 


Re: [PATCH 33/36] cpuidle,omap3: Use WFI for omap3_pm_idle()

2022-06-14 Thread Tony Lindgren
rndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, 
pv-driv...@vmware.com, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@bra
 infault.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


* Peter Zijlstra  [220608 14:52]:
> arch_cpu_idle() is a very simple idle interface and exposes only a
> single idle state and is expected to not require RCU and not do any
> tracing/instrumentation.
> 
> As such, omap_sram_idle() is not a valid implementation. Replace it
> with the simple (shallow) omap3_do_wfi() call. Leaving the more
> complicated idle states for the cpuidle driver.

Acked-by: Tony Lindgren 


[PATCH 34.5/36] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()

2022-06-14 Thread Tony Lindgren
rndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, 
pv-driv...@vmware.com, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@bra
 infault.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


OMAP4 uses full SoC suspend modes as idle states, as such it needs the
whole power-domain and clock-domain code from the idle path.

All that code is not suitable to run with RCU disabled, as such push
RCU-idle deeper still.

Signed-off-by: Tony Lindgren 
---

Peter here's one more for your series, looks like this is needed to avoid
warnings similar to what you did for omap3.

---
 arch/arm/mach-omap2/common.h  |  6 --
 arch/arm/mach-omap2/cpuidle44xx.c |  8 ++--
 arch/arm/mach-omap2/omap-mpuss-lowpower.c | 12 +++-
 arch/arm/mach-omap2/pm44xx.c  |  2 +-
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/arm/mach-omap2/common.h b/arch/arm/mach-omap2/common.h
--- a/arch/arm/mach-omap2/common.h
+++ b/arch/arm/mach-omap2/common.h
@@ -284,11 +284,13 @@ extern u32 omap4_get_cpu1_ns_pa_addr(void);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PM)
 extern int omap4_mpuss_init(void);
-extern int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state);
+extern int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+   bool rcuidle);
 extern int omap4_hotplug_cpu(unsigned int cpu, unsigned int power_state);
 #else
 static inline int omap4_enter_lowpower(unsigned int cpu,
-   unsigned int power_state)
+   unsigned int power_state,
+   bool rcuidle)
 {
cpu_do_idle();
return 0;
diff --git a/arch/arm/mach-omap2/cpuidle44xx.c 
b/arch/arm/mach-omap2/cpuidle44xx.c
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,9 +105,7 @@ static int omap_enter_idle_smp(struct cpuidle_device *dev,
}
raw_spin_unlock_irqrestore(&mpu_lock, flag);
 
-   cpuidle_rcu_enter();
-   omap4_enter_lowpower(dev->cpu, cx->cpu_state);
-   cpuidle_rcu_exit();
+   omap4_enter_lowpower(dev->cpu, cx->cpu_state, true);
 
raw_spin_lock_irqsave(&mpu_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -186,10 +184,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
*dev,
}
}
 
-   cpuidle_rcu_enter();
-   omap4_enter_lowpower(dev->cpu, cx->cpu_state);
+   omap4_enter_lowpower(dev->cpu, cx->cpu_state, true);
cpu_done[dev->cpu] = true;
-   cpuidle_rcu_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
diff --git a/arch/arm/mach-omap2/omap-mpuss-lowpower.c 
b/arch/arm/mach-omap2/omap-mpuss-lowpower.c
--- a/arch/arm/mach-omap2/omap-mpuss-lowpower.c
+++ b/arch/arm/mach-omap2/omap-mpuss-lowpower.c
@@ -33,6 +33,7 @@
  * and first to wake-up when MPUSS low power states are excercised
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -214,6 +215,7 @@ static void __init save_l2x0_context(void)
  * of OMAP4 MPUSS subsystem
  * @cpu : CPU ID
  * @power_state: Low power state.
+ * @rcuidle: RCU needs to be idled
  *
  * MPUSS states for the context save:
  * save_state =
@@ -222,7 +224,8 @@ static void __init save_l2x0_context(void)
  * 2 - CPUx L1 and logic lost + GIC lost: MPUSS OSWR
  * 3 - CPUx L1 and logic lost + GIC + L2 lost: DEVICE OFF
  */
-int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state)
+int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+bool rcuidle)
 {
struct omap4_cpu_pm_info *pm_info = &per_cpu(omap4_pm_info, cpu);
unsigned int save_state = 0, cpu_logic_state = PWRDM_POWER_RET;
@@ -268,6 +271,10 @@ int omap4_enter_lowpower(unsigned int cpu, unsigned int 
power_state)
cpu_clear_prev_logic_pwrst(cpu);

Re: ppc64le bzImage and Build_id elf note

2022-06-14 Thread Donald Zickus
On Wed, Jun 8, 2022 at 7:52 AM Michael Ellerman  wrote:

> Donald Zickus  writes:
> > Hi Michael,
> >
> > I am working on two packaging issues with Fedora and CKI that I am hoping
> > you can give me some guidance on.
> >
> > 1 - Fedora has always packaged an eu-strip'd vmlinux file for powerpc.
> The
> > other arches we support used native compressed images.  I was looking
> into
> > using powerpc's zImage (pseries) binary to remove the powerpc workarounds
> > in our rpm spec file.
>
> What's the motivation for using the zImage?
>
> My naive hope was that as more advanced boot loaders become the norm we
> could eventually get rid of the zImage.
>
> It's generally a pain to work with, and a bit crufty, it also doesn't
> get as much testing as booting the vmlinux, so I'd be a little wary of
> switching to it.
>

My motivation really is to remove some of manual stripping out of our
specfile and use something more community builtin and address my second
issue.

If zImage isn't the right path to pursue, that is fine by me.  Perhaps I
just want vmlinux.stripped.


> There's also multiple zImages (and others), although admittedly for the
> platforms that Fedora supports the zImage.pseries should work (I think).
>
> > However, the rpmbuild fails because it can't find a build-id with
> > eu-readelf -n zImage.  Sure enough the build-id is found in vmlinux and
> > vmlinux.stripped but disappears with vmlinux.stripped.gz.
>
> Looks like other arches use objcopy rather than strip, maybe that's it?
>
> > I had hoped
> > arch/powerpc/boot/addnote would stick it back in but it doesn't (I am
> > ignorant of how addnote works).
>
> addnote adds some notes that firmware needs to read, it doesn't do
> anything else, though maybe it could.
>
> > eu-readelf -n  data
> > vmlinux:
> >
> > Displaying notes found in: .notes
> >   OwnerData sizeDescription
> >   GNU  0x0014   NT_GNU_BUILD_ID (unique build ID
> > bitstring)
> > Build ID: b4c026d72ead7b4316a221cddb7f2b10d75fb313
> >   Linux0x0004   func
> >description data: 00 00 00 00
> >   Linux0x0001   OPEN
> >description data: 00
> >   PowerPC  0x0004   NT_VERSION (version)
> >description data: 01 00 00 00
> >
> > zImage:
> >
> > Displaying notes found at file offset 0x0158 with length 0x002c:
> >   OwnerData sizeDescription
> >   PowerPC  0x0018   Unknown note type: (0x1275)
> >description data: ff ff ff ff 02 00 00 00 ff ff ff ff ff ff ff ff ff
> ff
> > ff ff 00 00 40 00
> >
> > Displaying notes found at file offset 0x0184 with length 0x0044:
> >   OwnerData sizeDescription
> >   IBM,RPA-Client-[...] 0x0020   Unknown note type: (0x1275)
> >description data: 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 28 00
> 00
> > 00 01 ff ff ff ff 00 00 00 00 00 00 00 01
> >
> > Is this something that can be addressed?  Or should I/we expect the
> > build-id to never make it into the zImage and just continue with our
> > current vmlinux process?
>
> Maybe :)
>
> Is it correct for the build-id to be copied into the zImage anyway? It's
> a different binary so shouldn't it have a different build-id?
>
> If you have a zImage and a vmlinux with the same build-id isn't that
> going to confuse debugging tools?
>

My understanding is the debug tools use vmlinux anyway (at least for
Fedora) which is stored in our -debuginfo package.  While the booted image
(compressed) is stored in the traditional kernel rpm.  My understanding is
the build is the same whether compressed or not so having a similar
build-id made sense.  But I am a little ignorant here.



> > 2 - CKI builds kernels using 'make targz-pkg'.  The arches we support
> > (x86_64, s390, aarch64) provide compressed binaries to package using
> > KBUILD_IMAGE or a specific entry in scripts/package/buildtar.  As a
> result,
> > because powerpc doesn't have a KBUILD_IMAGE variable defined, the script
> > builds vmlinx and cp's that to vmlinux-kbuild.  The problem with powerpc
> is
> > that vmlinux for us is huge ( >256MB) and cp'ing that to vmlinux-kbuild
> > occupies > 512MB of /boot and our distro runs out of disk space on that
> > partition.
>
> Is that just because it has debug info built in? I thought the distro
> solution for that was doing split debug info?
>

Yes, our rpm specfile stores vmlinux in -debuginfo and strips vmlinux to
store in the normal kernel rpm.

But our kernel CI service, CKI uses the upstream's 'make targz-pkg' for
easier consumption by the upstream community that may not use rpms.  That
Makefile target doesn't have any stripping.  I was hoping to add that.


>
> > I was hoping to add a patch to arch/powerpc/Makefile that defines
> > KBUILD_IMAGE:=$(boot)/zImage (mimicing arch/s390), which I believe would
> > solve our problem.  However, that circles back to our first problem
> above.

Re: [RFC PATCH 0/6] User pkey minor bug fixes

2022-06-14 Thread Sohil Mehta

On 6/10/2022 4:35 PM, ira.we...@intel.com wrote:



glibc says it returns ENOSYS if the system does not support pkeys but I don't
see where ENOSYS is returned?  AFAICS it just returns what the kernel returns.
So it is probably up to user of glibc.



Implementation of the pkeys system calls is arch specific and 
conditional. See kernel/sys_ni.c


glibc is probably talking about ENOSYS being returned when the 
architecture doesn't have support or the CONFIG option is disabled on 
supported architectures.


Thanks,
Sohil


Re: [RFC PATCH 1/6] testing/pkeys: Add command line options

2022-06-14 Thread Sohil Mehta

On 6/10/2022 4:35 PM, ira.we...@intel.com wrote:


Add command line options for debug level and number of iterations.

$ ./protection_keys_64 -h
Usage: ./protection_keys_64 [-h,-d,-i ]
 --help,-h   This help
--debug,-d  Increase debug level for each -d


Is this mechanism (of counting d's) commonplace in other selftests as 
well? Looking at the test code for pkeys the debug levels run from 1-5. 
That feels like quite a few d's to input :)


Would it be easier to input the number in the command line directly?

Either way it would be useful to know the debug range in the help.
Maybe something like:
--debug,-d  Increase debug level for each -d (1-5)

The patch seems fine to me otherwise.


--iterations,-i   repeate test  times
default: 22



Thanks,
Sohil


Re: [RFC PATCH 2/6] testing/pkeys: Don't use uninitialized variable

2022-06-14 Thread Sohil Mehta

On 6/10/2022 4:35 PM, ira.we...@intel.com wrote:

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index d0183c381859..43e47de19c0d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1225,9 +1225,9 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
int new_pkey;
dprintf1("%s() alloc loop: %d\n", __func__, i);
new_pkey = alloc_pkey();
-   dprintf4("%s()::%d, err: %d pkey_reg: 0x%016llx"
+   dprintf4("%s()::%d, errno: %d pkey_reg: 0x%016llx"


What is errno referring to over here? There are a few things happening 
in alloc_pkey(). I guess it would show the latest error that happened. 
Does errno need to be set to 0 before the call?


Also, would it be useful to print the return value (new_pkey) from 
alloc_pkey() here?



" shadow: 0x%016llx\n",
-   __func__, __LINE__, err, __read_pkey_reg(),
+   __func__, __LINE__, errno, __read_pkey_reg(),
shadow_pkey_reg);
read_pkey_reg(); /* for shadow checking */
dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, 
ENOSPC);


Sohil


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-14 Thread Jarkko Sakkinen
l...@kernel.org>, Masahiro Yamada , Jarkko Sakkinen 
, Sami Tolvanen , "Naveen N. Rao" 
, Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , "Russell King \(Oracle\)" 
, Mark Brown , Borislav Petkov 
, Alexander Egorenkov , Thomas 
Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , linux-riscv 
, the arch/x86 maintainers , 
Russell King , Ingo Molnar , Aaron 
Tomlin , Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 09, 2022 at 03:23:16PM +0200, Ard Biesheuvel wrote:
> On Thu, 9 Jun 2022 at 15:14, Jarkko Sakkinen  wrote:
> >
> > On Wed, Jun 08, 2022 at 09:12:34AM -0700, Song Liu wrote:
> > > On Wed, Jun 8, 2022 at 7:21 AM Masami Hiramatsu  
> > > wrote:
> > > >
> > > > Hi Jarkko,
> > > >
> > > > On Wed, 8 Jun 2022 08:25:38 +0300
> > > > Jarkko Sakkinen  wrote:
> > > >
> > > > > On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> > > > > > .
> > > > > >
> > > > > > On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  
> > > > > > wrote:
> > > > > > >
> > > > > > > Tracing with kprobes while running a monolithic kernel is 
> > > > > > > currently
> > > > > > > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES. 
> > > > > > >  This
> > > > > > > dependency is a result of kprobes code using the module allocator 
> > > > > > > for the
> > > > > > > trampoline code.
> > > > > > >
> > > > > > > Detaching kprobes from modules helps to squeeze down the user 
> > > > > > > space,
> > > > > > > e.g. when developing new core kernel features, while still having 
> > > > > > > all
> > > > > > > the nice tracing capabilities.
> > > > > > >
> > > > > > > For kernel/ and arch/*, move module_alloc() and module_memfree() 
> > > > > > > to
> > > > > > > module_alloc.c, and compile as part of vmlinux when either 
> > > > > > > CONFIG_MODULES
> > > > > > > or CONFIG_KPROBES is enabled.  In addition, flag kernel module 
> > > > > > > specific
> > > > > > > code with CONFIG_MODULES.
> > > > > > >
> > > > > > > As the result, kprobes can be used with a monolithic kernel.
> > > > > > It's strange when MODULES is n, but vmlinux still obtains 
> > > > > > module_alloc.
> > > > > >
> > > > > > Maybe we need a kprobe_alloc, right?
> > > > >
> > > > > Perhaps not the best name but at least it documents the fact that
> > > > > they use the same allocator.
> > > > >
> > > > > Few years ago I carved up something "half-way there" for kprobes,
> > > > > and I used the name text_alloc() [*].
> > > > >
> > > > > [*] 
> > > > > https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
> > > >
> > > > Yeah, I remember that. Thank you for updating your patch!
> > > > I think the idea (split module_alloc() from CONFIG_MODULE) is good to 
> > > > me.
> > > > If module support maintainers think this name is not good, you may be
> > > > able to rename it as text_alloc() and make the module_alloc() as a
> > > > wrapper of it.
> > >
> > > IIUC, most users of module_alloc() use it to allocate memory for text, 
> > > except
> > > that module code uses it for both text and data. Therefore, I guess 
> > > calling it
> > > text_alloc() is not 100% accurate until we change the module code (to use
> > > a different API to allocate memory for data).
> >
> > After reading the feedback, I'd stay on using module_alloc() because
> > it has arch-specific quirks baked in. Easier to deal with them in one
> > place.
> >
> 
> In that case, please ensure that you enable this only on architectures
> where it is needed. arm64 implements alloc_insn_page() without relying
> on module_alloc() so I would not expect to see any changes there.

Right, got it, thanks for remark.

BR, Jarkko


Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> Hi All! (omg so many)

Hi Peter,

Sorry for the delay; my plate has also been rather full recently. I'm beginning
to page this in now.

> These here few patches mostly clear out the utter mess that is cpuidle vs 
> rcuidle.
> 
> At the end of the ride there's only 2 real RCU_NONIDLE() users left
> 
>   arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());
>   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> PERF_EF_RELOAD));

The latter of these is necessary because apparently PM notifiers are called
with RCU not watching. Is that still the case today (or at the end of this
series)? If so, that feels like fertile land for more issues (yaey...). If not,
we should be able to drop this.

I can go dig into that some more.

>   kernel/cfi.c:   RCU_NONIDLE({
> 
> (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full
> of trace_.*_rcuidle() left:
> 
>   kernel/trace/trace_preemptirq.c:
> trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
>   kernel/trace/trace_preemptirq.c:
> trace_preempt_enable_rcuidle(a0, a1);
>   kernel/trace/trace_preemptirq.c:
> trace_preempt_disable_rcuidle(a0, a1);
> 
> All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.

I think those are also unused on arm64 too?

If not, I can go attack that.

> I've touched a _lot_ of code that I can't test and likely broken some of it :/
> In particular, the whole ARM cpuidle stuff was quite involved with OMAP being
> the absolute 'winner'.
> 
> I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to
> GENERIC_ENTRY.

Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
(refactoring both arm64 and the generic portion to be more amenable to each
other), but we can certainly move closer to that for the bits that matter here.

Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that
we can select regardless of GENERIC_ENTRY to make that easier.

> I've also got a note that says ARM64 can probably do a WFE based
> idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.

Possibly; I'm not sure how much of a win that'll be given that by default we'll
have a ~10KHz WFE wakeup from the timer, but we could take a peek.

Thanks,
Mark.


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-14 Thread Jarkko Sakkinen
@kernel.org>, Masahiro Yamada , Jarkko Sakkinen 
, Sami Tolvanen , "Naveen N. Rao" 
, Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , Mark Brown , Borislav 
Petkov , Alexander Egorenkov , Thomas 
Bogendoerfer , linux-par...@vger.kernel.org, 
Nathaniel McCallum , Dmitry Torokhov 
, "David S. Miller" , "Kirill 
A. Shutemov" , Tobias Huschle 
, "Peter Zijlstra \(Intel\)" , "H. 
Peter Anvin" , sparcli...@vger.kernel.org, Tiezhu Yang 
, Miroslav Benes , Chen Zhongjin 
, Ard Biesheuvel , 
 x...@kernel.org, "Russell King \(Oracle\)" , 
linux-ri...@lists.infradead.org, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , linux-m...@vger.kernel.org, Changbin Du 
, Palmer Dabbelt , 
linuxppc-dev@lists.ozlabs.org, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 09, 2022 at 06:44:45AM -0700, Luis Chamberlain wrote:
> On Thu, Jun 09, 2022 at 08:47:38AM +0100, Russell King (Oracle) wrote:
> > On Wed, Jun 08, 2022 at 02:59:27AM +0300, Jarkko Sakkinen wrote:
> > > diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
> > > index 553866751e1a..d2bb954cd54f 100644
> > > --- a/arch/arm/kernel/Makefile
> > > +++ b/arch/arm/kernel/Makefile
> > > @@ -44,6 +44,11 @@ obj-$(CONFIG_CPU_IDLE) += cpuidle.o
> > >  obj-$(CONFIG_ISA_DMA_API)+= dma.o
> > >  obj-$(CONFIG_FIQ)+= fiq.o fiqasm.o
> > >  obj-$(CONFIG_MODULES)+= armksyms.o module.o
> > > +ifeq ($(CONFIG_MODULES),y)
> > > +obj-y+= module_alloc.o
> > > +else
> > > +obj-$(CONFIG_KPROBES)+= module_alloc.o
> > > +endif
> > 
> > Doesn't:
> > 
> > obj-$(CONFIG_MODULES)   += module_alloc.o
> > obj-$(CONFIG_KPROBES)   += module_alloc.o
> 
> That just begs for a new kconfig symbol for the object, and for
> the object then to be built with it.
> 
> The archs which override the default can use ARCH_HAS_VM_ALLOC_EXEC.
> Please note that the respective free is important as well and its
> not clear if we need an another define for the free. Someone has
> to do that work. We want to ensure to noexec the code on free and
> this can vary on each arch.

Let me check if I understand this (not 100% sure).

So if arch define ARCH_HAS_VMALLOC_EXEC, then this would set
config flag CONFIG_VMALLOC_EXEC, which would be used to include
the compilation unit?

BR, Jarkko


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-14 Thread Jarkko Sakkinen
kkinen , Sami Tolvanen , "Naveen 
N. Rao" , Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , "Russell King \(Oracle\)" 
, Mark Brown , Borislav Petkov 
, Alexander Egorenkov , Thomas 
Bogendoerfer , linux-par...@vger.kernel.org, 
Nathaniel McCallum , Dmitry Torokhov 
, "David S. Miller" , "Kirill 
A. Shutemov" , Tobias Huschle 
, "Peter Zijlstra \(Intel\)" , "H. 
Peter Anvin" , sparcli...@vger.kernel.org, Tiezhu Yang 
, Miroslav Benes , Chen Zhongjin 
, Ard Biesheuvel , X86 ML 
 , Russell King , 
linux-ri...@lists.infradead.org, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , linux-m...@vger.kernel.org, Changbin Du 
, Palmer Dabbelt , 
linuxppc-dev@lists.ozlabs.org, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Sun, Jun 12, 2022 at 09:30:41PM +0900, Masami Hiramatsu wrote:
> On Wed, 8 Jun 2022 11:19:19 -0700
> Song Liu  wrote:
> 
> > On Wed, Jun 8, 2022 at 9:28 AM Ard Biesheuvel  wrote:
> > >
> > > Hello Jarkko,
> > >
> > > On Wed, 8 Jun 2022 at 02:02, Jarkko Sakkinen  wrote:
> > > >
> > > > Tracing with kprobes while running a monolithic kernel is currently
> > > > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> > > > dependency is a result of kprobes code using the module allocator for 
> > > > the
> > > > trampoline code.
> > > >
> > > > Detaching kprobes from modules helps to squeeze down the user space,
> > > > e.g. when developing new core kernel features, while still having all
> > > > the nice tracing capabilities.
> > > >
> > > > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > > > module_alloc.c, and compile as part of vmlinux when either 
> > > > CONFIG_MODULES
> > > > or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> > > > code with CONFIG_MODULES.
> > > >
> > > > As the result, kprobes can be used with a monolithic kernel.
> > >
> > > I think I may have mentioned this the previous time as well, but I
> > > don't think this is the right approach.
> > >
> > > Kprobes uses alloc_insn_page() to allocate executable memory, but the
> > > requirements for this memory are radically different compared to
> > > loadable modules, which need to be within an arch-specific distance of
> > > the core kernel, need KASAN backing etc etc.
> > 
> > I think the distance of core kernel requirement is the same for kprobe
> > alloc_insn_page and modules, no?
> 
> This strongly depends on how kprobes (software breakpoint and
> single-step) is implemented on the arch. For example, x86 implements
> the so-called "kprobe-booster" which jumps back from the single
> stepping trampoline buffer. Then the buffer address must be within
> the range where it can jump to the original address.
> However, if the arch implements single-step as an instruction
> emulation, it has no such limitation. As far as I know, arm64
> will do emulation for the instructions which change PC register
> and will do direct execution with another software breakpoint
> for other instructions.
> 
> Why I'm using module_alloc() for a generic function, is that
> can cover the limitation most widely.
> Thus, if we have CONFIG_ARCH_HAVE_ALLOC_INSN_PAGE flag and
> kprobes can check it instead of using __weak function, the
> kprobes may not need to depend on module_alloc() in general.

OK, I guess this is what Luis meant. 

I'll try to carve up something based on this.

BR, Jarkko


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-14 Thread jar...@kernel.org
luxnic.net" , "ebied...@xmission.com" 
, "aneesh.ku...@linux.ibm.com" 
, "bris...@redhat.com" , 
"wangkefeng.w...@huawei.com" , "ker...@esmil.dk" 
, "jniet...@gmail.com" , 
"paul.walms...@sifive.com" , "a...@kernel.org" 
, "w...@kernel.org" , "masahi...@kernel.org" 
, "Sakkinen, Jarkko" , 
"samitolva...@google.com" , 
"naveen.n@linux.ibm.com" , "el...@google.com" 
, "keesc...@chromium.org" , 
"rost...@goodmis.org" , "nat...@kernel.org" 
, "rmk+ker...@armlinux.org.uk" , 
"broo...@kernel.org" , "b...@alien8.de" , 
"egore...@linux.ibm.com" , "tsbog...@alpha.franken.de" , 
"linux-par...@vger.kernel.org" , 
"nathan...@profian.com" , "dmitry.torok...@gmail.com" 
, "da...@davemloft.net" , 
"kirill.shute...@linux.intel.com" , 
"husc...@linux.ibm.com" , "pet...@infradead.org" 
, "h...@zytor.com" , 
"sparcli...@vger.kernel.org" , 
"yangtie...@loongson.cn" , "mbe...@suse.cz" 
, "chenzhong...@huawei.com" , 
"a...@kernel.org" , "x...@kernel.org" , 
"li...@armlinux.org.uk" , 
"linux-ri...@lists.infradead.org" , 
"mi...@redhat.com" , "atom...@redhat.com" 
, "a...@eecs.berkeley.edu" , 
 "h...@linux.ibm.com" , "liaochang1@
huawei.com" , "ati...@atishpatra.org" 
, "jpoim...@kernel.org" , 
"tmri...@linux.ibm.com" , "linux-m...@vger.kernel.org" 
, "changbin...@intel.com" , 
"pal...@dabbelt.com" , "linuxppc-dev@lists.ozlabs.org" 
, "linux-modu...@vger.kernel.org" 

Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 09, 2022 at 06:41:36PM +, Edgecombe, Rick P wrote:
> On Thu, 2022-06-09 at 06:24 -0700, Luis Chamberlain wrote:
> > On Thu, Jun 09, 2022 at 05:48:52AM +0200, Christoph Hellwig wrote:
> > > On Wed, Jun 08, 2022 at 01:26:19PM -0700, Luis Chamberlain wrote:
> > > > No, that was removed because it has only one user.
> > > 
> > > That is only part of the story.  The other part is that the overall
> > > kernel simply does not have any business allocating exutable
> > > memory.
> > > Executable memory is a very special concept for modules or module-
> > > like
> > > code like kprobes, and should not be exposed as a general concept.
> > 
> > It is not just modules and kprobes, it is also ftrace and bpf too
> > now.
> > So while it should not be used everywhere calling it module_alloc()
> > is just confusing at this point. Likewise, module_alloc_huge() is
> > being proposed too and I'd rather we deal with this properly in
> > aligment
> > of taking care of the rename as well.
> > 
> > If the concern is to restrict access we can use the module namespace
> > stuff
> > so to ensure only intended users get access to it.
> 
> BPF even has multiple uses for text allocation. It has its own
> trampoline feature that puts different type of text in the allocation,
> with its own allocation routine. I looks like there are even more
> little allocators in there.
> 
> So yea, there seems to be a lot of the kernel in the business of
> dynamically generated text, for better or worse. I agree that it needs
> to be done carefully. However, these usages always seem to have the
> same problems (W^X, arch eccentricities, etc). So I don't think we
> should hide away the pieces. Instead we should have something with
> guard rails on it, so they can't get the allocation part wrong.
> 
> But I guess the question here is: what should we do in the meantime? It
> is kind of similar to the questions that came up around the bpf prog
> pack allocator. Should we hold up allocator related work until
> underlying problems are resolved and there is some mature core
> solution?
> 
> Personally I had thought we would need to do some clean switch to a
> much different interface. I still think someday it will be required,
> but it seems to be evolving naturally for the time being.
> 
> Like say for a next step we moved prog pack out of bpf into core code,
> gave it it's own copy of module_alloc(), and then made kprobes use it.
> Then we would have something with improved W^X guard rails, and kprobes
> would not depend on modules anymore. I think maybe it's a step in the
> right direction, even if it's not perfect.

So you're saying that I should (as a first step) basically clone
module_alloc() implementation for kprobes, and future for BPF 
use, in order to get a clean starting point?

BR, Jarkko


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-14 Thread Christophe Leroy
�   ��W(�
b�ڶ���+!��b�Z%{�J֫��"��ڲ��w����YCy�(���G���h���b����Z���*��G���h�Z�I(I�$�w���䒊k���&I��N�ojw��ojw���
 ��(�֯y��E�'j�zڢX�����&1�ܠIoz࢈%y�&)�
�$��r�$r�+�����ޝ,��m��-y�`��f��+�֭��…
��zYh�vz�ޖ��1���0���xzZ+�+��Z��ۥ�bzx^ũ�z� 
�ނ�ޝ��{�d�f������jZai�ڞG�u�b��ij��r��G���h�   
b��ij��r��G���h�Z����Lp&���j�Z�'�����ܢ`���N��/vh��+h��!��&j)\�`ھ'R2)ez�Z��j���J��צ��"�)e��zj/�)��)�zW(�:����!��n��ezX�����&=�^�kh��zZ^���w�iךv���z�'�)ᥬ�ܢk)j�%�{��zZ+��Z��b��o���z�ޖ��N'����
��b{8n��'��'rs"��%j�^��&m�޲ˬy�…�ن��)܅���)Ṭ��&�A��!z�ޕ��nG���h�|�G���h�
|�G���h�n�ǥ��rW��{�j���{���.�X��l�w�iךv��)�Ƹ�r�b��"���u杢��

�Yڮhv�r��j�'N���v��ib�\�`%m��:樹眱^��n褠&��ק��)��&�r���&�r��Xnk�r��j�VjY���j�pjY��쬉�y�&&�!>��n��~:h�f�zG���h��f�F'!��횸���b��bng(�X��z��w���%�{��*l�zZ+�(Z��;n)ݺ)�zW(i�ޖ�Z�g�u��z[\�ib��i��^�X���3��좸%�{���z�b��(�V)��j�W��zZ+�X���v�^���G���h�+���N�b��i��^��.�Ǭ�܆+�

Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
>   * to avoid a deep idle transition as we are about to get the
>   * broadcast IPI right away.
>   */
> -int tick_check_broadcast_expired(void)
> +noinstr int tick_check_broadcast_expired(void)
>  {
> +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> + return arch_test_bit(smp_processor_id(), 
> cpumask_bits(tick_broadcast_force_mask));
> +#else
>   return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> +#endif
>  }

This is somewhat not-ideal. :/

Could we unconditionally do the arch_test_bit() variant, with a comment, or
does that not exist in some cases?

Thanks,
Mark.


Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-14 Thread Steven Rostedt
.com, yury.no...@gmail.com, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, mon...@monstr.eu, r...@vger.kernel.org, 
b...@alien8.de, bc...@quicinc.com, tsbog...@alpha.franken.de, 
linux-par...@vger.kernel.org, sudeep.ho...@arm.com, shawn...@kernel.org, 
da...@davemloft.net, dal...@libc.org, Peter Zijlstra , 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Arnd Bergmann , rich...@nod.at, 
x...@kernel.org, li...@armlinux.org.uk, mi...@redhat.com, 
a...@eecs.berkeley.edu, paul...@kernel.org, h...@linux.ibm.com, 
stefan.kristians...@saunalahti.fi, openr...@lists.librecores.org, 
paul.walms...@sifive.com, linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
pv-driv...@vmware.com, linux-m...@vger.kernel.org, palmer
 @dabbelt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, 9 Jun 2022 15:02:20 +0200
Petr Mladek  wrote:

> > I'm somewhat curious whether we can actually remove that trace event.  
> 
> Good question.
> 
> Well, I think that it might be useful. It allows to see trace and
> printk messages together.

Yes people still use it. I was just asked about it at Kernel Recipes. That
is, someone wanted printk mixed in with the tracing, and I told them about
this event (which they didn't know about but was happy to hear that it
existed).

-- Steve


Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> All callers should still have RCU enabled.

IIUC with that true we should be able to drop the RCU_NONIDLE() from
drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
notifier.

I should be able to give that a spin on some hardware.

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/cpu_pm.c |9 -
>  1 file changed, 9 deletions(-)
> 
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
>  {
>   int ret;
>  
> - /*
> -  * This introduces a RCU read critical section, which could be
> -  * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> -  * this.
> -  */
> - rcu_irq_enter_irqson();
>   rcu_read_lock();
>   ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
>   rcu_read_unlock();
> - rcu_irq_exit_irqson();

To make this easier to debug, is it worth adding an assertion that RCU is
watching here? e.g.

RCU_LOCKDEP_WARN(!rcu_is_watching(),
 "cpu_pm_notify() used illegally from EQS");

>  
>   return notifier_to_errno(ret);
>  }
> @@ -49,11 +42,9 @@ static int cpu_pm_notify_robust(enum cpu
>   unsigned long flags;
>   int ret;
>  
> - rcu_irq_enter_irqson();
>   raw_spin_lock_irqsave(&cpu_pm_notifier.lock, flags);
>   ret = raw_notifier_call_chain_robust(&cpu_pm_notifier.chain, event_up, 
> event_down, NULL);
>   raw_spin_unlock_irqrestore(&cpu_pm_notifier.lock, flags);
> - rcu_irq_exit_irqson();


... and likewise here?

Thanks,
Mark.

>  
>   return notifier_to_errno(ret);
>  }
> 
> 


Re: [PATCH 16/36] rcu: Fix rcu_idle_exit()

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:39PM +0200, Peter Zijlstra wrote:
> Current rcu_idle_exit() is terminally broken because it uses
> local_irq_{save,restore}(), which are traced which uses RCU.
> 
> However, now that all the callers are sure to have IRQs disabled, we
> can remove these calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Acked-by: Paul E. McKenney 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/rcu/tree.c |9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u
>   * If you add or remove a call to rcu_idle_enter(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_enter(void)
> +void noinstr rcu_idle_enter(void)
>  {
>   lockdep_assert_irqs_disabled();
>   rcu_eqs_enter(false);
> @@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us
>   * If you add or remove a call to rcu_idle_exit(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
>  {
> - unsigned long flags;
> -
> - local_irq_save(flags);
> + lockdep_assert_irqs_disabled();
>   rcu_eqs_exit(false);
> - local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_exit);
>  
> 
> 


Re: [PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:43PM +0200, Peter Zijlstra wrote:
> Current arch_cpu_idle() is called with IRQs disabled, but will return
> with IRQs enabled.
> 
> However, the very first thing the generic code does after calling
> arch_cpu_idle() is raw_local_irq_disable(). This means that
> architectures that can idle with IRQs disabled end up doing a
> pointless 'enable-disable' dance.
> 
> Therefore, push this IRQ disabling into the idle function, meaning
> that those architectures can avoid the pointless IRQ state flipping.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Nice!

  Acked-by: Mark Rutland  [arm64]

Mark.

> ---
>  arch/alpha/kernel/process.c  |1 -
>  arch/arc/kernel/process.c|3 +++
>  arch/arm/kernel/process.c|1 -
>  arch/arm/mach-gemini/board-dt.c  |3 ++-
>  arch/arm64/kernel/idle.c |1 -
>  arch/csky/kernel/process.c   |1 -
>  arch/csky/kernel/smp.c   |2 +-
>  arch/hexagon/kernel/process.c|1 -
>  arch/ia64/kernel/process.c   |1 +
>  arch/microblaze/kernel/process.c |1 -
>  arch/mips/kernel/idle.c  |8 +++-
>  arch/nios2/kernel/process.c  |1 -
>  arch/openrisc/kernel/process.c   |1 +
>  arch/parisc/kernel/process.c |2 --
>  arch/powerpc/kernel/idle.c   |5 ++---
>  arch/riscv/kernel/process.c  |1 -
>  arch/s390/kernel/idle.c  |1 -
>  arch/sh/kernel/idle.c|1 +
>  arch/sparc/kernel/leon_pmc.c |4 
>  arch/sparc/kernel/process_32.c   |1 -
>  arch/sparc/kernel/process_64.c   |3 ++-
>  arch/um/kernel/process.c |1 -
>  arch/x86/coco/tdx/tdx.c  |3 +++
>  arch/x86/kernel/process.c|   15 ---
>  arch/xtensa/kernel/process.c |1 +
>  kernel/sched/idle.c  |2 --
>  26 files changed, 28 insertions(+), 37 deletions(-)
> 
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
>  void arch_cpu_idle(void)
>  {
>   wtint(0);
> - raw_local_irq_enable();
>  }
>  
>  void arch_cpu_idle_dead(void)
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -114,6 +114,8 @@ void arch_cpu_idle(void)
>   "sleep %0   \n"
>   :
>   :"I"(arg)); /* can't be "r" has to be embedded const */
> +
> + raw_local_irq_disable();
>  }
>  
>  #else/* ARC700 */
> @@ -122,6 +124,7 @@ void arch_cpu_idle(void)
>  {
>   /* sleep, but enable both set E1/E2 (levels of interrupts) before 
> committing */
>   __asm__ __volatile__("sleep 0x3 \n");
> + raw_local_irq_disable();
>  }
>  
>  #endif
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -78,7 +78,6 @@ void arch_cpu_idle(void)
>   arm_pm_idle();
>   else
>   cpu_do_idle();
> - raw_local_irq_enable();
>  }
>  
>  void arch_cpu_idle_prepare(void)
> --- a/arch/arm/mach-gemini/board-dt.c
> +++ b/arch/arm/mach-gemini/board-dt.c
> @@ -42,8 +42,9 @@ static void gemini_idle(void)
>*/
>  
>   /* FIXME: Enabling interrupts here is racy! */
> - local_irq_enable();
> + raw_local_irq_enable();
>   cpu_do_idle();
> + raw_local_irq_disable();
>  }
>  
>  static void __init gemini_init_machine(void)
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
>* tricks
>*/
>   cpu_do_idle();
> - raw_local_irq_enable();
>  }
> --- a/arch/csky/kernel/process.c
> +++ b/arch/csky/kernel/process.c
> @@ -101,6 +101,5 @@ void arch_cpu_idle(void)
>  #ifdef CONFIG_CPU_PM_STOP
>   asm volatile("stop\n");
>  #endif
> - raw_local_irq_enable();
>  }
>  #endi

Re: [PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, m...@kernel.org, yury.no...@gmail.com, rich...@nod.at, 
x...@kernel.org, li...@armlinux.org.uk, mi...@redhat.com, 
a...@eecs.berkeley.edu, paul...@kernel.org, h...@linux.ibm.com, 
stefan.kristians...@saunalahti.fi, openr...@lists.librecores.org, 
paul.walms...@sifive.com, linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dabbelt.c
 om, a...@brainfault.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:46PM +0200, Peter Zijlstra wrote:
> Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as
> normal interrupts") this function is called in regular IRQ context.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

[adding Marc since he authored that commit]

Makes sense to me:

  Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kernel/smp.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr)
>   unsigned int cpu = smp_processor_id();
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_entry_rcuidle(ipi_types[ipinr]);
> + trace_ipi_entry(ipi_types[ipinr]);
>  
>   switch (ipinr) {
>   case IPI_RESCHEDULE:
> @@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr)
>   }
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_exit_rcuidle(ipi_types[ipinr]);
> + trace_ipi_exit(ipi_types[ipinr]);
>  }
>  
>  static irqreturn_t ipi_handler(int irq, void *data)
> 
> 


Re: [PATCH 25/36] time/tick-broadcast: Remove RCU_NONIDLE usage

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 04:27:48PM +0200, Peter Zijlstra wrote:
> No callers left that have already disabled RCU.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/time/tick-broadcast-hrtimer.c |   29 -
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> --- a/kernel/time/tick-broadcast-hrtimer.c
> +++ b/kernel/time/tick-broadcast-hrtimer.c
> @@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires,
>* hrtimer callback function is currently running, then
>* hrtimer_start() cannot move it and the timer stays on the CPU on
>* which it is assigned at the moment.
> +  */
> + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
> + /*
> +  * The core tick broadcast mode expects bc->bound_on to be set
> +  * correctly to prevent a CPU which has the broadcast hrtimer
> +  * armed from going deep idle.
>*
> -  * As this can be called from idle code, the hrtimer_start()
> -  * invocation has to be wrapped with RCU_NONIDLE() as
> -  * hrtimer_start() can call into tracing.
> +  * As tick_broadcast_lock is held, nothing can change the cpu
> +  * base which was just established in hrtimer_start() above. So
> +  * the below access is safe even without holding the hrtimer
> +  * base lock.
>*/
> - RCU_NONIDLE( {
> - hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
> - /*
> -  * The core tick broadcast mode expects bc->bound_on to be set
> -  * correctly to prevent a CPU which has the broadcast hrtimer
> -  * armed from going deep idle.
> -  *
> -  * As tick_broadcast_lock is held, nothing can change the cpu
> -  * base which was just established in hrtimer_start() above. So
> -  * the below access is safe even without holding the hrtimer
> -  * base lock.
> -  */
> - bc->bound_on = bctimer.base->cpu_base->cpu;
> - } );
> + bc->bound_on = bctimer.base->cpu_base->cpu;
> +
>   return 0;
>  }
>  
> 
> 


Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Peter Zijlstra
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote:
> On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> > --- a/kernel/time/tick-broadcast.c
> > +++ b/kernel/time/tick-broadcast.c
> > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
> >   * to avoid a deep idle transition as we are about to get the
> >   * broadcast IPI right away.
> >   */
> > -int tick_check_broadcast_expired(void)
> > +noinstr int tick_check_broadcast_expired(void)
> >  {
> > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> > +   return arch_test_bit(smp_processor_id(), 
> > cpumask_bits(tick_broadcast_force_mask));
> > +#else
> > return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> > +#endif
> >  }
> 
> This is somewhat not-ideal. :/

I'll say.

> Could we unconditionally do the arch_test_bit() variant, with a comment, or
> does that not exist in some cases?

Loads of build errors ensued, which is how I ended up with this mess ...


Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Peter Zijlstra
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote:
> On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> > All callers should still have RCU enabled.
> 
> IIUC with that true we should be able to drop the RCU_NONIDLE() from
> drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
> notifier.
> 
> I should be able to give that a spin on some hardware.
> 
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  kernel/cpu_pm.c |9 -
> >  1 file changed, 9 deletions(-)
> > 
> > --- a/kernel/cpu_pm.c
> > +++ b/kernel/cpu_pm.c
> > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
> >  {
> > int ret;
> >  
> > -   /*
> > -* This introduces a RCU read critical section, which could be
> > -* disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> > -* this.
> > -*/
> > -   rcu_irq_enter_irqson();
> > rcu_read_lock();
> > ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
> > rcu_read_unlock();
> > -   rcu_irq_exit_irqson();
> 
> To make this easier to debug, is it worth adding an assertion that RCU is
> watching here? e.g.
> 
>   RCU_LOCKDEP_WARN(!rcu_is_watching(),
>"cpu_pm_notify() used illegally from EQS");
> 

My understanding is that rcu_read_lock() implies something along those
lines when PROVE_RCU.


Re: [Pv-drivers] [PATCH 29/36] cpuidle, xenpv: Make more PARAVIRT_XXL noinstr clean

2022-06-14 Thread Peter Zijlstra
rtualizat...@lists.linux-foundation.org>, 
"james.bottom...@hansenpartnership.com" 
, "jcmvb...@gmail.com" 
, "thierry.red...@gmail.com" , 
"ker...@xen0n.name" , "quic_neer...@quicinc.com" 
, linux-s390 , 
"vschn...@redhat.com" , "john.ogn...@linutronix.de" 
, "ys...@users.sourceforge.jp" 
, "feste...@gmail.com" , 
"del...@gmx.de" , "daniel.lezc...@linaro.org" 
, "jonath...@nvidia.com" , 
"h...@linux.ibm.com" , "l...@kernel.org" , 
"linux-xte...@linux-xtensa.org" , 
"jo...@kernel.org" , "g...@linux.ibm.com" 
, "linux-arm-...@vger.kernel.org" 
, "sudeep.hol
 l...@arm.com" , "linux-m...@lists.linux-m68k.org" 
, "sho...@gmail.com" , 
"linux-arm-ker...@lists.infradead.org" , 
"ch...@zankel.net" , "sb...@kernel.org" , 
"dingu...@kernel.org" , "bris...@redhat.com" 
, "chenhua...@kernel.org" , 
"alexander.shish...@linux.intel.com" , 
"mturque...@baylibre.com" , "li...@rasmusvillemoes.dk" 
, "j...@joelfernandes.org" , 
Will Deacon , Boris Ostrovsky , 
"khil...@kernel.org" , "linux-c...@vger.kernel.org" 
, "t...@atomide.com" , 
"linux-snps-...@lists.infradead.org" , Mel 
Gorman , "j
 acob.jun@linux.intel.com" , Arnd Bergmann , "ulli.kr...@googlemail.com" 
, "vgu...@kernel.org" , 
"j...@joshtriplett.org" , Steven Rostedt 
, "r...@vger.kernel.org" , Mathieu 
Desnoyers , "bc...@quicinc.com" 
, "tsbog...@alpha.franken.de" , 
"linux-par...@vger.kernel.org" , 
"a...@brainfault.org" , "sriva...@csail.mit.edu" 
, "linux-al...@vger.kernel.org" 
, "shawn...@kernel.org" , 
"da...@davemloft.net" , "dal...@libc.org" 
, Pv-drivers , 
"bjorn.anders...@linaro.org" , "H. Peter Anvin" 
, "sparcli...@vger.kernel.org" , 
"linux-ri...@lists.infradead.org" , Anton Ivanov , 
"jo...@southpole.se" , "yury.no...@gmail.com" 
, "rich...@nod.at" , X86 ML 
, "li...@armlinux.org.uk" , 
"agr...@kernel.org" , "a...@eecs.berkeley.edu" 
, "paul...@kernel.org" , 
"frede...@kernel.org" , 
"stefan.kristians...@saunalahti.fi" , 
"openr...@lists.librecores.org" , 
"paul.walms...@sifive.com" , 
"linux-te...@vger.kernel.org" , 
"namhy...@kernel.org" , 
"andriy.shevche...@linux.intel.com" , 
"jpoim...@kernel.org" , Juergen Gross , 
"mon...@monstr.eu" , "linux-m...@vger.kernel.org" 
, "pal...@dabbelt.com" , "linux-
hexa...@vger.kernel.org" , Borislav Petkov 
, "johan...@sipsolutions.net" , 
linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Mon, Jun 13, 2022 at 07:23:13PM +, Nadav Amit wrote:
> On Jun 13, 2022, at 11:48 AM, Srivatsa S. Bhat  wrote:
> 
> > ⚠ External Email
> > 
> > On 6/8/22 4:27 PM, Peter Zijlstra wrote:
> >> vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() 
> >> leaves .noinstr.text section
> >> vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() 
> >> leaves .noinstr.text section
> >> vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to 
> >> HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section
> >> 
> >> Signed-off-by: Peter Zijlstra (Intel) 
> > 
> > Reviewed-by: Srivatsa S. Bhat (VMware) 
> > 
> >> 
> >> -static inline void wbinvd(void)
> >> +extern noinstr void pv_native_wbinvd(void);
> >> +
> >> +static __always_inline void wbinvd(void)
> >> {
> >>  PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
> >> }
> 
> I guess it is yet another instance of wrong accounting of GCC for
> the assembly blocks’ weight. I guess it is not a solution for older
> GCCs, but presumably PVOP_ALT_CALL() and friends should have
> used asm_inline or some new “asm_volatile_inline” variant.

Partially, some of the *SAN options also generate a metric ton of
nonsense when enabled and skew the compilers towards not inlining
things.


Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Tue, Jun 14, 2022 at 06:42:14PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> > > All callers should still have RCU enabled.
> > 
> > IIUC with that true we should be able to drop the RCU_NONIDLE() from
> > drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
> > notifier.
> > 
> > I should be able to give that a spin on some hardware.
> > 
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > >  kernel/cpu_pm.c |9 -
> > >  1 file changed, 9 deletions(-)
> > > 
> > > --- a/kernel/cpu_pm.c
> > > +++ b/kernel/cpu_pm.c
> > > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
> > >  {
> > >   int ret;
> > >  
> > > - /*
> > > -  * This introduces a RCU read critical section, which could be
> > > -  * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> > > -  * this.
> > > -  */
> > > - rcu_irq_enter_irqson();
> > >   rcu_read_lock();
> > >   ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
> > >   rcu_read_unlock();
> > > - rcu_irq_exit_irqson();
> > 
> > To make this easier to debug, is it worth adding an assertion that RCU is
> > watching here? e.g.
> > 
> > RCU_LOCKDEP_WARN(!rcu_is_watching(),
> >  "cpu_pm_notify() used illegally from EQS");
> > 
> 
> My understanding is that rcu_read_lock() implies something along those
> lines when PROVE_RCU.

Ah, duh. Given that:

Acked-by: Mark Rutland 

Mark.


Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Peter Zijlstra
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote:
> On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> > Hi All! (omg so many)
> 
> Hi Peter,
> 
> Sorry for the delay; my plate has also been rather full recently. I'm 
> beginning
> to page this in now.

No worries; we all have too much to do ;-)

> > These here few patches mostly clear out the utter mess that is cpuidle vs 
> > rcuidle.
> > 
> > At the end of the ride there's only 2 real RCU_NONIDLE() users left
> > 
> >   arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());
> >   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> > PERF_EF_RELOAD));
> 
> The latter of these is necessary because apparently PM notifiers are called
> with RCU not watching. Is that still the case today (or at the end of this
> series)? If so, that feels like fertile land for more issues (yaey...). If 
> not,
> we should be able to drop this.

That should be fixed; fingers crossed :-)

> >   kernel/cfi.c:   RCU_NONIDLE({
> > 
> > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand 
> > full
> > of trace_.*_rcuidle() left:
> > 
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> >   kernel/trace/trace_preemptirq.c:
> > trace_preempt_enable_rcuidle(a0, a1);
> >   kernel/trace/trace_preemptirq.c:
> > trace_preempt_disable_rcuidle(a0, a1);
> > 
> > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.
> 
> I think those are also unused on arm64 too?
> 
> If not, I can go attack that.

My grep spots:

arch/arm64/kernel/entry-common.c:   trace_hardirqs_on();
arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();
arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();

The _on thing should be replaced with something like:

trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare();
instrumentation_end();
rcu_irq_exit();
lockdep_hardirqs_on(CALLER_ADDR0);

(as I think you know, since you have some of that already). And
something similar for the _off thing, but with _off_finish().

> > I've touched a _lot_ of code that I can't test and likely broken some of it 
> > :/
> > In particular, the whole ARM cpuidle stuff was quite involved with OMAP 
> > being
> > the absolute 'winner'.
> > 
> > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that 
> > to
> > GENERIC_ENTRY.
> 
> Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
> (refactoring both arm64 and the generic portion to be more amenable to each
> other), but we can certainly move closer to that for the bits that matter 
> here.

I know ... been there etc.. :-)

> Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff 
> that
> we can select regardless of GENERIC_ENTRY to make that easier.

Possible yeah.

> > I've also got a note that says ARM64 can probably do a WFE based
> > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.
> 
> Possibly; I'm not sure how much of a win that'll be given that by default 
> we'll
> have a ~10KHz WFE wakeup from the timer, but we could take a peek.

Ohh.. I didn't know it woke up *that* often. I just know Will made use
of it in things like smp_cond_load_relaxed() which would be somewhat
similar to a ver

Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Tue, Jun 14, 2022 at 06:40:53PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> > > --- a/kernel/time/tick-broadcast.c
> > > +++ b/kernel/time/tick-broadcast.c
> > > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
> > >   * to avoid a deep idle transition as we are about to get the
> > >   * broadcast IPI right away.
> > >   */
> > > -int tick_check_broadcast_expired(void)
> > > +noinstr int tick_check_broadcast_expired(void)
> > >  {
> > > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> > > + return arch_test_bit(smp_processor_id(), 
> > > cpumask_bits(tick_broadcast_force_mask));
> > > +#else
> > >   return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> > > +#endif
> > >  }
> > 
> > This is somewhat not-ideal. :/
> 
> I'll say.
> 
> > Could we unconditionally do the arch_test_bit() variant, with a comment, or
> > does that not exist in some cases?
> 
> Loads of build errors ensued, which is how I ended up with this mess ...

Yaey :(

I see the same is true for the thread flag manipulation too.

I'll take a look and see if we can layer things so that we can use the arch_*()
helpers and wrap those consistently so that we don't have to check the CPP
guard.

Ideally we'd have a a better language that allows us to make some
context-senstive decisions, then we could hide all this gunk in the lower
levels with somethin like:

if (!THIS_IS_A_NOINSTR_FUNCTION()) {
explicit_instrumentation(...);
}

... ho hum.

Mark.


Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Mark Rutland
arndb.de>, ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Tue, Jun 14, 2022 at 06:58:30PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> > > Hi All! (omg so many)
> > 
> > Hi Peter,
> > 
> > Sorry for the delay; my plate has also been rather full recently. I'm 
> > beginning
> > to page this in now.
> 
> No worries; we all have too much to do ;-)
> 
> > > These here few patches mostly clear out the utter mess that is cpuidle vs 
> > > rcuidle.
> > > 
> > > At the end of the ride there's only 2 real RCU_NONIDLE() users left
> > > 
> > >   arch/arm64/kernel/suspend.c:
> > > RCU_NONIDLE(__cpu_suspend_exit());
> > >   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> > > PERF_EF_RELOAD));
> > 
> > The latter of these is necessary because apparently PM notifiers are called
> > with RCU not watching. Is that still the case today (or at the end of this
> > series)? If so, that feels like fertile land for more issues (yaey...). If 
> > not,
> > we should be able to drop this.
> 
> That should be fixed; fingers crossed :-)

Cool; I'll try to give that a spin when I'm sat next to some relevant hardware. 
:)

> > >   kernel/cfi.c:   RCU_NONIDLE({
> > > 
> > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand 
> > > full
> > > of trace_.*_rcuidle() left:
> > > 
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_preempt_enable_rcuidle(a0, a1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_preempt_disable_rcuidle(a0, a1);
> > > 
> > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.
> > I think those are also unused on arm64 too?
> > 
> > If not, I can go attack that.
> 
> My grep spots:
> 
> arch/arm64/kernel/entry-common.c:   trace_hardirqs_on();
> arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();
> arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();

Ah; I hadn't realised those used trace_.*_rcuidle() behind the scenes.

That affects local_irq_{enable,disable,restore}() too (which is what the
daifflags.h bits are emulating), and also the generic entry code's
irqentry_exit().

So it feels to me like we should be fixing those more generally? e.g. say that
with a new STRICT_ENTRY[_RCU], we can only call trace_hardirqs_{on,off}() with
RCU watching, and alter the definition of those?

> The _on thing should be replaced with something like:
> 
>   trace_hardirqs_on_prepare();
>   lockdep_hardirqs_on_prepare();
>   instrumentation_end();
>   rcu_irq_exit();
>   lockdep_hardirqs_on(CALLER_ADDR0);
> 
> (as I think you know, since you have some of that already). And
> something similar for the _off thing, but with _off_finish().

Sure; I knew that was necessary for the outermost parts of entry (and I think
that's all handled), I just hadn't realised that trace_hardirqs_{on,off} did
the rcuidle thing in the middle.

It'd be nice to not have to open-code the whole sequence everywhere for the
portions which run after entry and are instrumentable, so (as above) I reckon
we want to make trace_hardirqs_{on,off}() not do the rcuidle part
unnecessarily (which IIUC is an end-goal anyway)?

> > > I've touched a _l

Re: [PATCH 2/6] powerpc: Provide syscall wrapper

2022-06-14 Thread Rohan McLure
> On 3 Jun 2022, at 7:04 pm, Arnd Bergmann  wrote:
> 
> On Wed, Jun 1, 2022 at 7:48 AM Rohan McLure  wrote:
>> 
>> Syscall wrapper implemented as per s390, x86, arm64, providing the
>> option for gprs to be cleared on entry to the kernel, reducing caller
>> influence influence on speculation within syscall routine. The wrapper
>> is a macro that emits syscall handler implementations with parameters
>> passed by stack pointer.
>> 
>> For platforms supporting this syscall wrapper, emit symbols with usual
>> in-register parameters (`sys...`) to support calls to syscall handlers
>> from within the kernel.
> 
> Nice work!
> 
>> Syscalls are wrapped on all platforms except Cell processor. SPUs require
>> access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER
>> enabled.
> 
> Right, I think it's ok to leave out the SPU side. In the long run, I
> would like to
> go back to requiring the prototypes for everything on all architectures, to
> enforce type checking, but that's a separate piece of work.
> 
>> +/*
>> + * For PowerPC specific syscall implementations, wrapper takes exact name 
>> and
>> + * return type for a given function.
>> + */
>> +
>> +#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
>> +#define PPC_SYSCALL_DEFINE(x, type, name, ...)  
>>\
>> +   asmlinkage type __powerpc_##name(const struct pt_regs *regs);
>>\
>> +   ALLOW_ERROR_INJECTION(__powerpc_##name, ERRNO);  
>>\
>> +   type sys_##name(__MAP(x,__SC_DECL,__VA_ARGS__)); 
>>\
>> +   static type __se_##name(__MAP(x,__SC_LONG,__VA_ARGS__)); 
>>\
>> +   static inline type __do_##name(__MAP(x,__SC_DECL,__VA_ARGS__));  
>>\
> 
> What is the benefit of having a separate set of macros for this? I think that
> adds more complexity than it saves in the end.
> 
>> @@ -68,52 +69,63 @@ unsigned long compat_sys_mmap2(unsigned long addr, 
>> size_t len,
>> #define merge_64(high, low) ((u64)high << 32) | low
>> #endif
>> 
>> -compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
>> compat_size_t count,
>> -u32 reg6, u32 pos1, u32 pos2)
>> +PPC_SYSCALL_DEFINE(6, compat_ssize_t, compat_sys_pread64,
>> +  unsigned int, fd,
>> +  char __user *, ubuf, compat_size_t, count,
>> +  u32, reg6, u32, pos1, u32, pos2)
>> {
>>   return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2));
>> }
> 
> We now have generalized versions of most of these system calls, as of 5.19-rc1
> with the addition of the riscv compat support. I think it would be
> best to try removing
> the powerpc private versions wherever possible and use the common version,
> modifying it further where necessary.
> 
> If this doesn't work for some of the syscalls, can you use the
> COMPAT_SYSCALL_DEFINE for those in place of PPC_SYSCALL_DEFINE?
> 
>Arnd

Hi Arnd,

Thanks for your comments. 

> What is the benefit of having a separate set of macros for this? I think that
> adds more complexity than it saves in the end.

I was unsure whether the exact return types needed to be respected for syscall
handlers or not. I realise that under the existing behaviour,
system_call_exception performs an indirect call, the return type of which is
interpreted as a long, so the return type should be irrelevant. On inspection
PPC_SYSCALL_DEFINE is readily replacable with COMPAT_SYSCALL_DEFINE as you
have suggested.

Before resubmitting this series, I will try for a patch series which modernises
syscall handlers in arch/powerpc, and inspect where powerpc private versions
are strictly necessary, using __ARCH_WANT_... wherever possible.

Rohan

回复: [PATCH 1/2] powerpc:mm: export symbol ioremap_coherent

2022-06-14 Thread Wenhu Wang
>发件人: Michael Ellerman 
>发送时间: 2022年6月14日 18:45
>收件人: Wang Wenhu ; gre...@linuxfoundation.org 
>; christophe.le...@csgroup.eu 
>
>抄送: linuxppc-dev@lists.ozlabs.org ; 
>linux-ker...@vger.kernel.org ; Wang Wenhu 
>
>主题: Re: [PATCH 1/2] powerpc:mm: export symbol ioremap_coherent 
> 
>Wang Wenhu  writes:
>> The function ioremap_coherent may be called by modules such as
>> fsl_85xx_cache_sram. So export it for access in other modules.
>
>ioremap_coherent() is powerpc specific, and only has one other caller,
>I'd like to remove it.
>
>Does ioremap_cache() work for you?
>

Yes, it works. I will update in v2 to use ioremap_cache.
I tested and compared the outcomes of ioremap_cache and ioremap_coherent,
and found they ended same values.

Thanks,
Wenhu

>
>> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
>> index 4f12504fb405..08a00dacef0b 100644
>> --- a/arch/powerpc/mm/ioremap.c
>> +++ b/arch/powerpc/mm/ioremap.c
>> @@ -40,6 +40,7 @@ void __iomem *ioremap_coherent(phys_addr_t addr, unsigned 
>> long size)
>>    return iowa_ioremap(addr, size, prot, caller);
>>    return __ioremap_caller(addr, size, prot, caller);
>>  }
>> +EXPORT_SYMBOL(ioremap_coherent);
>>  
>>  void __iomem *ioremap_prot(phys_addr_t addr, unsigned long size, unsigned 
>>long flags)
>  {
> -- 
> 2.25.1

[PATCHv2 0/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver

2022-06-14 Thread Wang Wenhu
This series try to push an uio driver which works on freescale mpc85xx
to configure its l2-cache-sram as a block of SRAM and enable user level
application access of the SRAM.

1/2: For coding-style consideration of macro CONFIG_HAVE_IOREMAP_PORT;
2/2: Implementation of the uio driver.

This is the second version, which addressed some commets:
1. Use __be32 instead of u32 for the big-endian data declarations;
2. Remove "static inline" version of generic_access_phys definition in .h file
and give a version of no-op definition in mm/memory.c;
3. Use generic ioremap_cache instead of ioremap_coherent

For v1, see:
1/2: https://lore.kernel.org/all/20220610144348.GA595923@bhelgaas/T/
2/2: https://lore.kernel.org/lkml/yqhy1uxwclljm...@kroah.com/

Wang Wenhu (2):
  mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files
  uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

 drivers/char/mem.c|   2 -
 drivers/fpga/dfl-afu-main.c   |   2 -
 drivers/pci/mmap.c|   2 -
 drivers/uio/Kconfig   |  14 ++
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio.c |   2 -
 drivers/uio/uio_fsl_85xx_cache_sram.c | 288 ++
 mm/memory.c   |  13 +-
 8 files changed, 312 insertions(+), 12 deletions(-)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

-- 
2.25.1



[PATCHv2 1/2] mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files

2022-06-14 Thread Wang Wenhu
It is recommended in the "Conditional Compilation" chapter of kernel
coding-style documentation that preprocessor conditionals should not
be used in .c files wherever possible.

As for the macro CONFIG_HAVE_IOREMAP_PROT, now it's a proper chance
to eliminate it in .c files which are referencers. We constrict its usage
only to mm/memory.c.
HAVE_IOREMAP_PROT is supported by part of archectures such as powerpc and
x86, but not supported by some others such as arm. So for some functions,
a no-op version should be available. Currently it's generic_access_phys,
which is referenced by some other modules.

Signed-off-by: Wang Wenhu 
---
v2:
 - Added IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT) condition in __access_remote_vm
 - Added generic_access_phys() function with no-op in mm/memory.c instead of 
the 
 former one of "static inline" in include/linux/mm.h
Former: https://lore.kernel.org/linux-mm/yqmrtwah5fiws...@kroah.com/T/
---
 drivers/char/mem.c  |  2 --
 drivers/fpga/dfl-afu-main.c |  2 --
 drivers/pci/mmap.c  |  2 --
 drivers/uio/uio.c   |  2 --
 mm/memory.c | 13 +
 5 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 84ca98ed1dad..40186a441e38 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -354,9 +354,7 @@ static inline int private_mapping_ok(struct vm_area_struct 
*vma)
 #endif
 
 static const struct vm_operations_struct mmap_mem_ops = {
-#ifdef CONFIG_HAVE_IOREMAP_PROT
.access = generic_access_phys
-#endif
 };
 
 static int mmap_mem(struct file *file, struct vm_area_struct *vma)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 7f621e96d3b8..833e14806c7a 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -797,9 +797,7 @@ static long afu_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
 }
 
 static const struct vm_operations_struct afu_vma_ops = {
-#ifdef CONFIG_HAVE_IOREMAP_PROT
.access = generic_access_phys,
-#endif
 };
 
 static int afu_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
index b8c9011987f4..1dcfabf80453 100644
--- a/drivers/pci/mmap.c
+++ b/drivers/pci/mmap.c
@@ -35,9 +35,7 @@ int pci_mmap_page_range(struct pci_dev *pdev, int bar,
 #endif
 
 static const struct vm_operations_struct pci_phys_vm_ops = {
-#ifdef CONFIG_HAVE_IOREMAP_PROT
.access = generic_access_phys,
-#endif
 };
 
 int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 43afbb7c5ab9..c9205a121007 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -719,9 +719,7 @@ static int uio_mmap_logical(struct vm_area_struct *vma)
 }
 
 static const struct vm_operations_struct uio_physical_vm_ops = {
-#ifdef CONFIG_HAVE_IOREMAP_PROT
.access = generic_access_phys,
-#endif
 };
 
 static int uio_mmap_physical(struct vm_area_struct *vma)
diff --git a/mm/memory.c b/mm/memory.c
index 7a089145cad4..7c0e59085456 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5413,6 +5413,13 @@ int generic_access_phys(struct vm_area_struct *vma, 
unsigned long addr,
return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
+#else
+int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
+   void *buf, int len, int write)
+{
+   return 0;
+}
+EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
 
 /*
@@ -5437,9 +5444,8 @@ int __access_remote_vm(struct mm_struct *mm, unsigned 
long addr, void *buf,
ret = get_user_pages_remote(mm, addr, 1,
gup_flags, &page, &vma, NULL);
if (ret <= 0) {
-#ifndef CONFIG_HAVE_IOREMAP_PROT
-   break;
-#else
+   if (!IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT))
+   break;
/*
 * Check if this is a VM_IO | VM_PFNMAP VMA, which
 * we can access using slightly different code.
@@ -5453,7 +5459,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned 
long addr, void *buf,
if (ret <= 0)
break;
bytes = ret;
-#endif
} else {
bytes = len;
offset = addr & (PAGE_SIZE-1);
-- 
2.25.1



[PATCHv2 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Wang Wenhu
Freescale mpc85xx l2-cache could be optionally configured as SRAM partly
or fully. Users can make use of it as a block of independent memory that
offers special usage, such as for debuging or other critical status info
storage, which keeps consistently even when the whole system crashed.
Applications can make use of UIO driver to access the SRAM from user level.

Once there was another driver version for the l2-cache-sram for SRAM access
in kernel space. It had been removed recently.
See: 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dc21ed2aef4150fc2fcf58227a4ff24502015c03

Signed-off-by: Wang Wenhu 
---
v2:
 - Use __be32 instead of u32 for big-endian data declarations;
 - Use generic ioremap_cache instead of ioremap_coherent;
 - Physical address support both 32 and 64 bits;
 - Addressed some other comments from Greg.
---
 drivers/uio/Kconfig   |  14 ++
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 288 ++
 3 files changed, 303 insertions(+)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 2e16c5338e5b..f7604584a12c 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,20 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
 
+config UIO_FSL_85XX_CACHE_SRAM
+   tristate "Freescale 85xx L2-Cache-SRAM UIO driver"
+   depends on FSL_SOC_BOOKE && PPC32
+   help
+ Driver for user level access of freescale mpc85xx l2-cache-sram.
+
+ Freescale's mpc85xx provides an option of configuring a part of
+ (or full) cache memory as SRAM. The driver does this configuring
+ work and exports SRAM to user-space for access form user level.
+ This is extremely helpful for user applications that require
+ high performance memory accesses.
+
+ If you don't know what to do here, say N.
+
 config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index f2f416a14228..1ba07d92a1b1 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_UIO_MF624) += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o
 obj-$(CONFIG_UIO_HV_GENERIC)   += uio_hv_generic.o
 obj-$(CONFIG_UIO_DFL)  += uio_dfl.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..6f91b0aa946b
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,288 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Wang Wenhu 
+ * All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"uio_mpc85xx_cache_sram"
+#define UIO_INFO_VER   "0.0.1"
+#define UIO_NAME   "uio_cache_sram"
+
+#define L2CR_L2FI  0x4000  /* L2 flash invalidate */
+#define L2CR_L2IO  0x0020  /* L2 instruction only */
+#define L2CR_SRAM_ZERO 0x  /* L2SRAM zero size */
+#define L2CR_SRAM_FULL 0x0001  /* L2SRAM full size */
+#define L2CR_SRAM_HALF 0x0002  /* L2SRAM half size */
+#define L2CR_SRAM_TWO_HALFS0x0003  /* L2SRAM two half sizes */
+#define L2CR_SRAM_QUART0x0004  /* L2SRAM one quarter 
size */
+#define L2CR_SRAM_TWO_QUARTS   0x0005  /* L2SRAM two quarter size */
+#define L2CR_SRAM_EIGHTH   0x0006  /* L2SRAM one eighth size */
+#define L2CR_SRAM_TWO_EIGHTH   0x0007  /* L2SRAM two eighth size */
+
+#define L2SRAM_OPTIMAL_SZ_SHIFT0x0003  /* Optimum size for 
L2SRAM */
+
+#define L2SRAM_BAR_MSK_LO180xC000  /* Lower 18 bits */
+#define L2SRAM_BARE_MSK_HI40x000F  /* Upper 4 bits */
+
+enum cache_sram_lock_ways {
+   LOCK_WAYS_ZERO  = 0,
+   LOCK_WAYS_EIGHTH= 1,
+   LOCK_WAYS_TWO_EIGHTH= 2,
+   LOCK_WAYS_HALF  = 4,
+   LOCK_WAYS_FULL  = 8,
+};
+
+struct mpc85xx_l2ctlr {
+   __be32  ctl;/* 0x000 - L2 control */
+   u8  res1[0xC];
+   __be32  ewar0;  /* 0x010 - External write address 0 */
+   __be32  ewarea0;/* 0x014 - External write address extended 0 */
+   __be32  ewcr0;  /* 0x018 - External write ctrl */
+   u8  res2[4];
+   __be32  ewar1;  /* 0x020 - External write address 1 */
+   __be32  ewarea1;/* 0x024 - External write address extended 1 */
+   __be32  ewcr1;  /* 0x028 - External write ctrl 1 */
+   u8  res3[4];
+   __be32  ewar2;  /* 0x030 - External write address 2 */
+   __be32  ewarea2;   

Re: 回复: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy


Le 14/06/2022 à 08:09, Wenhu Wang a écrit :
>>> +
>>> +static const struct vm_operations_struct uio_cache_sram_vm_ops = {
>>> +#ifdef CONFIG_HAVE_IOREMAP_PROT
>>
>> Same here.
>>
> 
> I tried to eliminate it in mainline
> See: [PATCH v2] mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files
> https://lkml.org/lkml/2022/6/10/695
> 
>>> + .access = generic_access_phys,
>>> +#endif
>>> +};

Another solution is to do:


static const struct vm_operations_struct uio_cache_sram_vm_ops = {
.access = IS_ENABLED(CONFIG_HAVE_IOREMAP_PROT) ? generic_access_phys : 
NULL,
};


Christophe

Re: [PATCHv2 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christoph Hellwig
As pointed out last time:  uio is the wrong interface to expose sram,
and any kind of ioremap is the wrong way to map it.


Re: [PATCHv2 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy


Le 15/06/2022 à 07:57, Wang Wenhu a écrit :
> Freescale mpc85xx l2-cache could be optionally configured as SRAM partly
> or fully. Users can make use of it as a block of independent memory that
> offers special usage, such as for debuging or other critical status info
> storage, which keeps consistently even when the whole system crashed.
> Applications can make use of UIO driver to access the SRAM from user level.
> 
> Once there was another driver version for the l2-cache-sram for SRAM access
> in kernel space. It had been removed recently.
> See: 
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dc21ed2aef4150fc2fcf58227a4ff24502015c03
> 
> Signed-off-by: Wang Wenhu 
> ---
> v2:
>   - Use __be32 instead of u32 for big-endian data declarations;
>   - Use generic ioremap_cache instead of ioremap_coherent;
>   - Physical address support both 32 and 64 bits;
>   - Addressed some other comments from Greg.
> ---
>   drivers/uio/Kconfig   |  14 ++
>   drivers/uio/Makefile  |   1 +
>   drivers/uio/uio_fsl_85xx_cache_sram.c | 288 ++
>   3 files changed, 303 insertions(+)
>   create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c
> 
> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> index 2e16c5338e5b..f7604584a12c 100644
> --- a/drivers/uio/Kconfig
> +++ b/drivers/uio/Kconfig
> @@ -105,6 +105,20 @@ config UIO_NETX
> To compile this driver as a module, choose M here; the module
> will be called uio_netx.
>   
> +config UIO_FSL_85XX_CACHE_SRAM
> + tristate "Freescale 85xx L2-Cache-SRAM UIO driver"
> + depends on FSL_SOC_BOOKE && PPC32
> + help
> +   Driver for user level access of freescale mpc85xx l2-cache-sram.
> +
> +   Freescale's mpc85xx provides an option of configuring a part of
> +   (or full) cache memory as SRAM. The driver does this configuring
> +   work and exports SRAM to user-space for access form user level.
> +   This is extremely helpful for user applications that require
> +   high performance memory accesses.
> +
> +   If you don't know what to do here, say N.
> +
>   config UIO_FSL_ELBC_GPCM
>   tristate "eLBC/GPCM driver"
>   depends on FSL_LBC
> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> index f2f416a14228..1ba07d92a1b1 100644
> --- a/drivers/uio/Makefile
> +++ b/drivers/uio/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_UIO_MF624) += uio_mf624.o
>   obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
>   obj-$(CONFIG_UIO_HV_GENERIC)+= uio_hv_generic.o
>   obj-$(CONFIG_UIO_DFL)   += uio_dfl.o
> +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)+= uio_fsl_85xx_cache_sram.o
> diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
> b/drivers/uio/uio_fsl_85xx_cache_sram.c
> new file mode 100644
> index ..6f91b0aa946b
> --- /dev/null
> +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
> @@ -0,0 +1,288 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Wang Wenhu 
> + * All rights reserved.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DRIVER_NAME  "uio_mpc85xx_cache_sram"
> +#define UIO_INFO_VER "0.0.1"
> +#define UIO_NAME "uio_cache_sram"
> +
> +#define L2CR_L2FI0x4000  /* L2 flash invalidate */
> +#define L2CR_L2IO0x0020  /* L2 instruction only */
> +#define L2CR_SRAM_ZERO   0x  /* L2SRAM zero size */
> +#define L2CR_SRAM_FULL   0x0001  /* L2SRAM full size */
> +#define L2CR_SRAM_HALF   0x0002  /* L2SRAM half size */
> +#define L2CR_SRAM_TWO_HALFS  0x0003  /* L2SRAM two half sizes */
> +#define L2CR_SRAM_QUART  0x0004  /* L2SRAM one quarter 
> size */
> +#define L2CR_SRAM_TWO_QUARTS 0x0005  /* L2SRAM two quarter size */
> +#define L2CR_SRAM_EIGHTH 0x0006  /* L2SRAM one eighth size */
> +#define L2CR_SRAM_TWO_EIGHTH 0x0007  /* L2SRAM two eighth size */
> +
> +#define L2SRAM_OPTIMAL_SZ_SHIFT  0x0003  /* Optimum size for 
> L2SRAM */
> +
> +#define L2SRAM_BAR_MSK_LO18  0xC000  /* Lower 18 bits */
> +#define L2SRAM_BARE_MSK_HI4  0x000F  /* Upper 4 bits */
> +
> +enum cache_sram_lock_ways {
> + LOCK_WAYS_ZERO  = 0,
> + LOCK_WAYS_EIGHTH= 1,
> + LOCK_WAYS_TWO_EIGHTH= 2,
> + LOCK_WAYS_HALF  = 4,
> + LOCK_WAYS_FULL  = 8,
> +};
> +
> +struct mpc85xx_l2ctlr {
> + __be32  ctl;/* 0x000 - L2 control */
> + u8  res1[0xC];
> + __be32  ewar0;  /* 0x010 - External write address 0 */
> + __be32  ewarea0;/* 0x014 - External write address extended 0 */
> + __be32  ewcr0;  /* 0x018 - External write ctrl */
> + u8  res2[4];
> + __be32  ewar1;  /* 0x020 - External write address 1 */
> + __be32  ewarea1;  

Re: [PATCHv2 1/2] mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files

2022-06-14 Thread Christoph Hellwig
Did you verify that all architectures actually provide a ioremap_prot
prototype?
The header situation for ioremap* is a mess unfortunately.


Re: [PATCHv2 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christophe Leroy


Le 15/06/2022 à 07:57, Wang Wenhu a écrit :
> Freescale mpc85xx l2-cache could be optionally configured as SRAM partly
> or fully. Users can make use of it as a block of independent memory that
> offers special usage, such as for debuging or other critical status info
> storage, which keeps consistently even when the whole system crashed.
> Applications can make use of UIO driver to access the SRAM from user level.
> 
> Once there was another driver version for the l2-cache-sram for SRAM access
> in kernel space. It had been removed recently.
> See: 
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dc21ed2aef4150fc2fcf58227a4ff24502015c03
> 
> Signed-off-by: Wang Wenhu 
> ---
> v2:
>   - Use __be32 instead of u32 for big-endian data declarations;

I get the following warnings which 'make 
drivers/uio/uio_fsl_85xx_cache_sram.o C=2'

   CHECK   drivers/uio/uio_fsl_85xx_cache_sram.c
drivers/uio/uio_fsl_85xx_cache_sram.c:96:19: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:96:19:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:96:19:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:100:27: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:100:27:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:100:27:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:102:9: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:102:9:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:102:9:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:102:9: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:102:9:expected unsigned int 
const volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:102:9:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:106:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:106:17:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:106:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:106:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:106:17:expected unsigned int 
const volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:106:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:110:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:110:17:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:110:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:110:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:110:17:expected unsigned int 
const volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:110:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:114:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:114:17:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:114:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:114:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:114:17:expected unsigned int 
const volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:114:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:119:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:119:17:expected unsigned int 
volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:119:17:got restricted __be32 
[noderef] __iomem *
drivers/uio/uio_fsl_85xx_cache_sram.c:119:17: warning: incorrect type in 
argument 1 (different base types)
drivers/uio/uio_fsl_85xx_cache_sram.c:119:17:expected unsigned int 
const volatile [noderef] [usertype] __iomem *addr
drivers/uio/uio_fsl_85xx_cache_sram.c:119:17:got restricted __be32 
[noderef] __iomem *


>   - Use generic ioremap_cache instead of ioremap_coherent;
>   - Physical address support both 32 and 64 bits;
>   - Addressed some other comments from Greg.
> ---
>   drivers/uio/Kconfig   |  14 ++
>   drivers/uio/Makefile  

Re: [PATCH] arch/*: Disable softirq stacks on PREEMPT_RT.

2022-06-14 Thread Christoph Hellwig
On Tue, Jun 14, 2022 at 08:18:14PM +0200, Sebastian Andrzej Siewior wrote:
> Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and

s/safe/save/