Re: [PATCH 1/4] powerpc/64s: Add DEBUG_PAGEALLOC for radix

2022-09-19 Thread Michael Ellerman
Nicholas Miehlbradt  writes:
> There is support for DEBUG_PAGEALLOC on hash but not on radix.
> Add support on radix.
>
> Signed-off-by: Nicholas Miehlbradt 
> ---
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index db2f3d193448..483c99bfbde5 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -503,6 +504,9 @@ static unsigned long __init radix_memory_block_size(void)
>  {
>   unsigned long mem_block_size = MIN_MEMORY_BLOCK_SIZE;
>  
> + if (debug_pagealloc_enabled())
> + return PAGE_SIZE;
> +
>   /*
>* OPAL firmware feature is set by now. Hence we are ok
>* to test OPAL feature.
> @@ -519,6 +523,9 @@ static unsigned long __init radix_memory_block_size(void)
>  
>  static unsigned long __init radix_memory_block_size(void)
>  {
> + if (debug_pagealloc_enabled())
> + return PAGE_SIZE;
> +
>   return 1UL * 1024 * 1024 * 1024;
>  }
  
This value ends up in radix_mem_block_size, which is returned by 
pnv_memory_block_size(), which is wired up as ppc_md.memory_block_size,
and that's called by memory_block_size_bytes().

And I thought that value had to be >= MIN_MEMORY_BLOCK_SIZE.

#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
#define SECTION_SIZE_BITS   24


I would expect us to hit the panic in memory_dev_init().

So that's odd.

I suspect you need to leave radix_memory_block_size() alone, or at least
make sure you return MIN_MEMORY_BLOCK_SIZE when debug page alloc is
enabled.

We probably need a separate variable that holds the max page size used
for the linear mapping, and that would then be 1G in the normal case or
PAGE_SIZE in the debug page alloc case.

cheers


Re: [PATCH 1/4] powerpc/64s: Add DEBUG_PAGEALLOC for radix

2022-09-19 Thread Christophe Leroy


Le 19/09/2022 à 09:00, Michael Ellerman a écrit :
> Nicholas Miehlbradt  writes:
>> There is support for DEBUG_PAGEALLOC on hash but not on radix.
>> Add support on radix.
>>
>> Signed-off-by: Nicholas Miehlbradt 
>> ---
>>   arch/powerpc/mm/book3s64/radix_pgtable.c | 16 +++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
>> b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index db2f3d193448..483c99bfbde5 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -30,6 +30,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   
>>   #include 
>>   
>> @@ -503,6 +504,9 @@ static unsigned long __init radix_memory_block_size(void)
>>   {
>>  unsigned long mem_block_size = MIN_MEMORY_BLOCK_SIZE;
>>   
>> +if (debug_pagealloc_enabled())
>> +return PAGE_SIZE;
>> +
>>  /*
>>   * OPAL firmware feature is set by now. Hence we are ok
>>   * to test OPAL feature.
>> @@ -519,6 +523,9 @@ static unsigned long __init radix_memory_block_size(void)
>>   
>>   static unsigned long __init radix_memory_block_size(void)
>>   {
>> +if (debug_pagealloc_enabled())
>> +return PAGE_SIZE;
>> +
>>  return 1UL * 1024 * 1024 * 1024;
>>   }
>
> This value ends up in radix_mem_block_size, which is returned by
> pnv_memory_block_size(), which is wired up as ppc_md.memory_block_size,
> and that's called by memory_block_size_bytes().
> 
> And I thought that value had to be >= MIN_MEMORY_BLOCK_SIZE.
> 
> #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
> #define SECTION_SIZE_BITS   24
> 
> 
> I would expect us to hit the panic in memory_dev_init().
> 
> So that's odd.
> 
> I suspect you need to leave radix_memory_block_size() alone, or at least
> make sure you return MIN_MEMORY_BLOCK_SIZE when debug page alloc is
> enabled.
> 
> We probably need a separate variable that holds the max page size used
> for the linear mapping, and that would then be 1G in the normal case or
> PAGE_SIZE in the debug page alloc case.
> 

I don't know the details of PPC64, but as you mention linear mapping, be 
aware that you don't need to map everything with pages. Only data. You 
can keep text mapped by blocks, it's what is done on 8xx and book3s/32.

Christophe

Re: [PATCH 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-19 Thread Christophe Leroy


Le 19/09/2022 à 08:49, Benjamin Gray a écrit :
> On Mon, 2022-09-19 at 06:04 +, Christophe Leroy wrote:
>> With CONFIG_STRICT_KERNEL_RWX, this patches causes a 15% time
>> increase
>> for activation/deactivation of ftrace.
> 
> It's possible that new alignment check is the cause. I'll see
> 
> 
>> Without CONFIG_STRICT_KERNEL_RWX, it doesn't build.
> 
> Yup, fixed for v2
> 
>>> +static int __patch_text(void *dest, const void *src, size_t size,
>>> bool is_exec, void *exec_addr)
>>
>> Is 'text' a good name ? For me text mean executable code. Should it
>> be
>> __patch_memory() ?
> 
> Well patching regular memory is just a normal store. Text to me implies
> its non-writeable. Though __patch_memory would be fine.
> 
>> Why pass src as a void * ? This forces data to go via the stack.
>> Can't
>> you pass it as a 'long' ?
> 
> Probably, I wasn't aware that it would make a difference. I prefer
> pointers in general for their semantic meaning, but will change if it
> affects param passing.
> 
>>> +   if (virt_to_pfn(dest) != virt_to_pfn(dest + size - 1))
>>> +   return -EFAULT;
>>
>> Why do you need that new check ?
> 
> If the patch crosses a page boundary then letting it happen is
> unpredictable. Though perhaps this requirement can just be put as a
> comment, or require that patches be aligned to the patch size.

Why would it be unpredictable ? Only one page is mapped. If it crosses 
the boundary, __put_kernel_nofault() will fail in a controled manner.
I see no point in doing the check before every write.

Requiring an alignment to the patch size may be problematic when 
patching prefixed instructions (8 bytes). Allthough they are garantied 
to lie in a single cache line hence a single page, they are not 
garantied to be aligned to more than 4 bytes.

And while you are thinking about alignment, don't forget that dcbst and 
icbi apply on a give cacheline. If your memory crosses a cacheline you 
may have a problem.

> 
>>> +   case 8:
>>> +   __put_kernel_nofault(dest, src, u64,
>>> failed);
>>> +   break;
>>
>> Is case 8 needed for PPC32 ?
> 
> I don't have a particular need for it, but the underlying
> __put_kernel_nofault is capable of it so I included it.

Well, not including it will allow you to pass the source as a 'long' as 
mentionned above.

> 
>>> +   }
>>
>> Do you catch it when size if none of 1,2,4,8 ?
>>
> 
> Not yet. Perhaps I should wrap patch_text_data in a macro that checks
> the size with BUILD_BUG_ON? I'd rather not check at runtime.

Not necessarily a macro. WOuld be better as a static __always_inline in 
code_patching.h

> 
>>> +
>>> +   asm ("dcbst 0, %0; sync" :: "r" (dest));
>>
>> Maybe write it in C:
>>
>>  dcbst(dest);
>>  mb(); /* sync */
>>
>>> +
>>> +   if (is_exec)
>>> +   asm ("icbi 0,%0; sync; isync" :: "r" (exec_addr));
>>
>> Same, can be:
>>
>>  if (is_exec) {
>>  icbi(exec_addr);
>>  mb(); /* sync */
>>  isync();
>>  }
>>
>> Or keep it flat:
>>
>>  if (!is_exec)
>>  return 0;
>>
>>  icbi(exec_addr);
>>  mb(); /* sync */
>>  isync();
>>
>>  return 0;
> 
> Will try this.
> 
>>> +static int do_patch_text(void *dest, const void *src, size_t size,
>>> bool is_exec)
>>> +{
>>> +   int err;
>>> +   pte_t *pte;
>>> +   u32 *patch_addr;
>>> +
>>> +   pte = start_text_patch(dest, &patch_addr);
>>> +   err = __patch_text(patch_addr, src, size, is_exec, dest);
>>> +   finish_text_patch(pte);
>>
>> Why do you need to split this function in three parts ? I can't see
>> the
>> added value, all it does is reduce readability.
> 
> It made it more readable to me, so the __patch_text didn't get buried.
> It also made it easier to do the refactoring, and potentially add code
> patching variants that use the poke area but not __patch_text. I'll
> remove it for v2 though given this is the only use right now.
> 
>> Did you check the impact of calling __this_cpu_read() twice ?
> 
> I wasn't concerned about performance, but given I'll merge it back
> again it will only be read once in v2 again.
> 
>>> +void *patch_memory(void *dest, const void *src, size_t size)
>>
>> What is this function used for ?
>>
> 
> Build failure apparently :)
> 
> It's removed in v2.
>>

Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask

2022-09-19 Thread David Hildenbrand

On 19.09.22 04:13, Mike Kravetz wrote:

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
 if (p?d_huge())
 page = follow_huge_p?d();
the second check is of the form:
 if (is_hugepd())
 page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 
---


Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [Intel-wired-lan] [PATCH v2 3/3] net: ethernet: move from strlcpy with unused retval to strscpy

2022-09-19 Thread naamax.meir

On 8/30/2022 23:14, Wolfram Sang wrote:

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: 
https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=v6a6g1ouzcprm...@mail.gmail.com/
Signed-off-by: Wolfram Sang 
Reviewed-by: Petr Machata  # For 
drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand  # For ps3_gelic_net and 
spider_net_ethtool
Acked-by: Tom Lendacky  # For 
drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas  # For 
drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky  # For 
drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin  # For 
drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa  # For IXP4xx Ethernet
---

Changes since v1:
* split into smaller patches
* added given tags

  drivers/net/ethernet/3com/3c509.c|  2 +-
  drivers/net/ethernet/3com/3c515.c|  2 +-
  drivers/net/ethernet/3com/3c589_cs.c |  2 +-
  drivers/net/ethernet/3com/3c59x.c|  6 +++---
  drivers/net/ethernet/3com/typhoon.c  |  8 
  drivers/net/ethernet/8390/ax88796.c  |  6 +++---
  drivers/net/ethernet/8390/etherh.c   |  6 +++---
  drivers/net/ethernet/adaptec/starfire.c  |  4 ++--
  drivers/net/ethernet/aeroflex/greth.c|  4 ++--
  drivers/net/ethernet/agere/et131x.c  |  4 ++--
  drivers/net/ethernet/alacritech/slicoss.c|  4 ++--
  drivers/net/ethernet/allwinner/sun4i-emac.c  |  4 ++--
  drivers/net/ethernet/alteon/acenic.c |  4 ++--
  drivers/net/ethernet/amazon/ena/ena_ethtool.c|  4 ++--
  drivers/net/ethernet/amazon/ena/ena_netdev.c |  2 +-
  drivers/net/ethernet/amd/amd8111e.c  |  4 ++--
  drivers/net/ethernet/amd/au1000_eth.c|  2 +-
  drivers/net/ethernet/amd/nmclan_cs.c |  2 +-
  drivers/net/ethernet/amd/pcnet32.c   |  4 ++--
  drivers/net/ethernet/amd/sunlance.c  |  2 +-
  drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 ++--
  .../net/ethernet/aquantia/atlantic/aq_ethtool.c  |  2 +-
  drivers/net/ethernet/arc/emac_main.c |  2 +-
  drivers/net/ethernet/atheros/ag71xx.c|  4 ++--
  .../net/ethernet/atheros/atl1c/atl1c_ethtool.c   |  4 ++--
  .../net/ethernet/atheros/atl1e/atl1e_ethtool.c   |  6 +++---
  drivers/net/ethernet/atheros/atlx/atl1.c |  4 ++--
  drivers/net/ethernet/atheros/atlx/atl2.c |  6 +++---
  drivers/net/ethernet/broadcom/b44.c  |  6 +++---
  drivers/net/ethernet/broadcom/bcm63xx_enet.c |  4 ++--
  drivers/net/ethernet/broadcom/bcmsysport.c   |  4 ++--
  drivers/net/ethernet/broadcom/bgmac.c|  6 +++---
  drivers/net/ethernet/broadcom/bnx2.c |  6 +++---
  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |  2 +-
  .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c  |  6 +++---
  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  2 +-
  .../net/ethernet/broadcom/bnx2x/bnx2x_sriov.h|  2 +-
  drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c |  2 +-
  .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c|  8 
  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c|  2 +-
  drivers/net/ethernet/broadcom/genet/bcmgenet.c   |  2 +-
  drivers/net/ethernet/broadcom/tg3.c  |  6 +++---
  drivers/net/ethernet/brocade/bna/bnad_ethtool.c  |  6 +++---
  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c |  2 +-
  .../net/ethernet/cavium/thunder/nicvf_ethtool.c  |  4 ++--
  drivers/net/ethernet/chelsio/cxgb/cxgb2.c|  4 ++--
  drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c  |  4 ++--
  .../net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c   |  4 ++--
  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  4 ++--
  .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c  |  4 ++--
  .../chelsio/inline_crypto/chtls/chtls_main.c |  2 +-
  drivers/net/ethernet/cirrus/ep93xx_eth.c |  2 +-
  drivers/net/ethernet/cisco/enic/enic_ethtool.c   |  6 +++---
  drivers/net/ethernet/davicom/dm9000.c|  4 ++--
  drivers/net/ethernet/dec/tulip/de2104x.c |  4 ++--
  drivers/net/ethernet/dec/tulip/dmfe.c|  4 ++--
  drivers/net/ethernet/dec/tulip/tulip_core.c  |  4 ++--
  drivers/net/ethernet/dec/tulip/uli526x.c |  4 ++--
  drivers/net/ethernet/dec/tulip/winbond-840.c |  4 ++--
  drivers/net/ethernet/dlink/dl2k.c|  4 ++--
  drivers/net/ethernet/dlink/sundance.c|  4 ++--
  drivers/net/ethernet/dnet.c  |  4 ++--
  drivers/net/ethernet/emulex/benet/be_cmds.c  | 12 ++--
  drivers/net/ethernet/emulex/benet/be_ethtool.c   |  6 +++---
  drivers/net/ethernet/faraday/ftgmac100.c |  4 ++--
  drivers/net/ethernet/faraday/ftmac100.c  |  4 ++--
  drivers/net/ethernet/fealnx.c|  4 ++--
  .../net/ethernet/fr

Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

2022-09-19 Thread Olof Johansson
On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
 wrote:
>
> On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > These patches are on top of driver-core-next.
> >
> > Even if stdout-path isn't set in DT, this patch should take console
> > probe times back to how they were before the deferred_probe_timeout
> > clean up series[1].
>
> Now dropped from my queue due to lack of a response to other reviewer's
> questions.

What happened to this patch? I have a 10 second timeout on console
probe on my SiFive Unmatched, and I don't see this flag being set for
the serial driver. In fact, I don't see it anywhere in-tree. I can't
seem to locate another patchset from Saravana around this though, so
I'm not sure where to look for a missing piece for the sifive serial
driver.

This is the second boot time regression (this one not fatal, unlike
the Layerscape PCIe one) from the fw_devlink patchset.

Greg, can you revert the whole set for 6.0, please? It's obviously
nowhere near tested enough to go in and I expect we'll see a bunch of
-stable fixups due to this if we let it remain in.

This seems to be one of the worst releases I've encountered in recent
years on my hardware here due to this patchset. :-(


-Olof


Re: [PACTH v2] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-19 Thread Ganesh

On 9/7/22 09:49, Nicholas Piggin wrote:


On Mon Sep 5, 2022 at 4:38 PM AEST, Ganesh Goudar wrote:

Part of machine check error handling is done in realmode,
As of now instrumentation is not possible for any code that
runs in realmode.
When MCE is injected on KASAN enabled kernel, crash is
observed, Hence force inline or mark no instrumentation
for functions which can run in realmode, to avoid KASAN
instrumentation.

Signed-off-by: Ganesh Goudar
---
v2: Force inline few more functions.
---
  arch/powerpc/include/asm/hw_irq.h| 8 
  arch/powerpc/include/asm/interrupt.h | 2 +-
  arch/powerpc/include/asm/rtas.h  | 4 ++--
  arch/powerpc/kernel/rtas.c   | 4 ++--
  4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 26ede09c521d..3264991fe524 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -111,7 +111,7 @@ static inline void __hard_RI_enable(void)
  #ifdef CONFIG_PPC64
  #include 
  
-static inline notrace unsigned long irq_soft_mask_return(void)

+static __always_inline notrace unsigned long irq_soft_mask_return(void)
  {
return READ_ONCE(local_paca->irq_soft_mask);
  }
@@ -121,7 +121,7 @@ static inline notrace unsigned long 
irq_soft_mask_return(void)
   * for the critical section and as a clobber because
   * we changed paca->irq_soft_mask
   */
-static inline notrace void irq_soft_mask_set(unsigned long mask)
+static __always_inline notrace void irq_soft_mask_set(unsigned long mask)
  {
/*
 * The irq mask must always include the STD bit if any are set.

This doesn't give a reason why it's __always_inline, and having the
notrace attribute makes it possibly confusing. I think it would be easy
for someone to break without realising. Could you add a noinstr to these
instead / as well?


Yeah we can add noinstr. Missed to see your comment, Sorry for the delayed reply



What about adding a 'realmode' function annotation that includes noinstr?


You mean to define a new function annotation?



[Bug 216504] New: no audio on iBook G4 (powerbook6,5)

2022-09-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216504

Bug ID: 216504
   Summary: no audio on iBook G4 (powerbook6,5)
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.19
  Hardware: PPC-32
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-32
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: thieso...@me.com
Regression: No

Created attachment 301831
  --> https://bugzilla.kernel.org/attachment.cgi?id=301831&action=edit
Patch to fix audio issue

The audio on my iBook G4 (powerbook65) has not been working. After some time of
googling I have found out that there has been a patch posted to the debian
mailing lists here:
https://lists.debian.org/debian-powerpc/2013/09/msg00031.html
Unfortunately, this has never made it upstream.
I have adapted the patch in the link to the current state of the kernel and can
confirm that audio now works (even though I had to play with alsamixer a few).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks

2022-09-19 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 16/09/2022 à 07:05, Samuel Holland a écrit :
>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible
>> to switch away from a task inside copy_{from,to}_user. This left the CPU
>> with userspace access enabled until after the next IRQ or privilege
>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when
>> switching back to the original task, the userspace access would fault:
>
> This is not supposed to happen. You never switch away from a task 
> magically. Task switch will always happen in an interrupt, that means 
> copy_{from,to}_user() get interrupted.

Unfortunately this isn't true when CONFIG_PREEMPT=y.

We can switch away without an interrupt via:
  __copy_tofrom_user()
-> __copy_tofrom_user_power7()
   -> exit_vmx_usercopy()
  -> preempt_enable()
 -> __preempt_schedule()
-> preempt_schedule()
   -> preempt_schedule_common()
  -> __schedule()

I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are
all on Power8, which is a bit of an oversight on my part.

And clearly no one else tests it, until now :)

I think the root of our problem is that our KUAP lock/unlock is at too
high a level, ie. we do it in C around the low-level copy to/from.

eg:

static inline unsigned long
raw_copy_to_user(void __user *to, const void *from, unsigned long n)
{
unsigned long ret;

allow_write_to_user(to, n);
ret = __copy_tofrom_user(to, (__force const void __user *)from, n);
prevent_write_to_user(to, n);
return ret;
}

There's a reason we did that, which is that we have various different
KUAP methods on different platforms, not a simple instruction like other
arches.

But that means we have that exit_vmx_usercopy() being called deep in the
guts of __copy_tofrom_user(), with KUAP disabled, and then we call into
the preempt machinery and eventually schedule.

I don't see an easy way to fix that "properly", it would be a big change
to all platforms to push the KUAP save/restore down into the low level
asm code.

But I think the patch below does fix it, although it abuses things a
little. Namely it only works because the 64s KUAP code can handle a
double call to prevent, and doesn't need the addresses or size for the
allow.

Still I think it might be our best option for an easy fix.

Samuel, can you try this on your system and check it works for you?

cheers


diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 97a77b37daa3..c50080c6a136 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -432,6 +432,7 @@ int speround_handler(struct pt_regs *regs);
 /* VMX copying */
 int enter_vmx_usercopy(void);
 int exit_vmx_usercopy(void);
+void exit_vmx_usercopy_continue(void);
 int enter_vmx_ops(void);
 void *exit_vmx_ops(void *dest);
 
diff --git a/arch/powerpc/lib/copyuser_power7.S 
b/arch/powerpc/lib/copyuser_power7.S
index 28f0be523c06..77804860383c 100644
--- a/arch/powerpc/lib/copyuser_power7.S
+++ b/arch/powerpc/lib/copyuser_power7.S
@@ -47,7 +47,7 @@
ld  r15,STK_REG(R15)(r1)
ld  r14,STK_REG(R14)(r1)
 .Ldo_err3:
-   bl  exit_vmx_usercopy
+   bl  exit_vmx_usercopy_continue
ld  r0,STACKFRAMESIZE+16(r1)
mtlrr0
b   .Lexit
diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c
index f76a50291fd7..78a18b8384ff 100644
--- a/arch/powerpc/lib/vmx-helper.c
+++ b/arch/powerpc/lib/vmx-helper.c
@@ -8,6 +8,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 
 int enter_vmx_usercopy(void)
@@ -34,12 +35,19 @@ int enter_vmx_usercopy(void)
  */
 int exit_vmx_usercopy(void)
 {
+   prevent_user_access(KUAP_READ_WRITE);
disable_kernel_altivec();
pagefault_enable();
preempt_enable();
return 0;
 }
 
+void exit_vmx_usercopy_continue(void)
+{
+   exit_vmx_usercopy();
+   allow_read_write_user(NULL, NULL, 0);
+}
+
 int enter_vmx_ops(void)
 {
if (in_interrupt())



Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks

2022-09-19 Thread Christophe Leroy


Le 19/09/2022 à 14:37, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> Le 16/09/2022 à 07:05, Samuel Holland a écrit :
>>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible
>>> to switch away from a task inside copy_{from,to}_user. This left the CPU
>>> with userspace access enabled until after the next IRQ or privilege
>>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when
>>> switching back to the original task, the userspace access would fault:
>>
>> This is not supposed to happen. You never switch away from a task
>> magically. Task switch will always happen in an interrupt, that means
>> copy_{from,to}_user() get interrupted.
> 
> Unfortunately this isn't true when CONFIG_PREEMPT=y.

Argh, yes, I wrote the above with the assumption that we properly follow 
the main principles that no complex fonction is to be used while KUAP is 
open ... Which is apparently not true here. x86 would have detected it 
with objtool, but we don't have it yet in powerpc.

> 
> We can switch away without an interrupt via:
>__copy_tofrom_user()
>  -> __copy_tofrom_user_power7()
> -> exit_vmx_usercopy()
>-> preempt_enable()
>   -> __preempt_schedule()
>  -> preempt_schedule()
> -> preempt_schedule_common()
>-> __schedule()


Should we use preempt_enable_no_resched() to avoid that ?


> 
> I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are
> all on Power8, which is a bit of an oversight on my part.
> 
> And clearly no one else tests it, until now :)
> 
> I think the root of our problem is that our KUAP lock/unlock is at too
> high a level, ie. we do it in C around the low-level copy to/from.
> 
> eg:
> 
> static inline unsigned long
> raw_copy_to_user(void __user *to, const void *from, unsigned long n)
> {
>   unsigned long ret;
> 
>   allow_write_to_user(to, n);
>   ret = __copy_tofrom_user(to, (__force const void __user *)from, n);
>   prevent_write_to_user(to, n);
>   return ret;
> }
> 
> There's a reason we did that, which is that we have various different
> KUAP methods on different platforms, not a simple instruction like other
> arches.
> 
> But that means we have that exit_vmx_usercopy() being called deep in the
> guts of __copy_tofrom_user(), with KUAP disabled, and then we call into
> the preempt machinery and eventually schedule.
> 
> I don't see an easy way to fix that "properly", it would be a big change
> to all platforms to push the KUAP save/restore down into the low level
> asm code.
> 
> But I think the patch below does fix it, although it abuses things a
> little. Namely it only works because the 64s KUAP code can handle a
> double call to prevent, and doesn't need the addresses or size for the
> allow.
> 
> Still I think it might be our best option for an easy fix.

Wouldn't it be even easier and less abusive to use 
preemt_enable_no_resched() ? Or is there definitively a good reason to 
resched after a VMX copy while we don't with regular copies ?

Christophe

Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-19 Thread Nathan Lynch
Nathan Lynch  writes:
> "Nicholas Piggin"  writes:
>> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>>> > Leonardo Brás  writes:
>>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>>> > > > Leonardo Brás  writes:
>>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
>>> > > > > > thought
>>> > > > > > I had confirmed -- with PowerVM partition firmware development 
>>> > > > > > that
>>> > > > > > the following RTAS functions:
>>> > > > > > 
>>> > > > > > - ibm,get-xive
>>> > > > > > - ibm,int-off
>>> > > > > > - ibm,int-on
>>> > > > > > - ibm,set-xive
>>> > > > > > 
>>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary 
>>> > > > > > other
>>> > > > > > RTAS calls:
>>> > > > > > 
>>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
>>> > > > > > 
>>> > > > > > Recent discussion with firmware development makes it clear that 
>>> > > > > > this
>>> > > > > > is not true, and that the code in commit b664db8e3f97 
>>> > > > > > ("powerpc/rtas:
>>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining 
>>> > > > > > several
>>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal 
>>> > > > > > state
>>> > > > > > can be corrupted by the concurrent use of the "reentrant" 
>>> > > > > > functions,
>>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>>> > > > > 
>>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>>> > > > 
>>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>>> > > > property described in the spec applies only to the individual RTAS
>>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
>>> > > > CPUs
>>> > > > simultaneously, but it must adhere to the more general requirement to
>>> > > > serialize with other RTAS functions.
>>> > > > 
>>> > > 
>>> > > I see. Thanks for explaining that part!
>>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>>> > > previously thought.
>>> > > 
>>> > > OTOH, I think that instead of reverting the change, we could make use 
>>> > > of the
>>> > > correct information and fix the current implementation. (This could 
>>> > > help when we
>>> > > do the same rtas call in multiple cpus)
>>> > 
>>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>>> > that. I'm not seeing the need.
>>> > 
>>> > > I have an idea of a patch to fix this. 
>>> > > Do you think it would be ok if I sent that, to prospect being an 
>>> > > alternative to
>>> > > this reversion?
>>> > 
>>> > It is my preference, and I believe it is more common, to revert to the
>>> > well-understood prior state, imperfect as it may be. The revert can be
>>> > backported to -stable and distros while development and review of
>>> > another approach proceeds.
>>>
>>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>>
>>> FWIW:
>>> Reviewed-by: Leonardo Bras 
>>
>> A shame. I guess a reader/writer lock would not be much help because
>> the crash is probably more likely to hit longer running rtas calls?
>>
>> Alternative is just cheat and do this...?

[...]

>
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.

Regardless, I request that we proceed with the revert while the crash
path hardening gets sorted out. If I understand the motivation behind
commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
then it looks like it was incomplete anyway? rtas_os_term() still takes
the lock when calling ibm,os-term.


[RFC PATCH 0/7] powerpc: first hack at pcrel addressing

2022-09-19 Thread Nicholas Piggin
pcrel surprisingly didn't take much to get working, at least if
we ignore the hard bits (modules, ftrace, kprobes...). I'd like
to get it merged so we can incrementally fix the missing
bits. The series is functional but not quite polished, so this
is a good point to see if people agree with the approach.

Aside from polishing, the major bit missing before merge is Kconfig
detection of compiler pcrel feature.

Thanks,
Nick

Nicholas Piggin (7):
  powerpc: use 16-bit immediate for STACK_FRAME_REGS_MARKER
  powerpc/64: abstract asm global variable declaration and access
  powerpc/64: provide a helper macro to load r2 with the kernel TOC
  powerpc: add CFUNC assembly label annotation
  powerpc/64s: update generic cpu option name and compiler flags
  powerpc/64s: POWER10 CPU Kconfig build option
  powerpc/64s: Add option to build vmlinux with pcrel addressing

 arch/powerpc/Makefile |  22 ++-
 arch/powerpc/boot/opal-calls.S|   6 +-
 arch/powerpc/boot/ppc_asm.h   |   4 +
 arch/powerpc/include/asm/atomic.h |  20 ++-
 arch/powerpc/include/asm/io.h |  36 
 arch/powerpc/include/asm/ppc_asm.h| 157 +-
 arch/powerpc/include/asm/ptrace.h |   6 +-
 arch/powerpc/include/asm/uaccess.h|  22 +++
 arch/powerpc/kernel/entry_32.S|   9 +-
 arch/powerpc/kernel/exceptions-64e.S  |  12 +-
 arch/powerpc/kernel/exceptions-64s.S  | 116 ++---
 arch/powerpc/kernel/head_32.h |   3 +-
 arch/powerpc/kernel/head_64.S |  58 +--
 arch/powerpc/kernel/head_booke.h  |   3 +-
 arch/powerpc/kernel/interrupt_64.S|  56 +++
 arch/powerpc/kernel/irq.c |   4 +
 arch/powerpc/kernel/misc_64.S |   2 +-
 arch/powerpc/kernel/optprobes_head.S  |   2 +-
 arch/powerpc/kernel/swsusp_asm64.S|  22 +--
 arch/powerpc/kernel/trace/ftrace_mprofile.S   |   7 +-
 arch/powerpc/kernel/vdso/gettimeofday.S   |   2 +-
 arch/powerpc/kernel/vector.S  |  41 ++---
 arch/powerpc/kernel/vmlinux.lds.S |   6 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  16 +-
 arch/powerpc/lib/copypage_64.S|  11 +-
 arch/powerpc/lib/copypage_power7.S|   4 +-
 arch/powerpc/lib/copyuser_power7.S|   8 +-
 arch/powerpc/lib/hweight_64.S |   8 +-
 arch/powerpc/lib/memcmp_64.S  |   4 +-
 arch/powerpc/lib/memcpy_power7.S  |   6 +-
 arch/powerpc/lib/string_64.S  |   9 +-
 arch/powerpc/perf/bhrb.S  |   2 +-
 arch/powerpc/platforms/Kconfig.cputype|  30 +++-
 .../powerpc/platforms/powernv/opal-wrappers.S |   2 +-
 arch/powerpc/platforms/pseries/hvCall.S   |  14 +-
 arch/powerpc/xmon/spr_access.S|   4 +-
 36 files changed, 502 insertions(+), 232 deletions(-)

-- 
2.37.2



[RFC PATCH 1/7] powerpc: use 16-bit immediate for STACK_FRAME_REGS_MARKER

2022-09-19 Thread Nicholas Piggin
Using a 16-bit constant for this marker allows it to be loaded with
a single 'li' instruction. On 64-bit this avoids a TOC entry and a
TOC load that depends on the r2 value that has just been loaded from
the PACA.

XXX: this probably should be 64-bit change and use 2 instruction
sequence that 32-bit uses, to avoid false positives.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h| 6 +++---
 arch/powerpc/kernel/entry_32.S   | 9 -
 arch/powerpc/kernel/exceptions-64e.S | 8 +---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 arch/powerpc/kernel/head_32.h| 3 +--
 arch/powerpc/kernel/head_64.S| 7 ---
 arch/powerpc/kernel/head_booke.h | 3 +--
 arch/powerpc/kernel/interrupt_64.S   | 6 +++---
 8 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index a03403695cd4..f47066f7878e 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -115,10 +115,10 @@ struct pt_regs
 
 #define STACK_FRAME_OVERHEAD   112 /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE2   /* Location of LR in stack frame */
-#define STACK_FRAME_REGS_MARKERASM_CONST(0x7265677368657265)
+#define STACK_FRAME_REGS_MARKERASM_CONST(0xdead)
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + \
 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
-#define STACK_FRAME_MARKER 12
+#define STACK_FRAME_MARKER 1   /* Reuse CR+reserved word */
 
 #ifdef CONFIG_PPC64_ELF_ABI_V2
 #define STACK_FRAME_MIN_SIZE   32
@@ -136,7 +136,7 @@ struct pt_regs
 #define KERNEL_REDZONE_SIZE0
 #define STACK_FRAME_OVERHEAD   16  /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE1   /* Location of LR in stack frame */
-#define STACK_FRAME_REGS_MARKERASM_CONST(0x72656773)
+#define STACK_FRAME_REGS_MARKERASM_CONST(0xba51)
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD)
 #define STACK_FRAME_MARKER 2
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1d599df6f169..c221e764cefd 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -108,9 +108,8 @@ transfer_to_syscall:
 #ifdef CONFIG_BOOKE_OR_40x
rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
 #endif
-   lis r12,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
+   li  r12,STACK_FRAME_REGS_MARKER /* exception frame marker */
SAVE_GPR(2, r1)
-   addir12,r12,STACK_FRAME_REGS_MARKER@l
stw r9,_MSR(r1)
li  r2, INTERRUPT_SYSCALL
stw r12,8(r1)
@@ -265,7 +264,7 @@ fast_exception_return:
mtcrr10
lwz r10,_LINK(r11)
mtlrr10
-   /* Clear the exception_marker on the stack to avoid confusing 
stacktrace */
+   /* Clear the STACK_FRAME_REGS_MARKER on the stack to avoid confusing 
stacktrace */
li  r10, 0
stw r10, 8(r11)
REST_GPR(10, r11)
@@ -322,7 +321,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
li  r0,0
 
/*
-* Leaving a stale exception_marker on the stack can confuse
+* Leaving a stale STACK_FRAME_REGS_MARKER on the stack can confuse
 * the reliable stack unwinder later on. Clear it.
 */
stw r0,8(r1)
@@ -374,7 +373,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
mtspr   SPRN_XER,r5
 
/*
-* Leaving a stale exception_marker on the stack can confuse
+* Leaving a stale STACK_FRAME_REGS_MARKER on the stack can confuse
 * the reliable stack unwinder later on. Clear it.
 */
stw r0,8(r1)
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 67dc4e3179a0..08b7d6bd4da6 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -389,7 +389,7 @@ exc_##n##_common:   
\
ld  r9,excf+EX_R1(r13); /* load orig r1 back from PACA */   \
lwz r10,excf+EX_CR(r13);/* load orig CR back from PACA  */  \
lbz r11,PACAIRQSOFTMASK(r13); /* get current IRQ softe */   \
-   ld  r12,exception_marker@toc(r2);   \
+   li  r12,STACK_FRAME_REGS_MARKER;\
li  r0,0;   \
std r3,GPR10(r1);   /* save r10 to stackframe */\
std r4,GPR11(r1);   /* save r11 to stackframe */\
@@ -470,12 +470,6 @@ exc_##n##_bad_stack:   
\
bl  hdlr;  

[RFC PATCH 2/7] powerpc/64: abstract asm global variable declaration and access

2022-09-19 Thread Nicholas Piggin
Use asm helpers to access global variables and to define them in asm.
Stop using got addressing and use the more common @toc offsets. 32-bit
already does this so that should be unchanged.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/boot/opal-calls.S  |  6 +-
 arch/powerpc/boot/ppc_asm.h |  4 ++
 arch/powerpc/include/asm/ppc_asm.h  | 72 -
 arch/powerpc/kernel/interrupt_64.S  | 10 ---
 arch/powerpc/kernel/swsusp_asm64.S  | 22 ---
 arch/powerpc/kernel/trace/ftrace_mprofile.S |  3 +-
 arch/powerpc/kernel/vector.S| 41 
 arch/powerpc/lib/copypage_64.S  |  9 ++-
 arch/powerpc/lib/string_64.S|  9 ++-
 arch/powerpc/perf/bhrb.S|  2 +-
 arch/powerpc/platforms/pseries/hvCall.S | 10 +--
 arch/powerpc/xmon/spr_access.S  |  4 +-
 12 files changed, 118 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/boot/opal-calls.S b/arch/powerpc/boot/opal-calls.S
index ad0e15d930c4..1f2f330a459e 100644
--- a/arch/powerpc/boot/opal-calls.S
+++ b/arch/powerpc/boot/opal-calls.S
@@ -16,7 +16,7 @@ opal_kentry:
li  r5, 0
li  r6, 0
li  r7, 0
-   ld  r11,opal@got(r2)
+   LOAD_REG_ADDR(r11, opal)
ld  r8,0(r11)
ld  r9,8(r11)
bctr
@@ -35,7 +35,7 @@ opal_call:
mr  r13,r2
 
/* Set opal return address */
-   ld  r11,opal_return@got(r2)
+   LOAD_REG_ADDR(r11, opal_return)
mtlrr11
mfmsr   r12
 
@@ -45,7 +45,7 @@ opal_call:
mtspr   SPRN_HSRR1,r12
 
/* load the opal call entry point and base */
-   ld  r11,opal@got(r2)
+   LOAD_REG_ADDR(r11, opal)
ld  r12,8(r11)
ld  r2,0(r11)
mtspr   SPRN_HSRR0,r12
diff --git a/arch/powerpc/boot/ppc_asm.h b/arch/powerpc/boot/ppc_asm.h
index 192b97523b05..ea290bf78fb2 100644
--- a/arch/powerpc/boot/ppc_asm.h
+++ b/arch/powerpc/boot/ppc_asm.h
@@ -84,4 +84,8 @@
 #define MFTBU(dest)mfspr dest, SPRN_TBRU
 #endif
 
+#define LOAD_REG_ADDR(reg,name)\
+   addis   reg,r2,name@toc@ha; \
+   addireg,reg,name@toc@l
+
 #endif /* _PPC64_PPC_ASM_H */
diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 83c02f5a7f2a..520c4c9caf7f 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -303,6 +303,75 @@ GLUE(.,name):
.endif
 .endm
 
+.macro declare_var name, align
+   .type \name,@object
+#  .section ".toc","aw"
+   .data
+   .balign \align
+\name\():
+.endm
+
+.macro declare_var_end
+   .previous
+.endm
+
+.macro load_var_addr reg, name
+   addis   \reg,%r2,\name\()@toc@ha
+   addi\reg,\reg,\name\()@toc@l
+.endm
+
+.macro load_var reg, name, size
+   addis   \reg,%r2,\name\()@toc@ha
+   .if \size == 1
+   lbz \reg,\name\()@toc@l(\reg)
+   .elseif \size == 2
+   lhz \reg,\name\()@toc@l(\reg)
+   .elseif \size == 4
+   lwz \reg,\name\()@toc@l(\reg)
+   .elseif \size == 8
+   ld  \reg,\name\()@toc@l(\reg)
+   .else
+   .error "bad size"
+   .endif
+.endm
+
+.macro store_var reg, name, size
+   addis   \reg,%r2,\name\()@toc@ha
+   .if \size == 1
+   pstb\reg,\name\()@toc@l(\reg)
+   .elseif \size == 2
+   psth\reg,\name\()@toc@l(\reg)
+   .elseif \size == 4
+   pstw\reg,\name\()@toc@l(\reg)
+   .elseif \size == 8
+   pstd\reg,\name\()@toc@l(\reg)
+   .else
+   .error "bad size"
+   .endif
+.endm
+
+.macro fload_var reg, tmpreg, name, size
+   addis   \tmpreg,%r2,\name\()@toc@ha
+   .if \size == 4
+   lfs \reg,\name\()@toc@l(\tmpreg)
+   .elseif \size == 8
+   lfd \reg,\name\()@toc@l(\tmpreg)
+   .else
+   .error "bad size"
+   .endif
+.endm
+
+.macro fstore_var reg, tmpreg, name, size
+   addis   \tmpreg,%r2,\name\()@toc@ha
+   .if \size == 4
+   stfs\reg,\name\()@toc@l(\tmpreg)
+   .elseif \size == 8
+   stfd\reg,\name\()@toc@l(\tmpreg)
+   .else
+   .error "bad size"
+   .endif
+.endm
+
 #ifdef __powerpc64__
 
 #define LOAD_REG_IMMEDIATE(reg, expr) __LOAD_REG_IMMEDIATE reg, expr
@@ -315,7 +384,8 @@ GLUE(.,name):
rldimi  reg, tmp, 32, 0
 
 #define LOAD_REG_ADDR(reg,name)\
-   ld  reg,name@got(r2)
+   addis   reg,r2,name@toc@ha; \
+   addireg,reg,name@toc@l
 
 #define LOAD_REG_ADDRBASE(reg,name)LOAD_REG_ADDR(reg,name)
 #define ADDROFF(name)  0
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 14c409fd4c38..e95911f49eb8 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -13,16 +13,6 @@
 #include 
 #include 
 
-   .sectio

[RFC PATCH 3/7] powerpc/64: provide a helper macro to load r2 with the kernel TOC

2022-09-19 Thread Nicholas Piggin
A later change stops the kernel using r2 and loads it with a poison
value.  Provide a PACATOC loading abstraction which can hide this
detail.

XXX: 64e, KVM, ftrace not entirely done

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc_asm.h |  3 +++
 arch/powerpc/kernel/exceptions-64e.S   |  4 ++--
 arch/powerpc/kernel/exceptions-64s.S   |  6 +++---
 arch/powerpc/kernel/head_64.S  |  4 ++--
 arch/powerpc/kernel/interrupt_64.S | 12 ++--
 arch/powerpc/kernel/optprobes_head.S   |  2 +-
 arch/powerpc/kernel/trace/ftrace_mprofile.S|  4 ++--
 arch/powerpc/platforms/powernv/opal-wrappers.S |  2 +-
 8 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 520c4c9caf7f..c0848303151c 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -374,6 +374,9 @@ GLUE(.,name):
 
 #ifdef __powerpc64__
 
+#define LOAD_PACA_TOC()\
+   ld  r2,PACATOC(r13)
+
 #define LOAD_REG_IMMEDIATE(reg, expr) __LOAD_REG_IMMEDIATE reg, expr
 
 #define LOAD_REG_IMMEDIATE_SYM(reg, tmp, expr) \
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 08b7d6bd4da6..bc76950201b6 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -382,7 +382,7 @@ exc_##n##_common:   
\
ld  r4,excf+EX_R11(r13);/* get back r11 */  \
mfspr   r5,scratch; /* get back r13 */  \
std r12,GPR12(r1);  /* save r12 in stackframe */\
-   ld  r2,PACATOC(r13);/* get kernel TOC into r2 */\
+   LOAD_PACA_TOC();/* get kernel TOC into r2 */\
mflrr6; /* save LR in stackframe */ \
mfctr   r7; /* save CTR in stackframe */\
mfspr   r8,SPRN_XER;/* save XER in stackframe */\
@@ -1073,7 +1073,7 @@ bad_stack_book3e:
std r11,0(r1)
li  r12,0
std r12,0(r11)
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
 1: addir3,r1,STACK_FRAME_OVERHEAD
bl  kernel_bad_stack
b   1b
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 5c110e5e5819..9a06f2c8e326 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -580,7 +580,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
std r2,GPR2(r1) /* save r2 in stackframe*/
SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
mflrr9  /* Get LR, later save to stack  */
-   ld  r2,PACATOC(r13) /* get kernel TOC into r2   */
+   LOAD_PACA_TOC() /* get kernel TOC into r2   */
std r9,_LINK(r1)
lbz r10,PACAIRQSOFTMASK(r13)
mfspr   r11,SPRN_XER/* save XER in stackframe   */
@@ -610,7 +610,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 .macro SEARCH_RESTART_TABLE
 #ifdef CONFIG_RELOCATABLE
mr  r12,r2
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
LOAD_REG_ADDR(r9, __start___restart_table)
LOAD_REG_ADDR(r10, __stop___restart_table)
mr  r2,r12
@@ -640,7 +640,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 .macro SEARCH_SOFT_MASK_TABLE
 #ifdef CONFIG_RELOCATABLE
mr  r12,r2
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
LOAD_REG_ADDR(r9, __start___soft_mask_table)
LOAD_REG_ADDR(r10, __stop___soft_mask_table)
mr  r2,r12
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index cac3e1b58360..80106aaf0b7a 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -841,7 +841,7 @@ __secondary_start:
  * before going into C code.
  */
 start_secondary_prolog:
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
li  r3,0
std r3,0(r1)/* Zero the stack frame pointer */
bl  start_secondary
@@ -981,7 +981,7 @@ start_here_common:
std r1,PACAKSAVE(r13)
 
/* Load the TOC (virtual address) */
-   ld  r2,PACATOC(r13)
+   LOAD_PACA_TOC()
 
/* Mark interrupts soft and hard disabled (they might be enabled
 * in the PACA when doing hotplug)
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index e95911f49eb8..6d5c105457dd 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -57,7 +57,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
std r0,GPR0(r1)
std r10,GPR1(r1)
std r2,GPR2(r1)
-   ld  r2,P

[RFC PATCH 4/7] powerpc: add CFUNC assembly label annotation

2022-09-19 Thread Nicholas Piggin
This macro is to be used in assembly where C functions are called.
pcrel addressing mode requires branches to functions with a
localentry value of 1 to have either a trailing nop or @notoc.
This macro permits the latter without changing callers.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc_asm.h  |   5 ++
 arch/powerpc/kernel/exceptions-64s.S| 108 
 arch/powerpc/kernel/head_64.S   |  12 +--
 arch/powerpc/kernel/interrupt_64.S  |  28 +++---
 arch/powerpc/kernel/misc_64.S   |   2 +-
 arch/powerpc/kernel/vdso/gettimeofday.S |   2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  16 ++--
 arch/powerpc/lib/copypage_power7.S  |   4 +-
 arch/powerpc/lib/copyuser_power7.S  |   8 +-
 arch/powerpc/lib/hweight_64.S   |   8 +-
 arch/powerpc/lib/memcmp_64.S|   4 +-
 arch/powerpc/lib/memcpy_power7.S|   6 +-
 arch/powerpc/platforms/pseries/hvCall.S |   4 +-
 13 files changed, 106 insertions(+), 101 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index c0848303151c..ab8adf2b833f 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -139,6 +139,11 @@
 
 #ifdef __KERNEL__
 
+/*
+ * Used to name C functions called from asm
+ */
+#define CFUNC(name) name
+
 /*
  * We use __powerpc64__ here because we want the compat VDSO to use the 32-bit
  * version below in the else case of the ifdef.
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9a06f2c8e326..08d322ab5980 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -997,7 +997,7 @@ EXC_COMMON_BEGIN(system_reset_common)
__GEN_COMMON_BODY system_reset
 
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  system_reset_exception
+   bl  CFUNC(system_reset_exception)
 
/* Clear MSR_RI before setting SRR0 and SRR1. */
li  r9,0
@@ -1143,7 +1143,7 @@ BEGIN_FTR_SECTION
bl  enable_machine_check
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  machine_check_early
+   bl  CFUNC(machine_check_early)
std r3,RESULT(r1)   /* Save result */
ld  r12,_MSR(r1)
 
@@ -1204,7 +1204,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 * Queue up the MCE event so that we can log it later, while
 * returning from kernel or opal call.
 */
-   bl  machine_check_queue_event
+   bl  CFUNC(machine_check_queue_event)
MACHINE_CHECK_HANDLER_WINDUP
RFI_TO_KERNEL
 
@@ -1230,7 +1230,7 @@ EXC_COMMON_BEGIN(machine_check_common)
 */
GEN_COMMON machine_check
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  machine_check_exception_async
+   bl  CFUNC(machine_check_exception_async)
b   interrupt_return_srr
 
 
@@ -1240,7 +1240,7 @@ EXC_COMMON_BEGIN(machine_check_common)
  * done. Queue the event then call the idle code to do the wake up.
  */
 EXC_COMMON_BEGIN(machine_check_idle_common)
-   bl  machine_check_queue_event
+   bl  CFUNC(machine_check_queue_event)
 
/*
 * GPR-loss wakeups are relatively straightforward, because the
@@ -1279,7 +1279,7 @@ EXC_COMMON_BEGIN(unrecoverable_mce)
 BEGIN_FTR_SECTION
li  r10,0 /* clear MSR_RI */
mtmsrd  r10,1
-   bl  disable_machine_check
+   bl  CFUNC(disable_machine_check)
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
ld  r10,PACAKMSR(r13)
li  r3,MSR_ME
@@ -1296,14 +1296,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 * the early handler which is a true NMI.
 */
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  machine_check_exception
+   bl  CFUNC(machine_check_exception)
 
/*
 * We will not reach here. Even if we did, there is no way out.
 * Call unrecoverable_exception and die.
 */
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  unrecoverable_exception
+   bl  CFUNC(unrecoverable_exception)
b   .
 
 
@@ -1358,16 +1358,16 @@ EXC_COMMON_BEGIN(data_access_common)
bne-1f
 #ifdef CONFIG_PPC_64S_HASH_MMU
 BEGIN_MMU_FTR_SECTION
-   bl  do_hash_fault
+   bl  CFUNC(do_hash_fault)
 MMU_FTR_SECTION_ELSE
-   bl  do_page_fault
+   bl  CFUNC(do_page_fault)
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 #else
-   bl  do_page_fault
+   bl  CFUNC(do_page_fault)
 #endif
b   interrupt_return_srr
 
-1: bl  do_break
+1: bl  CFUNC(do_break)
/*
 * do_break() may have changed the NV GPRS while handling a breakpoint.
 * If so, we need to restore them with their updated values.
@@ -1411,7 +1411,7 @@ EXC_COMMON_BEGIN(data_access_slb_common)
 BEGIN_MMU_FTR_SECTION
/* HPT case, do SLB 

[RFC PATCH 5/7] powerpc/64s: update generic cpu option name and compiler flags

2022-09-19 Thread Nicholas Piggin
Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
make that clear in the option name.

-mtune= before power8 is dropped because the minimum gcc version
supports power8, and tuning is made consistent between big and little
endian.

Big endian drops -mcpu=power4 in favour of power5. Effectively the
minimum compiler version means power5 was always being selected here,
so this should not change anything. 970 / G5 code generation does not
seem to have been a problem with -mcpu=power5, but it's possible we
should go back to power4 to be really safe.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Makefile  | 8 +---
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 02742facf895..471ef14f8574 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -149,11 +149,13 @@ CFLAGS-$(CONFIG_PPC32)+= $(call 
cc-option,-mno-readonly-in-sdata)
 ifdef CONFIG_PPC_BOOK3S_64
 ifdef CONFIG_CPU_LITTLE_ENDIAN
 CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8
-CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power9,-mtune=power8)
 else
-CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power7,$(call 
cc-option,-mtune=power5))
-CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mcpu=power5,-mcpu=power4)
+# -mcpu=power5 should generate 970 compatible kernel code
+CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power5
 endif
+CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power10,   \
+ $(call cc-option,-mtune=power9,   \
+ $(call cc-option,-mtune=power8)))
 else ifdef CONFIG_PPC_BOOK3E_64
 CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=powerpc64
 endif
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 5185d942b455..4bf9af6a6eb5 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -125,7 +125,7 @@ choice
  If unsure, select Generic.
 
 config GENERIC_CPU
-   bool "Generic (POWER4 and above)"
+   bool "Generic (POWER5 / PPC970 and above)"
depends on PPC_BOOK3S_64 && !CPU_LITTLE_ENDIAN
select PPC_64S_HASH_MMU
 
-- 
2.37.2



[RFC PATCH 6/7] powerpc/64s: POWER10 CPU Kconfig build option

2022-09-19 Thread Nicholas Piggin
This adds basic POWER10_CPU option, which builds with -mcpu=power10.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Makefile  | 7 ++-
 arch/powerpc/platforms/Kconfig.cputype | 8 +++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 471ef14f8574..8c233f0894ba 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -193,9 +193,14 @@ ifdef CONFIG_476FPE_ERR46
-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
 endif
 
-# No AltiVec or VSX instructions when building kernel
+# No prefix or pcrel
+KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
+KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
+
+# No AltiVec or VSX or MMA instructions when building kernel
 KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
 KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
+KBUILD_CFLAGS += $(call cc-option,-mno-mma)
 
 # No SPE instruction when building kernel
 # (We use all available options to help semi-broken compilers)
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 4bf9af6a6eb5..9d4d54fea081 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -84,7 +84,7 @@ choice
help
  There are two families of 64 bit PowerPC chips supported.
  The most common ones are the desktop and server CPUs
- (POWER5, 970, POWER5+, POWER6, POWER7, POWER8, POWER9 ...)
+ (POWER5, 970, POWER5+, POWER6, POWER7, POWER8, POWER9, ...)
 
  The other are the "embedded" processors compliant with the
  "Book 3E" variant of the architecture
@@ -171,6 +171,11 @@ config POWER9_CPU
depends on PPC_BOOK3S_64
select ARCH_HAS_FAST_MULTIPLIER
 
+config POWER10_CPU
+   bool "POWER10"
+   depends on PPC_BOOK3S_64
+   select ARCH_HAS_FAST_MULTIPLIER
+
 config E5500_CPU
bool "Freescale e5500"
depends on PPC64 && E500
@@ -239,6 +244,7 @@ config TARGET_CPU
default "power7" if POWER7_CPU
default "power8" if POWER8_CPU
default "power9" if POWER9_CPU
+   default "power10" if POWER10_CPU
default "405" if 405_CPU
default "440" if 440_CPU
default "464" if 464_CPU
-- 
2.37.2



[RFC PATCH 7/7] powerpc/64s: Add option to build vmlinux with pcrel addressing

2022-09-19 Thread Nicholas Piggin
The main trick to this is that the kernel does not pick up the
PAGE_OFFSET (0xc load address) from the absolute address TOC load in
early boot so it doesn't load all addresses with the proper PAGE_OFFSET
as a matter of course, and needs to be moved there by hand.

- Extended inline asm memory addresses grow some complexity, so this
  uses a dumb base addressing for now.

- Modules are still using TOC addressing. Building modules with prcrel
  will I think require the loader add pc relative plt stubs, and add a
  GOT section for global variable access.  Would be nice to add though.

- definetely ftrace and probes, possibly BPF and KVM have some breakage.
  I haven't looked closely yet.

- copypage_64.S has an interesting problem, prefixed instructions have
  alignment restrictions so the linker can change their size so
  difference between two local labels may no longer be constant at
  assembly time. Even aligning the prefixed instruction can't generally
  solve it. Fortunately it's only one place in the kernel so far.

This reduces kernel text size by about 6%.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Makefile  |  7 +++
 arch/powerpc/include/asm/atomic.h  | 20 +--
 arch/powerpc/include/asm/io.h  | 36 
 arch/powerpc/include/asm/ppc_asm.h | 77 ++
 arch/powerpc/include/asm/uaccess.h | 22 
 arch/powerpc/kernel/head_64.S  | 35 
 arch/powerpc/kernel/irq.c  |  4 ++
 arch/powerpc/kernel/vmlinux.lds.S  |  6 ++
 arch/powerpc/lib/copypage_64.S |  4 +-
 arch/powerpc/platforms/Kconfig.cputype | 20 +++
 10 files changed, 226 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8c233f0894ba..a33ce1cf75ce 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -107,6 +107,9 @@ LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
 LDFLAGS_vmlinux:= $(LDFLAGS_vmlinux-y)
 
 ifdef CONFIG_PPC64
+ifdef CONFIG_PPC_KERNEL_PCREL
+   KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-pcrel)
+endif
 ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
@@ -194,8 +197,12 @@ ifdef CONFIG_476FPE_ERR46
 endif
 
 # No prefix or pcrel
+ifndef CONFIG_PPC_KERNEL_PREFIXED
 KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
+endif
+ifndef CONFIG_PPC_KERNEL_PCREL
 KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
+endif
 
 # No AltiVec or VSX or MMA instructions when building kernel
 KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 486ab7889121..4124e5795872 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -27,14 +27,20 @@ static __inline__ int arch_atomic_read(const atomic_t *v)
 {
int t;
 
-   __asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m<>"(v->counter));
+   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   __asm__ __volatile__("lwz %0,0(%1)" : "=r"(t) : 
"b"(&v->counter));
+   else
+   __asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : 
"m<>"(v->counter));
 
return t;
 }
 
 static __inline__ void arch_atomic_set(atomic_t *v, int i)
 {
-   __asm__ __volatile__("stw%U0%X0 %1,%0" : "=m<>"(v->counter) : "r"(i));
+   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   __asm__ __volatile__("stw %1,0(%2)" : "=m"(v->counter) : 
"r"(i), "b"(&v->counter));
+   else
+   __asm__ __volatile__("stw%U0%X0 %1,%0" : "=m<>"(v->counter) : 
"r"(i));
 }
 
 #define ATOMIC_OP(op, asm_op, suffix, sign, ...)   \
@@ -226,14 +232,20 @@ static __inline__ s64 arch_atomic64_read(const atomic64_t 
*v)
 {
s64 t;
 
-   __asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : "m<>"(v->counter));
+   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   __asm__ __volatile__("ld %0,0(%1)" : "=r"(t) : 
"b"(&v->counter));
+   else
+   __asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : 
"m<>"(v->counter));
 
return t;
 }
 
 static __inline__ void arch_atomic64_set(atomic64_t *v, s64 i)
 {
-   __asm__ __volatile__("std%U0%X0 %1,%0" : "=m<>"(v->counter) : "r"(i));
+   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL))
+   __asm__ __volatile__("std %1,0(%2)" : "=m"(v->counter) : 
"r"(i), "b"(&v->counter));
+   else
+   __asm__ __volatile__("std%U0%X0 %1,%0" : "=m<>"(v->counter) : 
"r"(i));
 }
 
 #define ATOMIC64_OP(op, asm_op)
\
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index fc112a91d0c2..4dc95872bffc 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -97,6 +97,41 @@ extern bool isa_io_special;
  *
  */
 
+#ifdef CONFIG_PPC_KERNEL_PCREL
+#de

[PATCH v2 01/19] powerpc/Kconfig: Fix non existing CONFIG_PPC_FSL_BOOKE

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_FSL_BOOKE doesn't exist. Should be CONFIG_FSL_BOOKE.

Fixes: 49e3d8ea6248 ("powerpc/fsl_booke: Enable STRICT_KERNEL_RWX")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..cbe7bb029aec 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -828,7 +828,7 @@ config DATA_SHIFT
default 24 if STRICT_KERNEL_RWX && PPC64
range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_BOOK3S_32
range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_8xx
-   range 20 24 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_FSL_BOOKE
+   range 20 24 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
FSL_BOOKE
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 18 if (DEBUG_PAGEALLOC || KFENCE) && PPC_BOOK3S_32
default 23 if STRICT_KERNEL_RWX && PPC_8xx
-- 
2.37.1



[PATCH v2 02/19] powerpc/64e: Tie PPC_BOOK3E_64 to PPC_E500MC

2022-09-19 Thread Christophe Leroy
The only 64-bit Book3E CPUs we support require the selection
of CONFIG_PPC_E500MC.

However our Kconfig allows configurating a kernel that has 64-bit
Book3E support, but without CONFIG_PPC_E500MC enabled. Such a kernel
would never boot, it doesn't know about any CPUs.

To fix this, force CONFIG_PPC_E500MC to be selected whenever we are
building a 64-bit Book3E kernel.

And add a test to detect future situations where cpu_specs is empty.

Signed-off-by: Christophe Leroy 
---
v2: Replaced e500mc by CONFIG_PPC_E500MC in commit description.
---
 arch/powerpc/kernel/cputable.c | 2 ++
 arch/powerpc/platforms/Kconfig.cputype | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index d8e42ef750f1..2829ea537277 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2018,6 +2018,8 @@ struct cpu_spec * __init identify_cpu(unsigned long 
offset, unsigned int pvr)
struct cpu_spec *s = cpu_specs;
int i;
 
+   BUILD_BUG_ON(!ARRAY_SIZE(cpu_specs));
+
s = PTRRELOC(s);
 
for (i = 0; i < ARRAY_SIZE(cpu_specs); i++,s++) {
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 5185d942b455..19fd95a06352 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -108,6 +108,8 @@ config PPC_BOOK3S_64
 config PPC_BOOK3E_64
bool "Embedded processors"
select PPC_FSL_BOOK3E
+   select E500
+   select PPC_E500MC
select PPC_FPU # Make it a choice ?
select PPC_SMP_MUXED_IPI
select PPC_DOORBELL
-- 
2.37.1



[PATCH v2 15/19] powerpc: Remove CONFIG_PPC_BOOK3E_MMU

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_BOOK3E_MMU is redundant with CONFIG_PPC_E500.

Remove it.

Also rename mmu-book3e.h to mmu-e500.h

Signed-off-by: Christophe Leroy 
---
 .../include/asm/nohash/{mmu-book3e.h => mmu-e500.h}   | 0
 arch/powerpc/include/asm/nohash/mmu.h | 4 ++--
 arch/powerpc/kernel/cpu_setup_e500.S  | 2 +-
 arch/powerpc/kernel/entry_32.S| 2 +-
 arch/powerpc/kernel/head_booke.h  | 4 ++--
 arch/powerpc/kernel/kvm.c | 8 
 arch/powerpc/kvm/e500.h   | 2 +-
 arch/powerpc/mm/nohash/tlb.c  | 4 ++--
 arch/powerpc/mm/ptdump/Makefile   | 2 +-
 arch/powerpc/platforms/Kconfig.cputype| 4 
 10 files changed, 14 insertions(+), 18 deletions(-)
 rename arch/powerpc/include/asm/nohash/{mmu-book3e.h => mmu-e500.h} (100%)

diff --git a/arch/powerpc/include/asm/nohash/mmu-book3e.h 
b/arch/powerpc/include/asm/nohash/mmu-e500.h
similarity index 100%
rename from arch/powerpc/include/asm/nohash/mmu-book3e.h
rename to arch/powerpc/include/asm/nohash/mmu-e500.h
diff --git a/arch/powerpc/include/asm/nohash/mmu.h 
b/arch/powerpc/include/asm/nohash/mmu.h
index edc793e5f08f..e264be219fdb 100644
--- a/arch/powerpc/include/asm/nohash/mmu.h
+++ b/arch/powerpc/include/asm/nohash/mmu.h
@@ -8,9 +8,9 @@
 #elif defined(CONFIG_44x)
 /* 44x-style software loaded TLB */
 #include 
-#elif defined(CONFIG_PPC_BOOK3E_MMU)
+#elif defined(CONFIG_PPC_E500)
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
-#include 
+#include 
 #elif defined (CONFIG_PPC_8xx)
 /* Motorola/Freescale 8xx software loaded TLB */
 #include 
diff --git a/arch/powerpc/kernel/cpu_setup_e500.S 
b/arch/powerpc/kernel/cpu_setup_e500.S
index 058336079069..2ab25161b0ad 100644
--- a/arch/powerpc/kernel/cpu_setup_e500.S
+++ b/arch/powerpc/kernel/cpu_setup_e500.S
@@ -12,7 +12,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e6d5fe3a8585..2b5b0677d36c 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -488,7 +488,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
mtspr   SPRN_##exc_lvl_srr0,r9; \
mtspr   SPRN_##exc_lvl_srr1,r10;
 
-#if defined(CONFIG_PPC_BOOK3E_MMU)
+#if defined(CONFIG_PPC_E500)
 #ifdef CONFIG_PHYS_64BIT
 #defineRESTORE_MAS7
\
lwz r11,MAS7(r1);   \
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 1047dc053b47..1cb9d0f7cbf2 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -242,7 +242,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 
 
 .macro SAVE_MMU_REGS
-#ifdef CONFIG_PPC_BOOK3E_MMU
+#ifdef CONFIG_PPC_E500
mfspr   r0,SPRN_MAS0
stw r0,MAS0(r1)
mfspr   r0,SPRN_MAS1
@@ -257,7 +257,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
mfspr   r0,SPRN_MAS7
stw r0,MAS7(r1)
 #endif /* CONFIG_PHYS_64BIT */
-#endif /* CONFIG_PPC_BOOK3E_MMU */
+#endif /* CONFIG_PPC_E500 */
 #ifdef CONFIG_44x
mfspr   r0,SPRN_MMUCR
stw r0,MMUCR(r1)
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 6568823cf306..5b3c093611ba 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -455,7 +455,7 @@ static void __init kvm_check_ins(u32 *inst, u32 features)
kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt);
break;
 
-#ifdef CONFIG_PPC_BOOK3E_MMU
+#ifdef CONFIG_PPC_E500
case KVM_INST_MFSPR(SPRN_MAS0):
if (features & KVM_MAGIC_FEAT_MAS0_TO_SPRG7)
kvm_patch_ins_lwz(inst, magic_var(mas0), inst_rt);
@@ -484,7 +484,7 @@ static void __init kvm_check_ins(u32 *inst, u32 features)
if (features & KVM_MAGIC_FEAT_MAS0_TO_SPRG7)
kvm_patch_ins_lwz(inst, magic_var(mas7_3), inst_rt);
break;
-#endif /* CONFIG_PPC_BOOK3E_MMU */
+#endif /* CONFIG_PPC_E500 */
 
case KVM_INST_MFSPR(SPRN_SPRG4):
 #ifdef CONFIG_BOOKE
@@ -557,7 +557,7 @@ static void __init kvm_check_ins(u32 *inst, u32 features)
case KVM_INST_MTSPR(SPRN_DSISR):
kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt);
break;
-#ifdef CONFIG_PPC_BOOK3E_MMU
+#ifdef CONFIG_PPC_E500
case KVM_INST_MTSPR(SPRN_MAS0):
if (features & KVM_MAGIC_FEAT_MAS0_TO_SPRG7)
kvm_patch_ins_stw(inst, magic_var(mas0), inst_rt);
@@ -586,7 +586,7 @@ static void __init kvm_check_ins(u32 *inst, u32 features)
if (features & KVM_MAGIC_FEAT_MAS0_TO_SPRG7)
kvm_patch_ins_stw(inst, magic_var(mas7_3), inst_rt);

[PATCH v2 14/19] powerpc: Remove CONFIG_PPC_FSL_BOOK3E

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Remove it.

And rename five files accordingly.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/barrier.h   |  2 +-
 arch/powerpc/include/asm/hugetlb.h   |  4 ++--
 arch/powerpc/include/asm/kvm_host.h  |  2 +-
 arch/powerpc/include/asm/mmu.h   |  2 +-
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h |  2 +-
 .../nohash/{hugetlb-book3e.h => hugetlb-e500.h}  |  2 +-
 arch/powerpc/include/asm/nohash/pgtable.h|  2 +-
 .../asm/nohash/{pte-book3e.h => pte-e500.h}  |  0
 arch/powerpc/include/asm/page.h  |  2 +-
 arch/powerpc/include/asm/ppc_asm.h   |  6 +++---
 arch/powerpc/include/asm/setup.h |  2 +-
 arch/powerpc/kernel/Makefile |  2 +-
 arch/powerpc/kernel/asm-offsets.c|  4 ++--
 .../{cpu_setup_fsl_booke.S => cpu_setup_e500.S}  |  0
 arch/powerpc/kernel/head_booke.h |  2 +-
 arch/powerpc/kernel/interrupt_64.S   |  2 +-
 arch/powerpc/kernel/security.c   | 10 +-
 arch/powerpc/kernel/smp.c|  2 +-
 arch/powerpc/kernel/sysfs.c  |  6 +++---
 arch/powerpc/kernel/vmlinux.lds.S|  2 +-
 arch/powerpc/lib/feature-fixups.c|  4 ++--
 arch/powerpc/mm/hugetlbpage.c|  2 +-
 arch/powerpc/mm/mem.c|  2 +-
 arch/powerpc/mm/mmu_decl.h   |  4 ++--
 arch/powerpc/mm/nohash/Makefile  |  6 +++---
 arch/powerpc/mm/nohash/{fsl_book3e.c => e500.c}  |  0
 .../{book3e_hugetlbpage.c => e500_hugetlbpage.c} |  0
 arch/powerpc/mm/nohash/tlb.c | 16 
 arch/powerpc/mm/nohash/tlb_low.S |  2 +-
 arch/powerpc/platforms/Kconfig.cputype   | 16 
 32 files changed, 52 insertions(+), 60 deletions(-)
 rename arch/powerpc/include/asm/nohash/{hugetlb-book3e.h => hugetlb-e500.h} 
(95%)
 rename arch/powerpc/include/asm/nohash/{pte-book3e.h => pte-e500.h} (100%)
 rename arch/powerpc/kernel/{cpu_setup_fsl_booke.S => cpu_setup_e500.S} (100%)
 rename arch/powerpc/mm/nohash/{fsl_book3e.c => e500.c} (100%)
 rename arch/powerpc/mm/nohash/{book3e_hugetlbpage.c => e500_hugetlbpage.c} 
(100%)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 94a614bb1581..38f36eb4d96c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -290,7 +290,7 @@ config PPC_LONG_DOUBLE_128
 config PPC_BARRIER_NOSPEC
bool
default y
-   depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
+   depends on PPC_BOOK3S_64 || PPC_E500
 
 config EARLY_PRINTK
bool
diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index ef2d8b15eaab..e80b2c0e9315 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -86,7 +86,7 @@ do {  
\
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #define NOSPEC_BARRIER_SLOT   nop
-#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#elif defined(CONFIG_PPC_E500)
 #define NOSPEC_BARRIER_SLOT   nop; nop
 #endif
 
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 32ce0fb7548f..ea71f7245a63 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -7,8 +7,8 @@
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
-#elif defined(CONFIG_PPC_FSL_BOOK3E)
-#include 
+#elif defined(CONFIG_PPC_E500)
+#include 
 #elif defined(CONFIG_PPC_8xx)
 #include 
 #endif /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index c2b003550dc9..caea15dcb91d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -443,7 +443,7 @@ struct kvmppc_passthru_irqmap {
 };
 #endif
 
-# ifdef CONFIG_PPC_FSL_BOOK3E
+# ifdef CONFIG_PPC_E500
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
 # else
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 5b46da9ba7f6..39057320e436 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -141,7 +141,7 @@
 
 typedef pte_t *pgtable_t;
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_E500
 #include 
 DECLARE_PER_CPU(int, next_tlbcam_idx);
 #endif
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 197e7552d9f6..0d40b33184eb 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -131,7 +131,7 @@ void unmap_kernel_page(unsigned long va);
 #elif defined(CONFIG_44x)
 #include 
 #elif defined(CONFIG_PPC_85xx) && defined(CONFIG_PTE_64BIT)
-#include 
+#include 
 #elif defin

[PATCH v2 03/19] powerpc/64e: Remove unnecessary #ifdef CONFIG_PPC_FSL_BOOK3E

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_BOOK3E_64 implies CONFIG_PPC_FSL_BOOK3E so no need of
additional #ifdefs in files built exclusively for CONFIG_PPC_BOOK3E_64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/exceptions-64e.S | 8 
 arch/powerpc/mm/nohash/tlb_low_64e.S | 6 --
 2 files changed, 14 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 67dc4e3179a0..3afba070a5d8 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -291,7 +291,6 @@ ret_from_mc_except:
 #define SPRN_MC_SRR0   SPRN_MCSRR0
 #define SPRN_MC_SRR1   SPRN_MCSRR1
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
 #define GEN_BTB_FLUSH  \
START_BTB_FLUSH_SECTION \
beq 1f; \
@@ -307,13 +306,6 @@ ret_from_mc_except:
 #define DBG_BTB_FLUSH CRIT_BTB_FLUSH
 #define MC_BTB_FLUSH CRIT_BTB_FLUSH
 #define GDBELL_BTB_FLUSH GEN_BTB_FLUSH
-#else
-#define GEN_BTB_FLUSH
-#define CRIT_BTB_FLUSH
-#define DBG_BTB_FLUSH
-#define MC_BTB_FLUSH
-#define GDBELL_BTB_FLUSH
-#endif
 
 #define NORMAL_EXCEPTION_PROLOG(n, intnum, addition)   \
EXCEPTION_PROLOG(n, intnum, GEN, addition##_GEN(n))
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index 68ffbfdba894..be26f33a6ac0 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low_64e.S
@@ -61,7 +61,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
ld  r14,PACAPGD(r13)
std r15,EX_TLB_R15(r12)
std r10,EX_TLB_CR(r12)
-#ifdef CONFIG_PPC_FSL_BOOK3E
 START_BTB_FLUSH_SECTION
mfspr r11, SPRN_SRR1
andi. r10,r11,MSR_PR
@@ -70,14 +69,11 @@ START_BTB_FLUSH_SECTION
 1:
 END_BTB_FLUSH_SECTION
std r7,EX_TLB_R7(r12)
-#endif
 .endm
 
 .macro tlb_epilog_bolted
ld  r14,EX_TLB_CR(r12)
-#ifdef CONFIG_PPC_FSL_BOOK3E
ld  r7,EX_TLB_R7(r12)
-#endif
ld  r10,EX_TLB_R10(r12)
ld  r11,EX_TLB_R11(r12)
ld  r13,EX_TLB_R13(r12)
@@ -248,7 +244,6 @@ itlb_miss_fault_bolted:
beq tlb_miss_user_bolted
b   itlb_miss_kernel_bolted
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
 /*
  * TLB miss handling for e6500 and derivatives, using hardware tablewalk.
  *
@@ -515,7 +510,6 @@ dtlb_miss_fault_e6500:
 itlb_miss_fault_e6500:
tlb_epilog_bolted
b   exc_instruction_storage_book3e
-#endif /* CONFIG_PPC_FSL_BOOK3E */
 
 /**
  **
-- 
2.37.1



[PATCH v2 08/19] powerpc/cputable: Split cpu_specs[] for mpc85xx and e500mc

2022-09-19 Thread Christophe Leroy
e500v1/v2 and e500mc are said to be mutually exclusive in Kconfig.

Split e500 cpu_specs[] and then restrict the non e500mc to PPC32
which is then 85xx.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cpu_specs.h   |  6 +-
 arch/powerpc/kernel/cpu_specs_85xx.h  | 60 +++
 .../{cpu_specs_e500.h => cpu_specs_e500mc.h}  | 57 --
 3 files changed, 64 insertions(+), 59 deletions(-)
 create mode 100644 arch/powerpc/kernel/cpu_specs_85xx.h
 rename arch/powerpc/kernel/{cpu_specs_e500.h => cpu_specs_e500mc.h} (58%)

diff --git a/arch/powerpc/kernel/cpu_specs.h b/arch/powerpc/kernel/cpu_specs.h
index 3de0b70d7203..2f5168c09be1 100644
--- a/arch/powerpc/kernel/cpu_specs.h
+++ b/arch/powerpc/kernel/cpu_specs.h
@@ -14,8 +14,10 @@
 #include "cpu_specs_8xx.h"
 #endif
 
-#ifdef CONFIG_E500
-#include "cpu_specs_e500.h"
+#ifdef CONFIG_PPC_E500MC
+#include "cpu_specs_e500mc.h"
+#elif defined(CONFIG_PPC_85xx)
+#include "cpu_specs_85xx.h"
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S_32
diff --git a/arch/powerpc/kernel/cpu_specs_85xx.h 
b/arch/powerpc/kernel/cpu_specs_85xx.h
new file mode 100644
index ..f5534311cfc0
--- /dev/null
+++ b/arch/powerpc/kernel/cpu_specs_85xx.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ *  Copyright (C) 2001 Ben. Herrenschmidt (b...@kernel.crashing.org)
+ */
+
+#define COMMON_USER_BOOKE  (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \
+PPC_FEATURE_BOOKE)
+
+static struct cpu_spec __initdata cpu_specs[] = {
+   {   /* e500 */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x8020,
+   .cpu_name   = "e500",
+   .cpu_features   = CPU_FTRS_E500,
+   .cpu_user_features  = COMMON_USER_BOOKE |
+   PPC_FEATURE_HAS_SPE_COMP |
+   PPC_FEATURE_HAS_EFP_SINGLE_COMP,
+   .cpu_user_features2 = PPC_FEATURE2_ISEL,
+   .mmu_features   = MMU_FTR_TYPE_FSL_E,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .num_pmcs   = 4,
+   .cpu_setup  = __setup_cpu_e500v1,
+   .machine_check  = machine_check_e500,
+   .platform   = "ppc8540",
+   },
+   {   /* e500v2 */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x8021,
+   .cpu_name   = "e500v2",
+   .cpu_features   = CPU_FTRS_E500_2,
+   .cpu_user_features  = COMMON_USER_BOOKE |
+   PPC_FEATURE_HAS_SPE_COMP |
+   PPC_FEATURE_HAS_EFP_SINGLE_COMP |
+   PPC_FEATURE_HAS_EFP_DOUBLE_COMP,
+   .cpu_user_features2 = PPC_FEATURE2_ISEL,
+   .mmu_features   = MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .num_pmcs   = 4,
+   .cpu_setup  = __setup_cpu_e500v2,
+   .machine_check  = machine_check_e500,
+   .platform   = "ppc8548",
+   .cpu_down_flush = cpu_down_flush_e500v2,
+   },
+   {   /* default match */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x,
+   .cpu_name   = "(generic E500 PPC)",
+   .cpu_features   = CPU_FTRS_E500,
+   .cpu_user_features  = COMMON_USER_BOOKE |
+   PPC_FEATURE_HAS_SPE_COMP |
+   PPC_FEATURE_HAS_EFP_SINGLE_COMP,
+   .mmu_features   = MMU_FTR_TYPE_FSL_E,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .machine_check  = machine_check_e500,
+   .platform   = "powerpc",
+   }
+};
diff --git a/arch/powerpc/kernel/cpu_specs_e500.h 
b/arch/powerpc/kernel/cpu_specs_e500mc.h
similarity index 58%
rename from arch/powerpc/kernel/cpu_specs_e500.h
rename to arch/powerpc/kernel/cpu_specs_e500mc.h
index 92d165741efc..2f6586f04cef 100644
--- a/arch/powerpc/kernel/cpu_specs_e500.h
+++ b/arch/powerpc/kernel/cpu_specs_e500mc.h
@@ -16,44 +16,6 @@
 
 static struct cpu_spec __initdata cpu_specs[] = {
 #ifdef CONFIG_PPC32
-#ifndef CONFIG_PPC_E500MC
-   {   /* e500 */
-   .pvr_mask   = 0x,
-   .pvr_value  = 0x8020,
-   .cpu_name   = "e500",
-   .cpu_features   = CPU_FTRS_E500,
-   .cpu_user_features  = COMMON_USER_BOOKE |
-   PPC_FEATURE_HAS_SPE_COMP |
-   PPC_FEATURE_HAS_

[PATCH v2 04/19] powerpc/cputable: Remove __machine_check_early_realmode_p{7/8/9} prototypes

2022-09-19 Thread Christophe Leroy
__machine_check_early_realmode_p{7/8/9} are already in mce.h
which is included. Remove them from cputable.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cputable.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 2829ea537277..5ace97cccad8 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -64,9 +64,6 @@ extern void __setup_cpu_ppc970MP(unsigned long offset, struct 
cpu_spec* spec);
 extern void __setup_cpu_pa6t(unsigned long offset, struct cpu_spec* spec);
 extern void __restore_cpu_pa6t(void);
 extern void __restore_cpu_ppc970(void);
-extern long __machine_check_early_realmode_p7(struct pt_regs *regs);
-extern long __machine_check_early_realmode_p8(struct pt_regs *regs);
-extern long __machine_check_early_realmode_p9(struct pt_regs *regs);
 #endif /* CONFIG_PPC64 */
 #if defined(CONFIG_E500)
 extern void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
-- 
2.37.1



[PATCH v2 17/19] powerpc: Simplify redundant Kconfig tests

2022-09-19 Thread Christophe Leroy
PPC_85xx implies PPC32 so no need to check PPC32 in addition.

PPC64 && !PPC_BOOK3E_64 means PPC_BOOK3S_64.

PPC_BOOK3E_64 implies PPC_E500.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 2 +-
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 arch/powerpc/xmon/xmon.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1e6346dcb1b4..da9bd5db1643 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -606,7 +606,7 @@ config RELOCATABLE
 
 config RANDOMIZE_BASE
bool "Randomize the address of the kernel image"
-   depends on (PPC_85xx && FLATMEM && PPC32)
+   depends on PPC_85xx && FLATMEM
depends on RELOCATABLE
help
  Randomizes the virtual address at which the kernel image is
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 6a216e88423b..51059af63856 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -314,7 +314,7 @@ config 4xx
 
 config BOOKE
bool
-   depends on PPC_E500 || 44x || PPC_BOOK3E_64
+   depends on PPC_E500 || 44x
default y
 
 config BOOKE_OR_40x
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e6d678d27b0f..f51c882bf902 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -288,7 +288,7 @@ Commands:\n\
   tprint backtrace\n\
   xexit monitor and recover\n\
   Xexit monitor and don't recover\n"
-#if defined(CONFIG_PPC64) && !defined(CONFIG_PPC_BOOK3E_64)
+#if defined(CONFIG_PPC_BOOK3S_64)
 "  u   dump segment table or SLB\n"
 #elif defined(CONFIG_PPC_BOOK3S_32)
 "  u   dump segment registers\n"
-- 
2.37.1



[PATCH v2 16/19] powerpc: Replace PPC_85xx || PPC_BOOKE_64 by PPC_E500

2022-09-19 Thread Christophe Leroy
PPC_E500 is the same as PPC_85xx || PPC_BOOKE_64

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 2 +-
 arch/powerpc/platforms/85xx/Kconfig| 2 +-
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 38f36eb4d96c..1e6346dcb1b4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -548,7 +548,7 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
 
 config KEXEC
bool "kexec system call"
-   depends on (PPC_BOOK3S || PPC_85xx || (44x && !SMP)) || PPC_BOOK3E_64
+   depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
select KEXEC_CORE
help
  kexec is a system call that implements the ability to shutdown your
diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index 63fec86e41b4..b92cb2b4d54d 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 menuconfig FSL_SOC_BOOKE
bool "Freescale Book-E Machine Type"
-   depends on PPC_85xx || PPC_BOOK3E_64
+   depends on PPC_E500
select FSL_SOC
select PPC_UDBG_16550
select MPIC
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 1746d19d058f..6a216e88423b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -487,7 +487,7 @@ config FORCE_SMP
select SMP
 
 config SMP
-   depends on PPC_BOOK3S || PPC_BOOK3E_64 || PPC_85xx || PPC_47x
+   depends on PPC_BOOK3S || PPC_E500 || PPC_47x
select GENERIC_IRQ_MIGRATION
bool "Symmetric multi-processing support" if !FORCE_SMP
help
-- 
2.37.1



[PATCH v2 07/19] powerpc: Remove CONFIG_FSL_BOOKE

2022-09-19 Thread Christophe Leroy
PPC_85xx is PPC32 only.
PPC_85xx always selects E500 and is the only PPC32 that
selects E500.
FSL_BOOKE is selected when E500 and PPC32 are selected.

So FSL_BOOKE is redundant with PPC_85xx.

Remove FSL_BOOKE.

And rename four files accordingly.

cpu_setup_fsl_booke.S is not renamed because it is linked to
PPC_FSL_BOOK3E and not to FSL_BOOKE as suggested by its name.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  | 28 +--
 arch/powerpc/Makefile |  2 +-
 arch/powerpc/include/asm/kexec.h  |  2 +-
 arch/powerpc/include/asm/nohash/32/pgtable.h  |  6 ++--
 .../nohash/32/{pte-fsl-booke.h => pte-85xx.h} |  6 ++--
 arch/powerpc/include/asm/nohash/tlbflush.h|  2 +-
 ...e_entry_mapping.S => 85xx_entry_mapping.S} |  0
 arch/powerpc/kernel/Makefile  |  6 ++--
 .../kernel/{head_fsl_booke.S => head_85xx.S}  |  4 +--
 arch/powerpc/kernel/kgdb.c| 12 
 .../kernel/{swsusp_booke.S => swsusp_85xx.S}  |  0
 arch/powerpc/kernel/traps.c   |  4 +--
 arch/powerpc/kexec/core_32.c  |  2 +-
 arch/powerpc/kexec/relocate_32.S  |  4 +--
 arch/powerpc/kvm/booke_interrupts.S   |  4 +--
 arch/powerpc/mm/init_32.c |  4 +--
 arch/powerpc/mm/mmu_decl.h|  4 +--
 arch/powerpc/mm/nohash/fsl_book3e.c   |  2 +-
 arch/powerpc/mm/nohash/tlb.c  |  2 +-
 arch/powerpc/mm/nohash/tlb_low.S  |  2 +-
 arch/powerpc/platforms/Kconfig.cputype| 11 ++--
 21 files changed, 51 insertions(+), 56 deletions(-)
 rename arch/powerpc/include/asm/nohash/32/{pte-fsl-booke.h => pte-85xx.h} (94%)
 rename arch/powerpc/kernel/{fsl_booke_entry_mapping.S => 85xx_entry_mapping.S} 
(100%)
 rename arch/powerpc/kernel/{head_fsl_booke.S => head_85xx.S} (99%)
 rename arch/powerpc/kernel/{swsusp_booke.S => swsusp_85xx.S} (100%)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cbe7bb029aec..7fe522b0946b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,7 +135,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S || PPC_8xx || 
40x) && !HIBERNATION
-   select ARCH_HAS_STRICT_KERNEL_RWX   if FSL_BOOKE && !HIBERNATION && 
!RANDOMIZE_BASE
+   select ARCH_HAS_STRICT_KERNEL_RWX   if PPC_85xx && !HIBERNATION && 
!RANDOMIZE_BASE
select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
@@ -548,7 +548,7 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
 
 config KEXEC
bool "kexec system call"
-   depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) || PPC_BOOK3E
+   depends on (PPC_BOOK3S || PPC_85xx || (44x && !SMP)) || PPC_BOOK3E
select KEXEC_CORE
help
  kexec is a system call that implements the ability to shutdown your
@@ -583,7 +583,7 @@ config ARCH_HAS_KEXEC_PURGATORY
 
 config RELOCATABLE
bool "Build a relocatable kernel"
-   depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
+   depends on PPC64 || (FLATMEM && (44x || PPC_85xx))
select NONSTATIC_KERNEL
help
  This builds a kernel image that is capable of running at the
@@ -606,7 +606,7 @@ config RELOCATABLE
 
 config RANDOMIZE_BASE
bool "Randomize the address of the kernel image"
-   depends on (FSL_BOOKE && FLATMEM && PPC32)
+   depends on (PPC_85xx && FLATMEM && PPC32)
depends on RELOCATABLE
help
  Randomizes the virtual address at which the kernel image is
@@ -625,8 +625,8 @@ config RELOCATABLE_TEST
 
 config CRASH_DUMP
bool "Build a dump capture kernel"
-   depends on PPC64 || PPC_BOOK3S_32 || FSL_BOOKE || (44x && !SMP)
-   select RELOCATABLE if PPC64 || 44x || FSL_BOOKE
+   depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
+   select RELOCATABLE if PPC64 || 44x || PPC_85xx
help
  Build a kernel suitable for use as a dump capture kernel.
  The same kernel binary can be used as production kernel and dump
@@ -815,7 +815,7 @@ config DATA_SHIFT_BOOL
depends on ADVANCED_OPTIONS
depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE
depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && 
!STRICT_KERNEL_RWX) || \
-  FSL_BOOKE
+  PPC_85xx
help
  This option allows you to set the kernel data alignment. When
  RAM is mapped by blocks, the alignment needs to fit the size and
@@ -828,13 +828,13 @@ config DATA_SHIFT
default 24 if STRICT_KERNEL_RWX && PPC64
range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_BOOK3S_32
range 19 23 if

[PATCH v2 06/19] powerpc/cputable: Split cpu_specs[] out of cputable.h

2022-09-19 Thread Christophe Leroy
cpu_specs[] is full of #ifdefs depending on the different
types of CPU.

CPUs are mutually exclusive, it is therefore possible to split
cpu_specs[] into smaller more readable pieces.

Create cpu_specs_XXX.h that will each be dedicated on one
of the following mutually exclusive families:
- 40x
- 44x
- 47x
- 8xx
- e500
- book3s/32
- book3s/64

In book3s/32, the block for 603 has been moved in front in order
to not have two 604 blocks.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cpu_specs.h   |   27 +
 arch/powerpc/kernel/cpu_specs_40x.h   |  280 +++
 arch/powerpc/kernel/cpu_specs_44x.h   |  304 
 arch/powerpc/kernel/cpu_specs_47x.h   |   78 +
 arch/powerpc/kernel/cpu_specs_8xx.h   |   21 +
 arch/powerpc/kernel/cpu_specs_book3s_32.h |  607 +++
 arch/powerpc/kernel/cpu_specs_book3s_64.h |  488 ++
 arch/powerpc/kernel/cpu_specs_e500.h  |  135 ++
 arch/powerpc/kernel/cputable.c| 1877 +
 9 files changed, 1941 insertions(+), 1876 deletions(-)
 create mode 100644 arch/powerpc/kernel/cpu_specs.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_40x.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_44x.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_47x.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_8xx.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_book3s_32.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_book3s_64.h
 create mode 100644 arch/powerpc/kernel/cpu_specs_e500.h

diff --git a/arch/powerpc/kernel/cpu_specs.h b/arch/powerpc/kernel/cpu_specs.h
new file mode 100644
index ..3de0b70d7203
--- /dev/null
+++ b/arch/powerpc/kernel/cpu_specs.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifdef CONFIG_40x
+#include "cpu_specs_40x.h"
+#endif
+
+#ifdef CONFIG_47x
+#include "cpu_specs_47x.h"
+#elif defined(CONFIG_44x)
+#include "cpu_specs_44x.h"
+#endif
+
+#ifdef CONFIG_PPC_8xx
+#include "cpu_specs_8xx.h"
+#endif
+
+#ifdef CONFIG_E500
+#include "cpu_specs_e500.h"
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_32
+#include "cpu_specs_book3s_32.h"
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+#include "cpu_specs_book3s_64.h"
+#endif
diff --git a/arch/powerpc/kernel/cpu_specs_40x.h 
b/arch/powerpc/kernel/cpu_specs_40x.h
new file mode 100644
index ..3dfe6a4df49d
--- /dev/null
+++ b/arch/powerpc/kernel/cpu_specs_40x.h
@@ -0,0 +1,280 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ *  Copyright (C) 2001 Ben. Herrenschmidt (b...@kernel.crashing.org)
+ */
+
+static struct cpu_spec __initdata cpu_specs[] = {
+   {   /* STB 04xxx */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x4181,
+   .cpu_name   = "STB04xxx",
+   .cpu_features   = CPU_FTRS_40X,
+   .cpu_user_features  = PPC_FEATURE_32 |
+   PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+   .mmu_features   = MMU_FTR_TYPE_40x,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .machine_check  = machine_check_4xx,
+   .platform   = "ppc405",
+   },
+   {   /* NP405L */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x4161,
+   .cpu_name   = "NP405L",
+   .cpu_features   = CPU_FTRS_40X,
+   .cpu_user_features  = PPC_FEATURE_32 |
+   PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+   .mmu_features   = MMU_FTR_TYPE_40x,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .machine_check  = machine_check_4xx,
+   .platform   = "ppc405",
+   },
+   {   /* NP4GS3 */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x40B1,
+   .cpu_name   = "NP4GS3",
+   .cpu_features   = CPU_FTRS_40X,
+   .cpu_user_features  = PPC_FEATURE_32 |
+   PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+   .mmu_features   = MMU_FTR_TYPE_40x,
+   .icache_bsize   = 32,
+   .dcache_bsize   = 32,
+   .machine_check  = machine_check_4xx,
+   .platform   = "ppc405",
+   },
+   {   /* NP405H */
+   .pvr_mask   = 0x,
+   .pvr_value  = 0x4141,
+   .cpu_name   = "NP405H",
+   .cpu_features   = CPU_FTRS_40X,
+   .cpu_user_features  = PPC_FEATURE_32 |
+   PPC_FEATURE_HAS_MMU | PPC_FEATURE_HAS_4xxMAC,
+   .mmu_features   = MMU_FTR_TYPE_40x,
+   .icache_bsize

[PATCH v2 10/19] powerpc: Remove redundant selection of E500 and E500MC

2022-09-19 Thread Christophe Leroy
PPC_85xx and PPC_BOOK3E_64 already select E500 so no need
to select it again by PPC_QEMU_E500 and CORENET_GENERIC
as they depend on PPC_85xx || PPC_BOOK3E_64.

PPC_BOOK3E_64 already selects E500MC so no need to
select it again by PPC_QEMU_E500 if PPC64, PPC_BOOK3E_64
is the only way into PPC_QEMU_E500 with PPC64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/85xx/Kconfig | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index 069628670a0c..63fec86e41b4 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -241,8 +241,6 @@ endif # PPC32
 config PPC_QEMU_E500
bool "QEMU generic e500 platform"
select DEFAULT_UIMAGE
-   select E500
-   select PPC_E500MC if PPC64
help
  This option enables support for running as a QEMU guest using
  QEMU's generic e500 machine.  This is not required if you're
@@ -258,7 +256,6 @@ config PPC_QEMU_E500
 config CORENET_GENERIC
bool "Freescale CoreNet Generic"
select DEFAULT_UIMAGE
-   select E500
select PPC_E500MC
select PHYS_64BIT
select SWIOTLB
-- 
2.37.1



[PATCH v2 12/19] Documentation: Rename PPC_FSL_BOOK3E to PPC_E500

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Rename it so that CONFIG_PPC_FSL_BOOK3E can be removed later.

Signed-off-by: Christophe Leroy 
Cc: Jonathan Corbet 
Cc: linux-...@vger.kernel.org
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7172a91539f2..07a134d8388b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3626,7 +3626,7 @@
(bounds check bypass). With this option data leaks are
possible in the system.
 
-   nospectre_v2[X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
+   nospectre_v2[X86,PPC_E500,ARM64] Disable all mitigations for
the Spectre variant 2 (indirect branch prediction)
vulnerability. System may allow data leaks with this
option.
-- 
2.37.1



[PATCH v2 18/19] powerpc: Cleanup idle for e500

2022-09-19 Thread Christophe Leroy
e500 idle setup is a bit messy.

e500_idle() is used for PPC32 while book3e_idle() is used for PPC64.
As they are mutually exclusive, call them all e500_idle().

Use CONFIG_MPC_85xx instead of PPC32 + E500 in Makefile and rename
idle_e500.c to idle_85xx.c .

Rename idle_book3e.c to idle_64e.c and remove #ifdef PPC64 in
as it's only built on PPC64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/machdep.h| 1 -
 arch/powerpc/kernel/Makefile  | 6 ++
 arch/powerpc/kernel/{idle_book3e.S => idle_64e.S} | 8 ++--
 arch/powerpc/kernel/{idle_e500.S => idle_85xx.S}  | 0
 arch/powerpc/platforms/85xx/corenet_generic.c | 4 
 arch/powerpc/platforms/85xx/qemu_e500.c   | 4 
 6 files changed, 4 insertions(+), 19 deletions(-)
 rename arch/powerpc/kernel/{idle_book3e.S => idle_64e.S} (93%)
 rename arch/powerpc/kernel/{idle_e500.S => idle_85xx.S} (100%)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 8cb83600c434..378b8d5836a7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -204,7 +204,6 @@ struct machdep_calls {
 extern void e500_idle(void);
 extern void power4_idle(void);
 extern void ppc6xx_idle(void);
-extern void book3e_idle(void);
 
 /*
  * ppc_md contains a copy of the machine description structure for the
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 658c4dffaa56..1f121c188805 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -81,7 +81,7 @@ obj-$(CONFIG_PPC_DAWR)+= dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
-obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_book3e.o
+obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_64e.o
 obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o
 obj-$(CONFIG_PPC64)+= vdso64_wrapper.o
 obj-$(CONFIG_ALTIVEC)  += vecemu.o
@@ -100,9 +100,7 @@ obj-$(CONFIG_GENERIC_TBSYNC)+= smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump.o
 obj-$(CONFIG_FA_DUMP)  += fadump.o
 obj-$(CONFIG_PRESERVE_FA_DUMP) += fadump.o
-ifdef CONFIG_PPC32
-obj-$(CONFIG_PPC_E500) += idle_e500.o
-endif
+obj-$(CONFIG_PPC_85xx) += idle_85xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o
 obj-$(CONFIG_TAU)  += tau_6xx.o
 obj-$(CONFIG_HIBERNATION)  += swsusp.o suspend.o
diff --git a/arch/powerpc/kernel/idle_book3e.S b/arch/powerpc/kernel/idle_64e.S
similarity index 93%
rename from arch/powerpc/kernel/idle_book3e.S
rename to arch/powerpc/kernel/idle_64e.S
index cc008de58b05..1736aad2afe9 100644
--- a/arch/powerpc/kernel/idle_book3e.S
+++ b/arch/powerpc/kernel/idle_64e.S
@@ -2,7 +2,7 @@
 /*
  * Copyright 2010 IBM Corp, Benjamin Herrenschmidt 
  *
- * Generic idle routine for Book3E processors
+ * Generic idle routine for 64 bits e500 processors
  */
 
 #include 
@@ -16,8 +16,6 @@
 #include 
 
 /* 64-bit version only for now */
-#ifdef CONFIG_PPC64
-
 .macro BOOK3E_IDLE name loop
 _GLOBAL(\name)
/* Save LR for later */
@@ -98,6 +96,4 @@ epapr_ev_idle_start:
 
 BOOK3E_IDLE epapr_ev_idle EPAPR_EV_IDLE_LOOP
 
-BOOK3E_IDLE book3e_idle BOOK3E_IDLE_LOOP
-
-#endif /* CONFIG_PPC64 */
+BOOK3E_IDLE e500_idle BOOK3E_IDLE_LOOP
diff --git a/arch/powerpc/kernel/idle_e500.S b/arch/powerpc/kernel/idle_85xx.S
similarity index 100%
rename from arch/powerpc/kernel/idle_e500.S
rename to arch/powerpc/kernel/idle_85xx.S
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 28d6b36f1ccd..2c539de2d629 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -200,9 +200,5 @@ define_machine(corenet_generic) {
 #endif
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
-#ifdef CONFIG_PPC64
-   .power_save = book3e_idle,
-#else
.power_save = e500_idle,
-#endif
 };
diff --git a/arch/powerpc/platforms/85xx/qemu_e500.c 
b/arch/powerpc/platforms/85xx/qemu_e500.c
index 64109ad6736c..1639e222cc33 100644
--- a/arch/powerpc/platforms/85xx/qemu_e500.c
+++ b/arch/powerpc/platforms/85xx/qemu_e500.c
@@ -68,9 +68,5 @@ define_machine(qemu_e500) {
.get_irq= mpic_get_coreint_irq,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
-#ifdef CONFIG_PPC64
-   .power_save = book3e_idle,
-#else
.power_save = e500_idle,
-#endif
 };
-- 
2.37.1



[PATCH v2 11/19] powerpc: Change CONFIG_E500 to CONFIG_PPC_E500

2022-09-19 Thread Christophe Leroy
It will be used outside arch/powerpc, make it clear its a
powerpc configuration item.

And we already have CONFIG_PPC_E500MC, so that will make
it more consistant.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile |  2 +-
 arch/powerpc/include/asm/cputable.h   |  4 ++--
 arch/powerpc/include/asm/kgdb.h   |  2 +-
 arch/powerpc/include/asm/mmu.h|  4 ++--
 arch/powerpc/include/asm/reg_booke.h  |  6 +++---
 arch/powerpc/include/asm/synch.h  |  2 +-
 arch/powerpc/include/asm/vdso/timebase.h  |  2 +-
 arch/powerpc/kernel/Makefile  |  2 +-
 arch/powerpc/kernel/cpu_setup_fsl_booke.S |  4 ++--
 arch/powerpc/kernel/entry_32.S|  4 ++--
 arch/powerpc/kernel/head_85xx.S   |  4 ++--
 arch/powerpc/kernel/head_booke.h  |  2 +-
 arch/powerpc/kernel/setup_32.c|  2 +-
 arch/powerpc/kernel/traps.c   |  2 +-
 arch/powerpc/kvm/Kconfig  |  4 ++--
 arch/powerpc/platforms/Kconfig.cputype| 26 +++
 arch/powerpc/sysdev/fsl_pci.c |  2 +-
 arch/powerpc/sysdev/fsl_rio.c |  2 +-
 18 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f6d477c4aa64..cb01832385d0 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -210,7 +210,7 @@ KBUILD_CFLAGS   += $(call cc-option,-mno-string)
 cpu-as-$(CONFIG_40x)   += -Wa,-m405
 cpu-as-$(CONFIG_44x)   += -Wa,-m440
 cpu-as-$(CONFIG_ALTIVEC)   += $(call as-option,-Wa$(comma)-maltivec)
-cpu-as-$(CONFIG_E500)  += -Wa,-me500
+cpu-as-$(CONFIG_PPC_E500)  += -Wa,-me500
 
 # When using '-many -mpower4' gas will first try and find a matching power4
 # mnemonic and failing that it will allow any valid mnemonic that GAS knows
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 27875f0b7bc7..757dbded11dc 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -510,7 +510,7 @@ enum {
 #elif defined(CONFIG_44x)
CPU_FTRS_44X | CPU_FTRS_440x6 |
 #endif
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
CPU_FTRS_E500 | CPU_FTRS_E500_2 |
 #endif
 #ifdef CONFIG_PPC_E500MC
@@ -584,7 +584,7 @@ enum {
 #elif defined(CONFIG_44x)
CPU_FTRS_44X & CPU_FTRS_440x6 &
 #endif
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
CPU_FTRS_E500 & CPU_FTRS_E500_2 &
 #endif
 #ifdef CONFIG_PPC_E500MC
diff --git a/arch/powerpc/include/asm/kgdb.h b/arch/powerpc/include/asm/kgdb.h
index a9e098a3b881..715c18b75334 100644
--- a/arch/powerpc/include/asm/kgdb.h
+++ b/arch/powerpc/include/asm/kgdb.h
@@ -52,7 +52,7 @@ static inline void arch_kgdb_breakpoint(void)
 /* On non-E500 family PPC32 we determine the size by picking the last
  * register we need, but on E500 we skip sections so we list what we
  * need to store, and add it up. */
-#ifndef CONFIG_E500
+#ifndef CONFIG_PPC_E500
 #define MAXREG (PT_FPSCR+1)
 #else
 /* 32 GPRs (8 bytes), nip, msr, ccr, link, ctr, xer, acc (8 bytes), spefscr*/
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 860d0290ca4d..5b46da9ba7f6 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -162,7 +162,7 @@ enum {
 #elif defined(CONFIG_44x)
MMU_FTR_TYPE_44x |
 #endif
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS | MMU_FTR_USE_TLBILX |
 #endif
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -211,7 +211,7 @@ enum {
 #elif defined(CONFIG_44x)
 #define MMU_FTRS_ALWAYSMMU_FTR_TYPE_44x
 #endif
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
 #define MMU_FTRS_ALWAYSMMU_FTR_TYPE_FSL_E
 #endif
 
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 17b8dcd9a40d..af56980b6cdb 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -246,7 +246,7 @@
 #define PPC47x_MCSR_FPR0x0080 /* FPR parity error */
 #define PPC47x_MCSR_IPR0x0040 /* Imprecise Machine Check Exception 
*/
 
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
 /* All e500 */
 #define MCSR_MCP   0x8000UL /* Machine Check Input Pin */
 #define MCSR_ICPERR0x4000UL /* I-Cache Parity Error */
@@ -282,7 +282,7 @@
 #endif
 
 /* Bit definitions for the HID1 */
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
 /* e500v1/v2 */
 #define HID1_PLL_CFG_MASK 0xfc00   /* PLL_CFG input pins */
 #define HID1_RFXE  0x0002  /* Read fault exception enable */
@@ -545,7 +545,7 @@
 #define TCR_FIE0x0080  /* FIT Interrupt Enable */
 #define TCR_ARE0x0040  /* Auto Reload Enable */
 
-#ifdef CONFIG_E500
+#ifdef CONFIG_PPC_E500
 #define TCR_GET_WP(tcr)  tcr) & 0xC000) >> 30) | \
  (((tcr) & 0x1E)

[PATCH v2 09/19] powerpc: Remove CONFIG_PPC_BOOK3E

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_BOOK3E is redundant with CONFIG_PPC_BOOK3E_64.

The later is more explicit about the fact that it's a 64 bits target.

Remove CONFIG_PPC_BOOK3E.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |  2 +-
 arch/powerpc/include/asm/cputable.h   |  4 +--
 arch/powerpc/include/asm/interrupt.h  |  2 +-
 arch/powerpc/include/asm/nohash/pgalloc.h |  2 +-
 arch/powerpc/include/asm/paca.h   |  8 ++---
 arch/powerpc/include/asm/ppc_asm.h|  4 +--
 arch/powerpc/kernel/asm-offsets.c |  6 ++--
 arch/powerpc/kernel/entry_64.S|  6 ++--
 arch/powerpc/kernel/head_64.S | 40 +++
 arch/powerpc/kernel/misc_64.S |  6 ++--
 arch/powerpc/kernel/paca.c|  6 ++--
 arch/powerpc/kernel/setup.h   |  2 +-
 arch/powerpc/kernel/setup_64.c|  8 ++---
 arch/powerpc/kernel/vmlinux.lds.S |  2 +-
 arch/powerpc/kexec/core_64.c  |  2 +-
 arch/powerpc/mm/mmu_decl.h|  6 ++--
 arch/powerpc/mm/nohash/tlb_low.S  |  2 +-
 arch/powerpc/platforms/85xx/Kconfig   |  2 +-
 arch/powerpc/platforms/Kconfig.cputype| 10 ++
 arch/powerpc/xmon/xmon.c  | 16 -
 20 files changed, 66 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7fe522b0946b..94a614bb1581 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -548,7 +548,7 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
 
 config KEXEC
bool "kexec system call"
-   depends on (PPC_BOOK3S || PPC_85xx || (44x && !SMP)) || PPC_BOOK3E
+   depends on (PPC_BOOK3S || PPC_85xx || (44x && !SMP)) || PPC_BOOK3E_64
select KEXEC_CORE
help
  kexec is a system call that implements the ability to shutdown your
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index ae8c3e13cfce..27875f0b7bc7 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -463,7 +463,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_COMPATIBLE(CPU_FTR_PPCAS_ARCH_V2)
 
 #ifdef CONFIG_PPC64
-#ifdef CONFIG_PPC_BOOK3E
+#ifdef CONFIG_PPC_BOOK3E_64
 #define CPU_FTRS_POSSIBLE  (CPU_FTRS_E6500 | CPU_FTRS_E5500)
 #else
 #ifdef CONFIG_CPU_LITTLE_ENDIAN
@@ -521,7 +521,7 @@ enum {
 #endif /* __powerpc64__ */
 
 #ifdef CONFIG_PPC64
-#ifdef CONFIG_PPC_BOOK3E
+#ifdef CONFIG_PPC_BOOK3E_64
 #define CPU_FTRS_ALWAYS(CPU_FTRS_E6500 & CPU_FTRS_E5500)
 #else
 
diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8069dbc4b8d1..84a1cdc3204c 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -281,7 +281,7 @@ static inline bool nmi_disables_ftrace(struct pt_regs *regs)
if (TRAP(regs) == INTERRUPT_PERFMON)
   return false;
}
-   if (IS_ENABLED(CONFIG_PPC_BOOK3E)) {
+   if (IS_ENABLED(CONFIG_PPC_BOOK3E_64)) {
if (TRAP(regs) == INTERRUPT_PERFMON)
return false;
}
diff --git a/arch/powerpc/include/asm/nohash/pgalloc.h 
b/arch/powerpc/include/asm/nohash/pgalloc.h
index 29c43665a753..4b62376318e1 100644
--- a/arch/powerpc/include/asm/nohash/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/pgalloc.h
@@ -15,7 +15,7 @@ static inline void tlb_flush_pgtable(struct mmu_gather *tlb,
 {
 
 }
-#endif /* !CONFIG_PPC_BOOK3E */
+#endif /* !CONFIG_PPC_BOOK3E_64 */
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 3537b0500f4d..09f1790d0ae1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -18,7 +18,7 @@
 #include 
 #include 
 #include 
-#ifdef CONFIG_PPC_BOOK3E
+#ifdef CONFIG_PPC_BOOK3E_64
 #include 
 #else
 #include 
@@ -127,7 +127,7 @@ struct paca_struct {
 #endif
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-#ifdef CONFIG_PPC_BOOK3E
+#ifdef CONFIG_PPC_BOOK3E_64
u64 exgen[8] __aligned(0x40);
/* Keep pgd in the same cacheline as the start of extlb */
pgd_t *pgd __aligned(0x40); /* Current PGD */
@@ -151,7 +151,7 @@ struct paca_struct {
void *dbg_kstack;
 
struct tlb_core_data tcd;
-#endif /* CONFIG_PPC_BOOK3E */
+#endif /* CONFIG_PPC_BOOK3E_64 */
 
 #ifdef CONFIG_PPC_64S_HASH_MMU
unsigned char mm_ctx_low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
@@ -168,7 +168,7 @@ struct paca_struct {
 #ifdef CONFIG_PPC64
u64 exit_save_r1;   /* Syscall/interrupt R1 save */
 #endif
-#ifdef CONFIG_PPC_BOOK3E
+#ifdef CONFIG_PPC_BOOK3E_64
u16 trap_save;  /* Used when bad stack is encountered */
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 83c02f5a7f2a..55149a0384db 100644
--- a/arch/powerpc/i

[PATCH v2 05/19] powerpc/cputable: Move __cpu_setup() prototypes out of cputable.h

2022-09-19 Thread Christophe Leroy
Move all prototypes out of cputable.h

For that rename cpu_setup_power.h to cpu_setup.h and move all
prototypes in it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cpu_setup.h   | 49 ++
 arch/powerpc/include/asm/cpu_setup_power.h | 12 --
 arch/powerpc/kernel/cpu_setup_power.c  |  2 +-
 arch/powerpc/kernel/cputable.c | 38 +
 4 files changed, 51 insertions(+), 50 deletions(-)
 create mode 100644 arch/powerpc/include/asm/cpu_setup.h
 delete mode 100644 arch/powerpc/include/asm/cpu_setup_power.h

diff --git a/arch/powerpc/include/asm/cpu_setup.h 
b/arch/powerpc/include/asm/cpu_setup.h
new file mode 100644
index ..d497b051598f
--- /dev/null
+++ b/arch/powerpc/include/asm/cpu_setup.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 IBM Corporation
+ */
+
+#ifndef _ASM_POWERPC_CPU_SETUP_H
+#define _ASM_POWERPC_CPU_SETUP_H
+void __setup_cpu_power7(unsigned long offset, struct cpu_spec *spec);
+void __setup_cpu_power8(unsigned long offset, struct cpu_spec *spec);
+void __setup_cpu_power9(unsigned long offset, struct cpu_spec *spec);
+void __setup_cpu_power10(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power7(void);
+void __restore_cpu_power8(void);
+void __restore_cpu_power9(void);
+void __restore_cpu_power10(void);
+
+void __setup_cpu_e500v1(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_e500v2(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_e500mc(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_440ep(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_440epx(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_440gx(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_440grx(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_440spe(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_440x5(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_460ex(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_460gt(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_460sx(unsigned long offset, struct cpu_spec *spec);
+void __setup_cpu_apm821xx(unsigned long offset, struct cpu_spec *spec);
+void __setup_cpu_603(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_604(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_750(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_750cx(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_750fx(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_7400(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_7410(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_745x(unsigned long offset, struct cpu_spec* spec);
+
+void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_ppc970MP(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_pa6t(unsigned long offset, struct cpu_spec* spec);
+void __restore_cpu_pa6t(void);
+void __restore_cpu_ppc970(void);
+
+void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
+void __setup_cpu_e6500(unsigned long offset, struct cpu_spec* spec);
+void __restore_cpu_e5500(void);
+void __restore_cpu_e6500(void);
+#endif /* _ASM_POWERPC_CPU_SETUP_H */
diff --git a/arch/powerpc/include/asm/cpu_setup_power.h 
b/arch/powerpc/include/asm/cpu_setup_power.h
deleted file mode 100644
index 24be9131f803..
--- a/arch/powerpc/include/asm/cpu_setup_power.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Copyright (C) 2020 IBM Corporation
- */
-void __setup_cpu_power7(unsigned long offset, struct cpu_spec *spec);
-void __restore_cpu_power7(void);
-void __setup_cpu_power8(unsigned long offset, struct cpu_spec *spec);
-void __restore_cpu_power8(void);
-void __setup_cpu_power9(unsigned long offset, struct cpu_spec *spec);
-void __restore_cpu_power9(void);
-void __setup_cpu_power10(unsigned long offset, struct cpu_spec *spec);
-void __restore_cpu_power10(void);
diff --git a/arch/powerpc/kernel/cpu_setup_power.c 
b/arch/powerpc/kernel/cpu_setup_power.c
index 3dc61e203f37..097c033668f0 100644
--- a/arch/powerpc/kernel/cpu_setup_power.c
+++ b/arch/powerpc/kernel/cpu_setup_power.c
@@ -11,7 +11,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 /* Disable CPU_FTR_HVMODE and return false if MSR:HV is not set */
 static bool init_hvmode_206(struct cpu_spec *t)
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 5ace97cccad8..9229e0930332 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct cpu_spec the_cpu_spec __read_mostly;
 
@@ -34,43 +35,6 @@ const char *powerpc_base_platform;
  * part of the cputable though. That has to be fixed for both ppc32
  * and ppc6

[PATCH v2 13/19] watchdog: booke_wdt: Replace PPC_FSL_BOOK3E by PPC_E500

2022-09-19 Thread Christophe Leroy
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Replace it so that CONFIG_PPC_FSL_BOOK3E can be removed later.

Signed-off-by: Christophe Leroy 
Cc: Wim Van Sebroeck 
Acked-by: Guenter Roeck 
Cc: linux-watch...@vger.kernel.org
---
 drivers/watchdog/Kconfig | 8 
 drivers/watchdog/booke_wdt.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 9295492d24f7..b7c03c600567 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -1935,10 +1935,10 @@ config BOOKE_WDT
 config BOOKE_WDT_DEFAULT_TIMEOUT
int "PowerPC Book-E Watchdog Timer Default Timeout"
depends on BOOKE_WDT
-   default 38 if PPC_FSL_BOOK3E
-   range 0 63 if PPC_FSL_BOOK3E
-   default 3 if !PPC_FSL_BOOK3E
-   range 0 3 if !PPC_FSL_BOOK3E
+   default 38 if PPC_E500
+   range 0 63 if PPC_E500
+   default 3 if !PPC_E500
+   range 0 3 if !PPC_E500
help
  Select the default watchdog timer period to be used by the PowerPC
  Book-E watchdog driver.  A watchdog "event" occurs when the bit
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index 75da5cd02615..932a03f4436a 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -27,7 +27,7 @@
  */
 
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_E500
 #define WDTP(x)x)&0x3)<<30)|(((x)&0x3c)<<15))
 #define WDTP_MASK  (WDTP(0x3f))
 #else
@@ -45,7 +45,7 @@ MODULE_PARM_DESC(nowayout,
"Watchdog cannot be stopped once started (default="
__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
 
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_E500
 
 /* For the specified period, determine the number of seconds
  * corresponding to the reset time.  There will be a watchdog
@@ -88,7 +88,7 @@ static unsigned int sec_to_period(unsigned int secs)
 
 #define MAX_WDT_TIMEOUTperiod_to_sec(1)
 
-#else /* CONFIG_PPC_FSL_BOOK3E */
+#else /* CONFIG_PPC_E500 */
 
 static unsigned long long period_to_sec(unsigned int period)
 {
@@ -102,7 +102,7 @@ static unsigned int sec_to_period(unsigned int secs)
 
 #define MAX_WDT_TIMEOUT3   /* from Kconfig */
 
-#endif /* !CONFIG_PPC_FSL_BOOK3E */
+#endif /* !CONFIG_PPC_E500 */
 
 static void __booke_wdt_set(void *data)
 {
-- 
2.37.1



[PATCH v2 19/19] powerpc: Remove impossible mmu_psize_defs[] on nohash

2022-09-19 Thread Christophe Leroy
Today there is:

  if e500 or 8xx
if e500
  mmu_psize_defs[] =
else if 8xx
  mmu_psize_defs[] =
else
  mmu_psize_defs[] =
endif
  endif

The else leg is dead definition.

Drop that else leg and rewrite as:

  if e500
mmu_psize_defs[] =
  endif
  if 8xx
mmu_psize_defs[] =
  endif

Signed-off-by: Christophe Leroy 
---
v2: Fix build failure
---
 arch/powerpc/mm/nohash/tlb.c | 64 +---
 1 file changed, 15 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index fac59fbd475a..2c15c86c7015 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -49,7 +49,6 @@
  * other sizes not listed here.   The .ind field is only used on MMUs that have
  * indirect page table entries.
  */
-#if defined(CONFIG_PPC_E500) || defined(CONFIG_PPC_8xx)
 #ifdef CONFIG_PPC_E500
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
[MMU_PAGE_4K] = {
@@ -81,7 +80,20 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.enc= BOOK3E_PAGESZ_1GB,
},
 };
-#elif defined(CONFIG_PPC_8xx)
+
+static inline int mmu_get_tsize(int psize)
+{
+   return mmu_psize_defs[psize].enc;
+}
+#else
+static inline int mmu_get_tsize(int psize)
+{
+   /* This isn't used on !Book3E for now */
+   return 0;
+}
+#endif
+
+#ifdef CONFIG_PPC_8xx
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
[MMU_PAGE_4K] = {
.shift  = 12,
@@ -96,53 +108,7 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 23,
},
 };
-#else
-struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
-   [MMU_PAGE_4K] = {
-   .shift  = 12,
-   .ind= 20,
-   .enc= BOOK3E_PAGESZ_4K,
-   },
-   [MMU_PAGE_16K] = {
-   .shift  = 14,
-   .enc= BOOK3E_PAGESZ_16K,
-   },
-   [MMU_PAGE_64K] = {
-   .shift  = 16,
-   .ind= 28,
-   .enc= BOOK3E_PAGESZ_64K,
-   },
-   [MMU_PAGE_1M] = {
-   .shift  = 20,
-   .enc= BOOK3E_PAGESZ_1M,
-   },
-   [MMU_PAGE_16M] = {
-   .shift  = 24,
-   .ind= 36,
-   .enc= BOOK3E_PAGESZ_16M,
-   },
-   [MMU_PAGE_256M] = {
-   .shift  = 28,
-   .enc= BOOK3E_PAGESZ_256M,
-   },
-   [MMU_PAGE_1G] = {
-   .shift  = 30,
-   .enc= BOOK3E_PAGESZ_1GB,
-   },
-};
-#endif /* CONFIG_PPC_85xx */
-
-static inline int mmu_get_tsize(int psize)
-{
-   return mmu_psize_defs[psize].enc;
-}
-#else
-static inline int mmu_get_tsize(int psize)
-{
-   /* This isn't used on !Book3E for now */
-   return 0;
-}
-#endif /* CONFIG_PPC_E500 */
+#endif
 
 /* The variables below are currently only used on 64-bit Book3E
  * though this will probably be made common with other nohash
-- 
2.37.1



Re: [PATCH 2/2] Discard .note.gnu.property sections in generic NOTES

2022-09-19 Thread Omar Sandoval
On Sat, Sep 17, 2022 at 06:31:20AM +, Christophe Leroy wrote:
> 
> 
> Le 16/09/2022 à 21:40, Omar Sandoval a écrit :
> > [Vous ne recevez pas souvent de courriers de osan...@osandov.com. D?couvrez 
> > pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification 
> > ]
> > 
> > On Tue, Apr 28, 2020 at 06:21:05AM -0700, H.J. Lu wrote:
> >> With the command-line option, -mx86-used-note=yes, the x86 assembler
> >> in binutils 2.32 and above generates a program property note in a note
> >> section, .note.gnu.property, to encode used x86 ISAs and features.  But
> >> kernel linker script only contains a single NOTE segment:
> >>
> >> PHDRS {
> >>   text PT_LOAD FLAGS(5);
> >>   data PT_LOAD FLAGS(6);
> >>   percpu PT_LOAD FLAGS(6);
> >>   init PT_LOAD FLAGS(7);
> >>   note PT_NOTE FLAGS(0);
> >> }
> >> SECTIONS
> >> {
> >> ...
> >>   .notes : AT(ADDR(.notes) - 0x8000) { __start_notes = .; 
> >> KEEP(*(.not
> >> e.*)) __stop_notes = .; } :text :note
> >> ...
> >> }
> >>
> >> The NOTE segment generated by kernel linker script is aligned to 4 bytes.
> >> But .note.gnu.property section must be aligned to 8 bytes on x86-64 and
> >> we get
> >>
> >> [hjl@gnu-skx-1 linux]$ readelf -n vmlinux
> >>
> >> Displaying notes found in: .notes
> >>OwnerData size Description
> >>Xen  0x0006 Unknown note type: (0x0006)
> >> description data: 6c 69 6e 75 78 00
> >>Xen  0x0004 Unknown note type: (0x0007)
> >> description data: 32 2e 36 00
> >>xen-3.0  0x0005 Unknown note type: (0x006e6558)
> >> description data: 08 00 00 00 03
> >> readelf: Warning: note with invalid namesz and/or descsz found at offset 
> >> 0x50
> >> readelf: Warning:  type: 0x, namesize: 0x006e6558, descsize:
> >> 0x8000, alignment: 8
> >> [hjl@gnu-skx-1 linux]$
> >>
> >> Since note.gnu.property section in kernel image is never used, this patch
> >> discards .note.gnu.property sections in kernel linker script by adding
> >>
> >> /DISCARD/ : {
> >>*(.note.gnu.property)
> >> }
> >>
> >> before kernel NOTE segment in generic NOTES.
> >>
> >> Signed-off-by: H.J. Lu 
> >> Reviewed-by: Kees Cook 
> >> ---
> >>   include/asm-generic/vmlinux.lds.h | 7 +++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/include/asm-generic/vmlinux.lds.h 
> >> b/include/asm-generic/vmlinux.lds.h
> >> index 71e387a5fe90..95cd678428f4 100644
> >> --- a/include/asm-generic/vmlinux.lds.h
> >> +++ b/include/asm-generic/vmlinux.lds.h
> >> @@ -833,7 +833,14 @@
> >>   #define TRACEDATA
> >>   #endif
> >>
> >> +/*
> >> + * Discard .note.gnu.property sections which are unused and have
> >> + * different alignment requirement from kernel note sections.
> >> + */
> >>   #define NOTES
> >> \
> >> + /DISCARD/ : {   \
> >> + *(.note.gnu.property)   \
> >> + }   \
> >>.notes : AT(ADDR(.notes) - LOAD_OFFSET) {   \
> >>__start_notes = .;  \
> >>KEEP(*(.note.*))\
> >> --
> >> 2.25.4
> >>
> > 
> > Hi, H.J.,
> > 
> > I recently ran into this same .notes corruption when building kernels on
> > Arch Linux.
> > 
> > What ended up happening to this patch? It doesn't appear to have been
> > merged, and I couldn't find any further discussion about it. I'm happy
> > to resend it for you if you need a hand.
> 
> As far as I can see, ARM64 is doing something with that section, see 
> arch/arm64/include/asm/assembler.h
> 
> Instead of discarding that section, would it be enough to force 
> alignment of .notes to 8 bytes ?
> 
> Thanks
> Christophe

Unfortunately, "alignment requirement" here isn't just the starting
alignment of the .notes section; it also refers to internal padding in
the note metadata to keep things aligned. Changing this would break
anyone who parses /sys/kernel/notes (e.g., perf).

Here is a little more context around this mess:

The System V gABI [1] says that the note header and descriptor should be
aligned to 4 bytes for 32-bit files and 8 bytes for 64-bit files.
However, Linux never followed this, and 4-byte alignment is used for
both 32-bit and 64-bit files; see elf(5) [2].

The only exception as of 2022 is
.note.gnu.property/NT_GNU_PROPERTY_TYPE_0, which is defined to follow
the gABI alignment. There was a long thread discussing this back in 2018
with the subject "PT_NOTE alignment, NT_GNU_PROPERTY_TYPE_0, glibc and
gold" [3].

According to the gABI Linux Extensions [4], consumers are now supposed
to use the p_align of the PT_NOTE segment instead of assuming an
alignment.

There are a few issues with this for the kernel:

* The vmlinux linker script squishes together all 

Re: [PATCH 2/2] Discard .note.gnu.property sections in generic NOTES

2022-09-19 Thread Mark Brown
On Mon, Sep 19, 2022 at 10:26:17AM -0700, Omar Sandoval wrote:

In general if you're going to CC someone into a thread please put
a note at the start of your mail explaining why, many of us get
copied on a lot of irrelevant things for no apparently reason so
if it's not immediately obvious why we were sent a mail there's
every chance it'll just be deleted.

> I'm not sure what exactly arch/arm64/include/asm/assembler.h is doing
> with this file. Perhaps the author, Mark Brown, can clarify?

I don't understand the question, what file are you talking about
here?  arch/arm64/include/asm/assembler.h is itself a file and I
couldn't find anything nearby in your mail talking about a file...


signature.asc
Description: PGP signature


Re: [PATCH 2/2] Discard .note.gnu.property sections in generic NOTES

2022-09-19 Thread Omar Sandoval
On Mon, Sep 19, 2022 at 06:33:40PM +0100, Mark Brown wrote:
> On Mon, Sep 19, 2022 at 10:26:17AM -0700, Omar Sandoval wrote:
> 
> In general if you're going to CC someone into a thread please put
> a note at the start of your mail explaining why, many of us get
> copied on a lot of irrelevant things for no apparently reason so
> if it's not immediately obvious why we were sent a mail there's
> every chance it'll just be deleted.

Sorry about that.

> > I'm not sure what exactly arch/arm64/include/asm/assembler.h is doing
> > with this file. Perhaps the author, Mark Brown, can clarify?
> 
> I don't understand the question, what file are you talking about
> here?  arch/arm64/include/asm/assembler.h is itself a file and I
> couldn't find anything nearby in your mail talking about a file...

Oops, that was a typo, I meant to say "I'm not sure what
arch/arm64/include/asm/assembler.h is doing with this *note*". To be
more explicit: does ARM64 need .note.gnu.property/NT_GNU_PROPERTY_TYPE_0
in vmlinux?

Thanks,
Omar


Re: [PATCH 2/2] Discard .note.gnu.property sections in generic NOTES

2022-09-19 Thread Mark Brown
On Mon, Sep 19, 2022 at 10:40:36AM -0700, Omar Sandoval wrote:
> On Mon, Sep 19, 2022 at 06:33:40PM +0100, Mark Brown wrote:

> > I don't understand the question, what file are you talking about
> > here?  arch/arm64/include/asm/assembler.h is itself a file and I
> > couldn't find anything nearby in your mail talking about a file...

> Oops, that was a typo, I meant to say "I'm not sure what
> arch/arm64/include/asm/assembler.h is doing with this *note*". To be
> more explicit: does ARM64 need .note.gnu.property/NT_GNU_PROPERTY_TYPE_0
> in vmlinux?

It needs it in at least the vDSO which gets built into vmlinux.
AFAIR we don't use it in normal kernel code.


signature.asc
Description: PGP signature


Re: [PATCH] net: ethernet: remove fs_mii_disconnect and fs_mii_connect declarations

2022-09-19 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski :

On Fri, 9 Sep 2022 14:29:59 +0800 you wrote:
> fs_mii_disconnect and fs_mii_connect have been removed since
> commit 5b4b8454344a ("[PATCH] FS_ENET: use PAL for mii management"),
> so remove them.
> 
> Signed-off-by: Gaosheng Cui 
> ---
>  drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 3 ---
>  1 file changed, 3 deletions(-)

Here is the summary with links:
  - net: ethernet: remove fs_mii_disconnect and fs_mii_connect declarations
https://git.kernel.org/netdev/net-next/c/feceb24ed79a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

2022-09-19 Thread Greg Kroah-Hartman
On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote:
> On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
>  wrote:
> >
> > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > > These patches are on top of driver-core-next.
> > >
> > > Even if stdout-path isn't set in DT, this patch should take console
> > > probe times back to how they were before the deferred_probe_timeout
> > > clean up series[1].
> >
> > Now dropped from my queue due to lack of a response to other reviewer's
> > questions.
> 
> What happened to this patch? I have a 10 second timeout on console
> probe on my SiFive Unmatched, and I don't see this flag being set for
> the serial driver. In fact, I don't see it anywhere in-tree. I can't
> seem to locate another patchset from Saravana around this though, so
> I'm not sure where to look for a missing piece for the sifive serial
> driver.
> 
> This is the second boot time regression (this one not fatal, unlike
> the Layerscape PCIe one) from the fw_devlink patchset.
> 
> Greg, can you revert the whole set for 6.0, please? It's obviously
> nowhere near tested enough to go in and I expect we'll see a bunch of
> -stable fixups due to this if we let it remain in.

What exactly is "the whole set"?  I have the default option fix queued
up and will send that to Linus later this week (am traveling back from
Plumbers still), but have not heard any problems about any other issues
at all other than your report.

thnaks,

greg k-h


[PATCH v2 05/44] cpuidle,riscv: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-riscv-sbi.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/cpuidle/cpuidle-riscv-sbi.c
+++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
@@ -116,12 +116,12 @@ static int __sbi_enter_domain_idle_state
return -1;
 
/* Do runtime PM to manage a hierarchical CPU toplogy. */
-   ct_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
-   ct_irq_exit_irqson();
+
+   ct_idle_enter();
 
if (sbi_is_domain_state_available())
state = sbi_get_domain_state();
@@ -130,12 +130,12 @@ static int __sbi_enter_domain_idle_state
 
ret = sbi_suspend(state) ? -1 : idx;
 
-   ct_irq_enter_irqson();
+   ct_idle_exit();
+
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
-   ct_irq_exit_irqson();
 
cpu_pm_exit();
 
@@ -246,6 +246,7 @@ static int sbi_dt_cpu_init_topology(stru
 * of a shared state for the domain, assumes the domain states are all
 * deeper states.
 */
+   drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
drv->states[state_count - 1].enter = sbi_enter_domain_idle_state;
drv->states[state_count - 1].enter_s2idle =
sbi_enter_s2idle_domain_idle_state;




[PATCH v2 07/44] cpuidle,psci: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-psci.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -69,12 +69,12 @@ static int __psci_enter_domain_idle_stat
return -1;
 
/* Do runtime PM to manage a hierarchical CPU toplogy. */
-   ct_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
-   ct_irq_exit_irqson();
+
+   ct_idle_enter();
 
state = psci_get_domain_state();
if (!state)
@@ -82,12 +82,12 @@ static int __psci_enter_domain_idle_stat
 
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
 
-   ct_irq_enter_irqson();
+   ct_idle_exit();
+
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
-   ct_irq_exit_irqson();
 
cpu_pm_exit();
 
@@ -240,6 +240,7 @@ static int psci_dt_cpu_init_topology(str
 * of a shared state for the domain, assumes the domain states are all
 * deeper states.
 */
+   drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
drv->states[state_count - 1].enter_s2idle = 
psci_enter_s2idle_domain_idle_state;
psci_cpuidle_use_cpuhp = true;




[PATCH v2 06/44] cpuidle,tegra: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-tegra.c |   21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

--- a/drivers/cpuidle/cpuidle-tegra.c
+++ b/drivers/cpuidle/cpuidle-tegra.c
@@ -180,9 +180,11 @@ static int tegra_cpuidle_state_enter(str
}
 
local_fiq_disable();
-   RCU_NONIDLE(tegra_pm_set_cpu_in_lp2());
+   tegra_pm_set_cpu_in_lp2();
cpu_pm_enter();
 
+   ct_idle_enter();
+
switch (index) {
case TEGRA_C7:
err = tegra_cpuidle_c7_enter();
@@ -197,8 +199,10 @@ static int tegra_cpuidle_state_enter(str
break;
}
 
+   ct_idle_exit();
+
cpu_pm_exit();
-   RCU_NONIDLE(tegra_pm_clear_cpu_in_lp2());
+   tegra_pm_clear_cpu_in_lp2();
local_fiq_enable();
 
return err ?: index;
@@ -226,6 +230,7 @@ static int tegra_cpuidle_enter(struct cp
   struct cpuidle_driver *drv,
   int index)
 {
+   bool do_rcu = drv->states[index].flags & CPUIDLE_FLAG_RCU_IDLE;
unsigned int cpu = cpu_logical_map(dev->cpu);
int ret;
 
@@ -233,9 +238,13 @@ static int tegra_cpuidle_enter(struct cp
if (dev->states_usage[index].disable)
return -1;
 
-   if (index == TEGRA_C1)
+   if (index == TEGRA_C1) {
+   if (do_rcu)
+   ct_idle_enter();
ret = arm_cpuidle_simple_enter(dev, drv, index);
-   else
+   if (do_rcu)
+   ct_idle_exit();
+   } else
ret = tegra_cpuidle_state_enter(dev, index, cpu);
 
if (ret < 0) {
@@ -285,7 +294,8 @@ static struct cpuidle_driver tegra_idle_
.exit_latency   = 2000,
.target_residency   = 2200,
.power_usage= 100,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C7",
.desc   = "CPU core powered off",
},
@@ -295,6 +305,7 @@ static struct cpuidle_driver tegra_idle_
.target_residency   = 1,
.power_usage= 0,
.flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE   |
  CPUIDLE_FLAG_COUPLED,
.name   = "CC6",
.desc   = "CPU cluster powered off",




[PATCH v2 03/44] cpuidle/poll: Ensure IRQ state is invariant

2022-09-19 Thread Peter Zijlstra
cpuidle_state::enter() methods should be IRQ invariant

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Rafael J. Wysocki 
---
 drivers/cpuidle/poll_state.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -17,7 +17,7 @@ static int __cpuidle poll_idle(struct cp
 
dev->poll_time_limit = false;
 
-   local_irq_enable();
+   raw_local_irq_enable();
if (!current_set_polling_and_test()) {
unsigned int loop_count = 0;
u64 limit;
@@ -36,6 +36,8 @@ static int __cpuidle poll_idle(struct cp
}
}
}
+   raw_local_irq_disable();
+
current_clr_polling();
 
return index;




[PATCH v2 10/44] cpuidle,armada: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-mvebu-v7.c |7 +++
 1 file changed, 7 insertions(+)

--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -36,7 +36,10 @@ static int mvebu_v7_enter_idle(struct cp
if (drv->states[index].flags & MVEBU_V7_FLAG_DEEP_IDLE)
deepidle = true;
 
+   ct_idle_enter();
ret = mvebu_v7_cpu_suspend(deepidle);
+   ct_idle_exit();
+
cpu_pm_exit();
 
if (ret)
@@ -49,6 +52,7 @@ static struct cpuidle_driver armadaxp_id
.name   = "armada_xp_idle",
.states[0]  = ARM_CPUIDLE_WFI_STATE,
.states[1]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 100,
.power_usage= 50,
@@ -57,6 +61,7 @@ static struct cpuidle_driver armadaxp_id
.desc   = "CPU power down",
},
.states[2]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 1000,
.power_usage= 5,
@@ -72,6 +77,7 @@ static struct cpuidle_driver armada370_i
.name   = "armada_370_idle",
.states[0]  = ARM_CPUIDLE_WFI_STATE,
.states[1]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 100,
.power_usage= 5,
@@ -87,6 +93,7 @@ static struct cpuidle_driver armada38x_i
.name   = "armada_38x_idle",
.states[0]  = ARM_CPUIDLE_WFI_STATE,
.states[1]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 10,
.power_usage= 5,




[PATCH v2 02/44] x86/idle: Replace x86_idle with a static_call

2022-09-19 Thread Peter Zijlstra
Typical boot time setup; no need to suffer an indirect call for that.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Frederic Weisbecker 
Reviewed-by: Rafael J. Wysocki 
---
 arch/x86/kernel/process.c |   50 +-
 1 file changed, 28 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -692,7 +693,23 @@ void __switch_to_xtra(struct task_struct
 unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-static void (*x86_idle)(void);
+/*
+ * We use this if we don't have any better idle routine..
+ */
+void __cpuidle default_idle(void)
+{
+   raw_safe_halt();
+}
+#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
+EXPORT_SYMBOL(default_idle);
+#endif
+
+DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
+
+static bool x86_idle_set(void)
+{
+   return !!static_call_query(x86_idle);
+}
 
 #ifndef CONFIG_SMP
 static inline void play_dead(void)
@@ -715,28 +732,17 @@ void arch_cpu_idle_dead(void)
 /*
  * Called from the generic idle code.
  */
-void arch_cpu_idle(void)
-{
-   x86_idle();
-}
-
-/*
- * We use this if we don't have any better idle routine..
- */
-void __cpuidle default_idle(void)
+void __cpuidle arch_cpu_idle(void)
 {
-   raw_safe_halt();
+   static_call(x86_idle)();
 }
-#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
-EXPORT_SYMBOL(default_idle);
-#endif
 
 #ifdef CONFIG_XEN
 bool xen_set_default_idle(void)
 {
-   bool ret = !!x86_idle;
+   bool ret = x86_idle_set();
 
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
 
return ret;
 }
@@ -859,20 +865,20 @@ void select_idle_routine(const struct cp
if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
pr_warn_once("WARNING: polling idle and HT enabled, performance 
may degrade\n");
 #endif
-   if (x86_idle || boot_option_idle_override == IDLE_POLL)
+   if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
return;
 
if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
pr_info("using AMD E400 aware idle routine\n");
-   x86_idle = amd_e400_idle;
+   static_call_update(x86_idle, amd_e400_idle);
} else if (prefer_mwait_c1_over_halt(c)) {
pr_info("using mwait in idle threads\n");
-   x86_idle = mwait_idle;
+   static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
-   x86_idle = tdx_safe_halt;
+   static_call_update(x86_idle, tdx_safe_halt);
} else
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
 }
 
 void amd_e400_c1e_apic_setup(void)
@@ -925,7 +931,7 @@ static int __init idle_setup(char *str)
 * To continue to load the CPU idle driver, don't touch
 * the boot_option_idle_override.
 */
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
boot_option_idle_override = IDLE_HALT;
} else if (!strcmp(str, "nomwait")) {
/*




[PATCH v2 09/44] cpuidle,omap3: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then teporarily enable it
again before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Tony Lindgren 
Tested-by: Tony Lindgren 
---
 arch/arm/mach-omap2/cpuidle34xx.c |   16 
 1 file changed, 16 insertions(+)

--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,7 +133,9 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
+   ct_idle_enter();
omap_sram_idle();
+   ct_idle_exit();
 
/*
 * Call idle CPU PM enter notifier chain to restore
@@ -265,6 +267,7 @@ static struct cpuidle_driver omap3_idle_
.owner= THIS_MODULE,
.states = {
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 2 + 2,
.target_residency = 5,
@@ -272,6 +275,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 10 + 10,
.target_residency = 30,
@@ -279,6 +283,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 50 + 50,
.target_residency = 300,
@@ -286,6 +291,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU RET + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 1500 + 1800,
.target_residency = 4000,
@@ -293,6 +299,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU OFF + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 2500 + 7500,
.target_residency = 12000,
@@ -300,6 +307,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU RET + CORE RET",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 3000 + 8500,
.target_residency = 15000,
@@ -307,6 +315,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU OFF + CORE RET",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 1 + 3,
.target_residency = 3,
@@ -328,6 +337,7 @@ static struct cpuidle_driver omap3430_id
.owner= THIS_MODULE,
.states = {
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 110 + 162,
.target_residency = 5,
@@ -335,6 +345,7 @@ static struct cpuidle_driver omap3430_id
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 106 + 180,
.target_residency = 309,
@@ -342,6 +353,7 @@ static struct cpuidle_driver omap3430_id
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 107 + 410,
.target_residency = 46057,
@@ -349,6 +361,7 @@ static struct cpuidle_driver omap3430_id
.desc = "MPU RET + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 121 + 3374,
.target_residency = 46057,
@@ -356,6 +369,7 @@ stat

[PATCH v2 11/44] cpuidle,omap4: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again, some *four* times, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Tony Lindgren 
Tested-by: Tony Lindgren 
---
 arch/arm/mach-omap2/cpuidle44xx.c |   29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,7 +105,9 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(&mpu_lock, flag);
 
+   ct_idle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
+   ct_idle_exit();
 
raw_spin_lock_irqsave(&mpu_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -151,10 +153,10 @@ static int omap_enter_idle_coupled(struc
 (cx->mpu_logic_state == PWRDM_POWER_OFF);
 
/* Enter broadcast mode for periodic timers */
-   RCU_NONIDLE(tick_broadcast_enable());
+   tick_broadcast_enable();
 
/* Enter broadcast mode for one-shot timers */
-   RCU_NONIDLE(tick_broadcast_enter());
+   tick_broadcast_enter();
 
/*
 * Call idle CPU PM enter notifier chain so that
@@ -166,7 +168,7 @@ static int omap_enter_idle_coupled(struc
 
if (dev->cpu == 0) {
pwrdm_set_logic_retst(mpu_pd, cx->mpu_logic_state);
-   RCU_NONIDLE(omap_set_pwrdm_state(mpu_pd, cx->mpu_state));
+   omap_set_pwrdm_state(mpu_pd, cx->mpu_state);
 
/*
 * Call idle CPU cluster PM enter notifier chain
@@ -178,14 +180,16 @@ static int omap_enter_idle_coupled(struc
index = 0;
cx = state_ptr + index;
pwrdm_set_logic_retst(mpu_pd, 
cx->mpu_logic_state);
-   RCU_NONIDLE(omap_set_pwrdm_state(mpu_pd, 
cx->mpu_state));
+   omap_set_pwrdm_state(mpu_pd, cx->mpu_state);
mpuss_can_lose_context = 0;
}
}
}
 
+   ct_idle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
cpu_done[dev->cpu] = true;
+   ct_idle_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
@@ -194,9 +198,9 @@ static int omap_enter_idle_coupled(struc
mpuss_can_lose_context)
gic_dist_disable();
 
-   RCU_NONIDLE(clkdm_deny_idle(cpu_clkdm[1]));
-   RCU_NONIDLE(omap_set_pwrdm_state(cpu_pd[1], PWRDM_POWER_ON));
-   RCU_NONIDLE(clkdm_allow_idle(cpu_clkdm[1]));
+   clkdm_deny_idle(cpu_clkdm[1]);
+   omap_set_pwrdm_state(cpu_pd[1], PWRDM_POWER_ON);
+   clkdm_allow_idle(cpu_clkdm[1]);
 
if (IS_PM44XX_ERRATUM(PM_OMAP4_ROM_SMP_BOOT_ERRATUM_GICD) &&
mpuss_can_lose_context) {
@@ -222,7 +226,7 @@ static int omap_enter_idle_coupled(struc
cpu_pm_exit();
 
 cpu_pm_out:
-   RCU_NONIDLE(tick_broadcast_exit());
+   tick_broadcast_exit();
 
 fail:
cpuidle_coupled_parallel_barrier(dev, &abort_barrier);
@@ -247,7 +251,8 @@ static struct cpuidle_driver omap4_idle_
/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
.exit_latency = 328 + 440,
.target_residency = 960,
-   .flags = CPUIDLE_FLAG_COUPLED,
+   .flags = CPUIDLE_FLAG_COUPLED |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = omap_enter_idle_coupled,
.name = "C2",
.desc = "CPUx OFF, MPUSS CSWR",
@@ -256,7 +261,8 @@ static struct cpuidle_driver omap4_idle_
/* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
.exit_latency = 460 + 518,
.target_residency = 1100,
-   .flags = CPUIDLE_FLAG_COUPLED,
+   .flags = CPUIDLE_FLAG_COUPLED |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = omap_enter_idle_coupled,
.name = "C3",
.desc = "CPUx OFF, MPUSS OSWR",
@@ -282,7 +288,8 @@ static struct cpuidle_driver omap5_idle_
/* C2 - CPU0 RET + CPU1 RET + MPU CSWR */
.exit_latency = 48 + 60,
.target_residency = 100,
-   .flags = CPUIDLE_FLAG_TIMER_STOP,
+   .flags = CPUIDLE_FLAG_TIMER_STOP |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = omap_enter_idle_smp,
.name = "C2",
.desc = "CPUx CSWR, MPUSS CSWR",




[PATCH v2 16/44] cpuidle: Annotate poll_idle()

2022-09-19 Thread Peter Zijlstra
The __cpuidle functions will become a noinstr class, as such they need
explicit annotations.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Rafael J. Wysocki 
---
 drivers/cpuidle/poll_state.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -13,7 +13,10 @@
 static int __cpuidle poll_idle(struct cpuidle_device *dev,
   struct cpuidle_driver *drv, int index)
 {
-   u64 time_start = local_clock();
+   u64 time_start;
+
+   instrumentation_begin();
+   time_start = local_clock();
 
dev->poll_time_limit = false;
 
@@ -39,6 +42,7 @@ static int __cpuidle poll_idle(struct cp
raw_local_irq_disable();
 
current_clr_polling();
+   instrumentation_end();
 
return index;
 }




[PATCH v2 15/44] acpi_idle: Remove tracing

2022-09-19 Thread Peter Zijlstra
All the idle routines are called with RCU disabled, as such there must
not be any tracing inside.

While there; clean-up the io-port idle thing.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/acpi/processor_idle.c |   24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -108,8 +108,8 @@ static const struct dmi_system_id proces
 static void __cpuidle acpi_safe_halt(void)
 {
if (!tif_need_resched()) {
-   safe_halt();
-   local_irq_disable();
+   raw_safe_halt();
+   raw_local_irq_disable();
}
 }
 
@@ -524,16 +524,21 @@ static int acpi_idle_bm_check(void)
return bm_status;
 }
 
-static void wait_for_freeze(void)
+static __cpuidle void io_idle(unsigned long addr)
 {
+   /* IO port based C-state */
+   inb(addr);
+
 #ifdef CONFIG_X86
/* No delay is needed if we are in guest */
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;
 #endif
-   /* Dummy wait op - must do something useless after P_LVL2 read
-  because chipsets cannot guarantee that STPCLK# signal
-  gets asserted in time to freeze execution properly. */
+   /*
+* Dummy wait op - must do something useless after P_LVL2 read
+* because chipsets cannot guarantee that STPCLK# signal
+* gets asserted in time to freeze execution properly.
+*/
inl(acpi_gbl_FADT.xpm_timer_block.address);
 }
 
@@ -553,9 +558,7 @@ static void __cpuidle acpi_idle_do_entry
} else if (cx->entry_method == ACPI_CSTATE_HALT) {
acpi_safe_halt();
} else {
-   /* IO port based C-state */
-   inb(cx->address);
-   wait_for_freeze();
+   io_idle(cx->address);
}
 
perf_lopwr_cb(false);
@@ -577,8 +580,7 @@ static int acpi_idle_play_dead(struct cp
if (cx->entry_method == ACPI_CSTATE_HALT)
safe_halt();
else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
-   inb(cx->address);
-   wait_for_freeze();
+   io_idle(cx->address);
} else
return -ENODEV;
 




[PATCH v2 25/44] printk: Remove trace_.*_rcuidle() usage

2022-09-19 Thread Peter Zijlstra
The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
tracepoint"), was printk usage from the cpuidle path where RCU was
already disabled.

Per the patches earlier in this series, this is no longer the case.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Sergey Senozhatsky 
Acked-by: Petr Mladek 
---
 kernel/printk/printk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16
}
}
 
-   trace_console_rcuidle(text, text_len);
+   trace_console(text, text_len);
 
return text_len;
 }




[PATCH v2 22/44] x86/tdx: Remove TDX_HCALL_ISSUE_STI

2022-09-19 Thread Peter Zijlstra
Now that arch_cpu_idle() is expected to return with IRQs disabled,
avoid the useless STI/CLI dance.

Per the specs this is supposed to work, but nobody has yet relied up
this behaviour so broken implementations are possible.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/coco/tdx/tdcall.S|   13 -
 arch/x86/coco/tdx/tdx.c   |   23 ---
 arch/x86/include/asm/shared/tdx.h |1 -
 3 files changed, 4 insertions(+), 33 deletions(-)

--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -139,19 +139,6 @@ SYM_FUNC_START(__tdx_hypercall)
 
movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
-   /*
-* For the idle loop STI needs to be called directly before the TDCALL
-* that enters idle (EXIT_REASON_HLT case). STI instruction enables
-* interrupts only one instruction later. If there is a window between
-* STI and the instruction that emulates the HALT state, there is a
-* chance for interrupts to happen in this window, which can delay the
-* HLT operation indefinitely. Since this is the not the desired
-* result, conditionally call STI before TDCALL.
-*/
-   testq $TDX_HCALL_ISSUE_STI, %rsi
-   jz .Lskip_sti
-   sti
-.Lskip_sti:
tdcall
 
/*
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -169,7 +169,7 @@ static int ve_instr_len(struct ve_info *
}
 }
 
-static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
+static u64 __cpuidle __halt(const bool irq_disabled)
 {
struct tdx_hypercall_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -189,20 +189,14 @@ static u64 __cpuidle __halt(const bool i
 * can keep the vCPU in virtual HLT, even if an IRQ is
 * pending, without hanging/breaking the guest.
 */
-   return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0);
+   return __tdx_hypercall(&args, 0);
 }
 
 static int handle_halt(struct ve_info *ve)
 {
-   /*
-* Since non safe halt is mainly used in CPU offlining
-* and the guest will always stay in the halt state, don't
-* call the STI instruction (set do_sti as false).
-*/
const bool irq_disabled = irqs_disabled();
-   const bool do_sti = false;
 
-   if (__halt(irq_disabled, do_sti))
+   if (__halt(irq_disabled))
return -EIO;
 
return ve_instr_len(ve);
@@ -210,22 +204,13 @@ static int handle_halt(struct ve_info *v
 
 void __cpuidle tdx_safe_halt(void)
 {
-/*
- * For do_sti=true case, __tdx_hypercall() function enables
- * interrupts using the STI instruction before the TDCALL. So
- * set irq_disabled as false.
- */
const bool irq_disabled = false;
-   const bool do_sti = true;
 
/*
 * Use WARN_ONCE() to report the failure.
 */
-   if (__halt(irq_disabled, do_sti))
+   if (__halt(irq_disabled))
WARN_ONCE(1, "HLT instruction emulation failed\n");
-
-   /* XXX I can't make sense of what @do_sti actually does */
-   raw_local_irq_disable();
 }
 
 static int read_msr(struct pt_regs *regs, struct ve_info *ve)
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -8,7 +8,6 @@
 #define TDX_HYPERCALL_STANDARD  0
 
 #define TDX_HCALL_HAS_OUTPUT   BIT(0)
-#define TDX_HCALL_ISSUE_STIBIT(1)
 
 #define TDX_CPUID_LEAF_ID  0x21
 #define TDX_IDENT  "IntelTDX"




[PATCH v2 37/44] arm,omap2: Use WFI for omap2_pm_idle()

2022-09-19 Thread Peter Zijlstra
arch_cpu_idle() is a very simple idle interface and exposes only a
single idle state and is expected to not require RCU and not do any
tracing/instrumentation.

As such, omap2_pm_idle() is not a valid implementation. Replace it
with a simple (shallow) omap2_do_wfi() call.

Omap2 doesn't have a cpuidle driver; but adding one would be the
recourse to (re)gain the other idle states.

Suggested-by: Tony Lindgren 
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/pm24xx.c |   51 +--
 1 file changed, 2 insertions(+), 49 deletions(-)

--- a/arch/arm/mach-omap2/pm24xx.c
+++ b/arch/arm/mach-omap2/pm24xx.c
@@ -116,50 +116,12 @@ static int omap2_enter_full_retention(vo
 
 static int sti_console_enabled;
 
-static int omap2_allow_mpu_retention(void)
-{
-   if (!omap2xxx_cm_mpu_retention_allowed())
-   return 0;
-   if (sti_console_enabled)
-   return 0;
-
-   return 1;
-}
-
-static void omap2_enter_mpu_retention(void)
+static void omap2_do_wfi(void)
 {
const int zero = 0;
 
-   /* The peripherals seem not to be able to wake up the MPU when
-* it is in retention mode. */
-   if (omap2_allow_mpu_retention()) {
-   /* REVISIT: These write to reserved bits? */
-   omap_prm_clear_mod_irqs(CORE_MOD, PM_WKST1, ~0);
-   omap_prm_clear_mod_irqs(CORE_MOD, OMAP24XX_PM_WKST2, ~0);
-   omap_prm_clear_mod_irqs(WKUP_MOD, PM_WKST, ~0);
-
-   /* Try to enter MPU retention */
-   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_RET);
-
-   } else {
-   /* Block MPU retention */
-   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
-   }
-
/* WFI */
asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
-
-   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
-}
-
-static int omap2_can_sleep(void)
-{
-   if (omap2xxx_cm_fclks_active())
-   return 0;
-   if (__clk_is_enabled(osc_ck))
-   return 0;
-
-   return 1;
 }
 
 static void omap2_pm_idle(void)
@@ -169,16 +131,7 @@ static void omap2_pm_idle(void)
if (omap_irq_pending())
return;
 
-   error = cpu_cluster_pm_enter();
-   if (error || !omap2_can_sleep()) {
-   omap2_enter_mpu_retention();
-   goto out_cpu_cluster_pm;
-   }
-
-   omap2_enter_full_retention();
-
-out_cpu_cluster_pm:
-   cpu_cluster_pm_exit();
+   omap2_do_wfi();
 }
 
 static void __init prcm_setup_regs(void)




[PATCH v2 29/44] cpuidle,tdx: Make tdx noinstr clean

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: __halt+0x2c: call to hcall_func.constprop.0() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: __halt+0x3f: call to __tdx_hypercall() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __tdx_hypercall+0x66: call to 
__tdx_hypercall_failed() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/boot/compressed/vmlinux.lds.S |1 +
 arch/x86/coco/tdx/tdcall.S |2 ++
 arch/x86/coco/tdx/tdx.c|5 +++--
 3 files changed, 6 insertions(+), 2 deletions(-)

--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -34,6 +34,7 @@ SECTIONS
_text = .;  /* Text */
*(.text)
*(.text.*)
+   *(.noinstr.text)
_etext = . ;
}
.rodata : {
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -31,6 +31,8 @@
  TDX_R12 | TDX_R13 | \
  TDX_R14 | TDX_R15 )
 
+.section .noinstr.text, "ax"
+
 /*
  * __tdx_module_call()  - Used by TDX guests to request services from
  * the TDX module (does not include VMM services) using TDCALL instruction.
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -53,8 +53,9 @@ static inline u64 _tdx_hypercall(u64 fn,
 }
 
 /* Called from __tdx_hypercall() for unrecoverable failure */
-void __tdx_hypercall_failed(void)
+noinstr void __tdx_hypercall_failed(void)
 {
+   instrumentation_begin();
panic("TDVMCALL failed. TDX module bug?");
 }
 
@@ -64,7 +65,7 @@ void __tdx_hypercall_failed(void)
  * Reusing the KVM EXIT_REASON macros makes it easier to connect the host and
  * guest sides of these calls.
  */
-static u64 hcall_func(u64 exit_reason)
+static __always_inline u64 hcall_func(u64 exit_reason)
 {
return exit_reason;
 }




[PATCH v2 33/44] ftrace: WARN on rcuidle

2022-09-19 Thread Peter Zijlstra
CONFIG_GENERIC_ENTRY disallows any and all tracing when RCU isn't
enabled.

XXX if s390 (the only other GENERIC_ENTRY user as of this writing)
isn't comfortable with this, we could switch to
HAVE_NOINSTR_VALIDATION which is x86_64 only atm.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/tracepoint.h |   13 -
 kernel/trace/trace.c   |3 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -178,6 +178,16 @@ static inline struct tracepoint *tracepo
 #endif /* CONFIG_HAVE_STATIC_CALL */
 
 /*
+ * CONFIG_GENERIC_ENTRY archs are expected to have sanitized entry and idle
+ * code that disallow any/all tracing/instrumentation when RCU isn't watching.
+ */
+#ifdef CONFIG_GENERIC_ENTRY
+#define RCUIDLE_COND(rcuidle)  (rcuidle)
+#else
+#define RCUIDLE_COND(rcuidle)  (rcuidle && in_nmi())
+#endif
+
+/*
  * it_func[0] is never NULL because there is at least one element in the array
  * when the array itself is non NULL.
  */
@@ -189,7 +199,8 @@ static inline struct tracepoint *tracepo
return; \
\
/* srcu can't be used from NMI */   \
-   WARN_ON_ONCE(rcuidle && in_nmi());  \
+   if (WARN_ON_ONCE(RCUIDLE_COND(rcuidle)))\
+   return; \
\
/* keep srcu and sched-rcu usage consistent */  \
preempt_disable_notrace();  \
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3104,6 +3104,9 @@ void __trace_stack(struct trace_array *t
return;
}
 
+   if (WARN_ON_ONCE(IS_ENABLED(CONFIG_GENERIC_ENTRY)))
+   return;
+
/*
 * When an NMI triggers, RCU is enabled via ct_nmi_enter(),
 * but if the above rcu_is_watching() failed, then the NMI




[PATCH v2 36/44] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()

2022-09-19 Thread Peter Zijlstra
From: Tony Lindgren 

OMAP4 uses full SoC suspend modes as idle states, as such it needs the
whole power-domain and clock-domain code from the idle path.

All that code is not suitable to run with RCU disabled, as such push
RCU-idle deeper still.

Signed-off-by: Tony Lindgren 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/yqcv6crsnkusw...@atomide.com
---
 arch/arm/mach-omap2/common.h  |6 --
 arch/arm/mach-omap2/cpuidle44xx.c |8 ++--
 arch/arm/mach-omap2/omap-mpuss-lowpower.c |   12 +++-
 arch/arm/mach-omap2/pm44xx.c  |2 +-
 4 files changed, 18 insertions(+), 10 deletions(-)

--- a/arch/arm/mach-omap2/common.h
+++ b/arch/arm/mach-omap2/common.h
@@ -284,11 +284,13 @@ extern u32 omap4_get_cpu1_ns_pa_addr(voi
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PM)
 extern int omap4_mpuss_init(void);
-extern int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state);
+extern int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+   bool rcuidle);
 extern int omap4_hotplug_cpu(unsigned int cpu, unsigned int power_state);
 #else
 static inline int omap4_enter_lowpower(unsigned int cpu,
-   unsigned int power_state)
+   unsigned int power_state,
+   bool rcuidle)
 {
cpu_do_idle();
return 0;
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,9 +105,7 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(&mpu_lock, flag);
 
-   ct_cpuidle_enter();
-   omap4_enter_lowpower(dev->cpu, cx->cpu_state);
-   ct_cpuidle_exit();
+   omap4_enter_lowpower(dev->cpu, cx->cpu_state, true);
 
raw_spin_lock_irqsave(&mpu_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -186,10 +184,8 @@ static int omap_enter_idle_coupled(struc
}
}
 
-   ct_cpuidle_enter();
-   omap4_enter_lowpower(dev->cpu, cx->cpu_state);
+   omap4_enter_lowpower(dev->cpu, cx->cpu_state, true);
cpu_done[dev->cpu] = true;
-   ct_cpuidle_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
--- a/arch/arm/mach-omap2/omap-mpuss-lowpower.c
+++ b/arch/arm/mach-omap2/omap-mpuss-lowpower.c
@@ -33,6 +33,7 @@
  * and first to wake-up when MPUSS low power states are excercised
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -214,6 +215,7 @@ static void __init save_l2x0_context(voi
  * of OMAP4 MPUSS subsystem
  * @cpu : CPU ID
  * @power_state: Low power state.
+ * @rcuidle: RCU needs to be idled
  *
  * MPUSS states for the context save:
  * save_state =
@@ -222,7 +224,8 @@ static void __init save_l2x0_context(voi
  * 2 - CPUx L1 and logic lost + GIC lost: MPUSS OSWR
  * 3 - CPUx L1 and logic lost + GIC + L2 lost: DEVICE OFF
  */
-int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state)
+int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+bool rcuidle)
 {
struct omap4_cpu_pm_info *pm_info = &per_cpu(omap4_pm_info, cpu);
unsigned int save_state = 0, cpu_logic_state = PWRDM_POWER_RET;
@@ -268,6 +271,10 @@ int omap4_enter_lowpower(unsigned int cp
cpu_clear_prev_logic_pwrst(cpu);
pwrdm_set_next_pwrst(pm_info->pwrdm, power_state);
pwrdm_set_logic_retst(pm_info->pwrdm, cpu_logic_state);
+
+   if (rcuidle)
+   ct_cpuidle_enter();
+
set_cpu_wakeup_addr(cpu, __pa_symbol(omap_pm_ops.resume));
omap_pm_ops.scu_prepare(cpu, power_state);
l2x0_pwrst_prepare(cpu, save_state);
@@ -283,6 +290,9 @@ int omap4_enter_lowpower(unsigned int cp
if (IS_PM44XX_ERRATUM(PM_OMAP4_ROM_SMP_BOOT_ERRATUM_GICD) && cpu)
gic_dist_enable();
 
+   if (rcuidle)
+   ct_cpuidle_exit();
+
/*
 * Restore the CPUx power state to ON otherwise CPUx
 * power domain can transitions to programmed low power
--- a/arch/arm/mach-omap2/pm44xx.c
+++ b/arch/arm/mach-omap2/pm44xx.c
@@ -76,7 +76,7 @@ static int omap4_pm_suspend(void)
 * domain CSWR is not supported by hardware.
 * More details can be found in OMAP4430 TRM section 4.3.4.2.
 */
-   omap4_enter_lowpower(cpu_id, cpu_suspend_state);
+   omap4_enter_lowpower(cpu_id, cpu_suspend_state, false);
 
/* Restore next powerdomain state */
list_for_each_entry(pwrst, &pwrst_list, node) {




[PATCH v2 39/44] cpuidle,clk: Remove trace_.*_rcuidle()

2022-09-19 Thread Peter Zijlstra
OMAP was the one and only user.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/clk/clk.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -978,12 +978,12 @@ static void clk_core_disable(struct clk_
if (--core->enable_count > 0)
return;
 
-   trace_clk_disable_rcuidle(core);
+   trace_clk_disable(core);
 
if (core->ops->disable)
core->ops->disable(core->hw);
 
-   trace_clk_disable_complete_rcuidle(core);
+   trace_clk_disable_complete(core);
 
clk_core_disable(core->parent);
 }
@@ -1037,12 +1037,12 @@ static int clk_core_enable(struct clk_co
if (ret)
return ret;
 
-   trace_clk_enable_rcuidle(core);
+   trace_clk_enable(core);
 
if (core->ops->enable)
ret = core->ops->enable(core->hw);
 
-   trace_clk_enable_complete_rcuidle(core);
+   trace_clk_enable_complete(core);
 
if (ret) {
clk_core_disable(core->parent);




[PATCH v2 27/44] cpuidle,sched: Remove annotations from TIF_{POLLING_NRFLAG,NEED_RESCHED}

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: mwait_idle+0x5: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0xbc: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to 
current_set_polling_and_test() leaves .noinstr.text section

vmlinux.o: warning: objtool: intel_idle+0xa6: call to current_clr_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xbf: call to current_clr_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xa1: call to 
current_clr_polling() leaves .noinstr.text section

vmlinux.o: warning: objtool: mwait_idle+0xe: call to __current_set_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to 
__current_set_polling() leaves .noinstr.text section
vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0xbc: call to __current_set_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to 
__current_set_polling() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to 
__current_set_polling() leaves .noinstr.text section

vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0x73: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0x91: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x78: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_safe_halt+0xf: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/sched/idle.h  |   40 ++--
 include/linux/thread_info.h |   18 +-
 2 files changed, 47 insertions(+), 11 deletions(-)

--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -23,12 +23,37 @@ static inline void wake_up_if_idle(int c
  */
 #ifdef TIF_POLLING_NRFLAG
 
-static inline void __current_set_polling(void)
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_ATOMIC_H
+
+static __always_inline void __current_set_polling(void)
 {
-   set_thread_flag(TIF_POLLING_NRFLAG);
+   arch_set_bit(TIF_POLLING_NRFLAG,
+(unsigned long *)(¤t_thread_info()->flags));
 }
 
-static inline bool __must_check current_set_polling_and_test(void)
+static __always_inline void __current_clr_polling(void)
+{
+   arch_clear_bit(TIF_POLLING_NRFLAG,
+  (unsigned long *)(¤t_thread_info()->flags));
+}
+
+#else
+
+static __always_inline void __current_set_polling(void)
+{
+   set_bit(TIF_POLLING_NRFLAG,
+   (unsigned long *)(¤t_thread_info()->flags));
+}
+
+static __always_inline void __current_clr_polling(void)
+{
+   clear_bit(TIF_POLLING_NRFLAG,
+ (unsigned long *)(¤t_thread_info()->flags));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_ATOMIC_H */
+
+static __always_inline bool __must_check current_set_polling_and_test(void)
 {
__current_set_polling();
 
@@ -41,12 +66,7 @@ static inline bool __must_check current_
return unlikely(tif_need_resched());
 }
 
-static inline void __current_clr_polling(void)
-{
-   clear_thread_flag(TIF_POLLING_NRFLAG);
-}
-
-static inline bool __must_check current_clr_polling_and_test(void)
+static __always_inline bool __must_check current_clr_polling_and_test(void)
 {
__current_clr_polling();
 
@@ -73,7 +93,7 @@ static inline bool __must_check current_
 }
 #endif
 
-static inline void current_clr_polling(void)
+static __always_inline void current_clr_polling(void)
 {
__current_clr_polling();
 
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -177,7 +177,23 @@ static __always_inline unsigned long rea
clear_ti_thread_flag(task_thread_info(t), TIF_##fl)
 #endif /* !CONFIG_GENERIC_ENTRY */
 
-#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
+
+static __always_inline bool tif_need_resched(void)
+{
+   return arch_test_bit(TIF_NEED_RESCHED,
+(unsigned long *)(¤t_thread_info()->flags));
+}
+
+#else
+
+static __always_inline bool tif_need_resched(void)

[PATCH v2 32/44] cpuidle,acpi: Make noinstr clean

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: io_idle+0xc: call to __inb.isra.0() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0xfe: call to num_online_cpus() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0x115: call to 
acpi_idle_fallback_to_c1.isra.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
---
 arch/x86/include/asm/shared/io.h |4 ++--
 drivers/acpi/processor_idle.c|2 +-
 include/linux/cpumask.h  |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/shared/io.h
+++ b/arch/x86/include/asm/shared/io.h
@@ -5,13 +5,13 @@
 #include 
 
 #define BUILDIO(bwl, bw, type) \
-static inline void __out##bwl(type value, u16 port)\
+static __always_inline void __out##bwl(type value, u16 port)   \
 {  \
asm volatile("out" #bwl " %" #bw "0, %w1"   \
 : : "a"(value), "Nd"(port));   \
 }  \
\
-static inline type __in##bwl(u16 port) \
+static __always_inline type __in##bwl(u16 port)
\
 {  \
type value; \
asm volatile("in" #bwl " %w1, %" #bw "0"\
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -593,7 +593,7 @@ static int acpi_idle_play_dead(struct cp
return 0;
 }
 
-static bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
+static __always_inline bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
 {
return IS_ENABLED(CONFIG_HOTPLUG_CPU) && !pr->flags.has_cst &&
!(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED);
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -908,9 +908,9 @@ static inline const struct cpumask *get_
  * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
  * region.
  */
-static inline unsigned int num_online_cpus(void)
+static __always_inline unsigned int num_online_cpus(void)
 {
-   return atomic_read(&__num_online_cpus);
+   return arch_atomic_read(&__num_online_cpus);
 }
 #define num_possible_cpus()cpumask_weight(cpu_possible_mask)
 #define num_present_cpus() cpumask_weight(cpu_present_mask)




[PATCH v2 38/44] cpuidle,powerdomain: Remove trace_.*_rcuidle()

2022-09-19 Thread Peter Zijlstra
OMAP was the one and only user.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/powerdomain.c |   10 +-
 drivers/base/power/runtime.c  |   24 
 2 files changed, 17 insertions(+), 17 deletions(-)

--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -187,9 +187,9 @@ static int _pwrdm_state_switch(struct po
trace_state = (PWRDM_TRACE_STATES_FLAG |
   ((next & OMAP_POWERSTATE_MASK) << 8) |
   ((prev & OMAP_POWERSTATE_MASK) << 0));
-   trace_power_domain_target_rcuidle(pwrdm->name,
- trace_state,
- 
raw_smp_processor_id());
+   trace_power_domain_target(pwrdm->name,
+ trace_state,
+ raw_smp_processor_id());
}
break;
default:
@@ -541,8 +541,8 @@ int pwrdm_set_next_pwrst(struct powerdom
 
if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
/* Trace the pwrdm desired target state */
-   trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
- raw_smp_processor_id());
+   trace_power_domain_target(pwrdm->name, pwrst,
+ raw_smp_processor_id());
/* Program the pwrdm desired target state */
ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
}
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -442,7 +442,7 @@ static int rpm_idle(struct device *dev,
int (*callback)(struct device *);
int retval;
 
-   trace_rpm_idle_rcuidle(dev, rpmflags);
+   trace_rpm_idle(dev, rpmflags);
retval = rpm_check_suspend_allowed(dev);
if (retval < 0)
;   /* Conditions are wrong. */
@@ -481,7 +481,7 @@ static int rpm_idle(struct device *dev,
dev->power.request_pending = true;
queue_work(pm_wq, &dev->power.work);
}
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, 0);
+   trace_rpm_return_int(dev, _THIS_IP_, 0);
return 0;
}
 
@@ -493,7 +493,7 @@ static int rpm_idle(struct device *dev,
wake_up_all(&dev->power.wait_queue);
 
  out:
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
return retval ? retval : rpm_suspend(dev, rpmflags | RPM_AUTO);
 }
 
@@ -557,7 +557,7 @@ static int rpm_suspend(struct device *de
struct device *parent = NULL;
int retval;
 
-   trace_rpm_suspend_rcuidle(dev, rpmflags);
+   trace_rpm_suspend(dev, rpmflags);
 
  repeat:
retval = rpm_check_suspend_allowed(dev);
@@ -708,7 +708,7 @@ static int rpm_suspend(struct device *de
}
 
  out:
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
 
return retval;
 
@@ -760,7 +760,7 @@ static int rpm_resume(struct device *dev
struct device *parent = NULL;
int retval = 0;
 
-   trace_rpm_resume_rcuidle(dev, rpmflags);
+   trace_rpm_resume(dev, rpmflags);
 
  repeat:
if (dev->power.runtime_error) {
@@ -925,7 +925,7 @@ static int rpm_resume(struct device *dev
spin_lock_irq(&dev->power.lock);
}
 
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
 
return retval;
 }
@@ -1081,7 +1081,7 @@ int __pm_runtime_idle(struct device *dev
if (retval < 0) {
return retval;
} else if (retval > 0) {
-   trace_rpm_usage_rcuidle(dev, rpmflags);
+   trace_rpm_usage(dev, rpmflags);
return 0;
}
}
@@ -1119,7 +1119,7 @@ int __pm_runtime_suspend(struct device *
if (retval < 0) {
return retval;
} else if (retval > 0) {
-   trace_rpm_usage_rcuidle(dev, rpmflags);
+   trace_rpm_usage(dev, rpmflags);
return 0;
}
}
@@ -1202,7 +1202,7 @@ int pm_runtime_get_if_active(struct devi
} else {
retval = atomic_inc_not_zero(&dev->power.usage_count);
}
-   trace_rpm_usage_rcuidle(dev, 0);
+   trace_rpm_usage(dev, 0);
spin_unlock_irqrestore(&dev->power.lock, flags);
 
return retval;
@@ -1566,7 +1566,7 @@ void pm_runtime_allow(struct device *dev
if (ret == 0)
rpm_idle(dev, RPM_AUTO | RPM_ASYNC);
else if (ret > 0)
-

[PATCH v2 13/44] cpuidle: Fix ct_idle_*() usage

2022-09-19 Thread Peter Zijlstra
The whole disable-RCU, enable-IRQS dance is very intricate since
changing IRQ state is traced, which depends on RCU.

Add two helpers for the cpuidle case that mirror the entry code.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-imx/cpuidle-imx6q.c|4 +--
 arch/arm/mach-imx/cpuidle-imx6sx.c   |4 +--
 arch/arm/mach-omap2/cpuidle34xx.c|4 +--
 arch/arm/mach-omap2/cpuidle44xx.c|8 +++---
 drivers/acpi/processor_idle.c|8 --
 drivers/cpuidle/cpuidle-big_little.c |4 +--
 drivers/cpuidle/cpuidle-mvebu-v7.c   |4 +--
 drivers/cpuidle/cpuidle-psci.c   |4 +--
 drivers/cpuidle/cpuidle-riscv-sbi.c  |4 +--
 drivers/cpuidle/cpuidle-tegra.c  |8 +++---
 drivers/cpuidle/cpuidle.c|   11 
 include/linux/cpuidle.h  |   38 ++---
 kernel/sched/idle.c  |   45 ++-
 kernel/time/tick-broadcast.c |6 +++-
 14 files changed, 86 insertions(+), 66 deletions(-)

--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -25,9 +25,9 @@ static int imx6q_enter_wait(struct cpuid
imx6_set_lpm(WAIT_UNCLOCKED);
raw_spin_unlock(&cpuidle_lock);
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
cpu_do_idle();
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
raw_spin_lock(&cpuidle_lock);
if (num_idle_cpus-- == num_online_cpus())
--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -47,9 +47,9 @@ static int imx6sx_enter_wait(struct cpui
cpu_pm_enter();
cpu_cluster_pm_enter();
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
cpu_suspend(0, imx6sx_idle_finish);
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
cpu_cluster_pm_exit();
cpu_pm_exit();
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,9 +133,9 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
-   ct_idle_enter();
+   ct_cpuidle_enter();
omap_sram_idle();
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
/*
 * Call idle CPU PM enter notifier chain to restore
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,9 +105,9 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(&mpu_lock, flag);
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
raw_spin_lock_irqsave(&mpu_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -186,10 +186,10 @@ static int omap_enter_idle_coupled(struc
}
}
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
cpu_done[dev->cpu] = true;
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -627,6 +627,8 @@ static int __cpuidle acpi_idle_enter_bm(
 */
bool dis_bm = pr->flags.bm_control;
 
+   instrumentation_begin();
+
/* If we can skip BM, demote to a safe state. */
if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
dis_bm = false;
@@ -648,11 +650,11 @@ static int __cpuidle acpi_idle_enter_bm(
raw_spin_unlock(&c3_lock);
}
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
 
acpi_idle_do_entry(cx);
 
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
/* Re-enable bus master arbitration */
if (dis_bm) {
@@ -662,6 +664,8 @@ static int __cpuidle acpi_idle_enter_bm(
raw_spin_unlock(&c3_lock);
}
 
+   instrumentation_end();
+
return index;
 }
 
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -126,13 +126,13 @@ static int bl_enter_powerdown(struct cpu
struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
-   ct_idle_enter();
+   ct_cpuidle_enter();
 
cpu_suspend(0, bl_powerdown_finisher);
 
/* signals the MCPM core that CPU is out of low power state */
mcpm_cpu_powered_up();
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
cpu_pm_exit();
 
--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -36,9 +36,9 @@ static int mvebu_v7_enter_idle(struct cp
if (drv->states[index].flags & MVEBU_V7_FLAG_DEEP_IDLE)
deepidle = true;
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
ret = mvebu_v7_cpu_suspend(deepidle);
-   ct_idle_exit();
+  

[PATCH v2 17/44] objtool/idle: Validate __cpuidle code as noinstr

2022-09-19 Thread Peter Zijlstra
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Geert Uytterhoeven 
---
 arch/alpha/kernel/vmlinux.lds.S  |1 -
 arch/arc/kernel/vmlinux.lds.S|1 -
 arch/arm/include/asm/vmlinux.lds.h   |1 -
 arch/arm64/kernel/vmlinux.lds.S  |1 -
 arch/csky/kernel/vmlinux.lds.S   |1 -
 arch/hexagon/kernel/vmlinux.lds.S|1 -
 arch/ia64/kernel/vmlinux.lds.S   |1 -
 arch/loongarch/kernel/vmlinux.lds.S  |1 -
 arch/m68k/kernel/vmlinux-nommu.lds   |1 -
 arch/m68k/kernel/vmlinux-std.lds |1 -
 arch/m68k/kernel/vmlinux-sun3.lds|1 -
 arch/microblaze/kernel/vmlinux.lds.S |1 -
 arch/mips/kernel/vmlinux.lds.S   |1 -
 arch/nios2/kernel/vmlinux.lds.S  |1 -
 arch/openrisc/kernel/vmlinux.lds.S   |1 -
 arch/parisc/kernel/vmlinux.lds.S |1 -
 arch/powerpc/kernel/vmlinux.lds.S|1 -
 arch/riscv/kernel/vmlinux-xip.lds.S  |1 -
 arch/riscv/kernel/vmlinux.lds.S  |1 -
 arch/s390/kernel/vmlinux.lds.S   |1 -
 arch/sh/kernel/vmlinux.lds.S |1 -
 arch/sparc/kernel/vmlinux.lds.S  |1 -
 arch/um/kernel/dyn.lds.S |1 -
 arch/um/kernel/uml.lds.S |1 -
 arch/x86/include/asm/irqflags.h  |   11 ---
 arch/x86/include/asm/mwait.h |2 +-
 arch/x86/kernel/vmlinux.lds.S|1 -
 arch/xtensa/kernel/vmlinux.lds.S |1 -
 include/asm-generic/vmlinux.lds.h|9 +++--
 include/linux/compiler_types.h   |8 ++--
 include/linux/cpu.h  |3 ---
 tools/objtool/check.c|   13 +
 32 files changed, 27 insertions(+), 45 deletions(-)

--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -27,7 +27,6 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -85,7 +85,6 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -96,7 +96,6 @@
SOFTIRQENTRY_TEXT   \
TEXT_TEXT   \
SCHED_TEXT  \
-   CPUIDLE_TEXT\
LOCK_TEXT   \
KPROBES_TEXT\
ARM_STUBS_TEXT  \
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -163,7 +163,6 @@ SECTIONS
ENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
HYPERVISOR_TEXT
--- a/arch/csky/kernel/vmlinux.lds.S
+++ b/arch/csky/kernel/vmlinux.lds.S
@@ -38,7 +38,6 @@ SECTIONS
SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ b/arch/hexagon/kernel/vmlinux.lds.S
@@ -41,7 +41,6 @@ SECTIONS
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -51,7 +51,6 @@ SECTIONS {
__end_ivt_text = .;
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
--- a/arch/loongarch/kernel/vmlinux.lds.S
+++ b/arch/loongarch/kernel/vmlinux.lds.S
@@ -41,7 +41,6 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
--- a/arch/m68k/kernel/vmlinux-nommu.lds
+++ b/arch/m68k/kernel/vmlinux-nommu.lds
@@ -48,7 +48,6 @@ SECTIONS {
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
. = ALIGN(16);
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -19,7 +19,6 @@ SECTIONS
IRQENTRY_TEXT
SOFTIRQEN

[PATCH v2 43/44] sched: Always inline __this_cpu_preempt_check()

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: in_entry_stack+0x9: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: default_do_nmi+0x10: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: fpu_idle_fpregs+0x41: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: kvm_read_and_reset_apf_flags+0x1: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: lockdep_hardirqs_on+0xb0: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: lockdep_hardirqs_off+0xae: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_nmi_enter+0x69: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_nmi_exit+0x32: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0x9: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0x43: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0x45: call to 
__this_cpu_preempt_check() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/percpu-defs.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -310,7 +310,7 @@ extern void __bad_size_call_parameter(vo
 #ifdef CONFIG_DEBUG_PREEMPT
 extern void __this_cpu_preempt_check(const char *op);
 #else
-static inline void __this_cpu_preempt_check(const char *op) { }
+static __always_inline void __this_cpu_preempt_check(const char *op) { }
 #endif
 
 #define __pcpu_size_call_return(stem, variable)
\




[PATCH v2 20/44] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IBRS

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() 
leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/kernel/cpu/bugs.c |2 +-
 drivers/idle/intel_idle.c  |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -79,7 +79,7 @@ void write_spec_ctrl_current(u64 val, bo
wrmsrl(MSR_IA32_SPEC_CTRL, val);
 }
 
-u64 spec_ctrl_current(void)
+noinstr u64 spec_ctrl_current(void)
 {
return this_cpu_read(x86_spec_ctrl_current);
 }
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -181,12 +181,12 @@ static __cpuidle int intel_idle_ibrs(str
int ret;
 
if (smt_active)
-   wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+   native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
 
ret = __intel_idle(dev, drv, index);
 
if (smt_active)
-   wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+   native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
 
return ret;
 }




[PATCH v2 28/44] cpuidle,mwait: Make noinstr clean

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: intel_idle_s2idle+0x6e: call to 
__monitor.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0x8c: call to 
__monitor.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x73: call to __monitor.constprop.0() 
leaves .noinstr.text section

vmlinux.o: warning: objtool: mwait_idle+0x88: call to clflush() leaves 
.noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/mwait.h |   12 ++--
 arch/x86/include/asm/special_insns.h |2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -25,7 +25,7 @@
 #define TPAUSE_C01_STATE   1
 #define TPAUSE_C02_STATE   0
 
-static inline void __monitor(const void *eax, unsigned long ecx,
+static __always_inline void __monitor(const void *eax, unsigned long ecx,
 unsigned long edx)
 {
/* "monitor %eax, %ecx, %edx;" */
@@ -33,7 +33,7 @@ static inline void __monitor(const void
 :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
-static inline void __monitorx(const void *eax, unsigned long ecx,
+static __always_inline void __monitorx(const void *eax, unsigned long ecx,
  unsigned long edx)
 {
/* "monitorx %eax, %ecx, %edx;" */
@@ -41,7 +41,7 @@ static inline void __monitorx(const void
 :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
-static inline void __mwait(unsigned long eax, unsigned long ecx)
+static __always_inline void __mwait(unsigned long eax, unsigned long ecx)
 {
mds_idle_clear_cpu_buffers();
 
@@ -76,8 +76,8 @@ static inline void __mwait(unsigned long
  * EAX (logical) address to monitor
  * ECX #GP if not zero
  */
-static inline void __mwaitx(unsigned long eax, unsigned long ebx,
-   unsigned long ecx)
+static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
+unsigned long ecx)
 {
/* No MDS buffer clear as this is AMD/HYGON only */
 
@@ -86,7 +86,7 @@ static inline void __mwaitx(unsigned lon
 :: "a" (eax), "b" (ebx), "c" (ecx));
 }
 
-static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
+static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
mds_idle_clear_cpu_buffers();
/* "mwait %eax, %ecx;" */
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -196,7 +196,7 @@ static inline void load_gs_index(unsigne
 
 #endif /* CONFIG_PARAVIRT_XXL */
 
-static inline void clflush(volatile void *__p)
+static __always_inline void clflush(volatile void *__p)
 {
asm volatile("clflush %0" : "+m" (*(volatile char __force *)__p));
 }




[PATCH v2 34/44] cpuidle,omap3: Use WFI for omap3_pm_idle()

2022-09-19 Thread Peter Zijlstra
arch_cpu_idle() is a very simple idle interface and exposes only a
single idle state and is expected to not require RCU and not do any
tracing/instrumentation.

As such, omap_sram_idle() is not a valid implementation. Replace it
with the simple (shallow) omap3_do_wfi() call. Leaving the more
complicated idle states for the cpuidle driver.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Tony Lindgren 
---
 arch/arm/mach-omap2/pm34xx.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -294,7 +294,7 @@ static void omap3_pm_idle(void)
if (omap_irq_pending())
return;
 
-   omap_sram_idle();
+   omap3_do_wfi();
 }
 
 #ifdef CONFIG_SUSPEND




[PATCH v2 12/44] cpuidle,dt: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is daft.

Notably: this converts all dt_init_idle_driver() and
__CPU_PM_CPU_IDLE_ENTER() users for they are inextrably intertwined.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/cpuidle34xx.c|4 ++--
 drivers/acpi/processor_idle.c|2 ++
 drivers/cpuidle/cpuidle-arm.c|1 +
 drivers/cpuidle/cpuidle-big_little.c |8 ++--
 drivers/cpuidle/cpuidle-psci.c   |1 +
 drivers/cpuidle/cpuidle-qcom-spm.c   |1 +
 drivers/cpuidle/cpuidle-riscv-sbi.c  |1 +
 drivers/cpuidle/dt_idle_states.c |2 +-
 include/linux/cpuidle.h  |4 
 9 files changed, 19 insertions(+), 5 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1200,6 +1200,8 @@ static int acpi_processor_setup_lpi_stat
state->target_residency = lpi->min_residency;
if (lpi->arch_flags)
state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+   if (lpi->entry_method == ACPI_CSTATE_FFH)
+   state->flags |= CPUIDLE_FLAG_RCU_IDLE;
state->enter = acpi_idle_lpi_enter;
drv->safe_state_index = i;
}
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -53,6 +53,7 @@ static struct cpuidle_driver arm_idle_dr
 * handler for idle state index 0.
 */
.states[0] = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = arm_enter_idle_state,
.exit_latency   = 1,
.target_residency   = 1,
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -64,7 +64,8 @@ static struct cpuidle_driver bl_idle_lit
.enter  = bl_enter_powerdown,
.exit_latency   = 700,
.target_residency   = 2500,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C1",
.desc   = "ARM little-cluster power down",
},
@@ -85,7 +86,8 @@ static struct cpuidle_driver bl_idle_big
.enter  = bl_enter_powerdown,
.exit_latency   = 500,
.target_residency   = 2000,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C1",
.desc   = "ARM big-cluster power down",
},
@@ -124,11 +126,13 @@ static int bl_enter_powerdown(struct cpu
struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
+   ct_idle_enter();
 
cpu_suspend(0, bl_powerdown_finisher);
 
/* signals the MCPM core that CPU is out of low power state */
mcpm_cpu_powered_up();
+   ct_idle_exit();
 
cpu_pm_exit();
 
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -357,6 +357,7 @@ static int psci_idle_init_cpu(struct dev
 * PSCI idle states relies on architectural WFI to be represented as
 * state index 0.
 */
+   drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
drv->states[0].enter = psci_enter_idle_state;
drv->states[0].exit_latency = 1;
drv->states[0].target_residency = 1;
--- a/drivers/cpuidle/cpuidle-qcom-spm.c
+++ b/drivers/cpuidle/cpuidle-qcom-spm.c
@@ -72,6 +72,7 @@ static struct cpuidle_driver qcom_spm_id
.owner = THIS_MODULE,
.states[0] = {
.enter  = spm_enter_idle_state,
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.exit_latency   = 1,
.target_residency   = 1,
.power_usage= UINT_MAX,
--- a/drivers/cpuidle/cpuidle-riscv-sbi.c
+++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
@@ -332,6 +332,7 @@ static int sbi_cpuidle_init_cpu(struct d
drv->cpumask = (struct cpumask *)cpumask_of(cpu);
 
/* RISC-V architectural WFI to be represented as state index 0. */
+   drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
drv->states[0].enter = sbi_cpuidle_enter_state;
drv->states[0].exit_latency = 1;
drv->states[0].target_residency = 1;
--- a/drivers/cpuidle/dt_idle_states.c
+++ b/drivers/cpuidle/dt_idle_states.c
@@ -77,7 +77,7 @@ static int init_state_node(struct cpuidl
if (err)
desc = state_node->name;
 
-   idle_state->flags = 0;
+   idle_state->flags = CPUIDLE_FLAG_RCU_IDLE;
if (of_property_read_bool(state_node, "loca

[PATCH v2 35/44] cpuidle,omap3: Push RCU-idle into omap_sram_idle()

2022-09-19 Thread Peter Zijlstra
OMAP3 uses full SoC suspend modes as idle states, as such it needs the
whole power-domain and clock-domain code from the idle path.

All that code is not suitable to run with RCU disabled, as such push
RCU-idle deeper still.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Tony Lindgren 
Tested-by: Tony Lindgren 
---
 arch/arm/mach-omap2/cpuidle34xx.c |4 +---
 arch/arm/mach-omap2/pm.h  |2 +-
 arch/arm/mach-omap2/pm34xx.c  |   12 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,9 +133,7 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
-   ct_cpuidle_enter();
-   omap_sram_idle();
-   ct_cpuidle_exit();
+   omap_sram_idle(true);
 
/*
 * Call idle CPU PM enter notifier chain to restore
--- a/arch/arm/mach-omap2/pm.h
+++ b/arch/arm/mach-omap2/pm.h
@@ -29,7 +29,7 @@ static inline int omap4_idle_init(void)
 
 extern void *omap3_secure_ram_storage;
 extern void omap3_pm_off_mode_enable(int);
-extern void omap_sram_idle(void);
+extern void omap_sram_idle(bool rcuidle);
 extern int omap_pm_clkdms_setup(struct clockdomain *clkdm, void *unused);
 
 #if defined(CONFIG_PM_OPP)
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -174,7 +175,7 @@ static int omap34xx_do_sram_idle(unsigne
return 0;
 }
 
-void omap_sram_idle(void)
+void omap_sram_idle(bool rcuidle)
 {
/* Variable to tell what needs to be saved and restored
 * in omap_sram_idle*/
@@ -254,11 +255,18 @@ void omap_sram_idle(void)
 */
if (save_state)
omap34xx_save_context(omap3_arm_context);
+
+   if (rcuidle)
+   ct_cpuidle_enter();
+
if (save_state == 1 || save_state == 3)
cpu_suspend(save_state, omap34xx_do_sram_idle);
else
omap34xx_do_sram_idle(save_state);
 
+   if (rcuidle)
+   ct_cpuidle_exit();
+
/* Restore normal SDRC POWER settings */
if (cpu_is_omap3430() && omap_rev() >= OMAP3430_REV_ES3_0 &&
(omap_type() == OMAP2_DEVICE_TYPE_EMU ||
@@ -316,7 +324,7 @@ static int omap3_pm_suspend(void)
 
omap3_intc_suspend();
 
-   omap_sram_idle();
+   omap_sram_idle(false);
 
 restore:
/* Restore next_pwrsts */




[PATCH v2 44/44] arm64,riscv,perf: Remove RCU_NONIDLE() usage

2022-09-19 Thread Peter Zijlstra
The PM notifiers should no longer be ran with RCU disabled (per the
previous patches), as such this hack is no longer required either.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/perf/arm_pmu.c   |   11 +--
 drivers/perf/riscv_pmu_sbi.c |8 +---
 2 files changed, 2 insertions(+), 17 deletions(-)

--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -762,17 +762,8 @@ static void cpu_pm_pmu_setup(struct arm_
case CPU_PM_ENTER_FAILED:
 /*
  * Restore and enable the counter.
- * armpmu_start() indirectly calls
- *
- * perf_event_update_userpage()
- *
- * that requires RCU read locking to be functional,
- * wrap the call within RCU_NONIDLE to make the
- * RCU subsystem aware this cpu is not idle from
- * an RCU perspective for the armpmu_start() call
- * duration.
  */
-   RCU_NONIDLE(armpmu_start(event, PERF_EF_RELOAD));
+   armpmu_start(event, PERF_EF_RELOAD);
break;
default:
break;
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -747,14 +747,8 @@ static int riscv_pm_pmu_notify(struct no
case CPU_PM_ENTER_FAILED:
/*
 * Restore and enable the counter.
-*
-* Requires RCU read locking to be functional,
-* wrap the call within RCU_NONIDLE to make the
-* RCU subsystem aware this cpu is not idle from
-* an RCU perspective for the riscv_pmu_start() call
-* duration.
 */
-   RCU_NONIDLE(riscv_pmu_start(event, PERF_EF_RELOAD));
+   riscv_pmu_start(event, PERF_EF_RELOAD);
break;
default:
break;




[PATCH v2 01/44] x86/perf/amd: Remove tracing from perf_lopwr_cb()

2022-09-19 Thread Peter Zijlstra
The perf_lopwr_cb() is called from the idle routines; there is no RCU
there, we must not enter tracing.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/events/amd/brs.c |   13 +
 arch/x86/include/asm/perf_event.h |2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -41,18 +41,15 @@ static inline unsigned int brs_to(int id
return MSR_AMD_SAMP_BR_FROM + 2 * idx + 1;
 }
 
-static inline void set_debug_extn_cfg(u64 val)
+static __always_inline void set_debug_extn_cfg(u64 val)
 {
/* bits[4:3] must always be set to 11b */
-   wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
+   __wrmsr(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
 }
 
-static inline u64 get_debug_extn_cfg(void)
+static __always_inline u64 get_debug_extn_cfg(void)
 {
-   u64 val;
-
-   rdmsrl(MSR_AMD_DBG_EXTN_CFG, val);
-   return val;
+   return __rdmsr(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
@@ -338,7 +335,7 @@ void amd_pmu_brs_sched_task(struct perf_
  * called from ACPI processor_idle.c or acpi_pad.c
  * with interrupts disabled
  */
-void perf_amd_brs_lopwr_cb(bool lopwr_in)
+void noinstr perf_amd_brs_lopwr_cb(bool lopwr_in)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
union amd_debug_extn_cfg cfg;
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -554,7 +554,7 @@ extern void perf_amd_brs_lopwr_cb(bool l
 
 DECLARE_STATIC_CALL(perf_lopwr_cb, perf_amd_brs_lopwr_cb);
 
-static inline void perf_lopwr_cb(bool lopwr_in)
+static __always_inline void perf_lopwr_cb(bool lopwr_in)
 {
static_call_mod(perf_lopwr_cb)(lopwr_in);
 }




[PATCH v2 23/44] arm,smp: Remove trace_.*_rcuidle() usage

2022-09-19 Thread Peter Zijlstra
None of these functions should ever be ran with RCU disabled anymore.

Specifically, do_handle_IPI() is only called from handle_IPI() which
explicitly does irq_enter()/irq_exit() which ensures RCU is watching.

The problem with smp_cross_call() was, per commit 7c64cc0531fa ("arm: Use
_rcuidle for smp_cross_call() tracepoints"), that
cpuidle_enter_state_coupled() already had RCU disabled, but that's
long been fixed by commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
deeper into the idle path").

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/kernel/smp.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -639,7 +639,7 @@ static void do_handle_IPI(int ipinr)
unsigned int cpu = smp_processor_id();
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
+   trace_ipi_entry(ipi_types[ipinr]);
 
switch (ipinr) {
case IPI_WAKEUP:
@@ -686,7 +686,7 @@ static void do_handle_IPI(int ipinr)
}
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
+   trace_ipi_exit(ipi_types[ipinr]);
 }
 
 /* Legacy version, should go away once all irqchips have been converted */
@@ -709,7 +709,7 @@ static irqreturn_t ipi_handler(int irq,
 
 static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 {
-   trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
+   trace_ipi_raise(target, ipi_types[ipinr]);
__ipi_send_mask(ipi_desc[ipinr], target);
 }
 




[PATCH v2 08/44] cpuidle,imx6: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-imx/cpuidle-imx6sx.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -47,7 +47,9 @@ static int imx6sx_enter_wait(struct cpui
cpu_pm_enter();
cpu_cluster_pm_enter();
 
+   ct_idle_enter();
cpu_suspend(0, imx6sx_idle_finish);
+   ct_idle_exit();
 
cpu_cluster_pm_exit();
cpu_pm_exit();
@@ -87,7 +89,8 @@ static struct cpuidle_driver imx6sx_cpui
 */
.exit_latency = 300,
.target_residency = 500,
-   .flags = CPUIDLE_FLAG_TIMER_STOP,
+   .flags = CPUIDLE_FLAG_TIMER_STOP |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = imx6sx_enter_wait,
.name = "LOW-POWER-IDLE",
.desc = "ARM power off",




[PATCH v2 42/44] entry,kasan,x86: Disallow overriding mem*() functions

2022-09-19 Thread Peter Zijlstra
KASAN cannot just hijack the mem*() functions, it needs to emit
__asan_mem*() variants if it wants instrumentation (other sanitizers
already do this).

vmlinux.o: warning: objtool: sync_regs+0x24: call to memcpy() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: vc_switch_off_ist+0xbe: call to memcpy() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: fixup_bad_iret+0x36: call to memset() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __sev_get_ghcb+0xa0: call to memcpy() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __sev_put_ghcb+0x35: call to memcpy() leaves 
.noinstr.text section

Remove the weak aliases to ensure nobody hijacks these functions and
add them to the noinstr section.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/lib/memcpy_64.S  |5 ++---
 arch/x86/lib/memmove_64.S |4 +++-
 arch/x86/lib/memset_64.S  |4 +++-
 mm/kasan/kasan.h  |4 
 mm/kasan/shadow.c |   38 ++
 tools/objtool/check.c |3 +++
 6 files changed, 53 insertions(+), 5 deletions(-)

--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -7,7 +7,7 @@
 #include 
 #include 
 
-.pushsection .noinstr.text, "ax"
+.section .noinstr.text, "ax"
 
 /*
  * We build a jump to memcpy_orig by default which gets NOPped out on
@@ -42,7 +42,7 @@ SYM_FUNC_START(__memcpy)
 SYM_FUNC_END(__memcpy)
 EXPORT_SYMBOL(__memcpy)
 
-SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
+SYM_FUNC_ALIAS(memcpy, __memcpy)
 EXPORT_SYMBOL(memcpy)
 
 /*
@@ -183,4 +183,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
RET
 SYM_FUNC_END(memcpy_orig)
 
-.popsection
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -13,6 +13,8 @@
 
 #undef memmove
 
+.section .noinstr.text, "ax"
+
 /*
  * Implement memmove(). This can handle overlap between src and dst.
  *
@@ -213,5 +215,5 @@ SYM_FUNC_START(__memmove)
 SYM_FUNC_END(__memmove)
 EXPORT_SYMBOL(__memmove)
 
-SYM_FUNC_ALIAS_WEAK(memmove, __memmove)
+SYM_FUNC_ALIAS(memmove, __memmove)
 EXPORT_SYMBOL(memmove)
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+.section .noinstr.text, "ax"
+
 /*
  * ISO C memset - set a memory block to a byte value. This function uses fast
  * string to get better performance than the original function. The code is
@@ -43,7 +45,7 @@ SYM_FUNC_START(__memset)
 SYM_FUNC_END(__memset)
 EXPORT_SYMBOL(__memset)
 
-SYM_FUNC_ALIAS_WEAK(memset, __memset)
+SYM_FUNC_ALIAS(memset, __memset)
 EXPORT_SYMBOL(memset)
 
 /*
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -551,6 +551,10 @@ void __asan_set_shadow_f3(const void *ad
 void __asan_set_shadow_f5(const void *addr, size_t size);
 void __asan_set_shadow_f8(const void *addr, size_t size);
 
+void *__asan_memset(void *addr, int c, size_t len);
+void *__asan_memmove(void *dest, const void *src, size_t len);
+void *__asan_memcpy(void *dest, const void *src, size_t len);
+
 void __hwasan_load1_noabort(unsigned long addr);
 void __hwasan_store1_noabort(unsigned long addr);
 void __hwasan_load2_noabort(unsigned long addr);
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -38,6 +38,12 @@ bool __kasan_check_write(const volatile
 }
 EXPORT_SYMBOL(__kasan_check_write);
 
+#ifndef CONFIG_GENERIC_ENTRY
+/*
+ * CONFIG_GENERIC_ENTRY relies on compiler emitted mem*() calls to not be
+ * instrumented. KASAN enabled toolchains should emit __asan_mem*() functions
+ * for the sites they want to instrument.
+ */
 #undef memset
 void *memset(void *addr, int c, size_t len)
 {
@@ -68,6 +74,38 @@ void *memcpy(void *dest, const void *src
 
return __memcpy(dest, src, len);
 }
+#endif
+
+void *__asan_memset(void *addr, int c, size_t len)
+{
+   if (!kasan_check_range((unsigned long)addr, len, true, _RET_IP_))
+   return NULL;
+
+   return __memset(addr, c, len);
+}
+EXPORT_SYMBOL(__asan_memset);
+
+#ifdef __HAVE_ARCH_MEMMOVE
+void *__asan_memmove(void *dest, const void *src, size_t len)
+{
+   if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) ||
+   !kasan_check_range((unsigned long)dest, len, true, _RET_IP_))
+   return NULL;
+
+   return __memmove(dest, src, len);
+}
+EXPORT_SYMBOL(__asan_memmove);
+#endif
+
+void *__asan_memcpy(void *dest, const void *src, size_t len)
+{
+   if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) ||
+   !kasan_check_range((unsigned long)dest, len, true, _RET_IP_))
+   return NULL;
+
+   return __memcpy(dest, src, len);
+}
+EXPORT_SYMBOL(__asan_memcpy);
 
 void kasan_poison(const void *addr, size_t size, u8 value, bool init)
 {
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -956,6 +956,9 @@ static const char *uaccess_safe_builtin[
"__asan_store16_noabort",
"__kasan_check_read",
"__kasan_check_write",
+   "__asan_memset",
+   "__asan_memmove",
+   "__asan_memcpy",
/* KASAN in-line */
"__as

[PATCH v2 24/44] arm64,smp: Remove trace_.*_rcuidle() usage

2022-09-19 Thread Peter Zijlstra
Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as
normal interrupts") this function is called in regular IRQ context.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Mark Rutland 
Acked-by: Marc Zyngier 
---
 arch/arm64/kernel/smp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr)
unsigned int cpu = smp_processor_id();
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
+   trace_ipi_entry(ipi_types[ipinr]);
 
switch (ipinr) {
case IPI_RESCHEDULE:
@@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr)
}
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
+   trace_ipi_exit(ipi_types[ipinr]);
 }
 
 static irqreturn_t ipi_handler(int irq, void *data)




[PATCH v2 18/44] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*

2022-09-19 Thread Peter Zijlstra
  vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to 
trace_hardirqs_off() leaves .noinstr.text section

As per commit 32d4fd5751ea ("cpuidle,intel_idle: Fix
CPUIDLE_FLAG_IRQ_ENABLE"):

  "must not have tracing in idle functions"

Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f236 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").

It doesn't re-introduce the problem because preceding patches fixed it
properly.

Fixes: d295ad34f236 ("intel_idle: Fix false positive RCU splats due to 
incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/idle/intel_idle.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -168,13 +168,7 @@ static __cpuidle int intel_idle_irq(stru
 
raw_local_irq_enable();
ret = __intel_idle(dev, drv, index);
-
-   /*
-* The lockdep hardirqs state may be changed to 'on' with timer
-* tick interrupt followed by __do_softirq(). Use local_irq_disable()
-* to keep the hardirqs state correct.
-*/
-   local_irq_disable();
+   raw_local_irq_disable();
 
return ret;
 }




[PATCH v2 26/44] time/tick-broadcast: Remove RCU_NONIDLE usage

2022-09-19 Thread Peter Zijlstra
No callers left that have already disabled RCU.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Mark Rutland 
---
 kernel/time/tick-broadcast-hrtimer.c |   29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires,
 * hrtimer callback function is currently running, then
 * hrtimer_start() cannot move it and the timer stays on the CPU on
 * which it is assigned at the moment.
+*/
+   hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
+   /*
+* The core tick broadcast mode expects bc->bound_on to be set
+* correctly to prevent a CPU which has the broadcast hrtimer
+* armed from going deep idle.
 *
-* As this can be called from idle code, the hrtimer_start()
-* invocation has to be wrapped with RCU_NONIDLE() as
-* hrtimer_start() can call into tracing.
+* As tick_broadcast_lock is held, nothing can change the cpu
+* base which was just established in hrtimer_start() above. So
+* the below access is safe even without holding the hrtimer
+* base lock.
 */
-   RCU_NONIDLE( {
-   hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
-   /*
-* The core tick broadcast mode expects bc->bound_on to be set
-* correctly to prevent a CPU which has the broadcast hrtimer
-* armed from going deep idle.
-*
-* As tick_broadcast_lock is held, nothing can change the cpu
-* base which was just established in hrtimer_start() above. So
-* the below access is safe even without holding the hrtimer
-* base lock.
-*/
-   bc->bound_on = bctimer.base->cpu_base->cpu;
-   } );
+   bc->bound_on = bctimer.base->cpu_base->cpu;
+
return 0;
 }
 




[PATCH v2 00/44] cpuidle,rcu: Clean up the mess

2022-09-19 Thread Peter Zijlstra
Hi All!

At long last, a respin of the cpuidle vs rcu cleanup patches.

v1: https://lkml.kernel.org/r/20220608142723.103523...@infradead.org

These here patches clean up the mess that is cpuidle vs rcuidle.

At the end of the ride there's only on RCU_NONIDLE user left:

  arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());

and 'one' trace_*_rcuidle() user:

  kernel/trace/trace_preemptirq.c:
trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
  kernel/trace/trace_preemptirq.c:
trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
  kernel/trace/trace_preemptirq.c:
trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
  kernel/trace/trace_preemptirq.c:
trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
  kernel/trace/trace_preemptirq.c:
trace_preempt_enable_rcuidle(a0, a1);
  kernel/trace/trace_preemptirq.c:
trace_preempt_disable_rcuidle(a0, a1);

However this last is all in deprecated code that should be unused for 
GENERIC_ENTRY.

I've touched a lot of code that I can't test and I might've broken something by
accident. In particular the whole ARM cpuidle stuff was quite involved.

Please all; have a look where you haven't already.


New since v1:

 - rebase on top of Frederic's rcu-context-tracking rename fest
 - more omap goodness as per the last discusion (thanks Tony!)
 - removed one more RCU_NONIDLE() from arm64/risc-v perf code
 - ubsan/kasan fixes
 - intel_idle module-param for testing
 - a bunch of extra __always_inline, because compilers are silly.

---
 arch/alpha/kernel/process.c   |  1 -
 arch/alpha/kernel/vmlinux.lds.S   |  1 -
 arch/arc/kernel/process.c |  3 ++
 arch/arc/kernel/vmlinux.lds.S |  1 -
 arch/arm/include/asm/vmlinux.lds.h|  1 -
 arch/arm/kernel/process.c |  1 -
 arch/arm/kernel/smp.c |  6 +--
 arch/arm/mach-gemini/board-dt.c   |  3 +-
 arch/arm/mach-imx/cpuidle-imx6q.c |  4 +-
 arch/arm/mach-imx/cpuidle-imx6sx.c|  5 ++-
 arch/arm/mach-omap2/common.h  |  6 ++-
 arch/arm/mach-omap2/cpuidle34xx.c | 16 +++-
 arch/arm/mach-omap2/cpuidle44xx.c | 29 +++---
 arch/arm/mach-omap2/omap-mpuss-lowpower.c | 12 +-
 arch/arm/mach-omap2/pm.h  |  2 +-
 arch/arm/mach-omap2/pm24xx.c  | 51 +---
 arch/arm/mach-omap2/pm34xx.c  | 14 +--
 arch/arm/mach-omap2/pm44xx.c  |  2 +-
 arch/arm/mach-omap2/powerdomain.c | 10 ++---
 arch/arm64/kernel/idle.c  |  1 -
 arch/arm64/kernel/smp.c   |  4 +-
 arch/arm64/kernel/vmlinux.lds.S   |  1 -
 arch/csky/kernel/process.c|  1 -
 arch/csky/kernel/smp.c|  2 +-
 arch/csky/kernel/vmlinux.lds.S|  1 -
 arch/hexagon/kernel/process.c |  1 -
 arch/hexagon/kernel/vmlinux.lds.S |  1 -
 arch/ia64/kernel/process.c|  1 +
 arch/ia64/kernel/vmlinux.lds.S|  1 -
 arch/loongarch/kernel/idle.c  |  1 +
 arch/loongarch/kernel/vmlinux.lds.S   |  1 -
 arch/m68k/kernel/vmlinux-nommu.lds|  1 -
 arch/m68k/kernel/vmlinux-std.lds  |  1 -
 arch/m68k/kernel/vmlinux-sun3.lds |  1 -
 arch/microblaze/kernel/process.c  |  1 -
 arch/microblaze/kernel/vmlinux.lds.S  |  1 -
 arch/mips/kernel/idle.c   |  8 ++--
 arch/mips/kernel/vmlinux.lds.S|  1 -
 arch/nios2/kernel/process.c   |  1 -
 arch/nios2/kernel/vmlinux.lds.S   |  1 -
 arch/openrisc/kernel/process.c|  1 +
 arch/openrisc/kernel/vmlinux.lds.S|  1 -
 arch/parisc/kernel/process.c  |  2 -
 arch/parisc/kernel/vmlinux.lds.S  |  1 -
 arch/powerpc/kernel/idle.c|  5 +--
 arch/powerpc/kernel/vmlinux.lds.S |  1 -
 arch/riscv/kernel/process.c   |  1 -
 arch/riscv/kernel/vmlinux-xip.lds.S   |  1 -
 arch/riscv/kernel/vmlinux.lds.S   |  1 -
 arch/s390/kernel/idle.c   |  1 -
 arch/s390/kernel/vmlinux.lds.S|  1 -
 arch/sh/kernel/idle.c |  1 +
 arch/sh/kernel/vmlinux.lds.S  |  1 -
 arch/sparc/kernel/leon_pmc.c  |  4 ++
 arch/sparc/kernel/process_32.c|  1 -
 arch/sparc/kernel/process_64.c|  3 +-
 arch/sparc/kernel/vmlinux.lds.S   |  1 -
 arch/um/kernel/dyn.lds.S  |  1 -
 arch/um/kernel/process.c  |  1 -
 arch/um/kernel/uml.lds.S  |  1 -
 arch/x86/boot/compressed/vmlinux.lds.S|  1 +
 arch/x86/coco/tdx/tdcall.S| 15 +--
 arch/x86/coco/tdx/tdx.c   | 25 
 arch/x86/events/amd/brs.c | 13 +++
 arch/x86/include/asm/fpu/xcr.h|  4 +-
 ar

[PATCH v2 40/44] ubsan: Fix objtool UACCESS warns

2022-09-19 Thread Peter Zijlstra
clang-14 allyesconfig gives:

vmlinux.o: warning: objtool: emulator_cmpxchg_emulated+0x705: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled
vmlinux.o: warning: objtool: paging64_update_accessed_dirty_bits+0x39e: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled
vmlinux.o: warning: objtool: paging32_update_accessed_dirty_bits+0x390: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled
vmlinux.o: warning: objtool: ept_update_accessed_dirty_bits+0x43f: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled

Add the required eflags save/restore and whitelist the thing.

Signed-off-by: Peter Zijlstra (Intel) 
---
 lib/ubsan.c   |5 -
 tools/objtool/check.c |1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -340,9 +340,10 @@ void __ubsan_handle_load_invalid_value(v
 {
struct invalid_value_data *data = _data;
char val_str[VALUE_LENGTH];
+   unsigned long ua_flags = user_access_save();
 
if (suppress_report(&data->location))
-   return;
+   goto out;
 
ubsan_prologue(&data->location, "invalid-load");
 
@@ -352,6 +353,8 @@ void __ubsan_handle_load_invalid_value(v
val_str, data->type->type_name);
 
ubsan_epilogue();
+out:
+   user_access_restore(ua_flags);
 }
 EXPORT_SYMBOL(__ubsan_handle_load_invalid_value);
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1068,6 +1068,7 @@ static const char *uaccess_safe_builtin[
"__ubsan_handle_type_mismatch",
"__ubsan_handle_type_mismatch_v1",
"__ubsan_handle_shift_out_of_bounds",
+   "__ubsan_handle_load_invalid_value",
/* misc */
"csum_partial_copy_generic",
"copy_mc_fragile",




[PATCH v2 19/44] cpuidle,intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE

2022-09-19 Thread Peter Zijlstra
vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() 
leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/fpu/xcr.h   |4 ++--
 arch/x86/include/asm/special_insns.h |2 +-
 arch/x86/kernel/fpu/core.c   |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/fpu/xcr.h
+++ b/arch/x86/include/asm/fpu/xcr.h
@@ -5,7 +5,7 @@
 #define XCR_XFEATURE_ENABLED_MASK  0x
 #define XCR_XFEATURE_IN_USE_MASK   0x0001
 
-static inline u64 xgetbv(u32 index)
+static __always_inline u64 xgetbv(u32 index)
 {
u32 eax, edx;
 
@@ -27,7 +27,7 @@ static inline void xsetbv(u32 index, u64
  *
  * Callers should check X86_FEATURE_XGETBV1.
  */
-static inline u64 xfeatures_in_use(void)
+static __always_inline u64 xfeatures_in_use(void)
 {
return xgetbv(XCR_XFEATURE_IN_USE_MASK);
 }
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -295,7 +295,7 @@ static inline int enqcmds(void __iomem *
return 0;
 }
 
-static inline void tile_release(void)
+static __always_inline void tile_release(void)
 {
/*
 * Instruction opcode for TILERELEASE; supported in binutils
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -856,12 +856,12 @@ int fpu__exception_code(struct fpu *fpu,
  * Initialize register state that may prevent from entering low-power idle.
  * This function will be invoked from the cpuidle driver only when needed.
  */
-void fpu_idle_fpregs(void)
+noinstr void fpu_idle_fpregs(void)
 {
/* Note: AMX_TILE being enabled implies XGETBV1 support */
if (cpu_feature_enabled(X86_FEATURE_AMX_TILE) &&
(xfeatures_in_use() & XFEATURE_MASK_XTILE)) {
tile_release();
-   fpregs_deactivate(¤t->thread.fpu);
+   __this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
 }




[PATCH v2 41/44] intel_idle: Add force_irq_on module param

2022-09-19 Thread Peter Zijlstra
For testing purposes.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/idle/intel_idle.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1787,6 +1787,9 @@ static bool __init intel_idle_verify_cst
return true;
 }
 
+static bool force_irq_on __read_mostly;
+module_param(force_irq_on, bool, 0444);
+
 static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 {
int cstate;
@@ -1838,8 +1841,10 @@ static void __init intel_idle_init_cstat
/* Structure copy. */
drv->states[drv->state_count] = cpuidle_state_table[cstate];
 
-   if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE)
+   if ((cpuidle_state_table[cstate].flags & 
CPUIDLE_FLAG_IRQ_ENABLE) || force_irq_on) {
+   printk("intel_idle: forced intel_idle_irq for state 
%d\n", cstate);
drv->states[drv->state_count].enter = intel_idle_irq;
+   }
 
if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS) {




[PATCH v2 04/44] cpuidle: Move IRQ state validation

2022-09-19 Thread Peter Zijlstra
Make cpuidle_enter_state() consistent with the s2idle variant and
verify ->enter() always returns with interrupts disabled.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -236,7 +236,11 @@ int cpuidle_enter_state(struct cpuidle_d
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
ct_idle_enter();
+
entered_state = target_state->enter(dev, drv, index);
+   if (WARN_ONCE(!irqs_disabled(), "%ps leaked IRQ state", 
target_state->enter))
+   raw_local_irq_disable();
+
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
ct_idle_exit();
start_critical_timings();
@@ -248,12 +252,8 @@ int cpuidle_enter_state(struct cpuidle_d
/* The cpu is no longer idle or about to enter idle. */
sched_idle_set_state(NULL);
 
-   if (broadcast) {
-   if (WARN_ON_ONCE(!irqs_disabled()))
-   local_irq_disable();
-
+   if (broadcast)
tick_broadcast_exit();
-   }
 
if (!cpuidle_state_is_coupled(drv, index))
local_irq_enable();




[PATCH v2 21/44] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-09-19 Thread Peter Zijlstra
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Gautham R. Shenoy 
Acked-by: Mark Rutland  [arm64]
Acked-by: Rafael J. Wysocki 
---
 arch/alpha/kernel/process.c  |1 -
 arch/arc/kernel/process.c|3 +++
 arch/arm/kernel/process.c|1 -
 arch/arm/mach-gemini/board-dt.c  |3 ++-
 arch/arm64/kernel/idle.c |1 -
 arch/csky/kernel/process.c   |1 -
 arch/csky/kernel/smp.c   |2 +-
 arch/hexagon/kernel/process.c|1 -
 arch/ia64/kernel/process.c   |1 +
 arch/loongarch/kernel/idle.c |1 +
 arch/microblaze/kernel/process.c |1 -
 arch/mips/kernel/idle.c  |8 +++-
 arch/nios2/kernel/process.c  |1 -
 arch/openrisc/kernel/process.c   |1 +
 arch/parisc/kernel/process.c |2 --
 arch/powerpc/kernel/idle.c   |5 ++---
 arch/riscv/kernel/process.c  |1 -
 arch/s390/kernel/idle.c  |1 -
 arch/sh/kernel/idle.c|1 +
 arch/sparc/kernel/leon_pmc.c |4 
 arch/sparc/kernel/process_32.c   |1 -
 arch/sparc/kernel/process_64.c   |3 ++-
 arch/um/kernel/process.c |1 -
 arch/x86/coco/tdx/tdx.c  |3 +++
 arch/x86/kernel/process.c|   15 ---
 arch/xtensa/kernel/process.c |1 +
 kernel/sched/idle.c  |2 --
 27 files changed, 29 insertions(+), 37 deletions(-)

--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
wtint(0);
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -114,6 +114,8 @@ void arch_cpu_idle(void)
"sleep %0   \n"
:
:"I"(arg)); /* can't be "r" has to be embedded const */
+
+   raw_local_irq_disable();
 }
 
 #else  /* ARC700 */
@@ -122,6 +124,7 @@ void arch_cpu_idle(void)
 {
/* sleep, but enable both set E1/E2 (levels of interrupts) before 
committing */
__asm__ __volatile__("sleep 0x3 \n");
+   raw_local_irq_disable();
 }
 
 #endif
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -78,7 +78,6 @@ void arch_cpu_idle(void)
arm_pm_idle();
else
cpu_do_idle();
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
--- a/arch/arm/mach-gemini/board-dt.c
+++ b/arch/arm/mach-gemini/board-dt.c
@@ -42,8 +42,9 @@ static void gemini_idle(void)
 */
 
/* FIXME: Enabling interrupts here is racy! */
-   local_irq_enable();
+   raw_local_irq_enable();
cpu_do_idle();
+   raw_local_irq_disable();
 }
 
 static void __init gemini_init_machine(void)
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
 * tricks
 */
cpu_do_idle();
-   raw_local_irq_enable();
 }
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -100,6 +100,5 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
asm volatile("stop\n");
 #endif
-   raw_local_irq_enable();
 }
 #endif
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -309,7 +309,7 @@ void arch_cpu_idle_dead(void)
while (!secondary_stack)
arch_cpu_idle();
 
-   local_irq_disable();
+   raw_local_irq_disable();
 
asm volatile(
"movsp, %0\n"
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,6 @@ void arch_cpu_idle(void)
 {
__vmwait();
/*  interrupts wake us up, but irqs are still disabled */
-   raw_local_irq_enable();
 }
 
 /*
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -242,6 +242,7 @@ void arch_cpu_idle(void)
(*mark_idle)(1);
 
raw_safe_halt();
+   raw_local_irq_disable();
 
if (mark_idle)
(*mark_idle)(0);
--- a/arch/loongarch/kernel/idle.c
+++ b/arch/loongarch/kernel/idle.c
@@ -13,4 +13,5 @@ void __cpuidle arch_cpu_idle(void)
 {
raw_local_irq_enable();
__arch_cpu_idle(); /* idle instruction needs irq enabled */
+   raw_local_irq_disable();
 }
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -140,5 +140,4 @@ int dump_fpu(struct pt_regs *regs, elf_f
 
 void arch_cpu_idle(void)
 {
-   raw_local_irq_enable();
 }
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -33,13 +33,13 @@ static void __cpuid

Re: [PATCH v2 30/44] cpuidle,xenpv: Make more PARAVIRT_XXL noinstr clean

2022-09-19 Thread Juergen Gross

On 19.09.22 12:00, Peter Zijlstra wrote:

vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to 
HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Srivatsa S. Bhat (VMware) 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 05/44] cpuidle,riscv: Push RCU-idle into driver

2022-09-19 Thread Anup Patel
On Mon, Sep 19, 2022 at 3:47 PM Peter Zijlstra  wrote:
>
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
>
> Signed-off-by: Peter Zijlstra (Intel) 

Looks good to me.

For RISC-V cpuidle:
Reviewed-by: Anup Patel 

Regards,
Anup


> ---
>  drivers/cpuidle/cpuidle-riscv-sbi.c |9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> @@ -116,12 +116,12 @@ static int __sbi_enter_domain_idle_state
> return -1;
>
> /* Do runtime PM to manage a hierarchical CPU toplogy. */
> -   ct_irq_enter_irqson();
> if (s2idle)
> dev_pm_genpd_suspend(pd_dev);
> else
> pm_runtime_put_sync_suspend(pd_dev);
> -   ct_irq_exit_irqson();
> +
> +   ct_idle_enter();
>
> if (sbi_is_domain_state_available())
> state = sbi_get_domain_state();
> @@ -130,12 +130,12 @@ static int __sbi_enter_domain_idle_state
>
> ret = sbi_suspend(state) ? -1 : idx;
>
> -   ct_irq_enter_irqson();
> +   ct_idle_exit();
> +
> if (s2idle)
> dev_pm_genpd_resume(pd_dev);
> else
> pm_runtime_get_sync(pd_dev);
> -   ct_irq_exit_irqson();
>
> cpu_pm_exit();
>
> @@ -246,6 +246,7 @@ static int sbi_dt_cpu_init_topology(stru
>  * of a shared state for the domain, assumes the domain states are all
>  * deeper states.
>  */
> +   drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
> drv->states[state_count - 1].enter = sbi_enter_domain_idle_state;
> drv->states[state_count - 1].enter_s2idle =
> sbi_enter_s2idle_domain_idle_state;
>
>


Re: [PATCH v2 03/44] cpuidle/poll: Ensure IRQ state is invariant

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:42AM +0200, Peter Zijlstra wrote:
> cpuidle_state::enter() methods should be IRQ invariant

Got a bit confused with the invariant thing since the first chunck I
see in this patch is a conversion to an non-traceable local_irq_enable().

Maybe just add a short mention about that and why?

Thanks.

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Rafael J. Wysocki 
> ---
>  drivers/cpuidle/poll_state.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -17,7 +17,7 @@ static int __cpuidle poll_idle(struct cp
>  
>   dev->poll_time_limit = false;
>  
> - local_irq_enable();
> + raw_local_irq_enable();
>   if (!current_set_polling_and_test()) {
>   unsigned int loop_count = 0;
>   u64 limit;
> @@ -36,6 +36,8 @@ static int __cpuidle poll_idle(struct cp
>   }
>   }
>   }
> + raw_local_irq_disable();
> +
>   current_clr_polling();
>  
>   return index;
> 
> 


Re: [PATCH v2 05/44] cpuidle,riscv: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:44AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 06/44] cpuidle,tegra: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:45AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 07/44] cpuidle,psci: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:46AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 08/44] cpuidle,imx6: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:47AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.

Hmm, what ends up calling RCU_IDLE() here? Also what about
cpu_do_idle()?

Thanks.

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/arm/mach-imx/cpuidle-imx6sx.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- a/arch/arm/mach-imx/cpuidle-imx6sx.c
> +++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
> @@ -47,7 +47,9 @@ static int imx6sx_enter_wait(struct cpui
>   cpu_pm_enter();
>   cpu_cluster_pm_enter();
>  
> + ct_idle_enter();
>   cpu_suspend(0, imx6sx_idle_finish);
> + ct_idle_exit();
>  
>   cpu_cluster_pm_exit();
>   cpu_pm_exit();
> @@ -87,7 +89,8 @@ static struct cpuidle_driver imx6sx_cpui
>*/
>   .exit_latency = 300,
>   .target_residency = 500,
> - .flags = CPUIDLE_FLAG_TIMER_STOP,
> + .flags = CPUIDLE_FLAG_TIMER_STOP |
> +  CPUIDLE_FLAG_RCU_IDLE,
>   .enter = imx6sx_enter_wait,
>   .name = "LOW-POWER-IDLE",
>   .desc = "ARM power off",
> 
> 


RE: [PATCH] powerpc/83xx: update kmeter1 defconfig and dts

2022-09-19 Thread Holger Brunck
> > Le 16/12/2019 à 10:50, Holger Brunck a écrit :
> >> From: Matteo Ghidoni 
> >>
> >> The defconfig is synchronized and the missing MTD_PHYSMAP,
> DEVTMPFS
> >> and I2C MUX support are switched on.
> >>
> >> Additionally the I2C mux device is added to the DTS with its attached
> >> temperature sensors and I2C clock frequency is lowered.
> >
> > This patch doesn't apply.
> >
> > Is it still relevant ?
> 
> If so it should be split into two patches.
> 

Ok. Then you can abandon this one. If I find the time I will split it up and
rebase it and send it as a new patch series with two patches.

Best regards
Holger



Re: [PATCH v2 09/44] cpuidle,omap3: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:48AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then teporarily enable it
> again before going idle is daft.

That doesn't tell where those calls are.

Thanks.


Re: [PATCH v2 10/44] cpuidle,armada: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:49AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again before going idle is daft.

Ah wait, now I see, that's cpu_pm_enter()/cpu_pm_exit() -> cpu_pm_notify*() the 
culprits.
Might be worth adding a short note about that on your changelogs.

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  drivers/cpuidle/cpuidle-mvebu-v7.c |7 +++
>  1 file changed, 7 insertions(+)
> 
> --- a/drivers/cpuidle/cpuidle-mvebu-v7.c
> +++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
> @@ -36,7 +36,10 @@ static int mvebu_v7_enter_idle(struct cp
>   if (drv->states[index].flags & MVEBU_V7_FLAG_DEEP_IDLE)
>   deepidle = true;
>  
> + ct_idle_enter();
>   ret = mvebu_v7_cpu_suspend(deepidle);
> + ct_idle_exit();

And then yes of course:

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 09/44] cpuidle,omap3: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:48AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then teporarily enable it
> again before going idle is daft.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Tony Lindgren 
> Tested-by: Tony Lindgren 

Ok now with the cpu_pm_*() informations that makes sense:

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 08/44] cpuidle,imx6: Push RCU-idle into driver

2022-09-19 Thread Frederic Weisbecker
On Mon, Sep 19, 2022 at 11:59:47AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/arm/mach-imx/cpuidle-imx6sx.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- a/arch/arm/mach-imx/cpuidle-imx6sx.c
> +++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
> @@ -47,7 +47,9 @@ static int imx6sx_enter_wait(struct cpui
>   cpu_pm_enter();
>   cpu_cluster_pm_enter();
>  
> + ct_idle_enter();
>   cpu_suspend(0, imx6sx_idle_finish);
> + ct_idle_exit();
>  
>   cpu_cluster_pm_exit();
>   cpu_pm_exit();
> @@ -87,7 +89,8 @@ static struct cpuidle_driver imx6sx_cpui
>*/
>   .exit_latency = 300,
>   .target_residency = 500,
> - .flags = CPUIDLE_FLAG_TIMER_STOP,
> + .flags = CPUIDLE_FLAG_TIMER_STOP |
> +  CPUIDLE_FLAG_RCU_IDLE,
>   .enter = imx6sx_enter_wait,

There is a second one below that also uses imx6sx_enter_wait.

Thanks.

>   .name = "LOW-POWER-IDLE",
>   .desc = "ARM power off",
> 
> 


Re: [PATCH v2 08/44] cpuidle,imx6: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
On Mon, Sep 19, 2022 at 04:21:23PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 11:59:47AM +0200, Peter Zijlstra wrote:
> > Doing RCU-idle outside the driver, only to then temporarily enable it
> > again, at least twice, before going idle is daft.
> 
> Hmm, what ends up calling RCU_IDLE() here? Also what about
> cpu_do_idle()?

Both cpu_pm_enter() and cpu_cluster_pm_enter() use ct_irq_enter_irqson()
which is another way to spell RCU_NONIDLE().



Re: [PATCH v2 08/44] cpuidle,imx6: Push RCU-idle into driver

2022-09-19 Thread Peter Zijlstra
On Mon, Sep 19, 2022 at 04:49:41PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 11:59:47AM +0200, Peter Zijlstra wrote:
> > Doing RCU-idle outside the driver, only to then temporarily enable it
> > again, at least twice, before going idle is daft.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  arch/arm/mach-imx/cpuidle-imx6sx.c |5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > --- a/arch/arm/mach-imx/cpuidle-imx6sx.c
> > +++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
> > @@ -47,7 +47,9 @@ static int imx6sx_enter_wait(struct cpui
> > cpu_pm_enter();
> > cpu_cluster_pm_enter();
> >  
> > +   ct_idle_enter();
> > cpu_suspend(0, imx6sx_idle_finish);
> > +   ct_idle_exit();
> >  
> > cpu_cluster_pm_exit();
> > cpu_pm_exit();
> > @@ -87,7 +89,8 @@ static struct cpuidle_driver imx6sx_cpui
> >  */
> > .exit_latency = 300,
> > .target_residency = 500,
> > -   .flags = CPUIDLE_FLAG_TIMER_STOP,
> > +   .flags = CPUIDLE_FLAG_TIMER_STOP |
> > +CPUIDLE_FLAG_RCU_IDLE,
> > .enter = imx6sx_enter_wait,
> 
> There is a second one below that also uses imx6sx_enter_wait.

Duh, thanks!


  1   2   >