Re: [PATCH v2] serial/pmac_zilog: Remove flawed mitigation for rx irq flood

2024-04-05 Thread Finn Thain
On Fri, 5 Apr 2024, Michael Ellerman wrote:

> I assume you have tested this on an actual pmac, as well as qemu?
> 

I tested the patched driver and its console functionality using Zilog SCC 
hardware in a Mac IIci, as well as QEMU's q800 virtual machine.

That should suffice from a code coverage point-of-view, since 
pmz_receive_chars() is portable and independent of CONFIG_PPC_PMAC.

Moreover, I don't know how to get my PowerMac G3 to execute the kludge 
that's to be removed here. I can't prove it's impossible, though.


Re: [PATCH v3] scsi: sg: Avoid race in error handling & drop bogus warn

2024-04-05 Thread Martin K. Petersen
On Mon, 01 Apr 2024 21:10:38 +0200, Alexander Wetzel wrote:

> commit 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race")
> introduced an incorrect WARN_ON_ONCE() and missed a sequence where
> sg_device_destroy() was used after scsi_device_put().
> 
> sg_device_destroy() is accessing the parent scsi_device request_queue which
> will already be set to NULL when the preceding call to scsi_device_put()
> removed the last reference to the parent scsi_device.
> 
> [...]

Applied to 6.9/scsi-fixes, thanks!

[1/1] scsi: sg: Avoid race in error handling & drop bogus warn
  https://git.kernel.org/mkp/scsi/c/d4e655c49f47

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2

2024-04-05 Thread Peter Xu
On Fri, Apr 05, 2024 at 03:16:33PM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 04, 2024 at 05:48:03PM -0400, Peter Xu wrote:
> > On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote:
> > > The more I look at this the more I think we need to get to Matthew's
> > > idea of having some kind of generic page table API that is not tightly
> > > tied to level. Replacing the hugetlb trick of 'everything is a PTE'
> > > with 5 special cases in every place seems just horrible.
> > > 
> > >struct mm_walk_ops {
> > >int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk 
> > > *walk);
> > >}
> > > 
> > > And many cases really want something like:
> > >struct mm_walk_state state;
> > > 
> > >if (!mm_walk_seek_leaf(state, mm, address))
> > >   goto no_present
> > >if (mm_walk_is_write(state)) ..
> > > 
> > > And detailed walking:
> > >for_each_pt_leaf(state, mm, address) {
> > >if (mm_walk_is_write(state)) ..
> > >}
> > > 
> > > Replacing it with a mm_walk_state that retains the level or otherwise
> > > to allow decoding any entry composes a lot better. Forced Loop
> > > unrolling can get back to the current code gen in alot of places.
> > > 
> > > It also makes the power stuff a bit nicer as the mm_walk_state could
> > > automatically retain back pointers to the higher levels in the state
> > > struct too...
> > > 
> > > The puzzle is how to do it and still get reasonable efficient codegen,
> > > many operations are going to end up switching on some state->level to
> > > know how to decode the entry.
> > 
> > These discussions are definitely constructive, thanks Jason.  Very helpful.
> > 
> > I thought about this last week but got interrupted.  It does make sense to
> > me; it looks pretty generic and it is flexible enough as a top design.  At
> > least that's what I thought.
> 
> Yeah, exactly..
> 
> > However now when I rethink about it, and look more into the code when I got
> > the chance, it turns out this will be a major rewrite of mostly every
> > walkers..  
> 
> Indeed, it is why it may not be reasonable.
> 
> > Consider that what we (or.. I) want to teach the pXd layers are two things
> > right now: (1) hugetlb mappings (2) MMIO (PFN) mappings.  That mostly
> > shares the generic concept when working on the mm walkers no matter which
> > way to go, just different treatment on different type of mem.  (2) is on
> > top of current code and new stuff, while (1) is a refactoring to drop
> > hugetlb_entry() hook point as the goal.
> 
> Right, I view this as a two pronged attack
> 
> One one front you teach the generic pXX_* macros to process huge pages
> and push that around to the performance walkers like GUP
> 
> On another front you want to replace use of the hugepte with the new
> walkers.
> 
> The challenge with the hugepte code is that it is all structured to
> assume one API that works at all levels and that may be a hard fit to
> replace with pXX_* functions.

That's the core of problem, or otherwise I feel like I might be doing
something else already.  I had a feeling even if it's currently efficient
for hugetlb, we'll drop that sooner or later.

The issue is at least hugetlb pgtable format is exactly the same as the
rest, as large folio grows it will reach the point that we complain more
than before on having hugetlb does its smart things on its own.

> 
> The places that are easy to switch from hugetlb to pXX_* may as well
> do so.
> 
> Other places maybe need a hugetlb replacement that has a similar
> abstraction level of pointing to any page table level.

IMHO this depends.

Per my current knowledge, hugetlb is only special in three forms:

- huge mapping (well, this isn't that special..)
- cont pte/pmd/...
- hugepd

The most fancy one is actually hugepd.. but I plan to put that temporarily
aside - I haven't look at Christophe's series yet, however I think we're
going to solve orthogonal issues but his work will definitely help me on
reaching mine, and I want to think through first on my end of things to
know a plan.  If hugepd has its chance to be generalized, the worst case is
I'll leverage CONFIG_ARCH_HAS_HUGEPD and only keep hugetlb_entry() for them
until hugepd became some cont-pxx variance.  Then when I put HAS_HUGEPD
aside, I don't think it's very complicated, perhaps?

In short, hugetlb mappings shouldn't be special comparing to other huge pXd
and large folio (cont-pXd) mappings for most of the walkers in my mind, if
not all.  I need to look at all the walkers and there can be some tricky
ones, but I believe that applies in general.  It's actually similar to what
I did with slow gup here.

Like this series, for cont-pXd we'll need multiple walks comparing to
before (when with hugetlb_entry()), but for that part I'll provide some
performance tests too, and we also have a fallback plan, which is to detect
cont-pXd existance, which will also work for large folios.

> 
> I think if you do the easy places for pXX conversion you 

Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot

2024-04-05 Thread Nathan Lynch
Shrikanth Hegde  writes:
> On 4/5/24 6:19 PM, Nathan Lynch wrote:
>> Shrikanth Hegde  writes:
>
> Hi Nathan, Thanks for reviewing this.
>
>>> When there are no options specified for lparstat, it is expected to
>>> give reports since LPAR(Logical Partition) boot. App is an indicator
>>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>>> derived using pool_idle_time which is obtained using H_PIC call.
>> 
>> If "App" is short for "Available Procesoor Pool" then it should be
>> written "APP" and the it should be introduced and defined more clearly
>> than this.
>> 
>
> Ok.I reworded it for v2. 
>
> yes APP is Available Processor Pool. 
>
>> 
>>> The interval based reports show correct App value while since boot
>>> report shows very high App values. This happens because in that case app
>>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>>> time is reported by the PowerVM hypervisor since its boot, it need not
>>> align with LPAR boot. This leads to large App values.
>>>
>>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>>> use this info to derive App as below for since boot reports.
>>>
>>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>>
>>> Results:: Observe app values.
>>> == Shared LPAR 
>>> lparstat
>>> System Configuration
>>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>>
>>> reboot
>>> stress-ng --cpu=$(nproc) -t 600
>>> sleep 600
>>> So in this case app is expected to close to 37-6=31.
>>>
>>> == 6.9-rc1 and lparstat 1.3.10  =
>>> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
>>> - - --- - - - - -
>>> 47.48  0.01  0.0052.51 0.00  0.00 47.49 69099.72 54154721
>>>
>>> === With this patch and powerpc-utils patch to do the above equation ===
>>> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
>>> - - --- - - - - -
>>> 47.48  0.01  0.0052.51 5.73 47.75 47.49 31.21 54175321
>>> =
>>>
>>> Note: physc, purr/idle purr being inaccurate is being handled in a
>>> separate patch in powerpc-utils tree.
>>>
>>> Signed-off-by: Shrikanth Hegde 
>>> ---
>>> Note:
>>>
>>> This patch needs to merged first in the kernel for the powerpc-utils
>>> patches to work. powerpc-utils patches will be posted to its mailing
>>> list and link would be found in the reply to this patch if available.
>>>
>>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
>>> b/arch/powerpc/platforms/pseries/lparcfg.c
>>> index f73c4d1c26af..8df4e7c529d7 100644
>>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>> return rc;
>>>  }
>>>
>>> +unsigned long boot_pool_idle_time;
>> 
>> Should be static, and u64. Better to use explicitly sized types for data
>> at the kernel-hypervisor boundary.
>
> Current usage of h_pic doesn't follow this either. Are you suggesting we 
> change that 
> as well?

Yes pretty much. h_pic() as currently written and used has some
problems:

  static unsigned h_pic(unsigned long *pool_idle_time,
unsigned long *num_procs)
  {
  unsigned long rc;
  unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
  
  rc = plpar_hcall(H_PIC, retbuf);
  
  *pool_idle_time = retbuf[0];
  *num_procs = retbuf[1];
  
  return rc;
  }

* Coerces the return value of plpar_hcall() to unsigned -- hcall
  errors are negative.
* Assigns *pool_idle_time and *num_procs using uninitialized
  data when H_PIC is unsuccessful.
* Assigns the outparams unconditionally; would be nicer if it allowed
  callers to pass NULL so they don't have to provide dummy inputs that
  aren't even used, as in your change.
* Should follow Linux -errno return value convention in the absence
  of a need for the specific hcall status in its callers.

>>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>> printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>> return -EIO;
>>> }
>>> +
>>> +   h_pic(_pool_idle_time, _procs);
>> 
>> h_pic() can fail, leaving the out parameters uninitialized.
>
> Naveen pointed to me this a while ago, but I forgot. 
>
> Currently h_pic return value is not checked at all, either at boor or at 
> runtime. 
> When it fails, should we re-try or just print a kernel debug? What would be 
> expected 
> behavior? because if it fails, it would anyway result in wrong values of app 
> even 
> if the variables are initialized to 0.

There's nothing in the spec for H_PIC that suggests retrying on failure.
I'm 

Re: [PATCH] powerpc: Fix fatal warnings flag for LLVM's integrated assembler

2024-04-05 Thread Justin Stitt
On Fri, Apr 5, 2024 at 12:31 PM Nathan Chancellor  wrote:
>
> When building with LLVM_IAS=1, there is an error because
> '-fatal-warnings' is not recognized as a valid flag:
>
>   clang: error: unsupported argument '-fatal-warnings' to option '-Wa,'
>
> Use the double hyphen version of the flag, '--fatal-warnings', which
> works with both the GNU assembler and LLVM's integrated assembler.
>
> Fixes: 608d4a5ca563 ("powerpc: Error on assembly warnings")
> Signed-off-by: Nathan Chancellor 

Nice catch.

Reviewed-by: Justin Stitt 

> ---
>  arch/powerpc/Kbuild | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild
> index da862e9558bc..571f260b0842 100644
> --- a/arch/powerpc/Kbuild
> +++ b/arch/powerpc/Kbuild
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
> -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -Wa,-fatal-warnings
> -subdir-asflags-$(CONFIG_PPC_WERROR) := -Wa,-fatal-warnings
> +subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -Wa,--fatal-warnings
> +subdir-asflags-$(CONFIG_PPC_WERROR) := -Wa,--fatal-warnings
>
>  obj-y += kernel/
>  obj-y += mm/
>
> ---
> base-commit: bfe51886ca544956eb4ff924d1937ac01d0ca9c8
> change-id: 20240405-ppc-fix-wa-fatal-warnings-clang-603f0ebb0133
>
> Best regards,
> --
> Nathan Chancellor 
>


[PATCH] powerpc: Fix fatal warnings flag for LLVM's integrated assembler

2024-04-05 Thread Nathan Chancellor
When building with LLVM_IAS=1, there is an error because
'-fatal-warnings' is not recognized as a valid flag:

  clang: error: unsupported argument '-fatal-warnings' to option '-Wa,'

Use the double hyphen version of the flag, '--fatal-warnings', which
works with both the GNU assembler and LLVM's integrated assembler.

Fixes: 608d4a5ca563 ("powerpc: Error on assembly warnings")
Signed-off-by: Nathan Chancellor 
---
 arch/powerpc/Kbuild | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild
index da862e9558bc..571f260b0842 100644
--- a/arch/powerpc/Kbuild
+++ b/arch/powerpc/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
-subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -Wa,-fatal-warnings
-subdir-asflags-$(CONFIG_PPC_WERROR) := -Wa,-fatal-warnings
+subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -Wa,--fatal-warnings
+subdir-asflags-$(CONFIG_PPC_WERROR) := -Wa,--fatal-warnings
 
 obj-y += kernel/
 obj-y += mm/

---
base-commit: bfe51886ca544956eb4ff924d1937ac01d0ca9c8
change-id: 20240405-ppc-fix-wa-fatal-warnings-clang-603f0ebb0133

Best regards,
-- 
Nathan Chancellor 



Re: [PATCH v3 13/15] sh: Move defines needed for suppressing warning backtraces

2024-04-05 Thread Simon Horman
On Wed, Apr 03, 2024 at 06:19:34AM -0700, Guenter Roeck wrote:
> Declaring the defines needed for suppressing warning inside
> '#ifdef CONFIG_DEBUG_BUGVERBOSE' results in a kerneldoc warning.
> 
> .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY().
>   Prototype was for HAVE_BUG_FUNCTION() instead
> 
> Move the defines above the kerneldoc entry for _EMIT_BUG_ENTRY
> to make kerneldoc happy.
> 
> Reported-by: Simon Horman 
> Cc: Simon Horman 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: John Paul Adrian Glaubitz 
> Signed-off-by: Guenter Roeck 
> ---
> v3: Added patch. Possibly squash into previous patch.

FWIIW, this looks good to me.

>  arch/sh/include/asm/bug.h | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h
> index 470ce6567d20..bf4947d51d69 100644
> --- a/arch/sh/include/asm/bug.h
> +++ b/arch/sh/include/asm/bug.h
> @@ -11,6 +11,15 @@
>  #define HAVE_ARCH_BUG
>  #define HAVE_ARCH_WARN_ON
>  
> +#ifdef CONFIG_DEBUG_BUGVERBOSE
> +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
> +# define HAVE_BUG_FUNCTION
> +# define __BUG_FUNC_PTR  "\t.long %O2\n"
> +#else
> +# define __BUG_FUNC_PTR
> +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */
> +#endif /* CONFIG_DEBUG_BUGVERBOSE */
> +
>  /**
>   * _EMIT_BUG_ENTRY
>   * %1 - __FILE__
> @@ -25,13 +34,6 @@
>   */
>  #ifdef CONFIG_DEBUG_BUGVERBOSE
>  
> -#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
> -# define HAVE_BUG_FUNCTION
> -# define __BUG_FUNC_PTR  "\t.long %O2\n"
> -#else
> -# define __BUG_FUNC_PTR
> -#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */
> -
>  #define _EMIT_BUG_ENTRY  \
>   "\t.pushsection __bug_table,\"aw\"\n"   \
>   "2:\t.long 1b, %O1\n"   \
> -- 
> 2.39.2
> 


Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2

2024-04-05 Thread Jason Gunthorpe
On Thu, Apr 04, 2024 at 05:48:03PM -0400, Peter Xu wrote:
> On Tue, Mar 26, 2024 at 11:02:52AM -0300, Jason Gunthorpe wrote:
> > The more I look at this the more I think we need to get to Matthew's
> > idea of having some kind of generic page table API that is not tightly
> > tied to level. Replacing the hugetlb trick of 'everything is a PTE'
> > with 5 special cases in every place seems just horrible.
> > 
> >struct mm_walk_ops {
> >int (*leaf_entry)(struct mm_walk_state *state, struct mm_walk *walk);
> >}
> > 
> > And many cases really want something like:
> >struct mm_walk_state state;
> > 
> >if (!mm_walk_seek_leaf(state, mm, address))
> >   goto no_present
> >if (mm_walk_is_write(state)) ..
> > 
> > And detailed walking:
> >for_each_pt_leaf(state, mm, address) {
> >if (mm_walk_is_write(state)) ..
> >}
> > 
> > Replacing it with a mm_walk_state that retains the level or otherwise
> > to allow decoding any entry composes a lot better. Forced Loop
> > unrolling can get back to the current code gen in alot of places.
> > 
> > It also makes the power stuff a bit nicer as the mm_walk_state could
> > automatically retain back pointers to the higher levels in the state
> > struct too...
> > 
> > The puzzle is how to do it and still get reasonable efficient codegen,
> > many operations are going to end up switching on some state->level to
> > know how to decode the entry.
> 
> These discussions are definitely constructive, thanks Jason.  Very helpful.
> 
> I thought about this last week but got interrupted.  It does make sense to
> me; it looks pretty generic and it is flexible enough as a top design.  At
> least that's what I thought.

Yeah, exactly..

> However now when I rethink about it, and look more into the code when I got
> the chance, it turns out this will be a major rewrite of mostly every
> walkers..  

Indeed, it is why it may not be reasonable.

> Consider that what we (or.. I) want to teach the pXd layers are two things
> right now: (1) hugetlb mappings (2) MMIO (PFN) mappings.  That mostly
> shares the generic concept when working on the mm walkers no matter which
> way to go, just different treatment on different type of mem.  (2) is on
> top of current code and new stuff, while (1) is a refactoring to drop
> hugetlb_entry() hook point as the goal.

Right, I view this as a two pronged attack

One one front you teach the generic pXX_* macros to process huge pages
and push that around to the performance walkers like GUP

On another front you want to replace use of the hugepte with the new
walkers.

The challenge with the hugepte code is that it is all structured to
assume one API that works at all levels and that may be a hard fit to
replace with pXX_* functions.

The places that are easy to switch from hugetlb to pXX_* may as well
do so.

Other places maybe need a hugetlb replacement that has a similar
abstraction level of pointing to any page table level.

I think if you do the easy places for pXX conversion you will have a
good idea about what is needed for the hard places.

> Now the important question I'm asking myself is: do we really need huge p4d
> or even bigger?

Do we need huge p4d support with folios? Probably not..

huge p4d support for pfnmap, eg in VFIO. Yes I think that is possibly
interesting - but I wouldn't ask anyone to do the work :)

But then again we come back to power and its big list of page sizes
and variety :( Looks like some there have huge sizes at the pgd level
at least.

> So, can we over-engineer too much if we go the generic route now?

Yes we can, and it will probably be slow as well. The pXX macros are
the most efficient if code can be adapted to use them.

> Considering that we already have most of pmd/pud entries around in the mm
> walker ops.

Yeah, so you add pgd and maybe p4d and then we can don't need any
generic thing. If it is easy it would be nice.

Jason


Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot

2024-04-05 Thread Shrikanth Hegde



On 4/5/24 6:19 PM, Nathan Lynch wrote:
> Shrikanth Hegde  writes:

Hi Nathan, Thanks for reviewing this.

>> When there are no options specified for lparstat, it is expected to
>> give reports since LPAR(Logical Partition) boot. App is an indicator
>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>> derived using pool_idle_time which is obtained using H_PIC call.
> 
> If "App" is short for "Available Procesoor Pool" then it should be
> written "APP" and the it should be introduced and defined more clearly
> than this.
> 

Ok.I reworded it for v2. 

yes APP is Available Processor Pool. 

> 
>> The interval based reports show correct App value while since boot
>> report shows very high App values. This happens because in that case app
>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>> time is reported by the PowerVM hypervisor since its boot, it need not
>> align with LPAR boot. This leads to large App values.
>>
>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>> use this info to derive App as below for since boot reports.
>>
>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>
>> Results:: Observe app values.
>> == Shared LPAR 
>> lparstat
>> System Configuration
>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>
>> reboot
>> stress-ng --cpu=$(nproc) -t 600
>> sleep 600
>> So in this case app is expected to close to 37-6=31.
>>
>> == 6.9-rc1 and lparstat 1.3.10  =
>> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
>> - - --- - - - - -
>> 47.48  0.01  0.0052.51 0.00  0.00 47.49 69099.72 54154721
>>
>> === With this patch and powerpc-utils patch to do the above equation ===
>> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
>> - - --- - - - - -
>> 47.48  0.01  0.0052.51 5.73 47.75 47.49 31.21 54175321
>> =
>>
>> Note: physc, purr/idle purr being inaccurate is being handled in a
>> separate patch in powerpc-utils tree.
>>
>> Signed-off-by: Shrikanth Hegde 
>> ---
>> Note:
>>
>> This patch needs to merged first in the kernel for the powerpc-utils
>> patches to work. powerpc-utils patches will be posted to its mailing
>> list and link would be found in the reply to this patch if available.
>>
>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
>> b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f73c4d1c26af..8df4e7c529d7 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>  return rc;
>>  }
>>
>> +unsigned long boot_pool_idle_time;
> 
> Should be static, and u64. Better to use explicitly sized types for data
> at the kernel-hypervisor boundary.

Current usage of h_pic doesn't follow this either. Are you suggesting we change 
that 
as well? Or is this applicable to only boot_pool_idle_time?

For example in parse_ppp_data: 

if (lppaca_shared_proc()) {
unsigned long pool_idle_time, pool_procs;

h_pic(_idle_time, _procs);
seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
seq_printf(m, "pool_num_procs=%ld\n", pool_procs);

> 
>> +
>>  /*
>>   * parse_ppp_data
>>   * Parse out the data returned from h_get_ppp and h_pic
>> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>>  h_pic(_idle_time, _procs);
>>  seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>>  seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>> +seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
> 
> If boot_pool_idle_time is unsigned then the format string should be %ul
> or similar, not %ld.
> 
>>  }
>>
>>  seq_printf(m, "unallocated_capacity_weight=%d\n",
>> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>>  static int __init lparcfg_init(void)
>>  {
>>  umode_t mode = 0444;
>> +unsigned long num_procs;
>>
>>  /* Allow writing if we have FW_FEATURE_SPLPAR */
>>  if (firmware_has_feature(FW_FEATURE_SPLPAR))
>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>  printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>  return -EIO;
>>  }
>> +
>> +h_pic(_pool_idle_time, _procs);
> 
> h_pic() can fail, leaving the out parameters uninitialized.

Naveen pointed to me this a while ago, but I forgot. 

Currently h_pic return value is not checked at all, either at boor or at 
runtime. 
When it fails, should we re-try or just print a kernel debug? What would 

Re: [Intel-gfx] [PATCH v5 0/7] Introduce __xchg, non-atomic xchg

2024-04-05 Thread Andrzej Hajda




On 05.04.2024 16:47, Jani Nikula wrote:

On Mon, 27 Feb 2023, Peter Zijlstra  wrote:

On Thu, Feb 23, 2023 at 10:24:19PM +0100, Andrzej Hajda wrote:

On 22.02.2023 18:04, Peter Zijlstra wrote:

On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote:


Andrzej Hajda (7):
arch: rename all internal names __xchg to __arch_xchg
linux/include: add non-atomic version of xchg
arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr
llist: simplify __llist_del_all
io_uring: use __xchg if possible
qed: use __xchg if possible
drm/i915/gt: use __xchg instead of internal helper

Nothing crazy in here I suppose, I somewhat wonder why you went through
the trouble, but meh.

If you are asking why I have proposed this patchset, then the answer is
simple, 1st I've tried to find a way to move internal i915 helper to core
(see patch 7).
Then I was looking for possible other users of this helper. And apparently
there are many of them, patches 3-7 shows some.



You want me to take this through te locking tree (for the next cycle,
not this one) where I normally take atomic things or does someone else
want this?

If you could take it I will be happy.

OK, I'll go queue it in tip/locking/core after -rc1. Thanks!

Is this where the series fell between the cracks, or was there some
follow-up that I missed?

I think this would still be useful. Andrzej, would you mind rebasing and
resending if there are no objections?


The patchset was rejected/dropped by Linus at the pull-request stage.
He didn't like many things, but the most __xchg name. However he was 
quite positive about i915 name fetch_and_zero.
I can try to revive patchset with fetch_and_zero, and maybe 
fetch_and_set, instead of __xchg.


Regards
Andrzej



BR,
Jani.






Re: [PATCH v1 1/1] powerpc/52xx: Replace of_gpio.h by proper one

2024-04-05 Thread Andy Shevchenko
On Fri, Apr 05, 2024 at 10:58:55AM +1100, Michael Ellerman wrote:
> Andy Shevchenko  writes:
> > On Wed, Mar 13, 2024 at 03:56:45PM +0200, Andy Shevchenko wrote:
> >> of_gpio.h is deprecated and subject to remove.
> >> The driver doesn't use it directly, replace it
> >> with what is really being used.
> >
> > Any comments on this?
> 
> No comment :)
> 
> I'll take it for 6.10, or do you want me to sneak it in as a fix for 6.9?

Up to you and thank you!

-- 
With Best Regards,
Andy Shevchenko




Re: [Intel-gfx] [PATCH v5 0/7] Introduce __xchg, non-atomic xchg

2024-04-05 Thread Jani Nikula
On Mon, 27 Feb 2023, Peter Zijlstra  wrote:
> On Thu, Feb 23, 2023 at 10:24:19PM +0100, Andrzej Hajda wrote:
>> On 22.02.2023 18:04, Peter Zijlstra wrote:
>> > On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote:
>> > 
>> > > Andrzej Hajda (7):
>> > >arch: rename all internal names __xchg to __arch_xchg
>> > >linux/include: add non-atomic version of xchg
>> > >arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr
>> > >llist: simplify __llist_del_all
>> > >io_uring: use __xchg if possible
>> > >qed: use __xchg if possible
>> > >drm/i915/gt: use __xchg instead of internal helper
>> > 
>> > Nothing crazy in here I suppose, I somewhat wonder why you went through
>> > the trouble, but meh.
>> 
>> If you are asking why I have proposed this patchset, then the answer is
>> simple, 1st I've tried to find a way to move internal i915 helper to core
>> (see patch 7).
>> Then I was looking for possible other users of this helper. And apparently
>> there are many of them, patches 3-7 shows some.
>> 
>> 
>> > 
>> > You want me to take this through te locking tree (for the next cycle,
>> > not this one) where I normally take atomic things or does someone else
>> > want this?
>> 
>> If you could take it I will be happy.
>
> OK, I'll go queue it in tip/locking/core after -rc1. Thanks!

Is this where the series fell between the cracks, or was there some
follow-up that I missed?

I think this would still be useful. Andrzej, would you mind rebasing and
resending if there are no objections?

BR,
Jani.


-- 
Jani Nikula, Intel


Re: [kvm-unit-tests PATCH v8 13/35] doc: start documentation directory with unittests.cfg doc

2024-04-05 Thread Andrew Jones
On Fri, Apr 05, 2024 at 06:35:14PM +1000, Nicholas Piggin wrote:
> Consolidate unittests.cfg documentation in one place.
> 
> Suggested-by: Andrew Jones 
> Signed-off-by: Nicholas Piggin 
> ---
>  arm/unittests.cfg | 26 ++---
>  docs/unittests.txt| 89 +++
>  powerpc/unittests.cfg | 25 ++--
>  riscv/unittests.cfg   | 26 ++---
>  s390x/unittests.cfg   | 18 ++---
>  x86/unittests.cfg | 26 ++---
>  6 files changed, 107 insertions(+), 103 deletions(-)
>  create mode 100644 docs/unittests.txt

This is really nice. I only found one thing, which I point out below.

> 
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index fe601cbb1..54cedea28 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -1,28 +1,10 @@
>  
> ##
>  # unittest configuration
>  #
> -# [unittest_name]
> -# file = .flat # Name of the flat file to be used.
> -# smp  =# Number of processors the VM will use
> -## during this test. Use $MAX_SMP to use
> -## the maximum the host supports. Defaults
> -## to one.
> -# extra_params = -append  # Additional parameters used.
> -# arch = arm|arm64   # Select one if the test case is
> -## specific to only one.
> -# groups =   ...   # Used to identify test cases
> -## with run_tests -g ...
> -## Specify group_name=nodefault
> -## to have test not run by
> -## default
> -# accel = kvm|tcg# Optionally specify if test must run with
> -## kvm or tcg. If not specified, then kvm will
> -## be used when available.
> -# timeout =# Optionally specify a timeout.
> -# check = = # check a file for a particular value before running
> -## a test. The check line can contain multiple files
> -## to check separated by a space but each check
> -## parameter needs to be of the form =
> +# arm specifics:
> +#
> +# file = .flat# arm uses .flat files
> +# arch = arm|arm64
>  
> ##
>  
>  #
> diff --git a/docs/unittests.txt b/docs/unittests.txt
> new file mode 100644
> index 0..53e02077c
> --- /dev/null
> +++ b/docs/unittests.txt
> @@ -0,0 +1,89 @@
> +unittests
> +*
> +
> +run_tests.sh is driven by the /unittests.cfg file. That file defines
> +test cases by specifying an executable (target image) under the /
> +directory, and how to run it. This way, for example, a single file can
> +provide multiple test cases by being run with different host configurations
> +and/or different parameters passed to it.
> +
> +Detailed output from run_tests.sh unit tests are stored in files under
> +the logs/ directory.
> +
> +unittests.cfg format
> +
> +
> +# is the comment symbol, all following contents of the line is ignored.
> +
> +Each unit test is defined as with a [unit-test-name] line, followed by

s/ as//

Otherwise,

Reviewed-by: Andrew Jones 

Thanks,
drew


Re: [kvm-unit-tests PATCH v8 04/35] (arm|s390): Use migrate_skip in test cases

2024-04-05 Thread Andrew Jones
On Fri, Apr 05, 2024 at 06:35:05PM +1000, Nicholas Piggin wrote:
> Have tests use the new migrate_skip command in skip paths, rather than
> calling migrate_once to prevent harness reporting an error.
> 
> s390x/migration.c adds a new command that looks like it was missing
> previously.
> 
> Reviewed-by: Thomas Huth 
> Signed-off-by: Nicholas Piggin 
> ---
>  arm/gic.c  | 21 -
>  s390x/migration-cmm.c  |  8 
>  s390x/migration-skey.c |  4 +++-
>  s390x/migration.c  |  1 +
>  4 files changed, 20 insertions(+), 14 deletions(-)
>

Acked-by: Andrew Jones 


[PATCH] powerpc/eeh: Permanently disable the removed device

2024-04-05 Thread Ganesh Goudar
When a device is hot removed on powernv, the hotplug
driver clears the device's state. However, on pseries,
if a device is removed by phyp after reaching the error
threshold, the kernel remains unaware, leading to the
device not being torn down. This prevents necessary
remediation actions like failover.

Permanently disable the device if the presence check
fails.

Signed-off-by: Ganesh Goudar 
---
 arch/powerpc/kernel/eeh.c| 4 +++-
 arch/powerpc/kernel/eeh_driver.c | 8 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ab316e155ea9..8d1606406d3f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -508,7 +508,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * state, PE is in good state.
 */
if ((ret < 0) ||
-   (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
+   (ret == EEH_STATE_NOT_SUPPORT &&
+dev->error_state == pci_channel_io_perm_failure) ||
+   eeh_state_active(ret)) {
eeh_stats.false_positives++;
pe->false_positives++;
rc = 0;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 48773d2d9be3..10317badf471 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -867,7 +867,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
if (!devices) {
pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n",
pe->phb->global_number, pe->addr);
-   goto out; /* nothing to recover */
+   /*
+* The device is removed, Tear down its state,
+* On powernv hotplug driver would take care of
+* it but not on pseries, Permanently disable the
+* card as it is hot removed.
+*/
+   goto recover_failed;
}
 
/* Log the event */
-- 
2.44.0



Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot

2024-04-05 Thread Nathan Lynch
Shrikanth Hegde  writes:
> When there are no options specified for lparstat, it is expected to
> give reports since LPAR(Logical Partition) boot. App is an indicator
> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
> derived using pool_idle_time which is obtained using H_PIC call.

If "App" is short for "Available Procesoor Pool" then it should be
written "APP" and the it should be introduced and defined more clearly
than this.


> The interval based reports show correct App value while since boot
> report shows very high App values. This happens because in that case app
> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
> time is reported by the PowerVM hypervisor since its boot, it need not
> align with LPAR boot. This leads to large App values.
>
> To fix that export boot pool idle time in lparcfg and powerpc-utils will
> use this info to derive App as below for since boot reports.
>
> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>
> Results:: Observe app values.
> == Shared LPAR 
> lparstat
> System Configuration
> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>
> reboot
> stress-ng --cpu=$(nproc) -t 600
> sleep 600
> So in this case app is expected to close to 37-6=31.
>
> == 6.9-rc1 and lparstat 1.3.10  =
> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
> - - --- - - - - -
> 47.48  0.01  0.0052.51 0.00  0.00 47.49 69099.72 54154721
>
> === With this patch and powerpc-utils patch to do the above equation ===
> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
> - - --- - - - - -
> 47.48  0.01  0.0052.51 5.73 47.75 47.49 31.21 54175321
> =
>
> Note: physc, purr/idle purr being inaccurate is being handled in a
> separate patch in powerpc-utils tree.
>
> Signed-off-by: Shrikanth Hegde 
> ---
> Note:
>
> This patch needs to merged first in the kernel for the powerpc-utils
> patches to work. powerpc-utils patches will be posted to its mailing
> list and link would be found in the reply to this patch if available.
>
> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
> b/arch/powerpc/platforms/pseries/lparcfg.c
> index f73c4d1c26af..8df4e7c529d7 100644
> --- a/arch/powerpc/platforms/pseries/lparcfg.c
> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>   return rc;
>  }
>
> +unsigned long boot_pool_idle_time;

Should be static, and u64. Better to use explicitly sized types for data
at the kernel-hypervisor boundary.

> +
>  /*
>   * parse_ppp_data
>   * Parse out the data returned from h_get_ppp and h_pic
> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>   h_pic(_idle_time, _procs);
>   seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>   seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
> + seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);

If boot_pool_idle_time is unsigned then the format string should be %ul
or similar, not %ld.

>   }
>
>   seq_printf(m, "unallocated_capacity_weight=%d\n",
> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>  static int __init lparcfg_init(void)
>  {
>   umode_t mode = 0444;
> + unsigned long num_procs;
>
>   /* Allow writing if we have FW_FEATURE_SPLPAR */
>   if (firmware_has_feature(FW_FEATURE_SPLPAR))
> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>   printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>   return -EIO;
>   }
> +
> + h_pic(_pool_idle_time, _procs);

h_pic() can fail, leaving the out parameters uninitialized.

> +
>   return 0;
>  }
>  machine_device_initcall(pseries, lparcfg_init);
> --
> 2.39.3


[PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at()

2024-04-05 Thread Paolo Bonzini
With the demise of the .change_pte() MMU notifier callback, there is no
notification happening in set_pte_at_notify().  It is a synonym of
set_pte_at() and can be replaced with it.

Signed-off-by: Paolo Bonzini 
---
 include/linux/mmu_notifier.h | 2 --
 kernel/events/uprobes.c  | 5 ++---
 mm/ksm.c | 4 ++--
 mm/memory.c  | 7 +--
 mm/migrate_device.c  | 8 ++--
 5 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 8c72bf651606..d39ebb10caeb 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -657,6 +657,4 @@ static inline void mmu_notifier_synchronize(void)
 
 #endif /* CONFIG_MMU_NOTIFIER */
 
-#define set_pte_at_notify set_pte_at
-
 #endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e4834d23e1d1..f4523b95c945 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include /* anon_vma_prepare */
-#include /* set_pte_at_notify */
 #include /* folio_free_swap */
 #include   /* user_enable_single_step */
 #include   /* notifier mechanism */
@@ -195,8 +194,8 @@ static int __replace_page(struct vm_area_struct *vma, 
unsigned long addr,
flush_cache_page(vma, addr, pte_pfn(ptep_get(pvmw.pte)));
ptep_clear_flush(vma, addr, pvmw.pte);
if (new_page)
-   set_pte_at_notify(mm, addr, pvmw.pte,
- mk_pte(new_page, vma->vm_page_prot));
+   set_pte_at(mm, addr, pvmw.pte,
+  mk_pte(new_page, vma->vm_page_prot));
 
folio_remove_rmap_pte(old_folio, old_page, vma);
if (!folio_mapped(old_folio))
diff --git a/mm/ksm.c b/mm/ksm.c
index 8c001819cf10..108a4d167824 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1345,7 +1345,7 @@ static int write_protect_page(struct vm_area_struct *vma, 
struct page *page,
if (pte_write(entry))
entry = pte_wrprotect(entry);
 
-   set_pte_at_notify(mm, pvmw.address, pvmw.pte, entry);
+   set_pte_at(mm, pvmw.address, pvmw.pte, entry);
}
*orig_pte = entry;
err = 0;
@@ -1447,7 +1447,7 @@ static int replace_page(struct vm_area_struct *vma, 
struct page *page,
 * See Documentation/mm/mmu_notifier.rst
 */
ptep_clear_flush(vma, addr, ptep);
-   set_pte_at_notify(mm, addr, ptep, newpte);
+   set_pte_at(mm, addr, ptep, newpte);
 
folio = page_folio(page);
folio_remove_rmap_pte(folio, page, vma);
diff --git a/mm/memory.c b/mm/memory.c
index f2bc6dd15eb8..9a6f4d8aa379 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3327,13 +3327,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
ptep_clear_flush(vma, vmf->address, vmf->pte);
folio_add_new_anon_rmap(new_folio, vma, vmf->address);
folio_add_lru_vma(new_folio, vma);
-   /*
-* We call the notify macro here because, when using secondary
-* mmu page tables (such as kvm shadow page tables), we want the
-* new page to be mapped directly into the secondary page table.
-*/
BUG_ON(unshare && pte_write(entry));
-   set_pte_at_notify(mm, vmf->address, vmf->pte, entry);
+   set_pte_at(mm, vmf->address, vmf->pte, entry);
update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1);
if (old_folio) {
/*
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index b6c27c76e1a0..66206734b1b9 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -664,13 +664,9 @@ static void migrate_vma_insert_page(struct migrate_vma 
*migrate,
if (flush) {
flush_cache_page(vma, addr, pte_pfn(orig_pte));
ptep_clear_flush(vma, addr, ptep);
-   set_pte_at_notify(mm, addr, ptep, entry);
-   update_mmu_cache(vma, addr, ptep);
-   } else {
-   /* No need to invalidate - it was non-present before */
-   set_pte_at(mm, addr, ptep, entry);
-   update_mmu_cache(vma, addr, ptep);
}
+   set_pte_at(mm, addr, ptep, entry);
+   update_mmu_cache(vma, addr, ptep);
 
pte_unmap_unlock(ptep, ptl);
*src = MIGRATE_PFN_MIGRATE;
-- 
2.43.0



[PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range()

2024-04-05 Thread Paolo Bonzini
The only user was kvm_mmu_notifier_change_pte(), which is now gone.

Signed-off-by: Paolo Bonzini 
---
 virt/kvm/kvm_main.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2fcd9979752a..970111ad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -595,8 +595,6 @@ static void kvm_null_fn(void)
 }
 #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
 
-static const union kvm_mmu_notifier_arg KVM_MMU_NOTIFIER_NO_ARG;
-
 /* Iterate over each memslot intersecting [start, last] (inclusive) range */
 #define kvm_for_each_memslot_in_hva_range(node, slots, start, last) \
for (node = interval_tree_iter_first(>hva_tree, start, last); \
@@ -682,14 +680,12 @@ static __always_inline kvm_mn_ret_t 
__kvm_handle_hva_range(struct kvm *kvm,
 static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
unsigned long start,
unsigned long end,
-   union kvm_mmu_notifier_arg arg,
gfn_handler_t handler)
 {
struct kvm *kvm = mmu_notifier_to_kvm(mn);
const struct kvm_mmu_notifier_range range = {
.start  = start,
.end= end,
-   .arg= arg,
.handler= handler,
.on_lock= (void *)kvm_null_fn,
.flush_on_ret   = true,
@@ -880,8 +876,7 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
mmu_notifier *mn,
 {
trace_kvm_age_hva(start, end);
 
-   return kvm_handle_hva_range(mn, start, end, KVM_MMU_NOTIFIER_NO_ARG,
-   kvm_age_gfn);
+   return kvm_handle_hva_range(mn, start, end, kvm_age_gfn);
 }
 
 static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
-- 
2.43.0




[PATCH 3/4] mmu_notifier: remove the .change_pte() callback

2024-04-05 Thread Paolo Bonzini
The scope of set_pte_at_notify() has reduced more and more through the
years.  Initially, it was meant for when the change to the PTE was
not bracketed by mmu_notifier_invalidate_range_{start,end}().  However,
that has not been so for over ten years.  During all this period
the only implementation of .change_pte() was KVM and it
had no actual functionality, because it was called after
mmu_notifier_invalidate_range_start() zapped the secondary PTE.

Now that this (nonfunctional) user of the .change_pte() callback is
gone, the whole callback can be removed.  For now, leave in place
set_pte_at_notify() even though it is just a synonym for set_pte_at().

Signed-off-by: Paolo Bonzini 
---
 include/linux/mmu_notifier.h | 46 ++--
 mm/mmu_notifier.c| 17 -
 2 files changed, 2 insertions(+), 61 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index f349e08a9dfe..8c72bf651606 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -122,15 +122,6 @@ struct mmu_notifier_ops {
  struct mm_struct *mm,
  unsigned long address);
 
-   /*
-* change_pte is called in cases that pte mapping to page is changed:
-* for example, when ksm remaps pte to point to a new shared page.
-*/
-   void (*change_pte)(struct mmu_notifier *subscription,
-  struct mm_struct *mm,
-  unsigned long address,
-  pte_t pte);
-
/*
 * invalidate_range_start() and invalidate_range_end() must be
 * paired and are called only when the mmap_lock and/or the
@@ -392,8 +383,6 @@ extern int __mmu_notifier_clear_young(struct mm_struct *mm,
  unsigned long end);
 extern int __mmu_notifier_test_young(struct mm_struct *mm,
 unsigned long address);
-extern void __mmu_notifier_change_pte(struct mm_struct *mm,
- unsigned long address, pte_t pte);
 extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r);
 extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r);
 extern void __mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm,
@@ -439,13 +428,6 @@ static inline int mmu_notifier_test_young(struct mm_struct 
*mm,
return 0;
 }
 
-static inline void mmu_notifier_change_pte(struct mm_struct *mm,
-  unsigned long address, pte_t pte)
-{
-   if (mm_has_notifiers(mm))
-   __mmu_notifier_change_pte(mm, address, pte);
-}
-
 static inline void
 mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 {
@@ -581,26 +563,6 @@ static inline void mmu_notifier_range_init_owner(
__young;\
 })
 
-/*
- * set_pte_at_notify() sets the pte _after_ running the notifier.
- * This is safe to start by updating the secondary MMUs, because the primary 
MMU
- * pte invalidate must have already happened with a ptep_clear_flush() before
- * set_pte_at_notify() has been invoked.  Updating the secondary MMUs first is
- * required when we change both the protection of the mapping from read-only to
- * read-write and the pfn (like during copy on write page faults). Otherwise 
the
- * old page would remain mapped readonly in the secondary MMUs after the new
- * page is already writable by some CPU through the primary MMU.
- */
-#define set_pte_at_notify(__mm, __address, __ptep, __pte)  \
-({ \
-   struct mm_struct *___mm = __mm; \
-   unsigned long ___address = __address;   \
-   pte_t ___pte = __pte;   \
-   \
-   mmu_notifier_change_pte(___mm, ___address, ___pte); \
-   set_pte_at(___mm, ___address, __ptep, ___pte);  \
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 struct mmu_notifier_range {
@@ -650,11 +612,6 @@ static inline int mmu_notifier_test_young(struct mm_struct 
*mm,
return 0;
 }
 
-static inline void mmu_notifier_change_pte(struct mm_struct *mm,
-  unsigned long address, pte_t pte)
-{
-}
-
 static inline void
 mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 {
@@ -693,7 +650,6 @@ static inline void 
mmu_notifier_subscriptions_destroy(struct mm_struct *mm)
 #defineptep_clear_flush_notify ptep_clear_flush
 #define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush
 #define pudp_huge_clear_flush_notify pudp_huge_clear_flush
-#define set_pte_at_notify set_pte_at
 
 static inline void mmu_notifier_synchronize(void)
 {
@@ -701,4 +657,6 @@ static 

[PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-05 Thread Paolo Bonzini
The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
.change_pte().

Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.

In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
.change_pte() and allow sleepable .invalidate_page() hooks.

This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.

Signed-off-by: Paolo Bonzini 
---
 arch/arm64/kvm/mmu.c  | 34 -
 arch/loongarch/include/asm/kvm_host.h |  1 -
 arch/loongarch/kvm/mmu.c  | 32 
 arch/mips/kvm/mmu.c   | 30 ---
 arch/powerpc/include/asm/kvm_ppc.h|  1 -
 arch/powerpc/kvm/book3s.c |  5 ---
 arch/powerpc/kvm/book3s.h |  1 -
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
 arch/powerpc/kvm/book3s_hv.c  |  1 -
 arch/powerpc/kvm/book3s_pr.c  |  7 
 arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
 arch/riscv/kvm/mmu.c  | 20 --
 arch/x86/kvm/mmu/mmu.c| 54 +--
 arch/x86/kvm/mmu/spte.c   | 16 
 arch/x86/kvm/mmu/spte.h   |  2 -
 arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
 arch/x86/kvm/mmu/tdp_mmu.h|  1 -
 include/linux/kvm_host.h  |  2 -
 include/trace/events/kvm.h| 15 
 virt/kvm/kvm_main.c   | 43 -
 20 files changed, 2 insertions(+), 327 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index dc04bc767865..ff17849be9f4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
return false;
 }
 
-bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
-{
-   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
-
-   if (!kvm->arch.mmu.pgt)
-   return false;
-
-   WARN_ON(range->end - range->start != 1);
-
-   /*
-* If the page isn't tagged, defer to user_mem_abort() for sanitising
-* the MTE tags. The S2 pte should have been unmapped by
-* mmu_notifier_invalidate_range_end().
-*/
-   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
-   return false;
-
-   /*
-* We've moved a page around, probably through CoW, so let's treat
-* it just like a translation fault and the map handler will clean
-* the cache to the PoC.
-*
-* The MMU notifiers will have unmapped a huge PMD before calling
-* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
-* therefore we never need to clear out a huge PMD through this
-* calling path and a memcache is not required.
-*/
-   kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
-  PAGE_SIZE, __pfn_to_phys(pfn),
-  KVM_PGTABLE_PROT_R, NULL, 0);
-
-   return false;
-}
-
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
u64 size = (range->end - range->start) << PAGE_SHIFT;
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..69305441f40d 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -203,7 +203,6 @@ void kvm_flush_tlb_all(void);
 void kvm_flush_tlb_gpa(struct kvm_vcpu *vcpu, unsigned long gpa);
 int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long badv, bool write);
 
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end, bool blockable);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index a556cff35740..98883aa23ab8 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -494,38 +494,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
kvm_gfn_range *range)
range->end << 

[PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify()

2024-04-05 Thread Paolo Bonzini
The .change_pte() MMU notifier callback was intended as an optimization
and for this reason it was initially called without a surrounding
mmu_notifier_invalidate_range_{start,end}() pair.  It was only ever
implemented by KVM (which was also the original user of MMU notifiers)
and the rules on when to call set_pte_at_notify() rather than set_pte_at()
have always been pretty obscure.

It may seem a miracle that it has never caused any hard to trigger
bugs, but there's a good reason for that: KVM's implementation has
been nonfunctional for a good part of its existence.  Already in
2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with
invalidate_range_start and invalidate_range_end", 2012-10-09) changed the
.change_pte() callback to occur within an invalidate_range_start/end()
pair; and because KVM unmaps the sPTEs during .invalidate_range_start(),
.change_pte() has no hope of finding a sPTE to change.

Therefore, all the code for .change_pte() can be removed from both KVM
and mm/, and set_pte_at_notify() can be replaced with just set_pte_at().

Please review!  Also feel free to take the KVM patches through the mm
tree, as I don't expect any conflicts.

Thanks,

Paolo

Paolo Bonzini (4):
  KVM: delete .change_pte MMU notifier callback
  KVM: remove unused argument of kvm_handle_hva_range()
  mmu_notifier: remove the .change_pte() callback
  mm: replace set_pte_at_notify() with just set_pte_at()

 arch/arm64/kvm/mmu.c  | 34 -
 arch/loongarch/include/asm/kvm_host.h |  1 -
 arch/loongarch/kvm/mmu.c  | 32 
 arch/mips/kvm/mmu.c   | 30 ---
 arch/powerpc/include/asm/kvm_ppc.h|  1 -
 arch/powerpc/kvm/book3s.c |  5 ---
 arch/powerpc/kvm/book3s.h |  1 -
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
 arch/powerpc/kvm/book3s_hv.c  |  1 -
 arch/powerpc/kvm/book3s_pr.c  |  7 
 arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
 arch/riscv/kvm/mmu.c  | 20 --
 arch/x86/kvm/mmu/mmu.c| 54 +--
 arch/x86/kvm/mmu/spte.c   | 16 
 arch/x86/kvm/mmu/spte.h   |  2 -
 arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
 arch/x86/kvm/mmu/tdp_mmu.h|  1 -
 include/linux/kvm_host.h  |  2 -
 include/linux/mmu_notifier.h  | 44 --
 include/trace/events/kvm.h| 15 
 kernel/events/uprobes.c   |  5 +--
 mm/ksm.c  |  4 +-
 mm/memory.c   |  7 +---
 mm/migrate_device.c   |  8 +---
 mm/mmu_notifier.c | 17 -
 virt/kvm/kvm_main.c   | 50 +
 26 files changed, 10 insertions(+), 411 deletions(-)

-- 
2.43.0



Re: [PATCH 9/9] mmc: Convert from tasklet to BH workqueue

2024-04-05 Thread Michał Mirosław
On Wed, Mar 27, 2024 at 04:03:14PM +, Allen Pais wrote:
> The only generic interface to execute asynchronously in the BH context is
> tasklet; however, it's marked deprecated and has some design flaws. To
> replace tasklets, BH workqueue support was recently added. A BH workqueue
> behaves similarly to regular workqueues except that the queued work items
> are executed in the BH context.
> 
> This patch converts drivers/infiniband/* from tasklet to BH workqueue.
> 
> Based on the work done by Tejun Heo 
> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
> 
> Signed-off-by: Allen Pais 
> ---
[...]
>  drivers/mmc/host/cb710-mmc.c  | 15 ++--
>  drivers/mmc/host/cb710-mmc.h  |  3 +-
[...]

Acked-by: Michał Mirosław 


Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot

2024-04-05 Thread Shrikanth Hegde



On 4/5/24 3:43 PM, Shrikanth Hegde wrote:
> When there are no options specified for lparstat, it is expected to
> give reports since LPAR(Logical Partition) boot. App is an indicator
> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
> derived using pool_idle_time which is obtained using H_PIC call.
> 
powerpc-utils link: 
https://groups.google.com/g/powerpc-utils-devel/c/WZFxrm2aCzU 



[powerpc:merge] BUILD SUCCESS 5a53fade16482b0ce6b3973714cd40fc5bf6b7ef

2024-04-05 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 5a53fade16482b0ce6b3973714cd40fc5bf6b7ef  Automatic merge of 
'master' into merge (2024-04-05 00:04)

elapsed time: 1228m

configs tested: 181
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc   randconfig-001-20240405   gcc  
arc   randconfig-002-20240405   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
armneponset_defconfig   gcc  
arm   randconfig-001-20240405   gcc  
arm   randconfig-002-20240405   clang
arm   randconfig-003-20240405   gcc  
arm   randconfig-004-20240405   clang
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20240405   gcc  
arm64 randconfig-002-20240405   gcc  
arm64 randconfig-003-20240405   gcc  
arm64 randconfig-004-20240405   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240405   gcc  
csky  randconfig-002-20240405   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
hexagon   randconfig-001-20240405   clang
hexagon   randconfig-002-20240405   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240405   gcc  
i386 buildonly-randconfig-002-20240405   gcc  
i386 buildonly-randconfig-003-20240405   clang
i386 buildonly-randconfig-004-20240405   gcc  
i386 buildonly-randconfig-005-20240405   clang
i386 buildonly-randconfig-006-20240405   clang
i386defconfig   clang
i386  randconfig-001-20240405   clang
i386  randconfig-002-20240405   gcc  
i386  randconfig-003-20240405   clang
i386  randconfig-004-20240405   clang
i386  randconfig-005-20240405   clang
i386  randconfig-006-20240405   gcc  
i386  randconfig-011-20240405   clang
i386  randconfig-012-20240405   gcc  
i386  randconfig-013-20240405   gcc  
i386  randconfig-014-20240405   gcc  
i386  randconfig-015-20240405   gcc  
i386  randconfig-016-20240405   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240405   gcc  
loongarch randconfig-002-20240405   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68k amcore_defconfig   gcc  
m68kdefconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
mips decstation_r4k_defconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20240405   gcc  
nios2 randconfig-002-20240405   gcc  
openrisc allmodconfig   gcc  
openrisc

[PATCH] powerpc/pseries: Add pool idle time at LPAR boot

2024-04-05 Thread Shrikanth Hegde
When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot. App is an indicator
for available processor pool in an Shared Processor LPAR(SPLPAR). App is
derived using pool_idle_time which is obtained using H_PIC call.

The interval based reports show correct App value while since boot
report shows very high App values. This happens because in that case app
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot. This leads to large App values.

To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive App as below for since boot reports.

App = (pool idle time - boot pool idle time) / (uptime * timebase)

Results:: Observe app values.
== Shared LPAR 
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00

reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.

== 6.9-rc1 and lparstat 1.3.10  =
%user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
- - --- - - - - -
47.48  0.01  0.0052.51 0.00  0.00 47.49 69099.72 54154721

=== With this patch and powerpc-utils patch to do the above equation ===
%user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
- - --- - - - - -
47.48  0.01  0.0052.51 5.73 47.75 47.49 31.21 54175321
=

Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.

Signed-off-by: Shrikanth Hegde 
---
Note:

This patch needs to merged first in the kernel for the powerpc-utils
patches to work. powerpc-utils patches will be posted to its mailing
list and link would be found in the reply to this patch if available.

arch/powerpc/platforms/pseries/lparcfg.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index f73c4d1c26af..8df4e7c529d7 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
return rc;
 }

+unsigned long boot_pool_idle_time;
+
 /*
  * parse_ppp_data
  * Parse out the data returned from h_get_ppp and h_pic
@@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
h_pic(_idle_time, _procs);
seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
+   seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
}

seq_printf(m, "unallocated_capacity_weight=%d\n",
@@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
 static int __init lparcfg_init(void)
 {
umode_t mode = 0444;
+   unsigned long num_procs;

/* Allow writing if we have FW_FEATURE_SPLPAR */
if (firmware_has_feature(FW_FEATURE_SPLPAR))
@@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
return -EIO;
}
+
+   h_pic(_pool_idle_time, _procs);
+
return 0;
 }
 machine_device_initcall(pseries, lparcfg_init);
--
2.39.3



Re: [PATCH v3 0/3] arch: Remove fbdev dependency from video helpers

2024-04-05 Thread Thomas Zimmermann

Hi,

if there are no further comments, can this series be merged through 
asm-generic?


Best regards
Thomas

Am 29.03.24 um 21:32 schrieb Thomas Zimmermann:

Make architecture helpers for display functionality depend on general
video functionality instead of fbdev. This avoids the dependency on
fbdev and makes the functionality available for non-fbdev code.

Patch 1 replaces the variety of Kconfig options that control the
Makefiles with CONFIG_VIDEO. More fine-grained control of the build
can then be done within each video/ directory; see parisc for an
example.

Patch 2 replaces fb_is_primary_device() with video_is_primary_device(),
which has no dependencies on fbdev. The implementation remains identical
on all affected platforms. There's one minor change in fbcon, which is
the only caller of fb_is_primary_device().

Patch 3 renames the source and header files from fbdev to video.

v3:
- arc, arm, arm64, sh, um: generate asm/video.h (Sam, Helge, Arnd)
- fix typos (Sam)
v2:
- improve cover letter
- rebase onto v6.9-rc1

Thomas Zimmermann (3):
   arch: Select fbdev helpers with CONFIG_VIDEO
   arch: Remove struct fb_info from video helpers
   arch: Rename fbdev header and source files

  arch/arc/include/asm/fb.h|  8 --
  arch/arm/include/asm/fb.h|  6 -
  arch/arm64/include/asm/fb.h  | 10 
  arch/loongarch/include/asm/{fb.h => video.h} |  8 +++---
  arch/m68k/include/asm/{fb.h => video.h}  |  8 +++---
  arch/mips/include/asm/{fb.h => video.h}  | 12 -
  arch/parisc/Makefile |  2 +-
  arch/parisc/include/asm/fb.h | 14 ---
  arch/parisc/include/asm/video.h  | 16 
  arch/parisc/video/Makefile   |  2 +-
  arch/parisc/video/{fbdev.c => video-sti.c}   |  9 ---
  arch/powerpc/include/asm/{fb.h => video.h}   |  8 +++---
  arch/powerpc/kernel/pci-common.c |  2 +-
  arch/sh/include/asm/fb.h |  7 --
  arch/sparc/Makefile  |  4 +--
  arch/sparc/include/asm/{fb.h => video.h} | 15 +--
  arch/sparc/video/Makefile|  2 +-
  arch/sparc/video/fbdev.c | 26 
  arch/sparc/video/video.c | 25 +++
  arch/um/include/asm/Kbuild   |  2 +-
  arch/x86/Makefile|  2 +-
  arch/x86/include/asm/fb.h| 19 --
  arch/x86/include/asm/video.h | 21 
  arch/x86/video/Makefile  |  3 ++-
  arch/x86/video/{fbdev.c => video.c}  | 21 +++-
  drivers/video/fbdev/core/fbcon.c |  2 +-
  include/asm-generic/Kbuild   |  2 +-
  include/asm-generic/{fb.h => video.h}| 17 +++--
  include/linux/fb.h   |  2 +-
  29 files changed, 124 insertions(+), 151 deletions(-)
  delete mode 100644 arch/arc/include/asm/fb.h
  delete mode 100644 arch/arm/include/asm/fb.h
  delete mode 100644 arch/arm64/include/asm/fb.h
  rename arch/loongarch/include/asm/{fb.h => video.h} (86%)
  rename arch/m68k/include/asm/{fb.h => video.h} (86%)
  rename arch/mips/include/asm/{fb.h => video.h} (76%)
  delete mode 100644 arch/parisc/include/asm/fb.h
  create mode 100644 arch/parisc/include/asm/video.h
  rename arch/parisc/video/{fbdev.c => video-sti.c} (78%)
  rename arch/powerpc/include/asm/{fb.h => video.h} (76%)
  delete mode 100644 arch/sh/include/asm/fb.h
  rename arch/sparc/include/asm/{fb.h => video.h} (75%)
  delete mode 100644 arch/sparc/video/fbdev.c
  create mode 100644 arch/sparc/video/video.c
  delete mode 100644 arch/x86/include/asm/fb.h
  create mode 100644 arch/x86/include/asm/video.h
  rename arch/x86/video/{fbdev.c => video.c} (66%)
  rename include/asm-generic/{fb.h => video.h} (89%)



--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Re: [PATCH 00/64] i2c: reword i2c_algorithm according to newest specification

2024-04-05 Thread Wolfram Sang
Hi Andi, hi everyone,

thank you for reviewing and waiting. I had a small personal hiatus over
Easter but now I am back. This series needs another cycle, so no need to
hurry. I will address some of the review comments but not all. The
conversion (and API improvements) are some bigger tasks, so
inconsistencies inbetween can't be avoided AFAICS.

I'll keep you updated.

Happy hacking,

   Wolfram



signature.asc
Description: PGP signature


[kvm-unit-tests PATCH v8 35/35] powerpc: gitlab CI update

2024-04-05 Thread Nicholas Piggin
This adds testing for the powernv machine, and adds a gitlab-ci test
group instead of specifying all tests in .gitlab-ci.yml.

Signed-off-by: Nicholas Piggin 
---
 .gitlab-ci.yml| 30 --
 powerpc/unittests.cfg | 14 --
 2 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 60b3cdfd2..e3638b088 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -97,17 +97,10 @@ build-ppc64be:
  - cd build
  - ../configure --arch=ppc64 --endian=big --cross-prefix=powerpc64-linux-gnu-
  - make -j2
- - ACCEL=tcg ./run_tests.sh
-  selftest-setup
-  selftest-migration
-  selftest-migration-skip
-  spapr_hcall
-  rtas-get-time-of-day
-  rtas-get-time-of-day-base
-  rtas-set-time-of-day
-  emulator
-  | tee results.txt
- - if grep -q FAIL results.txt ; then exit 1 ; fi
+ - ACCEL=tcg MAX_SMP=8 ./run_tests.sh -g gitlab-ci | tee results.txt
+ - grep -q PASS results.txt && ! grep -q FAIL results.txt
+ - ACCEL=tcg MAX_SMP=8 MACHINE=powernv ./run_tests.sh -g gitlab-ci | tee 
results.txt
+ - grep -q PASS results.txt && ! grep -q FAIL results.txt
 
 build-ppc64le:
  extends: .intree_template
@@ -115,17 +108,10 @@ build-ppc64le:
  - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu nmap-ncat
  - ./configure --arch=ppc64 --endian=little --cross-prefix=powerpc64-linux-gnu-
  - make -j2
- - ACCEL=tcg ./run_tests.sh
-  selftest-setup
-  selftest-migration
-  selftest-migration-skip
-  spapr_hcall
-  rtas-get-time-of-day
-  rtas-get-time-of-day-base
-  rtas-set-time-of-day
-  emulator
-  | tee results.txt
- - if grep -q FAIL results.txt ; then exit 1 ; fi
+ - ACCEL=tcg MAX_SMP=8 ./run_tests.sh -g gitlab-ci | tee results.txt
+ - grep -q PASS results.txt && ! grep -q FAIL results.txt
+ - ACCEL=tcg MAX_SMP=8 MACHINE=powernv ./run_tests.sh -g gitlab-ci | tee 
results.txt
+ - grep -q PASS results.txt && ! grep -q FAIL results.txt
 
 # build-riscv32:
 # Fedora doesn't package a riscv32 compiler for QEMU. Oh, well.
diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index 379aa166b..f6ddc4a7f 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -16,12 +16,12 @@
 file = selftest.elf
 smp = 2
 extra_params = -m 1g -append 'setup smp=2 mem=1024'
-groups = selftest
+groups = selftest gitlab-ci
 
 [selftest-migration]
 file = selftest-migration.elf
 machine = pseries
-groups = selftest migration
+groups = selftest migration gitlab-ci
 # TODO: Remove accel=kvm once the following TCG migration fix has been merged:
 # https://lore.kernel.org/qemu-devel/20240219061731.232570-1-npig...@gmail.com/
 accel = kvm
@@ -29,7 +29,7 @@ accel = kvm
 [selftest-migration-skip]
 file = selftest-migration.elf
 machine = pseries
-groups = selftest migration
+groups = selftest migration gitlab-ci
 extra_params = -append "skip"
 
 # This fails due to a QEMU TCG bug so KVM-only until QEMU is fixed upstream
@@ -42,6 +42,7 @@ groups = migration
 [spapr_hcall]
 file = spapr_hcall.elf
 machine = pseries
+groups = gitlab-ci
 
 [spapr_vpa]
 file = spapr_vpa.elf
@@ -52,24 +53,25 @@ file = rtas.elf
 machine = pseries
 timeout = 5
 extra_params = -append "get-time-of-day date=$(date +%s)"
-groups = rtas
+groups = rtas gitlab-ci
 
 [rtas-get-time-of-day-base]
 file = rtas.elf
 machine = pseries
 timeout = 5
 extra_params = -rtc base="2006-06-17" -append "get-time-of-day date=$(date 
--date="2006-06-17 UTC" +%s)"
-groups = rtas
+groups = rtas gitlab-ci
 
 [rtas-set-time-of-day]
 file = rtas.elf
 machine = pseries
 extra_params = -append "set-time-of-day"
 timeout = 5
-groups = rtas
+groups = rtas gitlab-ci
 
 [emulator]
 file = emulator.elf
+groups = gitlab-ci
 
 [interrupts]
 file = interrupts.elf
-- 
2.43.0



[kvm-unit-tests PATCH v8 34/35] powerpc: Remove remnants of ppc64 directory and build structure

2024-04-05 Thread Nicholas Piggin
This moves merges ppc64 directories and files into powerpc, and
merges the 3 makefiles into one.

The configure --arch=powerpc option is aliased to ppc64 for
good measure.

Signed-off-by: Nicholas Piggin 
---
 MAINTAINERS|   1 -
 configure  |   3 +-
 lib/{ppc64 => powerpc}/asm-offsets.c   |   0
 lib/{ppc64 => powerpc}/asm/asm-offsets.h   |   0
 lib/{ppc64 => powerpc}/asm/atomic.h|   0
 lib/{ppc64 => powerpc}/asm/barrier.h   |   4 +-
 lib/{ppc64 => powerpc}/asm/bitops.h|   4 +-
 lib/{ppc64 => powerpc}/asm/io.h|   4 +-
 lib/{ppc64 => powerpc}/asm/mmu.h   |   0
 lib/{ppc64 => powerpc}/asm/opal.h  |   4 +-
 lib/{ppc64 => powerpc}/asm/page.h  |   6 +-
 lib/{ppc64 => powerpc}/asm/pgtable-hwdef.h |   6 +-
 lib/{ppc64 => powerpc}/asm/pgtable.h   |   2 +-
 lib/{ppc64 => powerpc}/asm/ptrace.h|   6 +-
 lib/{ppc64 => powerpc}/asm/spinlock.h  |   6 +-
 lib/powerpc/asm/stack.h|   3 +
 lib/{ppc64 => powerpc}/asm/vpa.h   |   0
 lib/{ppc64 => powerpc}/mmu.c   |   0
 lib/{ppc64 => powerpc}/opal-calls.S|   0
 lib/{ppc64 => powerpc}/opal.c  |   0
 lib/{ppc64 => powerpc}/stack.c |   0
 lib/ppc64/.gitignore   |   1 -
 lib/ppc64/asm/handlers.h   |   1 -
 lib/ppc64/asm/hcall.h  |   1 -
 lib/ppc64/asm/memory_areas.h   |   6 --
 lib/ppc64/asm/ppc_asm.h|   1 -
 lib/ppc64/asm/processor.h  |   1 -
 lib/ppc64/asm/reg.h|   1 -
 lib/ppc64/asm/rtas.h   |   1 -
 lib/ppc64/asm/setup.h  |   1 -
 lib/ppc64/asm/smp.h|   1 -
 lib/ppc64/asm/stack.h  |  11 --
 powerpc/Makefile   | 111 -
 powerpc/Makefile.common|  95 --
 powerpc/Makefile.ppc64 |  31 --
 35 files changed, 136 insertions(+), 176 deletions(-)
 rename lib/{ppc64 => powerpc}/asm-offsets.c (100%)
 rename lib/{ppc64 => powerpc}/asm/asm-offsets.h (100%)
 rename lib/{ppc64 => powerpc}/asm/atomic.h (100%)
 rename lib/{ppc64 => powerpc}/asm/barrier.h (83%)
 rename lib/{ppc64 => powerpc}/asm/bitops.h (69%)
 rename lib/{ppc64 => powerpc}/asm/io.h (50%)
 rename lib/{ppc64 => powerpc}/asm/mmu.h (100%)
 rename lib/{ppc64 => powerpc}/asm/opal.h (90%)
 rename lib/{ppc64 => powerpc}/asm/page.h (94%)
 rename lib/{ppc64 => powerpc}/asm/pgtable-hwdef.h (93%)
 rename lib/{ppc64 => powerpc}/asm/pgtable.h (99%)
 rename lib/{ppc64 => powerpc}/asm/ptrace.h (89%)
 rename lib/{ppc64 => powerpc}/asm/spinlock.h (54%)
 rename lib/{ppc64 => powerpc}/asm/vpa.h (100%)
 rename lib/{ppc64 => powerpc}/mmu.c (100%)
 rename lib/{ppc64 => powerpc}/opal-calls.S (100%)
 rename lib/{ppc64 => powerpc}/opal.c (100%)
 rename lib/{ppc64 => powerpc}/stack.c (100%)
 delete mode 100644 lib/ppc64/.gitignore
 delete mode 100644 lib/ppc64/asm/handlers.h
 delete mode 100644 lib/ppc64/asm/hcall.h
 delete mode 100644 lib/ppc64/asm/memory_areas.h
 delete mode 100644 lib/ppc64/asm/ppc_asm.h
 delete mode 100644 lib/ppc64/asm/processor.h
 delete mode 100644 lib/ppc64/asm/reg.h
 delete mode 100644 lib/ppc64/asm/rtas.h
 delete mode 100644 lib/ppc64/asm/setup.h
 delete mode 100644 lib/ppc64/asm/smp.h
 delete mode 100644 lib/ppc64/asm/stack.h
 delete mode 100644 powerpc/Makefile.common
 delete mode 100644 powerpc/Makefile.ppc64

diff --git a/MAINTAINERS b/MAINTAINERS
index a2fa437da..1309863f2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -92,7 +92,6 @@ S: Maintained
 L: linuxppc-dev@lists.ozlabs.org
 F: powerpc/
 F: lib/powerpc/
-F: lib/ppc64/
 
 RISCV
 M: Andrew Jones 
diff --git a/configure b/configure
index a1308db8e..8508396af 100755
--- a/configure
+++ b/configure
@@ -215,6 +215,7 @@ fi
 
 arch_name=$arch
 [ "$arch" = "aarch64" ] && arch="arm64"
+[ "$arch" = "powerpc" ] && arch="ppc64"
 [ "$arch_name" = "arm64" ] && arch_name="aarch64"
 
 if [ "$arch" = "riscv" ]; then
@@ -337,7 +338,7 @@ elif [ "$arch" = "arm" ] || [ "$arch" = "arm64" ]; then
 fi
 elif [ "$arch" = "ppc64" ]; then
 testdir=powerpc
-arch_libdir=ppc64
+arch_libdir=powerpc
 firmware="$testdir/boot_rom.bin"
 if [ "$endian" != "little" ] && [ "$endian" != "big" ]; then
 echo "You must provide endianness (big or little)!"
diff --git a/lib/ppc64/asm-offsets.c b/lib/powerpc/asm-offsets.c
similarity index 100%
rename from lib/ppc64/asm-offsets.c
rename to lib/powerpc/asm-offsets.c
diff --git a/lib/ppc64/asm/asm-offsets.h b/lib/powerpc/asm/asm-offsets.h
similarity index 100%
rename from lib/ppc64/asm/asm-offsets.h
rename to lib/powerpc/asm/asm-offsets.h
diff --git a/lib/ppc64/asm/atomic.h b/lib/powerpc/asm/atomic.h
similarity index 100%
rename from lib/ppc64/asm/atomic.h
rename to lib/powerpc/asm/atomic.h
diff --git 

[kvm-unit-tests PATCH v8 33/35] configure: Make arch_libdir a first-class entity

2024-04-05 Thread Nicholas Piggin
arch_libdir was brought in to improve the heuristic determination of
the lib/ directory based on arch and testdir names, but it did not
entirely clean that mess up.

Remove the arch_libdir->arch->testdir heuristic and just require
everybody sets arch_libdir correctly. Fail if the lib/arch or
lib/arch/asm directories can not be found.

Cc: Alexandru Elisei 
Cc: Claudio Imbrenda 
Cc: David Hildenbrand 
Cc: Eric Auger 
Cc: Janosch Frank 
Cc: Laurent Vivier 
Cc: Nico Böhr 
Cc: Paolo Bonzini 
Cc: Thomas Huth 
Cc: k...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: kvm...@lists.linux.dev
Cc: kvm-ri...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Andrew Jones 
Signed-off-by: Nicholas Piggin 
---
 Makefile  |  2 +-
 configure | 18 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 4f35fffc6..4e0f54543 100644
--- a/Makefile
+++ b/Makefile
@@ -10,7 +10,7 @@ include config.mak
 VPATH = $(SRCDIR)
 
 libdirs-get = $(shell [ -d "lib/$(1)" ] && echo "lib/$(1) lib/$(1)/asm")
-ARCH_LIBDIRS := $(call libdirs-get,$(ARCH_LIBDIR)) $(call 
libdirs-get,$(TEST_DIR))
+ARCH_LIBDIRS := $(call libdirs-get,$(ARCH_LIBDIR))
 OBJDIRS := $(ARCH_LIBDIRS)
 
 DESTDIR := $(PREFIX)/share/kvm-unit-tests/
diff --git a/configure b/configure
index e19ba6f0c..a1308db8e 100755
--- a/configure
+++ b/configure
@@ -216,7 +216,6 @@ fi
 arch_name=$arch
 [ "$arch" = "aarch64" ] && arch="arm64"
 [ "$arch_name" = "arm64" ] && arch_name="aarch64"
-arch_libdir=$arch
 
 if [ "$arch" = "riscv" ]; then
 echo "riscv32 or riscv64 must be specified"
@@ -286,8 +285,10 @@ fi
 
 if [ "$arch" = "i386" ] || [ "$arch" = "x86_64" ]; then
 testdir=x86
+arch_libdir=x86
 elif [ "$arch" = "arm" ] || [ "$arch" = "arm64" ]; then
 testdir=arm
+arch_libdir=$arch
 if [ "$target" = "qemu" ]; then
 arm_uart_early_addr=0x0900
 elif [ "$target" = "kvmtool" ]; then
@@ -336,6 +337,7 @@ elif [ "$arch" = "arm" ] || [ "$arch" = "arm64" ]; then
 fi
 elif [ "$arch" = "ppc64" ]; then
 testdir=powerpc
+arch_libdir=ppc64
 firmware="$testdir/boot_rom.bin"
 if [ "$endian" != "little" ] && [ "$endian" != "big" ]; then
 echo "You must provide endianness (big or little)!"
@@ -346,6 +348,7 @@ elif [ "$arch" = "riscv32" ] || [ "$arch" = "riscv64" ]; 
then
 arch_libdir=riscv
 elif [ "$arch" = "s390x" ]; then
 testdir=s390x
+arch_libdir=s390x
 else
 echo "arch $arch is not supported!"
 arch=
@@ -355,6 +358,10 @@ if [ ! -d "$srcdir/$testdir" ]; then
 echo "$srcdir/$testdir does not exist!"
 exit 1
 fi
+if [ ! -d "$srcdir/lib/$arch_libdir" ]; then
+echo "$srcdir/lib/$arch_libdir does not exist!"
+exit 1
+fi
 
 if [ "$efi" = "y" ] && [ -f "$srcdir/$testdir/efi/run" ]; then
 ln -fs "$srcdir/$testdir/efi/run" $testdir-run
@@ -417,10 +424,11 @@ fi
 # link lib/asm for the architecture
 rm -f lib/asm
 asm="asm-generic"
-if [ -d "$srcdir/lib/$arch/asm" ]; then
-   asm="$srcdir/lib/$arch/asm"
-elif [ -d "$srcdir/lib/$testdir/asm" ]; then
-   asm="$srcdir/lib/$testdir/asm"
+if [ -d "$srcdir/lib/$arch_libdir/asm" ]; then
+asm="$srcdir/lib/$arch_libdir/asm"
+else
+echo "$srcdir/lib/$arch_libdir/asm does not exist"
+exit 1
 fi
 mkdir -p lib
 ln -sf "$asm" lib/asm
-- 
2.43.0



[kvm-unit-tests PATCH v8 32/35] powerpc: add pmu tests

2024-04-05 Thread Nicholas Piggin
Add some initial PMU testing.

- PMC5/6 tests
- PMAE / PMI test
- BHRB basic tests

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/processor.h |   2 +
 lib/powerpc/asm/reg.h   |   9 +
 lib/powerpc/asm/setup.h |   1 +
 lib/powerpc/setup.c |  23 ++
 powerpc/Makefile.common |   3 +-
 powerpc/pmu.c   | 405 
 powerpc/unittests.cfg   |   3 +
 7 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100644 powerpc/pmu.c

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index 749155696..28239c610 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -14,6 +14,8 @@ extern bool cpu_has_hv;
 extern bool cpu_has_power_mce;
 extern bool cpu_has_siar;
 extern bool cpu_has_heai;
+extern bool cpu_has_bhrb;
+extern bool cpu_has_p10_bhrb;
 extern bool cpu_has_radix;
 extern bool cpu_has_prefix;
 extern bool cpu_has_sc_lev;
diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index 69ef21adb..602fba1b6 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -40,10 +40,19 @@
 #define SPR_LPIDR  0x13f
 #define SPR_HEIR   0x153
 #define SPR_PTCR   0x1d0
+#define SPR_MMCRA  0x312
+#define   MMCRA_BHRBRD UL(0x0020)
+#define   MMCRA_IFM_MASK   UL(0xc000)
+#define SPR_PMC5   0x317
+#define SPR_PMC6   0x318
 #define SPR_MMCR0  0x31b
 #define   MMCR0_FC UL(0x8000)
+#define   MMCR0_FCPUL(0x2000)
 #define   MMCR0_PMAE   UL(0x0400)
+#define   MMCR0_BHRBA  UL(0x0020)
+#define   MMCR0_FCPC   UL(0x1000)
 #define   MMCR0_PMAO   UL(0x0080)
+#define   MMCR0_FC56   UL(0x0010)
 #define SPR_SIAR   0x31c
 
 /* Machine State Register definitions: */
diff --git a/lib/powerpc/asm/setup.h b/lib/powerpc/asm/setup.h
index 9ca318ce6..8f0b58ed0 100644
--- a/lib/powerpc/asm/setup.h
+++ b/lib/powerpc/asm/setup.h
@@ -10,6 +10,7 @@
 #define NR_CPUS8   /* arbitrarily set for now */
 
 extern uint64_t tb_hz;
+extern uint64_t cpu_hz;
 
 #define NR_MEM_REGIONS 8
 #define MR_F_PRIMARY   (1U << 0)
diff --git a/lib/powerpc/setup.c b/lib/powerpc/setup.c
index da56cb369..b56a1981a 100644
--- a/lib/powerpc/setup.c
+++ b/lib/powerpc/setup.c
@@ -33,6 +33,7 @@ u32 initrd_size;
 u32 cpu_to_hwid[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0U) };
 int nr_cpus_present;
 uint64_t tb_hz;
+uint64_t cpu_hz;
 
 struct mem_region mem_regions[NR_MEM_REGIONS];
 phys_addr_t __physical_start, __physical_end;
@@ -42,6 +43,7 @@ struct cpu_set_params {
unsigned icache_bytes;
unsigned dcache_bytes;
uint64_t tb_hz;
+   uint64_t cpu_hz;
 };
 
 static void cpu_set(int fdtnode, u64 regval, void *info)
@@ -95,6 +97,22 @@ static void cpu_set(int fdtnode, u64 regval, void *info)
data = (u32 *)prop->data;
params->tb_hz = fdt32_to_cpu(*data);
 
+   prop = fdt_get_property(dt_fdt(), fdtnode,
+   "ibm,extended-clock-frequency", NULL);
+   if (prop) {
+   data = (u32 *)prop->data;
+   params->cpu_hz = fdt32_to_cpu(*data);
+   params->cpu_hz <<= 32;
+   data = (u32 *)prop->data + 1;
+   params->cpu_hz |= fdt32_to_cpu(*data);
+   } else {
+   prop = fdt_get_property(dt_fdt(), fdtnode,
+   "clock-frequency", NULL);
+   assert(prop != NULL);
+   data = (u32 *)prop->data;
+   params->cpu_hz = fdt32_to_cpu(*data);
+   }
+
read_common_info = true;
}
 }
@@ -103,6 +121,8 @@ bool cpu_has_hv;
 bool cpu_has_power_mce; /* POWER CPU machine checks */
 bool cpu_has_siar;
 bool cpu_has_heai;
+bool cpu_has_bhrb;
+bool cpu_has_p10_bhrb;
 bool cpu_has_radix;
 bool cpu_has_prefix;
 bool cpu_has_sc_lev; /* sc interrupt has LEV field in SRR1 */
@@ -119,12 +139,14 @@ static void cpu_init_params(void)
__icache_bytes = params.icache_bytes;
__dcache_bytes = params.dcache_bytes;
tb_hz = params.tb_hz;
+   cpu_hz = params.cpu_hz;
 
switch (mfspr(SPR_PVR) & PVR_VERSION_MASK) {
case PVR_VER_POWER10:
cpu_has_prefix = true;
cpu_has_sc_lev = true;
cpu_has_pause_short = true;
+   cpu_has_p10_bhrb = true;
case PVR_VER_POWER9:
cpu_has_radix = true;
case PVR_VER_POWER8E:
@@ -133,6 +155,7 @@ static void cpu_init_params(void)
cpu_has_power_mce = true;
cpu_has_heai = true;
cpu_has_siar = true;
+   cpu_has_bhrb = true;
break;
default:
break;
diff --git a/powerpc/Makefile.common 

[kvm-unit-tests PATCH v8 31/35] powerpc: add usermode support

2024-04-05 Thread Nicholas Piggin
The biggest difficulty for user mode is MMU support. Otherwise it is
a simple matter of setting and clearing MSR[PR] with rfid and sc
respectively.

Some common harness operations will fail in usermode, so some workarounds
are reqiured (e.g., puts() can't be used directly).

A usermode privileged instruction interrupt test is added.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/processor.h |  9 +
 lib/powerpc/asm/reg.h   |  1 +
 lib/powerpc/asm/smp.h   |  1 +
 lib/powerpc/io.c|  7 +++
 lib/powerpc/processor.c | 38 +
 lib/powerpc/rtas.c  |  3 +++
 lib/powerpc/setup.c |  8 ++--
 lib/powerpc/spinlock.c  |  4 
 lib/ppc64/mmu.c |  2 ++
 powerpc/interrupts.c| 28 +++
 10 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index d348239c5..749155696 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -19,6 +19,8 @@ extern bool cpu_has_prefix;
 extern bool cpu_has_sc_lev;
 extern bool cpu_has_pause_short;
 
+bool in_usermode(void);
+
 static inline uint64_t mfspr(int nr)
 {
uint64_t ret;
@@ -51,6 +53,8 @@ static inline void local_irq_enable(void)
 {
unsigned long msr;
 
+   assert(!in_usermode());
+
asm volatile(
 "  mfmsr   %0  \n \
ori %0,%0,%1\n \
@@ -62,6 +66,8 @@ static inline void local_irq_disable(void)
 {
unsigned long msr;
 
+   assert(!in_usermode());
+
asm volatile(
 "  mfmsr   %0  \n \
andc%0,%0,%1\n \
@@ -90,4 +96,7 @@ static inline bool machine_is_pseries(void)
 void enable_mcheck(void);
 void disable_mcheck(void);
 
+void enter_usermode(void);
+void exit_usermode(void);
+
 #endif /* _ASMPOWERPC_PROCESSOR_H_ */
diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index b2fab4313..69ef21adb 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -58,5 +58,6 @@
 #define MSR_SE UL(0x0400)  /* Single Step Enable */
 #define MSR_EE UL(0x8000)
 #define MSR_ME UL(0x1000)
+#define MSR_PR UL(0x4000)
 
 #endif
diff --git a/lib/powerpc/asm/smp.h b/lib/powerpc/asm/smp.h
index 820c05e9e..b96a55903 100644
--- a/lib/powerpc/asm/smp.h
+++ b/lib/powerpc/asm/smp.h
@@ -11,6 +11,7 @@ struct cpu {
unsigned long server_no;
unsigned long stack;
unsigned long exception_stack;
+   bool in_user;
secondary_entry_fn entry;
pgd_t *pgtable;
 } __attribute__((packed)); /* used by asm */
diff --git a/lib/powerpc/io.c b/lib/powerpc/io.c
index cb7f2f050..5c2810884 100644
--- a/lib/powerpc/io.c
+++ b/lib/powerpc/io.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "io.h"
 
 static struct spinlock print_lock;
@@ -41,10 +42,16 @@ void io_init(void)
 
 void puts(const char *s)
 {
+   bool user = in_usermode();
+
+   if (user)
+   exit_usermode();
spin_lock(_lock);
while (*s)
putchar(*s++);
spin_unlock(_lock);
+   if (user)
+   enter_usermode();
 }
 
 /*
diff --git a/lib/powerpc/processor.c b/lib/powerpc/processor.c
index 09f6bb9d8..6c3000d5c 100644
--- a/lib/powerpc/processor.c
+++ b/lib/powerpc/processor.c
@@ -47,6 +47,8 @@ void do_handle_exception(struct pt_regs *regs)
unsigned char v;
 
__current_cpu = (struct cpu *)mfspr(SPR_SPRG0);
+   if (in_usermode())
+   current_cpu()->in_user = false;
 
/*
 * We run with AIL=0, so interrupts taken with MMU disabled.
@@ -60,6 +62,8 @@ void do_handle_exception(struct pt_regs *regs)
 
if (v < 128 && handlers[v].func) {
handlers[v].func(regs, handlers[v].data);
+   if (regs->msr & MSR_PR)
+   current_cpu()->in_user = true;
return;
}
 
@@ -169,3 +173,37 @@ void disable_mcheck(void)
 {
rfid_msr(mfmsr() & ~MSR_ME);
 }
+
+bool in_usermode(void)
+{
+   return current_cpu()->in_user;
+}
+
+static void usermode_sc_handler(struct pt_regs *regs, void *data)
+{
+   regs->msr &= ~(MSR_PR|MSR_EE);
+   /* Interrupt return handler will keep in_user clear */
+}
+
+void enter_usermode(void)
+{
+   assert_msg(!in_usermode(), "enter_usermode called with in_usermode");
+   /* mfmsr would fault in usermode anyway */
+   assert_msg(!(mfmsr() & MSR_PR), "enter_usermode called from user mode");
+   assert_msg(!(mfmsr() & MSR_EE), "enter_usermode called with interrupts 
enabled");
+   assert_msg((mfmsr() & (MSR_IR|MSR_DR)) == (MSR_IR|MSR_DR),
+   "enter_usermode called with virtual memory disabled");
+
+   handle_exception(0xc00, usermode_sc_handler, NULL);
+   rfid_msr(mfmsr() | (MSR_PR|MSR_IR|MSR_DR|MSR_EE));
+   current_cpu()->in_user = 

[kvm-unit-tests PATCH v8 30/35] powerpc: Add sieve.c common test

2024-04-05 Thread Nicholas Piggin
Now that sieve copes with lack of MMU support, it can be run by
powerpc.

Signed-off-by: Nicholas Piggin 
---
 powerpc/Makefile.common | 1 +
 powerpc/sieve.c | 1 +
 powerpc/unittests.cfg   | 3 +++
 3 files changed, 5 insertions(+)
 create mode 12 powerpc/sieve.c

diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index 5871da47a..410a675d9 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -8,6 +8,7 @@ tests-common = \
$(TEST_DIR)/selftest.elf \
$(TEST_DIR)/selftest-migration.elf \
$(TEST_DIR)/memory-verify.elf \
+   $(TEST_DIR)/sieve.elf \
$(TEST_DIR)/spapr_hcall.elf \
$(TEST_DIR)/rtas.elf \
$(TEST_DIR)/emulator.elf \
diff --git a/powerpc/sieve.c b/powerpc/sieve.c
new file mode 12
index 0..fe299f309
--- /dev/null
+++ b/powerpc/sieve.c
@@ -0,0 +1 @@
+../common/sieve.c
\ No newline at end of file
diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index 0be787f67..351da46a6 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -121,3 +121,6 @@ file = sprs.elf
 machine = pseries
 extra_params = -append '-w'
 groups = migration
+
+[sieve]
+file = sieve.elf
-- 
2.43.0



[kvm-unit-tests PATCH v8 29/35] common/sieve: Support machines without MMU

2024-04-05 Thread Nicholas Piggin
Not all powerpc CPUs provide MMU support. Define vm_available() that is
true by default but archs can override it. Use this to run VM tests.

Cc: Paolo Bonzini 
Cc: Thomas Huth 
Cc: k...@vger.kernel.org
Reviewed-by: Andrew Jones 
Signed-off-by: Nicholas Piggin 
---
 common/sieve.c  | 14 --
 lib/ppc64/asm/mmu.h |  1 -
 lib/ppc64/mmu.c |  2 +-
 lib/vmalloc.c   |  7 +++
 lib/vmalloc.h   |  2 ++
 5 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/common/sieve.c b/common/sieve.c
index 8fe05ef13..db084691a 100644
--- a/common/sieve.c
+++ b/common/sieve.c
@@ -40,12 +40,14 @@ int main(void)
 
 printf("starting sieve\n");
 test_sieve("static", static_data, STATIC_SIZE);
-setup_vm();
-test_sieve("mapped", static_data, STATIC_SIZE);
-for (i = 0; i < 3; ++i) {
-   v = malloc(VSIZE);
-   test_sieve("virtual", v, VSIZE);
-   free(v);
+if (vm_available()) {
+   setup_vm();
+   test_sieve("mapped", static_data, STATIC_SIZE);
+   for (i = 0; i < 3; ++i) {
+   v = malloc(VSIZE);
+   test_sieve("virtual", v, VSIZE);
+   free(v);
+   }
 }
 
 return 0;
diff --git a/lib/ppc64/asm/mmu.h b/lib/ppc64/asm/mmu.h
index fadeee4bc..eaff0f1f7 100644
--- a/lib/ppc64/asm/mmu.h
+++ b/lib/ppc64/asm/mmu.h
@@ -3,7 +3,6 @@
 
 #include 
 
-bool vm_available(void);
 bool mmu_enabled(void);
 void mmu_enable(pgd_t *pgtable);
 void mmu_disable(void);
diff --git a/lib/ppc64/mmu.c b/lib/ppc64/mmu.c
index 5307cd862..84be31752 100644
--- a/lib/ppc64/mmu.c
+++ b/lib/ppc64/mmu.c
@@ -23,7 +23,7 @@
 
 static pgd_t *identity_pgd;
 
-bool vm_available(void)
+bool vm_available(void) /* weak override */
 {
return cpu_has_radix;
 }
diff --git a/lib/vmalloc.c b/lib/vmalloc.c
index 572682576..cf2ef7a70 100644
--- a/lib/vmalloc.c
+++ b/lib/vmalloc.c
@@ -206,10 +206,17 @@ void init_alloc_vpage(void *top)
spin_unlock();
 }
 
+bool __attribute__((__weak__)) vm_available(void)
+{
+   return true;
+}
+
 void __setup_vm(void *opaque)
 {
phys_addr_t base, top;
 
+   assert_msg(vm_available(), "Virtual memory not available. Must check 
vm_available() before calling setup_vm()");
+
if (alloc_ops == _ops)
return;
 
diff --git a/lib/vmalloc.h b/lib/vmalloc.h
index 0269fdde9..e81be39f4 100644
--- a/lib/vmalloc.h
+++ b/lib/vmalloc.h
@@ -17,6 +17,8 @@ extern void setup_vm(void);
 /* As above, plus passes an opaque value to setup_mmu(). */
 extern void __setup_vm(void *opaque);
 
+/* common/ tests must check availability before calling setup_vm() */
+extern bool vm_available(void);
 /* Set up paging */
 extern void *setup_mmu(phys_addr_t top, void *opaque);
 /* Walk the page table and resolve the virtual address to a physical address */
-- 
2.43.0



[kvm-unit-tests PATCH v8 28/35] common/sieve: Use vmalloc.h for setup_mmu definition

2024-04-05 Thread Nicholas Piggin
There is no good reason to put setup_vm in libcflat.h when it's
defined in vmalloc.h.

Cc: Paolo Bonzini 
Cc: Thomas Huth 
Cc: Janosch Frank 
Cc: Claudio Imbrenda 
Cc: Nico Böhr 
Cc: David Hildenbrand 
Cc: k...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Acked-by: Andrew Jones 
Signed-off-by: Nicholas Piggin 
---
 common/sieve.c | 1 +
 lib/libcflat.h | 2 --
 lib/s390x/io.c | 1 +
 lib/s390x/uv.h | 1 +
 lib/x86/vm.h   | 1 +
 s390x/mvpg.c   | 1 +
 s390x/selftest.c   | 1 +
 x86/pmu.c  | 1 +
 x86/pmu_lbr.c  | 1 +
 x86/vmexit.c   | 1 +
 x86/vmware_backdoors.c | 1 +
 11 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/common/sieve.c b/common/sieve.c
index 8150f2d98..8fe05ef13 100644
--- a/common/sieve.c
+++ b/common/sieve.c
@@ -1,5 +1,6 @@
 #include "alloc.h"
 #include "libcflat.h"
+#include "vmalloc.h"
 
 static int sieve(char* data, int size)
 {
diff --git a/lib/libcflat.h b/lib/libcflat.h
index 700f43527..8c8dd0286 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -152,8 +152,6 @@ do {
\
 void binstr(unsigned long x, char out[BINSTR_SZ]);
 void print_binstr(unsigned long x);
 
-extern void setup_vm(void);
-
 #endif /* !__ASSEMBLY__ */
 
 #define SZ_256 (1 << 8)
diff --git a/lib/s390x/io.c b/lib/s390x/io.c
index fb7b7ddaa..2b28ccaa0 100644
--- a/lib/s390x/io.c
+++ b/lib/s390x/io.c
@@ -10,6 +10,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/lib/s390x/uv.h b/lib/s390x/uv.h
index 286933caa..00a370410 100644
--- a/lib/s390x/uv.h
+++ b/lib/s390x/uv.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 bool uv_os_is_guest(void);
 bool uv_os_is_host(void);
diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index 4b714bad7..cf39787aa 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -2,6 +2,7 @@
 #define _X86_VM_H_
 
 #include "processor.h"
+#include "vmalloc.h"
 #include "asm/page.h"
 #include "asm/io.h"
 #include "asm/bitops.h"
diff --git a/s390x/mvpg.c b/s390x/mvpg.c
index 296338d4f..a0cfc575a 100644
--- a/s390x/mvpg.c
+++ b/s390x/mvpg.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/s390x/selftest.c b/s390x/selftest.c
index 92ed4e5d3..3eaae9b06 100644
--- a/s390x/selftest.c
+++ b/s390x/selftest.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/x86/pmu.c b/x86/pmu.c
index 47a1a602a..7062c1ad9 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -6,6 +6,7 @@
 #include "x86/apic.h"
 #include "x86/desc.h"
 #include "x86/isr.h"
+#include "vmalloc.h"
 #include "alloc.h"
 
 #include "libcflat.h"
diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index 40b63fa3d..c6f010847 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -2,6 +2,7 @@
 #include "x86/processor.h"
 #include "x86/pmu.h"
 #include "x86/desc.h"
+#include "vmalloc.h"
 
 #define N 100
 
diff --git a/x86/vmexit.c b/x86/vmexit.c
index eb5d3023a..48a38f60f 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -1,6 +1,7 @@
 #include "libcflat.h"
 #include "acpi.h"
 #include "smp.h"
+#include "vmalloc.h"
 #include "pci.h"
 #include "x86/vm.h"
 #include "x86/desc.h"
diff --git a/x86/vmware_backdoors.c b/x86/vmware_backdoors.c
index bc1002056..f8cf7ecb1 100644
--- a/x86/vmware_backdoors.c
+++ b/x86/vmware_backdoors.c
@@ -6,6 +6,7 @@
 #include "x86/desc.h"
 #include "x86/isr.h"
 #include "alloc.h"
+#include "vmalloc.h"
 #include "setjmp.h"
 #include "usermode.h"
 #include "fault_test.h"
-- 
2.43.0



[kvm-unit-tests PATCH v8 27/35] powerpc: Add MMU support

2024-04-05 Thread Nicholas Piggin
Add support for radix MMU, 4kB and 64kB pages.

This also adds MMU interrupt test cases, and runs the interrupts
test entirely with MMU enabled if it is available (aside from
machine check tests).

Acked-by: Andrew Jones  (configure changes)
Signed-off-by: Nicholas Piggin 
---
 configure |  39 +++--
 lib/powerpc/asm/hcall.h   |   6 +
 lib/powerpc/asm/processor.h   |   1 +
 lib/powerpc/asm/reg.h |   3 +
 lib/powerpc/asm/smp.h |   2 +
 lib/powerpc/processor.c   |   9 ++
 lib/powerpc/setup.c   |   9 +-
 lib/ppc64/asm/mmu.h   |  11 ++
 lib/ppc64/asm/page.h  |  66 +++-
 lib/ppc64/asm/pgtable-hwdef.h |  66 
 lib/ppc64/asm/pgtable.h   | 125 +++
 lib/ppc64/mmu.c   | 281 ++
 lib/ppc64/opal-calls.S|   4 +-
 powerpc/Makefile.common   |   2 +
 powerpc/Makefile.ppc64|   1 +
 powerpc/interrupts.c  |  96 ++--
 16 files changed, 692 insertions(+), 29 deletions(-)
 create mode 100644 lib/ppc64/asm/mmu.h
 create mode 100644 lib/ppc64/asm/pgtable-hwdef.h
 create mode 100644 lib/ppc64/asm/pgtable.h
 create mode 100644 lib/ppc64/mmu.c

diff --git a/configure b/configure
index 49f047cb2..e19ba6f0c 100755
--- a/configure
+++ b/configure
@@ -245,29 +245,35 @@ fi
 if [ -z "$page_size" ]; then
 if [ "$efi" = 'y' ] && [ "$arch" = "arm64" ]; then
 page_size="4096"
-elif [ "$arch" = "arm64" ]; then
+elif [ "$arch" = "arm64" ] || [ "$arch" = "ppc64" ]; then
 page_size="65536"
 elif [ "$arch" = "arm" ]; then
 page_size="4096"
 fi
 else
-if [ "$arch" != "arm64" ]; then
-echo "--page-size is not supported for $arch"
-usage
-fi
-
 if [ "${page_size: -1}" = "K" ] || [ "${page_size: -1}" = "k" ]; then
 page_size=$(( ${page_size%?} * 1024 ))
 fi
-if [ "$page_size" != "4096" ] && [ "$page_size" != "16384" ] &&
-   [ "$page_size" != "65536" ]; then
-echo "arm64 doesn't support page size of $page_size"
+
+if [ "$arch" = "arm64" ]; then
+if [ "$page_size" != "4096" ] && [ "$page_size" != "16384" ] &&
+   [ "$page_size" != "65536" ]; then
+echo "arm64 doesn't support page size of $page_size"
+usage
+fi
+if [ "$efi" = 'y' ] && [ "$page_size" != "4096" ]; then
+echo "efi must use 4K pages"
+exit 1
+fi
+elif [ "$arch" = "ppc64" ]; then
+if [ "$page_size" != "4096" ] && [ "$page_size" != "65536" ]; then
+echo "ppc64 doesn't support page size of $page_size"
+usage
+fi
+else
+echo "--page-size is not supported for $arch"
 usage
 fi
-if [ "$efi" = 'y' ] && [ "$page_size" != "4096" ]; then
-echo "efi must use 4K pages"
-exit 1
-fi
 fi
 
 [ -z "$processor" ] && processor="$arch"
@@ -472,6 +478,13 @@ cat <> lib/config.h
 
 #define CONFIG_UART_EARLY_BASE ${arm_uart_early_addr}
 #define CONFIG_ERRATA_FORCE ${errata_force}
+
+EOF
+fi
+
+if [ "$arch" = "arm" ] || [ "$arch" = "arm64" ] || [ "$arch" = "ppc64" ]; then
+cat <> lib/config.h
+
 #define CONFIG_PAGE_SIZE _AC(${page_size}, UL)
 
 EOF
diff --git a/lib/powerpc/asm/hcall.h b/lib/powerpc/asm/hcall.h
index e0f5009e3..3b44dd204 100644
--- a/lib/powerpc/asm/hcall.h
+++ b/lib/powerpc/asm/hcall.h
@@ -24,6 +24,12 @@
 #define H_PUT_TERM_CHAR0x58
 #define H_RANDOM   0x300
 #define H_SET_MODE 0x31C
+#define H_REGISTER_PROCESS_TABLE   0x37C
+
+#define PTBL_NEW   0x18
+#define PTBL_UNREGISTER0x10
+#define PTBL_RADIX 0x04
+#define PTBL_GTSE  0x01
 
 #define KVMPPC_HCALL_BASE  0xf000
 #define KVMPPC_H_RTAS  (KVMPPC_HCALL_BASE + 0x0)
diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index a3859b5d4..d348239c5 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -14,6 +14,7 @@ extern bool cpu_has_hv;
 extern bool cpu_has_power_mce;
 extern bool cpu_has_siar;
 extern bool cpu_has_heai;
+extern bool cpu_has_radix;
 extern bool cpu_has_prefix;
 extern bool cpu_has_sc_lev;
 extern bool cpu_has_pause_short;
diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index 12f9e8ac6..b2fab4313 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -11,6 +11,7 @@
 #define SPR_SRR0   0x01a
 #define SPR_SRR1   0x01b
 #define   SRR1_PREFIX  UL(0x2000)
+#define SPR_PIDR   0x030
 #define SPR_FSCR   0x099
 #define   FSCR_PREFIX  UL(0x2000)
 #define SPR_HFSCR  0x0be
@@ -36,7 +37,9 @@
 #define SPR_LPCR   0x13e
 #define   LPCR_HDICE   UL(0x1)
 #define   LPCR_LD  UL(0x2)
+#define SPR_LPIDR  0x13f
 #define SPR_HEIR   0x153
+#define SPR_PTCR   0x1d0
 #define SPR_MMCR0  0x31b
 #define   MMCR0_FC UL(0x8000)
 #define   

[kvm-unit-tests PATCH v8 26/35] powerpc: Add timebase tests

2024-04-05 Thread Nicholas Piggin
This has a known failure on QEMU TCG machines where the decrementer
interrupt is not lowered when the DEC wraps from -ve to +ve.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/reg.h   |   1 +
 powerpc/Makefile.common |   1 +
 powerpc/timebase.c  | 329 
 powerpc/unittests.cfg   |   8 +
 4 files changed, 339 insertions(+)
 create mode 100644 powerpc/timebase.c

diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index d2ca964c4..12f9e8ac6 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -35,6 +35,7 @@
 #define SPR_HSRR1  0x13b
 #define SPR_LPCR   0x13e
 #define   LPCR_HDICE   UL(0x1)
+#define   LPCR_LD  UL(0x2)
 #define SPR_HEIR   0x153
 #define SPR_MMCR0  0x31b
 #define   MMCR0_FC UL(0x8000)
diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index b6f9b3b85..1348f658b 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -15,6 +15,7 @@ tests-common = \
$(TEST_DIR)/tm.elf \
$(TEST_DIR)/smp.elf \
$(TEST_DIR)/sprs.elf \
+   $(TEST_DIR)/timebase.elf \
$(TEST_DIR)/interrupts.elf
 
 tests-all = $(tests-common) $(tests)
diff --git a/powerpc/timebase.c b/powerpc/timebase.c
new file mode 100644
index 0..1908ca838
--- /dev/null
+++ b/powerpc/timebase.c
@@ -0,0 +1,329 @@
+/* SPDX-License-Identifier: LGPL-2.0-only */
+/*
+ * Test Timebase
+ *
+ * Copyright 2024 Nicholas Piggin, IBM Corp.
+ *
+ * This contains tests of timebase facility, TB, DEC, etc.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int dec_bits = 0;
+
+static void cpu_dec_bits(int fdtnode, u64 regval __unused, void *arg __unused)
+{
+   const struct fdt_property *prop;
+   int plen;
+
+   prop = fdt_get_property(dt_fdt(), fdtnode, "ibm,dec-bits", );
+   if (!prop) {
+   dec_bits = 32;
+   return;
+   }
+
+   /* Sanity check for the property layout (first two bytes are header) */
+   assert(plen == 4);
+
+   dec_bits = fdt32_to_cpu(*(uint32_t *)prop->data);
+}
+
+/* Check amount of CPUs nodes that have the TM flag */
+static int find_dec_bits(void)
+{
+   int ret;
+
+   ret = dt_for_each_cpu_node(cpu_dec_bits, NULL);
+   if (ret < 0)
+   return ret;
+
+   return dec_bits;
+}
+
+
+static bool do_migrate = false;
+static volatile bool got_interrupt;
+static volatile struct pt_regs recorded_regs;
+
+static uint64_t dec_max;
+static uint64_t dec_min;
+
+static void test_tb(int argc, char **argv)
+{
+   uint64_t tb;
+
+   tb = get_tb();
+   if (do_migrate)
+   migrate();
+   report(get_tb() >= tb, "timebase is incrementing");
+}
+
+static void dec_stop_handler(struct pt_regs *regs, void *data)
+{
+   mtspr(SPR_DEC, dec_max);
+}
+
+static void dec_handler(struct pt_regs *regs, void *data)
+{
+   got_interrupt = true;
+   memcpy((void *)_regs, regs, sizeof(struct pt_regs));
+   regs->msr &= ~MSR_EE;
+}
+
+static void test_dec(int argc, char **argv)
+{
+   uint64_t tb1, tb2, dec;
+   int i;
+
+   handle_exception(0x900, _handler, NULL);
+
+   for (i = 0; i < 100; i++) {
+   tb1 = get_tb();
+   mtspr(SPR_DEC, dec_max);
+   dec = mfspr(SPR_DEC);
+   tb2 = get_tb();
+   if (tb2 - tb1 < dec_max - dec)
+   break;
+   }
+   report(tb2 - tb1 >= dec_max - dec, "decrementer remains within TB after 
mtDEC");
+
+   tb1 = get_tb();
+   mtspr(SPR_DEC, dec_max);
+   mdelay(1000);
+   dec = mfspr(SPR_DEC);
+   tb2 = get_tb();
+   report(tb2 - tb1 >= dec_max - dec, "decrementer remains within TB after 
1s");
+
+   mtspr(SPR_DEC, dec_max);
+   local_irq_enable();
+   local_irq_disable();
+   if (mfspr(SPR_DEC) <= dec_max) {
+   report(!got_interrupt, "no interrupt on decrementer positive");
+   }
+   got_interrupt = false;
+
+   mtspr(SPR_DEC, 1);
+   mdelay(100); /* Give the timer a chance to run */
+   if (do_migrate)
+   migrate();
+   local_irq_enable();
+   local_irq_disable();
+   report(got_interrupt, "interrupt on decrementer underflow");
+   got_interrupt = false;
+
+   if (do_migrate)
+   migrate();
+   local_irq_enable();
+   local_irq_disable();
+   report(got_interrupt, "interrupt on decrementer still underflown");
+   got_interrupt = false;
+
+   mtspr(SPR_DEC, 0);
+   mdelay(100); /* Give the timer a chance to run */
+   if (do_migrate)
+   migrate();
+   local_irq_enable();
+   local_irq_disable();
+   report(got_interrupt, "DEC deal with set to 0");
+   got_interrupt = false;
+
+   /* Test for level-triggered decrementer */
+   mtspr(SPR_DEC, -1ULL);
+   if (do_migrate)
+

[kvm-unit-tests PATCH v8 25/35] powerpc: Add atomics tests

2024-04-05 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 powerpc/Makefile.common |   1 +
 powerpc/atomics.c   | 374 
 powerpc/unittests.cfg   |   9 +
 3 files changed, 384 insertions(+)
 create mode 100644 powerpc/atomics.c

diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index 02af54b83..b6f9b3b85 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -11,6 +11,7 @@ tests-common = \
$(TEST_DIR)/spapr_hcall.elf \
$(TEST_DIR)/rtas.elf \
$(TEST_DIR)/emulator.elf \
+   $(TEST_DIR)/atomics.elf \
$(TEST_DIR)/tm.elf \
$(TEST_DIR)/smp.elf \
$(TEST_DIR)/sprs.elf \
diff --git a/powerpc/atomics.c b/powerpc/atomics.c
new file mode 100644
index 0..c3d1cef52
--- /dev/null
+++ b/powerpc/atomics.c
@@ -0,0 +1,374 @@
+/* SPDX-License-Identifier: LGPL-2.0-only */
+/*
+ * Test some powerpc instructions
+ *
+ * Copyright 2024 Nicholas Piggin, IBM Corp.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static bool do_migrate;
+static bool do_record;
+
+#define RSV_SIZE 128
+
+static uint8_t granule[RSV_SIZE] __attribute((__aligned__(RSV_SIZE)));
+
+static void spin_lock(unsigned int *lock)
+{
+   unsigned int old;
+
+   asm volatile ("1:"
+ "lwarx%0,0,%2;"
+ "cmpwi%0,0;"
+ "bne  1b;"
+ "stwcx.   %1,0,%2;"
+ "bne- 1b;"
+ "lwsync;"
+ : "="(old) : "r"(1), "r"(lock) : "cr0", "memory");
+}
+
+static void spin_unlock(unsigned int *lock)
+{
+   asm volatile("lwsync;"
+"stw   %1,%0;"
+: "+m"(*lock) : "r"(0) : "memory");
+}
+
+static volatile bool got_interrupt;
+static volatile struct pt_regs recorded_regs;
+
+static void interrupt_handler(struct pt_regs *regs, void *opaque)
+{
+   assert(!got_interrupt);
+   got_interrupt = true;
+   memcpy((void *)_regs, regs, sizeof(struct pt_regs));
+   regs_advance_insn(regs);
+}
+
+static void test_lwarx_stwcx(int argc, char *argv[])
+{
+   unsigned int *var = (unsigned int *)granule;
+   unsigned int old;
+   unsigned int result;
+
+   *var = 0;
+   asm volatile ("1:"
+ "lwarx%0,0,%2;"
+ "stwcx.   %1,0,%2;"
+ "bne- 1b;"
+ : "="(old) : "r"(1), "r"(var) : "cr0", "memory");
+   report(old == 0 && *var == 1, "simple update");
+
+   *var = 0;
+   asm volatile ("li   %0,0;"
+ "stwcx.   %1,0,%2;"
+ "stwcx.   %1,0,%2;"
+ "bne- 1f;"
+ "li   %0,1;"
+ "1:"
+ : "="(result)
+ : "r"(1), "r"(var) : "cr0", "memory");
+   report(result == 0 && *var == 0, "failed stwcx. (no reservation)");
+
+   *var = 0;
+   asm volatile ("li   %0,0;"
+ "lwarx%1,0,%4;"
+ "stw  %3,0(%4);"
+ "stwcx.   %2,0,%4;"
+ "bne- 1f;"
+ "li   %0,1;"
+ "1:"
+ : "="(result), "="(old)
+ : "r"(1), "r"(2), "r"(var) : "cr0", "memory");
+   /* This is implementation specific, so don't fail */
+   if (result == 0 && *var == 2)
+   report(true, "failed stwcx. (intervening store)");
+   else
+   report(true, "succeeded stwcx. (intervening store)");
+
+   handle_exception(0x600, interrupt_handler, NULL);
+   handle_exception(0x700, interrupt_handler, NULL);
+
+   /* Implementations may not necessarily invoke the alignment interrupt */
+   old = 10;
+   *var = 0;
+   asm volatile (
+ "lwarx%0,0,%1;"
+ : "+"(old) : "r"((char *)var + 1));
+   report(old == 10 && got_interrupt && recorded_regs.trap == 0x600, 
"unaligned lwarx causes fault");
+   got_interrupt = false;
+
+   /*
+* Unaligned stwcx. is more difficult to test, at least under QEMU,
+* the store does not proceed if there is no matching reservation, so
+* the alignment handler does not get invoked. This is okay according
+* to the Power ISA (unalignment does not necessarily invoke the
+* alignment interrupt). But POWER CPUs do cause alignment interrupt.
+*/
+   *var = 0;
+   asm volatile (
+ "lwarx%0,0,%2;"
+ "stwcx.   %1,0,%3;"
+ : "="(old) : "r"(1), "r"(var), "r"((char *)var+1) : 
"cr0", "memory");
+   report(old == 0 && *var == 0 && got_interrupt && recorded_regs.trap == 
0x600, "unaligned stwcx. causes fault");
+   got_interrupt = false;
+
+   handle_exception(0x600, NULL, NULL);
+
+}
+
+static void 

[kvm-unit-tests PATCH v8 24/35] powerpc: Avoid using larx/stcx. in spinlocks when only one CPU is running

2024-04-05 Thread Nicholas Piggin
The test harness uses spinlocks if they are implemented with larx/stcx.
it can prevent some test scenarios such as testing migration of a
reservation.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/smp.h|  1 +
 lib/powerpc/smp.c|  5 +
 lib/powerpc/spinlock.c   | 29 +
 lib/ppc64/asm/spinlock.h |  7 ++-
 powerpc/Makefile.common  |  1 +
 5 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 lib/powerpc/spinlock.c

diff --git a/lib/powerpc/asm/smp.h b/lib/powerpc/asm/smp.h
index 4519e5436..6ef3ae521 100644
--- a/lib/powerpc/asm/smp.h
+++ b/lib/powerpc/asm/smp.h
@@ -15,6 +15,7 @@ struct cpu {
 
 extern int nr_cpus_present;
 extern int nr_cpus_online;
+extern bool multithreaded;
 extern struct cpu cpus[];
 
 register struct cpu *__current_cpu asm("r13");
diff --git a/lib/powerpc/smp.c b/lib/powerpc/smp.c
index a3bf85d44..f3b2a3faf 100644
--- a/lib/powerpc/smp.c
+++ b/lib/powerpc/smp.c
@@ -276,6 +276,8 @@ static void start_each_secondary(int fdtnode, u64 regval 
__unused, void *info)
start_core(fdtnode, datap->entry);
 }
 
+bool multithreaded = false;
+
 /*
  * Start all stopped cpus on the guest at entry with register 3 set to r3
  * We expect that we come in with only one thread currently started
@@ -290,6 +292,7 @@ bool start_all_cpus(secondary_entry_fn entry)
 
assert(nr_cpus_online == 1);
assert(nr_started == 1);
+   multithreaded = true;
ret = dt_for_each_cpu_node(start_each_secondary, );
assert(ret == 0);
assert(nr_started == nr_cpus_present);
@@ -308,8 +311,10 @@ bool start_all_cpus(secondary_entry_fn entry)
 
 void stop_all_cpus(void)
 {
+   assert(multithreaded);
while (nr_cpus_online > 1)
cpu_relax();
mb();
nr_started = 1;
+   multithreaded = false;
 }
diff --git a/lib/powerpc/spinlock.c b/lib/powerpc/spinlock.c
new file mode 100644
index 0..623a1f2c1
--- /dev/null
+++ b/lib/powerpc/spinlock.c
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: LGPL-2.0 */
+#include 
+#include 
+
+/*
+ * Skip the atomic when single-threaded, which helps avoid larx/stcx. in
+ * the harness when testing tricky larx/stcx. sequences (e.g., migration
+ * vs reservation).
+ */
+void spin_lock(struct spinlock *lock)
+{
+   if (!multithreaded) {
+   assert(lock->v == 0);
+   lock->v = 1;
+   } else {
+   while (__sync_lock_test_and_set(>v, 1))
+   ;
+   }
+}
+
+void spin_unlock(struct spinlock *lock)
+{
+   assert(lock->v == 1);
+   if (!multithreaded) {
+   lock->v = 0;
+   } else {
+   __sync_lock_release(>v);
+   }
+}
diff --git a/lib/ppc64/asm/spinlock.h b/lib/ppc64/asm/spinlock.h
index f59eed191..b952386da 100644
--- a/lib/ppc64/asm/spinlock.h
+++ b/lib/ppc64/asm/spinlock.h
@@ -1,6 +1,11 @@
 #ifndef _ASMPPC64_SPINLOCK_H_
 #define _ASMPPC64_SPINLOCK_H_
 
-#include 
+struct spinlock {
+   unsigned int v;
+};
+
+void spin_lock(struct spinlock *lock);
+void spin_unlock(struct spinlock *lock);
 
 #endif /* _ASMPPC64_SPINLOCK_H_ */
diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index 744dfc1f7..02af54b83 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -48,6 +48,7 @@ cflatobjs += lib/powerpc/rtas.o
 cflatobjs += lib/powerpc/processor.o
 cflatobjs += lib/powerpc/handlers.o
 cflatobjs += lib/powerpc/smp.o
+cflatobjs += lib/powerpc/spinlock.o
 
 OBJDIRS += lib/powerpc
 
-- 
2.43.0



[kvm-unit-tests PATCH v8 23/35] powerpc: Permit ACCEL=tcg,thread=single

2024-04-05 Thread Nicholas Piggin
Modify run script to permit single vs mttcg threading, add a
thread=single smp case to unittests.cfg.

Signed-off-by: Nicholas Piggin 
---
 powerpc/run   | 4 ++--
 powerpc/unittests.cfg | 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/powerpc/run b/powerpc/run
index 172f32a46..27abf1ef6 100755
--- a/powerpc/run
+++ b/powerpc/run
@@ -36,8 +36,8 @@ if ! $qemu -machine '?' 2>&1 | grep $MACHINE > /dev/null; then
exit 2
 fi
 
+A="-accel $ACCEL$ACCEL_PROPS"
 M="-machine $MACHINE"
-M+=",accel=$ACCEL$ACCEL_PROPS"
 B=""
 D=""
 
@@ -54,7 +54,7 @@ if [[ "$MACHINE" == "powernv"* ]] ; then
D+="-device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10"
 fi
 
-command="$qemu -nodefaults $M $B $D"
+command="$qemu -nodefaults $A $M $B $D"
 command+=" -display none -serial stdio -kernel"
 command="$(migration_cmd) $(timeout_cmd) $command"
 
diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index ddce409a8..71bfc935d 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -82,6 +82,12 @@ smp = 2
 file = smp.elf
 smp = 8,threads=4
 
+# mttcg is the default most places, so add a thread=single test
+[smp-thread-single]
+file = smp.elf
+smp = 8,threads=4
+accel = tcg,thread=single
+
 [h_cede_tm]
 file = tm.elf
 machine = pseries
-- 
2.43.0



[kvm-unit-tests PATCH v8 22/35] powerpc: add SMP and IPI support

2024-04-05 Thread Nicholas Piggin
powerpc SMP support is very primitive and does not set up a first-class
runtime environment for secondary CPUs.

This reworks SMP support, and provides a complete C and harness
environment for the secondaries, including interrupt handling, as well
as IPI support.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/processor.h |  23 +++
 lib/powerpc/asm/reg.h   |   1 +
 lib/powerpc/asm/setup.h |   2 -
 lib/powerpc/asm/smp.h   |  46 +++--
 lib/powerpc/io.c|  15 +-
 lib/powerpc/processor.c |   7 +-
 lib/powerpc/setup.c |  90 +++---
 lib/powerpc/smp.c   | 282 +
 lib/ppc64/asm-offsets.c |   7 +
 lib/ppc64/asm/atomic.h  |   6 +
 lib/ppc64/asm/barrier.h |   3 +
 lib/ppc64/asm/opal.h|   7 +
 powerpc/Makefile.common |   1 +
 powerpc/cstart64.S  |  49 -
 powerpc/selftest.c  |   4 +-
 powerpc/smp.c   | 348 
 powerpc/tm.c|   4 +-
 powerpc/unittests.cfg   |   8 +
 18 files changed, 818 insertions(+), 85 deletions(-)
 create mode 100644 lib/ppc64/asm/atomic.h
 create mode 100644 powerpc/smp.c

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index eed37d1f4..a3859b5d4 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -16,6 +16,7 @@ extern bool cpu_has_siar;
 extern bool cpu_has_heai;
 extern bool cpu_has_prefix;
 extern bool cpu_has_sc_lev;
+extern bool cpu_has_pause_short;
 
 static inline uint64_t mfspr(int nr)
 {
@@ -45,6 +46,28 @@ static inline void mtmsr(uint64_t msr)
asm volatile ("mtmsrd %[msr]" :: [msr] "r" (msr) : "memory");
 }
 
+static inline void local_irq_enable(void)
+{
+   unsigned long msr;
+
+   asm volatile(
+"  mfmsr   %0  \n \
+   ori %0,%0,%1\n \
+   mtmsrd  %0,1"
+   : "=r"(msr) : "i"(MSR_EE): "memory");
+}
+
+static inline void local_irq_disable(void)
+{
+   unsigned long msr;
+
+   asm volatile(
+"  mfmsr   %0  \n \
+   andc%0,%0,%1\n \
+   mtmsrd  %0,1"
+   : "=r"(msr) : "r"(MSR_EE): "memory");
+}
+
 /*
  * This returns true on PowerNV / OPAL machines which run in hypervisor
  * mode. False on pseries / PAPR machines that run in guest mode.
diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index d6097f48f..d2ca964c4 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -19,6 +19,7 @@
 #define SPR_SPRG1  0x111
 #define SPR_SPRG2  0x112
 #define SPR_SPRG3  0x113
+#define SPR_TBU40  0x11e
 #define SPR_PVR0x11f
 #define   PVR_VERSION_MASK UL(0x)
 #define   PVR_VER_970  UL(0x0039)
diff --git a/lib/powerpc/asm/setup.h b/lib/powerpc/asm/setup.h
index cc7cf5e25..9ca318ce6 100644
--- a/lib/powerpc/asm/setup.h
+++ b/lib/powerpc/asm/setup.h
@@ -8,8 +8,6 @@
 #include 
 
 #define NR_CPUS8   /* arbitrarily set for now */
-extern u32 cpus[NR_CPUS];
-extern int nr_cpus;
 
 extern uint64_t tb_hz;
 
diff --git a/lib/powerpc/asm/smp.h b/lib/powerpc/asm/smp.h
index 21940b4bc..4519e5436 100644
--- a/lib/powerpc/asm/smp.h
+++ b/lib/powerpc/asm/smp.h
@@ -2,21 +2,45 @@
 #define _ASMPOWERPC_SMP_H_
 
 #include 
+#include 
 
-extern int nr_threads;
+typedef void (*secondary_entry_fn)(int cpu_id);
 
-struct start_threads {
-   int nr_threads;
-   int nr_started;
-};
+struct cpu {
+   unsigned long server_no;
+   unsigned long stack;
+   unsigned long exception_stack;
+   secondary_entry_fn entry;
+} __attribute__((packed)); /* used by asm */
 
-typedef void (*secondary_entry_fn)(void);
+extern int nr_cpus_present;
+extern int nr_cpus_online;
+extern struct cpu cpus[];
 
-extern void halt(void);
+register struct cpu *__current_cpu asm("r13");
+static inline struct cpu *current_cpu(void)
+{
+   return __current_cpu;
+}
 
-extern int start_thread(int cpu_id, secondary_entry_fn entry, uint32_t r3);
-extern struct start_threads start_cpu(int cpu_node, secondary_entry_fn entry,
- uint32_t r3);
-extern bool start_all_cpus(secondary_entry_fn entry, uint32_t r3);
+static inline int smp_processor_id(void)
+{
+   return current_cpu()->server_no;
+}
+
+void cpu_init(struct cpu *cpu, int cpu_id);
+
+extern void halt(int cpu_id);
+
+extern bool start_all_cpus(secondary_entry_fn entry);
+extern void stop_all_cpus(void);
+
+struct pt_regs;
+void register_ipi(void (*fn)(struct pt_regs *, void *), void *data);
+void unregister_ipi(void);
+void cpu_init_ipis(void);
+void local_ipi_enable(void);
+void local_ipi_disable(void);
+void send_ipi(int cpu_id);
 
 #endif /* _ASMPOWERPC_SMP_H_ */
diff --git a/lib/powerpc/io.c b/lib/powerpc/io.c
index ab7bb843c..cb7f2f050 100644
--- a/lib/powerpc/io.c
+++ b/lib/powerpc/io.c
@@ -10,6 +10,7 @@
 #include 
 

[kvm-unit-tests PATCH v8 21/35] powerpc: Remove broken SMP exception stack setup

2024-04-05 Thread Nicholas Piggin
The exception stack setup does not work correctly for SMP, because
it is the boot processor that calls cpu_set() which sets SPRG2 to
the exception stack, not the target CPU itself. So secondaries
never got their SPRG2 set to a valid exception stack.

Remove the SMP code and just set an exception stack for the boot
processor. Make the stack 64kB while we're here, to match the
size of the regular stack.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/setup.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/powerpc/setup.c b/lib/powerpc/setup.c
index 9b665f59c..496af40f8 100644
--- a/lib/powerpc/setup.c
+++ b/lib/powerpc/setup.c
@@ -42,10 +42,6 @@ struct cpu_set_params {
uint64_t tb_hz;
 };
 
-#define EXCEPTION_STACK_SIZE   (32*1024) /* 32kB */
-
-static char exception_stack[NR_CPUS][EXCEPTION_STACK_SIZE];
-
 static void cpu_set(int fdtnode, u64 regval, void *info)
 {
static bool read_common_info = false;
@@ -56,10 +52,6 @@ static void cpu_set(int fdtnode, u64 regval, void *info)
 
cpus[cpu] = regval;
 
-   /* set exception stack address for this CPU (in SPGR0) */
-   asm volatile ("mtsprg0 %[addr]" ::
- [addr] "r" (exception_stack[cpu + 1]));
-
if (!read_common_info) {
const struct fdt_property *prop;
u32 *data;
@@ -180,6 +172,10 @@ static void mem_init(phys_addr_t freemem_start)
 ? __icache_bytes : __dcache_bytes);
 }
 
+#define EXCEPTION_STACK_SIZE   SZ_64K
+
+static char boot_exception_stack[EXCEPTION_STACK_SIZE];
+
 void setup(const void *fdt)
 {
void *freemem = 
@@ -189,6 +185,10 @@ void setup(const void *fdt)
 
cpu_has_hv = !!(mfmsr() & (1ULL << MSR_HV_BIT));
 
+   /* set exception stack address for this CPU (in SPGR0) */
+   asm volatile ("mtsprg0 %[addr]" ::
+ [addr] "r" (boot_exception_stack));
+
enable_mcheck();
 
/*
-- 
2.43.0



[kvm-unit-tests PATCH v8 20/35] powerpc: Add rtas stop-self support

2024-04-05 Thread Nicholas Piggin
In preparation for improved SMP support, add stop-self support to the
harness. This is non-trivial because it requires an unlocked rtas
call: a CPU can't be holding a spin lock when it goes offline or it
will deadlock other CPUs. rtas permits stop-self to be called without
serialising all other rtas operations.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/rtas.h |  2 ++
 lib/powerpc/rtas.c | 78 +-
 2 files changed, 64 insertions(+), 16 deletions(-)

diff --git a/lib/powerpc/asm/rtas.h b/lib/powerpc/asm/rtas.h
index 6fb407a18..364bf9355 100644
--- a/lib/powerpc/asm/rtas.h
+++ b/lib/powerpc/asm/rtas.h
@@ -23,8 +23,10 @@ struct rtas_args {
 extern void rtas_init(void);
 extern int rtas_token(const char *service, uint32_t *token);
 extern int rtas_call(int token, int nargs, int nret, int *outputs, ...);
+extern int rtas_call_unlocked(struct rtas_args *args, int token, int nargs, 
int nret, int *outputs, ...);
 
 extern void rtas_power_off(void);
+extern void rtas_stop_self(void);
 #endif /* __ASSEMBLY__ */
 
 #define RTAS_MSR_MASK 0xfffe
diff --git a/lib/powerpc/rtas.c b/lib/powerpc/rtas.c
index 41c0a243e..b477a38e0 100644
--- a/lib/powerpc/rtas.c
+++ b/lib/powerpc/rtas.c
@@ -87,40 +87,86 @@ int rtas_token(const char *service, uint32_t *token)
return 0;
 }
 
-int rtas_call(int token, int nargs, int nret, int *outputs, ...)
+static void __rtas_call(struct rtas_args *args)
 {
-   va_list list;
-   int ret, i;
+   enter_rtas(__pa(args));
+}
 
-   spin_lock(_lock);
+static int rtas_call_unlocked_va(struct rtas_args *args,
+ int token, int nargs, int nret, int *outputs,
+ va_list list)
+{
+   int ret, i;
 
-   rtas_args.token = cpu_to_be32(token);
-   rtas_args.nargs = cpu_to_be32(nargs);
-   rtas_args.nret = cpu_to_be32(nret);
-   rtas_args.rets = _args.args[nargs];
+   args->token = cpu_to_be32(token);
+   args->nargs = cpu_to_be32(nargs);
+   args->nret = cpu_to_be32(nret);
+   args->rets = >args[nargs];
 
-   va_start(list, outputs);
for (i = 0; i < nargs; ++i)
-   rtas_args.args[i] = cpu_to_be32(va_arg(list, u32));
-   va_end(list);
+   args->args[i] = cpu_to_be32(va_arg(list, u32));
 
for (i = 0; i < nret; ++i)
-   rtas_args.rets[i] = 0;
+   args->rets[i] = 0;
 
-   enter_rtas(__pa(_args));
+   __rtas_call(args);
 
if (nret > 1 && outputs != NULL)
for (i = 0; i < nret - 1; ++i)
-   outputs[i] = be32_to_cpu(rtas_args.rets[i + 1]);
+   outputs[i] = be32_to_cpu(args->rets[i + 1]);
+
+   ret = nret > 0 ? be32_to_cpu(args->rets[0]) : 0;
+
+   return ret;
+}
+
+int rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, 
int *outputs, ...)
+{
+   va_list list;
+   int ret;
 
-   ret = nret > 0 ? be32_to_cpu(rtas_args.rets[0]) : 0;
+   va_start(list, outputs);
+   ret = rtas_call_unlocked_va(args, token, nargs, nret, outputs, list);
+   va_end(list);
+
+   return ret;
+}
+
+int rtas_call(int token, int nargs, int nret, int *outputs, ...)
+{
+   va_list list;
+   int ret;
+
+   spin_lock(_lock);
+
+   va_start(list, outputs);
+   ret = rtas_call_unlocked_va(_args, token, nargs, nret, outputs, 
list);
+   va_end(list);
 
spin_unlock(_lock);
+
return ret;
 }
 
+void rtas_stop_self(void)
+{
+   struct rtas_args args;
+   uint32_t token;
+   int ret;
+
+   ret = rtas_token("stop-self", );
+   if (ret) {
+   puts("RTAS stop-self not available\n");
+   return;
+   }
+
+   ret = rtas_call_unlocked(, token, 0, 1, NULL);
+   printf("RTAS stop-self returned %d\n", ret);
+}
+
 void rtas_power_off(void)
 {
+   struct rtas_args args;
uint32_t token;
int ret;
 
@@ -130,6 +176,6 @@ void rtas_power_off(void)
return;
}
 
-   ret = rtas_call(token, 2, 1, NULL, -1, -1);
+   ret = rtas_call_unlocked(, token, 2, 1, NULL, -1, -1);
printf("RTAS power-off returned %d\n", ret);
 }
-- 
2.43.0



[kvm-unit-tests PATCH v8 19/35] powerpc: general interrupt tests

2024-04-05 Thread Nicholas Piggin
Add basic testing of various kinds of interrupts, machine check,
page fault, illegal, decrementer, trace, syscall, etc.

This has a known failure on QEMU TCG pseries machines where MSR[ME]
can be incorrectly set to 0.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/processor.h |   4 +
 lib/powerpc/asm/reg.h   |  17 ++
 lib/powerpc/setup.c |  11 +
 lib/ppc64/asm/ptrace.h  |  16 ++
 powerpc/Makefile.common |   3 +-
 powerpc/interrupts.c| 414 
 powerpc/unittests.cfg   |   3 +
 7 files changed, 467 insertions(+), 1 deletion(-)
 create mode 100644 powerpc/interrupts.c

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index cf1b9d8ff..eed37d1f4 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -11,7 +11,11 @@ void do_handle_exception(struct pt_regs *regs);
 #endif /* __ASSEMBLY__ */
 
 extern bool cpu_has_hv;
+extern bool cpu_has_power_mce;
+extern bool cpu_has_siar;
 extern bool cpu_has_heai;
+extern bool cpu_has_prefix;
+extern bool cpu_has_sc_lev;
 
 static inline uint64_t mfspr(int nr)
 {
diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index 782e75527..d6097f48f 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -5,8 +5,15 @@
 
 #define UL(x) _AC(x, UL)
 
+#define SPR_DSISR  0x012
+#define SPR_DAR0x013
+#define SPR_DEC0x016
 #define SPR_SRR0   0x01a
 #define SPR_SRR1   0x01b
+#define   SRR1_PREFIX  UL(0x2000)
+#define SPR_FSCR   0x099
+#define   FSCR_PREFIX  UL(0x2000)
+#define SPR_HFSCR  0x0be
 #define SPR_TB 0x10c
 #define SPR_SPRG0  0x110
 #define SPR_SPRG1  0x111
@@ -22,12 +29,17 @@
 #define   PVR_VER_POWER8   UL(0x004d)
 #define   PVR_VER_POWER9   UL(0x004e)
 #define   PVR_VER_POWER10  UL(0x0080)
+#define SPR_HDEC   0x136
 #define SPR_HSRR0  0x13a
 #define SPR_HSRR1  0x13b
+#define SPR_LPCR   0x13e
+#define   LPCR_HDICE   UL(0x1)
+#define SPR_HEIR   0x153
 #define SPR_MMCR0  0x31b
 #define   MMCR0_FC UL(0x8000)
 #define   MMCR0_PMAE   UL(0x0400)
 #define   MMCR0_PMAO   UL(0x0080)
+#define SPR_SIAR   0x31c
 
 /* Machine State Register definitions: */
 #define MSR_LE_BIT 0
@@ -35,6 +47,11 @@
 #define MSR_HV_BIT 60  /* Hypervisor mode */
 #define MSR_SF_BIT 63  /* 64-bit mode */
 
+#define MSR_DR UL(0x0010)
+#define MSR_IR UL(0x0020)
+#define MSR_BE UL(0x0200)  /* Branch Trace Enable */
+#define MSR_SE UL(0x0400)  /* Single Step Enable */
+#define MSR_EE UL(0x8000)
 #define MSR_ME UL(0x1000)
 
 #endif
diff --git a/lib/powerpc/setup.c b/lib/powerpc/setup.c
index 3c81aee9e..9b665f59c 100644
--- a/lib/powerpc/setup.c
+++ b/lib/powerpc/setup.c
@@ -87,7 +87,11 @@ static void cpu_set(int fdtnode, u64 regval, void *info)
 }
 
 bool cpu_has_hv;
+bool cpu_has_power_mce; /* POWER CPU machine checks */
+bool cpu_has_siar;
 bool cpu_has_heai;
+bool cpu_has_prefix;
+bool cpu_has_sc_lev; /* sc interrupt has LEV field in SRR1 */
 
 static void cpu_init(void)
 {
@@ -112,15 +116,22 @@ static void cpu_init(void)
 
switch (mfspr(SPR_PVR) & PVR_VERSION_MASK) {
case PVR_VER_POWER10:
+   cpu_has_prefix = true;
+   cpu_has_sc_lev = true;
case PVR_VER_POWER9:
case PVR_VER_POWER8E:
case PVR_VER_POWER8NVL:
case PVR_VER_POWER8:
+   cpu_has_power_mce = true;
cpu_has_heai = true;
+   cpu_has_siar = true;
break;
default:
break;
}
+
+   if (!cpu_has_hv) /* HEIR is HV register */
+   cpu_has_heai = false;
 }
 
 static void mem_init(phys_addr_t freemem_start)
diff --git a/lib/ppc64/asm/ptrace.h b/lib/ppc64/asm/ptrace.h
index 12de7499b..db263a59e 100644
--- a/lib/ppc64/asm/ptrace.h
+++ b/lib/ppc64/asm/ptrace.h
@@ -5,6 +5,9 @@
 #define STACK_FRAME_OVERHEAD112 /* size of minimum stack frame */
 
 #ifndef __ASSEMBLY__
+
+#include 
+
 struct pt_regs {
unsigned long gpr[32];
unsigned long nip;
@@ -17,6 +20,19 @@ struct pt_regs {
unsigned long _pad; /* stack must be 16-byte aligned */
 };
 
+static inline bool regs_is_prefix(volatile struct pt_regs *regs)
+{
+   return regs->msr & SRR1_PREFIX;
+}
+
+static inline void regs_advance_insn(struct pt_regs *regs)
+{
+   if (regs_is_prefix(regs))
+   regs->nip += 8;
+   else
+   regs->nip += 4;
+}
+
 #define STACK_INT_FRAME_SIZE(sizeof(struct pt_regs) + \
 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
 
diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index 1e181da69..68165fc25 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -12,7 +12,8 @@ 

[kvm-unit-tests PATCH v8 18/35] powerpc/sprs: Test hypervisor registers on powernv machine

2024-04-05 Thread Nicholas Piggin
This enables HV privilege registers to be tested with the powernv
machine.

Acked-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 powerpc/sprs.c | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/powerpc/sprs.c b/powerpc/sprs.c
index cb1d6c980..0a82418d6 100644
--- a/powerpc/sprs.c
+++ b/powerpc/sprs.c
@@ -199,16 +199,16 @@ static const struct spr sprs_power_common[1024] = {
 [190] = { "HFSCR", 64, HV_RW, },
 [256] = { "VRSAVE",32, RW, },
 [259] = { "SPRG3", 64, RO, },
-[284] = { "TBL",   32, HV_WO, },
-[285] = { "TBU",   32, HV_WO, },
-[286] = { "TBU40", 64, HV_WO, },
+[284] = { "TBL",   32, HV_WO, }, /* Things can go a bit wonky with */
+[285] = { "TBU",   32, HV_WO, }, /* Timebase changing. Should save */
+[286] = { "TBU40", 64, HV_WO, }, /* and restore it. */
 [304] = { "HSPRG0",64, HV_RW, },
 [305] = { "HSPRG1",64, HV_RW, },
 [306] = { "HDSISR",32, HV_RW,  SPR_INT, },
 [307] = { "HDAR",  64, HV_RW,  SPR_INT, },
 [308] = { "SPURR", 64, HV_RW | OS_RO,  SPR_ASYNC, },
 [309] = { "PURR",  64, HV_RW | OS_RO,  SPR_ASYNC, },
-[313] = { "HRMOR", 64, HV_RW, },
+[313] = { "HRMOR", 64, HV_RW,  SPR_HARNESS, }, /* Harness 
can't cope with HRMOR changing */
 [314] = { "HSRR0", 64, HV_RW,  SPR_INT, },
 [315] = { "HSRR1", 64, HV_RW,  SPR_INT, },
 [318] = { "LPCR",  64, HV_RW, },
@@ -306,7 +306,7 @@ static const struct spr sprs_power9_10[1024] = {
 [921] = { "TSCR",  32, HV_RW, },
 [922] = { "TTR",   64, HV_RW, },
 [1006]= { "TRACE", 64, WO, },
-[1008]= { "HID",   64, HV_RW, },
+[1008]= { "HID",   64, HV_RW,  SPR_HARNESS, }, /* HILE would 
be unhelpful to change */
 };
 
 /* This covers POWER8 and POWER9 PMUs */
@@ -350,6 +350,22 @@ static const struct spr sprs_power10_pmu[1024] = {
 
 static struct spr sprs[1024];
 
+static bool spr_read_perms(int spr)
+{
+   if (cpu_has_hv)
+   return !!(sprs[spr].access & SPR_HV_READ);
+   else
+   return !!(sprs[spr].access & SPR_OS_READ);
+}
+
+static bool spr_write_perms(int spr)
+{
+   if (cpu_has_hv)
+   return !!(sprs[spr].access & SPR_HV_WRITE);
+   else
+   return !!(sprs[spr].access & SPR_OS_WRITE);
+}
+
 static void setup_sprs(void)
 {
int i;
@@ -461,7 +477,7 @@ static void get_sprs(uint64_t *v)
int i;
 
for (i = 0; i < 1024; i++) {
-   if (!(sprs[i].access & SPR_OS_READ))
+   if (!spr_read_perms(i))
continue;
v[i] = __mfspr(i);
}
@@ -472,8 +488,9 @@ static void set_sprs(uint64_t val)
int i;
 
for (i = 0; i < 1024; i++) {
-   if (!(sprs[i].access & SPR_OS_WRITE))
+   if (!spr_write_perms(i))
continue;
+
if (sprs[i].type & SPR_HARNESS)
continue;
__mtspr(i, val);
@@ -561,7 +578,7 @@ int main(int argc, char **argv)
for (i = 0; i < 1024; i++) {
bool pass = true;
 
-   if (!(sprs[i].access & SPR_OS_READ))
+   if (!spr_read_perms(i))
continue;
 
if (sprs[i].width == 32) {
-- 
2.43.0



[kvm-unit-tests PATCH v8 17/35] powerpc: Fix emulator illegal instruction test for powernv

2024-04-05 Thread Nicholas Piggin
Illegal instructions cause 0xe40 (HEAI) interrupts rather
than program interrupts.

Acked-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/processor.h |  1 +
 lib/powerpc/setup.c | 13 +
 powerpc/emulator.c  | 16 
 3 files changed, 30 insertions(+)

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index 9d8061962..cf1b9d8ff 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -11,6 +11,7 @@ void do_handle_exception(struct pt_regs *regs);
 #endif /* __ASSEMBLY__ */
 
 extern bool cpu_has_hv;
+extern bool cpu_has_heai;
 
 static inline uint64_t mfspr(int nr)
 {
diff --git a/lib/powerpc/setup.c b/lib/powerpc/setup.c
index 89e5157f2..3c81aee9e 100644
--- a/lib/powerpc/setup.c
+++ b/lib/powerpc/setup.c
@@ -87,6 +87,7 @@ static void cpu_set(int fdtnode, u64 regval, void *info)
 }
 
 bool cpu_has_hv;
+bool cpu_has_heai;
 
 static void cpu_init(void)
 {
@@ -108,6 +109,18 @@ static void cpu_init(void)
hcall(H_SET_MODE, 0, 4, 0, 0);
 #endif
}
+
+   switch (mfspr(SPR_PVR) & PVR_VERSION_MASK) {
+   case PVR_VER_POWER10:
+   case PVR_VER_POWER9:
+   case PVR_VER_POWER8E:
+   case PVR_VER_POWER8NVL:
+   case PVR_VER_POWER8:
+   cpu_has_heai = true;
+   break;
+   default:
+   break;
+   }
 }
 
 static void mem_init(phys_addr_t freemem_start)
diff --git a/powerpc/emulator.c b/powerpc/emulator.c
index 39dd59645..af5174944 100644
--- a/powerpc/emulator.c
+++ b/powerpc/emulator.c
@@ -31,6 +31,20 @@ static void program_check_handler(struct pt_regs *regs, void 
*opaque)
regs->nip += 4;
 }
 
+static void heai_handler(struct pt_regs *regs, void *opaque)
+{
+   int *data = opaque;
+
+   if (verbose) {
+   printf("Detected invalid instruction %#018lx: %08x\n",
+  regs->nip, *(uint32_t*)regs->nip);
+   }
+
+   *data = 8; /* Illegal instruction */
+
+   regs->nip += 4;
+}
+
 static void alignment_handler(struct pt_regs *regs, void *opaque)
 {
int *data = opaque;
@@ -363,6 +377,8 @@ int main(int argc, char **argv)
int i;
 
handle_exception(0x700, program_check_handler, (void *)_invalid);
+   if (cpu_has_heai)
+   handle_exception(0xe40, heai_handler, (void *)_invalid);
handle_exception(0x600, alignment_handler, (void *));
 
for (i = 1; i < argc; i++) {
-- 
2.43.0



[kvm-unit-tests PATCH v8 16/35] powerpc: Support powernv machine with QEMU TCG

2024-04-05 Thread Nicholas Piggin
Add support for QEMU's powernv machine. This uses standard firmware
(skiboot) rather than a minimal firmware shim.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/processor.h | 23 +++
 lib/powerpc/asm/reg.h   |  4 ++
 lib/powerpc/hcall.c |  4 +-
 lib/powerpc/io.c| 27 -
 lib/powerpc/io.h|  6 +++
 lib/powerpc/processor.c | 37 ++
 lib/powerpc/setup.c | 14 +--
 lib/ppc64/asm/opal.h| 15 
 lib/ppc64/opal-calls.S  | 50 
 lib/ppc64/opal.c| 76 +
 powerpc/Makefile.ppc64  |  2 +
 powerpc/cstart64.S  |  7 
 powerpc/run | 42 
 powerpc/unittests.cfg   | 10 -
 14 files changed, 301 insertions(+), 16 deletions(-)
 create mode 100644 lib/ppc64/asm/opal.h
 create mode 100644 lib/ppc64/opal-calls.S
 create mode 100644 lib/ppc64/opal.c

diff --git a/lib/powerpc/asm/processor.h b/lib/powerpc/asm/processor.h
index e415f9235..9d8061962 100644
--- a/lib/powerpc/asm/processor.h
+++ b/lib/powerpc/asm/processor.h
@@ -10,6 +10,8 @@ void handle_exception(int trap, void (*func)(struct pt_regs 
*, void *), void *);
 void do_handle_exception(struct pt_regs *regs);
 #endif /* __ASSEMBLY__ */
 
+extern bool cpu_has_hv;
+
 static inline uint64_t mfspr(int nr)
 {
uint64_t ret;
@@ -38,4 +40,25 @@ static inline void mtmsr(uint64_t msr)
asm volatile ("mtmsrd %[msr]" :: [msr] "r" (msr) : "memory");
 }
 
+/*
+ * This returns true on PowerNV / OPAL machines which run in hypervisor
+ * mode. False on pseries / PAPR machines that run in guest mode.
+ */
+static inline bool machine_is_powernv(void)
+{
+   return cpu_has_hv;
+}
+
+/*
+ * This returns true on pseries / PAPR / KVM machines which run under a
+ * hypervisor or QEMU pseries machine. False for PowerNV / OPAL.
+ */
+static inline bool machine_is_pseries(void)
+{
+   return !machine_is_powernv();
+}
+
+void enable_mcheck(void);
+void disable_mcheck(void);
+
 #endif /* _ASMPOWERPC_PROCESSOR_H_ */
diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index c80b32059..782e75527 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -30,7 +30,11 @@
 #define   MMCR0_PMAO   UL(0x0080)
 
 /* Machine State Register definitions: */
+#define MSR_LE_BIT 0
 #define MSR_EE_BIT 15  /* External Interrupts Enable */
+#define MSR_HV_BIT 60  /* Hypervisor mode */
 #define MSR_SF_BIT 63  /* 64-bit mode */
 
+#define MSR_ME UL(0x1000)
+
 #endif
diff --git a/lib/powerpc/hcall.c b/lib/powerpc/hcall.c
index b4d39ac65..45f201315 100644
--- a/lib/powerpc/hcall.c
+++ b/lib/powerpc/hcall.c
@@ -25,7 +25,7 @@ int hcall_have_broken_sc1(void)
return r3 == (unsigned long)H_PRIVILEGE;
 }
 
-void putchar(int c)
+void papr_putchar(int c)
 {
unsigned long vty = 0;  /* 0 == default */
unsigned long nr_chars = 1;
@@ -34,7 +34,7 @@ void putchar(int c)
hcall(H_PUT_TERM_CHAR, vty, nr_chars, chars);
 }
 
-int __getchar(void)
+int __papr_getchar(void)
 {
register unsigned long r3 asm("r3") = H_GET_TERM_CHAR;
register unsigned long r4 asm("r4") = 0; /* 0 == default vty */
diff --git a/lib/powerpc/io.c b/lib/powerpc/io.c
index a381688bc..ab7bb843c 100644
--- a/lib/powerpc/io.c
+++ b/lib/powerpc/io.c
@@ -9,13 +9,33 @@
 #include 
 #include 
 #include 
+#include 
 #include "io.h"
 
 static struct spinlock print_lock;
 
+void putchar(int c)
+{
+   if (machine_is_powernv())
+   opal_putchar(c);
+   else
+   papr_putchar(c);
+}
+
+int __getchar(void)
+{
+   if (machine_is_powernv())
+   return __opal_getchar();
+   else
+   return __papr_getchar();
+}
+
 void io_init(void)
 {
-   rtas_init();
+   if (machine_is_powernv())
+   assert(!opal_init());
+   else
+   rtas_init();
 }
 
 void puts(const char *s)
@@ -38,7 +58,10 @@ void exit(int code)
 // FIXME: change this print-exit/rtas-poweroff to chr_testdev_exit(),
 //maybe by plugging chr-testdev into a spapr-vty.
printf("\nEXIT: STATUS=%d\n", ((code) << 1) | 1);
-   rtas_power_off();
+   if (machine_is_powernv())
+   opal_power_off();
+   else
+   rtas_power_off();
halt(code);
__builtin_unreachable();
 }
diff --git a/lib/powerpc/io.h b/lib/powerpc/io.h
index d4f21ba15..943bf142b 100644
--- a/lib/powerpc/io.h
+++ b/lib/powerpc/io.h
@@ -8,6 +8,12 @@
 #define _POWERPC_IO_H_
 
 extern void io_init(void);
+extern int opal_init(void);
+extern void opal_power_off(void);
 extern void putchar(int c);
+extern void opal_putchar(int c);
+extern void papr_putchar(int c);
+extern int __opal_getchar(void);
+extern int __papr_getchar(void);
 
 #endif
diff --git 

[kvm-unit-tests PATCH v8 15/35] scripts: Accommodate powerpc powernv machine differences

2024-04-05 Thread Nicholas Piggin
The QEMU powerpc powernv machine has minor differences that must be
accommodated for in output parsing:

- Summary parsing must search more lines of output for the summary
  line, to accommodate OPAL message on shutdown.
- Premature failure testing must tolerate case differences in kernel
  load error message.

Acked-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 powerpc/unittests.cfg | 1 +
 scripts/runtime.bash  | 6 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index 432c81d58..4929e71a1 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -4,6 +4,7 @@
 # powerpc specifics:
 #
 # file = .elf # powerpc uses .elf files
+# machine = pseries|powernv
 ##
 
 #
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index a66940ead..e4ad1962f 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -9,7 +9,7 @@ FAIL() { echo -ne "\e[31mFAIL\e[0m"; }
 extract_summary()
 {
 local cr=$'\r'
-tail -3 | grep '^SUMMARY: ' | sed 's/^SUMMARY: /(/;s/'"$cr"'\{0,1\}$/)/'
+tail -5 | grep '^SUMMARY: ' | sed 's/^SUMMARY: /(/;s/'"$cr"'\{0,1\}$/)/'
 }
 
 # We assume that QEMU is going to work if it tried to load the kernel
@@ -18,7 +18,9 @@ premature_failure()
 local log="$(eval "$(get_cmdline _NO_FILE_4Uhere_)" 2>&1)"
 
 echo "$log" | grep "_NO_FILE_4Uhere_" |
-grep -q -e "could not \(load\|open\) kernel" -e "error loading" -e 
"failed to load" &&
+grep -q -e "[Cc]ould not \(load\|open\) kernel" \
+-e "error loading" \
+-e "failed to load" &&
 return 1
 
 RUNTIME_log_stderr <<< "$log"
-- 
2.43.0



[kvm-unit-tests PATCH v8 14/35] scripts: allow machine option to be specified in unittests.cfg

2024-04-05 Thread Nicholas Piggin
This allows different machines with different requirements to be
supported by run_tests.sh, similarly to how different accelerators
are handled.

Acked-by: Thomas Huth 
Acked-by: Andrew Jones 
Signed-off-by: Nicholas Piggin 
---
 docs/unittests.txt   |  6 ++
 scripts/common.bash  |  8 ++--
 scripts/runtime.bash | 16 
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/docs/unittests.txt b/docs/unittests.txt
index 53e02077c..5b184723c 100644
--- a/docs/unittests.txt
+++ b/docs/unittests.txt
@@ -42,6 +42,12 @@ For / directories that support multiple architectures, 
this restricts
 the test to the specified arch. By default, the test will run on any
 architecture.
 
+machine
+---
+For those architectures that support multiple machine types, this allows
+machine-specific tests to be created. By default, the test will run on
+any machine type.
+
 smp
 ---
 smp = 
diff --git a/scripts/common.bash b/scripts/common.bash
index b9413d683..ee1dd8659 100644
--- a/scripts/common.bash
+++ b/scripts/common.bash
@@ -10,6 +10,7 @@ function for_each_unittest()
local opts
local groups
local arch
+   local machine
local check
local accel
local timeout
@@ -21,7 +22,7 @@ function for_each_unittest()
if [[ "$line" =~ ^\[(.*)\]$ ]]; then
rematch=${BASH_REMATCH[1]}
if [ -n "${testname}" ]; then
-   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" 
"$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
+   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" 
"$kernel" "$opts" "$arch" "$machine" "$check" "$accel" "$timeout"
fi
testname=$rematch
smp=1
@@ -29,6 +30,7 @@ function for_each_unittest()
opts=""
groups=""
arch=""
+   machine=""
check=""
accel=""
timeout=""
@@ -58,6 +60,8 @@ function for_each_unittest()
groups=${BASH_REMATCH[1]}
elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
arch=${BASH_REMATCH[1]}
+   elif [[ $line =~ ^machine\ *=\ *(.*)$ ]]; then
+   machine=${BASH_REMATCH[1]}
elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then
check=${BASH_REMATCH[1]}
elif [[ $line =~ ^accel\ *=\ *(.*)$ ]]; then
@@ -67,7 +71,7 @@ function for_each_unittest()
fi
done
if [ -n "${testname}" ]; then
-   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" 
"$opts" "$arch" "$check" "$accel" "$timeout"
+   $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" 
"$opts" "$arch" "$machine" "$check" "$accel" "$timeout"
fi
exec {fd}<&-
 }
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 255e756f2..a66940ead 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -30,7 +30,7 @@ premature_failure()
 get_cmdline()
 {
 local kernel=$1
-echo "TESTNAME=$testname TIMEOUT=$timeout ACCEL=$accel $RUNTIME_arch_run 
$kernel -smp $smp $opts"
+echo "TESTNAME=$testname TIMEOUT=$timeout MACHINE=$machine ACCEL=$accel 
$RUNTIME_arch_run $kernel -smp $smp $opts"
 }
 
 skip_nodefault()
@@ -78,9 +78,10 @@ function run()
 local kernel="$4"
 local opts="$5"
 local arch="$6"
-local check="${CHECK:-$7}"
-local accel="$8"
-local timeout="${9:-$TIMEOUT}" # unittests.cfg overrides the default
+local machine="$7"
+local check="${CHECK:-$8}"
+local accel="$9"
+local timeout="${10:-$TIMEOUT}" # unittests.cfg overrides the default
 
 if [ "${CONFIG_EFI}" == "y" ]; then
 kernel=${kernel/%.flat/.efi}
@@ -114,6 +115,13 @@ function run()
 return 2
 fi
 
+if [ -n "$machine" ] && [ -n "$MACHINE" ] && [ "$machine" != "$MACHINE" ]; 
then
+print_result "SKIP" $testname "" "$machine only"
+return 2
+elif [ -n "$MACHINE" ]; then
+machine="$MACHINE"
+fi
+
 if [ -n "$accel" ] && [ -n "$ACCEL" ] && [ "$accel" != "$ACCEL" ]; then
 print_result "SKIP" $testname "" "$accel only, but ACCEL=$ACCEL"
 return 2
-- 
2.43.0



[kvm-unit-tests PATCH v8 13/35] doc: start documentation directory with unittests.cfg doc

2024-04-05 Thread Nicholas Piggin
Consolidate unittests.cfg documentation in one place.

Suggested-by: Andrew Jones 
Signed-off-by: Nicholas Piggin 
---
 arm/unittests.cfg | 26 ++---
 docs/unittests.txt| 89 +++
 powerpc/unittests.cfg | 25 ++--
 riscv/unittests.cfg   | 26 ++---
 s390x/unittests.cfg   | 18 ++---
 x86/unittests.cfg | 26 ++---
 6 files changed, 107 insertions(+), 103 deletions(-)
 create mode 100644 docs/unittests.txt

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index fe601cbb1..54cedea28 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -1,28 +1,10 @@
 ##
 # unittest configuration
 #
-# [unittest_name]
-# file = .flat   # Name of the flat file to be used.
-# smp  =  # Number of processors the VM will use
-#  # during this test. Use $MAX_SMP to use
-#  # the maximum the host supports. Defaults
-#  # to one.
-# extra_params = -append# Additional parameters used.
-# arch = arm|arm64 # Select one if the test case is
-#  # specific to only one.
-# groups =   ... # Used to identify test cases
-#  # with run_tests -g ...
-#  # Specify group_name=nodefault
-#  # to have test not run by
-#  # default
-# accel = kvm|tcg  # Optionally specify if test must run with
-#  # kvm or tcg. If not specified, then kvm will
-#  # be used when available.
-# timeout =  # Optionally specify a timeout.
-# check = = # check a file for a particular value before running
-## a test. The check line can contain multiple files
-## to check separated by a space but each check
-## parameter needs to be of the form =
+# arm specifics:
+#
+# file = .flat# arm uses .flat files
+# arch = arm|arm64
 ##
 
 #
diff --git a/docs/unittests.txt b/docs/unittests.txt
new file mode 100644
index 0..53e02077c
--- /dev/null
+++ b/docs/unittests.txt
@@ -0,0 +1,89 @@
+unittests
+*
+
+run_tests.sh is driven by the /unittests.cfg file. That file defines
+test cases by specifying an executable (target image) under the /
+directory, and how to run it. This way, for example, a single file can
+provide multiple test cases by being run with different host configurations
+and/or different parameters passed to it.
+
+Detailed output from run_tests.sh unit tests are stored in files under
+the logs/ directory.
+
+unittests.cfg format
+
+
+# is the comment symbol, all following contents of the line is ignored.
+
+Each unit test is defined as with a [unit-test-name] line, followed by
+a set of parameters that control how the test case is run. The name is
+arbitrary and appears in the status reporting output.
+
+Parameters appear on their own lines under the test name, and have a
+param = value format.
+
+Available parameters
+
+Note! Some parameters like smp and extra_params modify how a test is run,
+while others like arch and accel restrict the configurations in which the
+test is run.
+
+file
+
+file = 
+
+This parameter is mandatory and specifies which binary under the /
+directory to run. Typically this is .flat or .elf, depending
+on the arch. The directory name is not included, only the file name.
+
+arch
+
+For / directories that support multiple architectures, this restricts
+the test to the specified arch. By default, the test will run on any
+architecture.
+
+smp
+---
+smp = 
+
+Optional, the number of processors created in the machine to run the test.
+Defaults to 1. $MAX_SMP can be used to specify the maximum supported.
+
+extra_params
+
+These are extra parameters supplied to the QEMU process. -append '...' can
+be used to pass arguments into the test case argv. Multiple parameters can
+be added, for example:
+
+extra_params = -m 256 -append 'smp=2'
+
+groups
+--
+groups =   ...
+
+Used to group the test cases for the `run_tests.sh -g ...` run group
+option. Adding a test to the nodefault group will cause it to not be
+run by default.
+
+accel
+-
+accel = kvm|tcg
+
+This restricts the test to the specified accelerator. By default, the
+test will run on either accelerator. (Note, the accelerator can be
+specified with ACCEL= environment variable, and defaults to KVM if
+available).
+
+timeout
+---
+timeout = 
+
+Optional timeout in seconds, after which the test will be killed and fail.
+
+check
+-
+check = =<
+
+Check a file for a 

[kvm-unit-tests PATCH v8 12/35] powerpc/sprs: Avoid taking PMU interrupts caused by register fuzzing

2024-04-05 Thread Nicholas Piggin
Storing certain values in MMCR0 can cause PMU interrupts when msleep
enables MSR[EE], and this crashes the test. Freeze the PMU counters
and clear any PMU exception before calling msleep.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/reg.h |  4 
 powerpc/sprs.c| 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index 1f991288e..c80b32059 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -24,6 +24,10 @@
 #define   PVR_VER_POWER10  UL(0x0080)
 #define SPR_HSRR0  0x13a
 #define SPR_HSRR1  0x13b
+#define SPR_MMCR0  0x31b
+#define   MMCR0_FC UL(0x8000)
+#define   MMCR0_PMAE   UL(0x0400)
+#define   MMCR0_PMAO   UL(0x0080)
 
 /* Machine State Register definitions: */
 #define MSR_EE_BIT 15  /* External Interrupts Enable */
diff --git a/powerpc/sprs.c b/powerpc/sprs.c
index 44edd0d7b..cb1d6c980 100644
--- a/powerpc/sprs.c
+++ b/powerpc/sprs.c
@@ -476,12 +476,7 @@ static void set_sprs(uint64_t val)
continue;
if (sprs[i].type & SPR_HARNESS)
continue;
-   if (!strcmp(sprs[i].name, "MMCR0")) {
-   /* XXX: could use a comment or better abstraction! */
-   __mtspr(i, (val & 0xfbab3fffULL) | 0xfa0b2070);
-   } else {
-   __mtspr(i, val);
-   }
+   __mtspr(i, val);
}
 }
 
@@ -538,6 +533,16 @@ int main(int argc, char **argv)
if (sprs[895].name)
before[895] = mfspr(895);
} else {
+   /*
+* msleep will enable MSR[EE] and take a decrementer
+* interrupt. Must account for changed registers and
+* prevent taking unhandled interrupts.
+*/
+   /* Prevent PMU interrupt */
+   mtspr(SPR_MMCR0, (mfspr(SPR_MMCR0) | MMCR0_FC) &
+   ~(MMCR0_PMAO | MMCR0_PMAE));
+   before[SPR_MMCR0] = mfspr(SPR_MMCR0);
+   before[779] = mfspr(SPR_MMCR0);
msleep(2000);
 
/* Reload regs changed by dec interrupt */
-- 
2.43.0



[kvm-unit-tests PATCH v8 11/35] powerpc/sprs: Specify SPRs with data rather than code

2024-04-05 Thread Nicholas Piggin
A significant rework that builds an array of 'struct spr', where each
element describes an SPR. This makes various metadata about the SPR
like name and access type easier to carry and use.

Hypervisor privileged registers are described despite not being used
at the moment for completeness, but also the code might one day be
reused for a hypervisor-privileged test.

Acked-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/asm/reg.h |   2 +
 powerpc/sprs.c| 647 +-
 2 files changed, 457 insertions(+), 192 deletions(-)

diff --git a/lib/powerpc/asm/reg.h b/lib/powerpc/asm/reg.h
index 6810c1d82..1f991288e 100644
--- a/lib/powerpc/asm/reg.h
+++ b/lib/powerpc/asm/reg.h
@@ -5,6 +5,8 @@
 
 #define UL(x) _AC(x, UL)
 
+#define SPR_SRR0   0x01a
+#define SPR_SRR1   0x01b
 #define SPR_TB 0x10c
 #define SPR_SPRG0  0x110
 #define SPR_SPRG1  0x111
diff --git a/powerpc/sprs.c b/powerpc/sprs.c
index a19d80a1a..44edd0d7b 100644
--- a/powerpc/sprs.c
+++ b/powerpc/sprs.c
@@ -30,229 +30,458 @@
 #include 
 #include 
 
-uint64_t before[1024], after[1024];
-
-/* Common SPRs for all PowerPC CPUs */
-static void set_sprs_common(uint64_t val)
+/* "Indirect" mfspr/mtspr which accept a non-constant spr number */
+static uint64_t __mfspr(unsigned spr)
 {
-   mtspr(9, val);  /* CTR */
-   // mtspr(273, val); /* SPRG1 */  /* Used by our exception handler */
-   mtspr(274, val);/* SPRG2 */
-   mtspr(275, val);/* SPRG3 */
+   uint64_t tmp;
+   uint64_t ret;
+
+   asm volatile(
+"  bcl 20, 31, 1f  \n"
+"1:mflr%0  \n"
+"  addi%0, %0, (2f-1b) \n"
+"  add %0, %0, %2  \n"
+"  mtctr   %0  \n"
+"  bctr\n"
+"2:\n"
+".LSPR=0   \n"
+".rept 1024\n"
+"  mfspr   %1, .LSPR   \n"
+"  b   3f  \n"
+"  .LSPR=.LSPR+1   \n"
+".endr \n"
+"3:\n"
+   : "="(tmp),
+ "=r"(ret)
+   : "r"(spr*8) /* 8 bytes per 'mfspr ; b' block */
+   : "lr", "ctr");
+
+   return ret;
 }
 
-/* SPRs from PowerPC Operating Environment Architecture, Book III, Vers. 2.01 
*/
-static void set_sprs_book3s_201(uint64_t val)
+static void __mtspr(unsigned spr, uint64_t val)
 {
-   mtspr(18, val); /* DSISR */
-   mtspr(19, val); /* DAR */
-   mtspr(152, val);/* CTRL */
-   mtspr(256, val);/* VRSAVE */
-   mtspr(786, val);/* MMCRA */
-   mtspr(795, val);/* MMCR0 */
-   mtspr(798, val);/* MMCR1 */
+   uint64_t tmp;
+
+   asm volatile(
+"  bcl 20, 31, 1f  \n"
+"1:mflr%0  \n"
+"  addi%0, %0, (2f-1b) \n"
+"  add %0, %0, %2  \n"
+"  mtctr   %0  \n"
+"  bctr\n"
+"2:\n"
+".LSPR=0   \n"
+".rept 1024\n"
+"  mtspr   .LSPR, %1   \n"
+"  b   3f  \n"
+"  .LSPR=.LSPR+1   \n"
+".endr \n"
+"3:\n"
+   : "="(tmp)
+   : "r"(val),
+ "r"(spr*8) /* 8 bytes per 'mfspr ; b' block */
+   : "lr", "ctr", "xer");
 }
 
+static uint64_t before[1024], after[1024];
+
+#define SPR_PR_READ0x0001
+#define SPR_PR_WRITE   0x0002
+#define SPR_OS_READ0x0010
+#define SPR_OS_WRITE   0x0020
+#define SPR_HV_READ0x0100
+#define SPR_HV_WRITE   0x0200
+
+#define RW 0x333
+#define RO 0x111
+#define WO 0x222
+#define OS_RW  0x330
+#define OS_RO  0x110
+#define OS_WO  0x220
+#define HV_RW  0x300
+#define HV_RO  0x100
+#define HV_WO  0x200
+
+#define SPR_ASYNC  0x1000  /* May be updated asynchronously */
+#define SPR_INT0x2000  /* May be updated by synchronous 
interrupt */
+#define SPR_HARNESS0x4000  /* Test harness uses the register */
+
+struct spr {
+   const char  *name;
+   uint8_t width;
+   uint16_taccess;
+   uint16_ttype;
+};
+
+/* SPRs common denominator back to PowerPC Operating Environment Architecture 
*/
+static const struct spr sprs_common[1024] = {
+  [1] = { "XER",   64, RW, SPR_HARNESS, }, /* Used by 
compiler */
+  [8] = { "LR",64, RW, SPR_HARNESS, }, /* Compiler, 
mfspr/mtspr */
+  [9] = { "CTR",   64, RW, SPR_HARNESS, }, /* Compiler, 
mfspr/mtspr */
+ [18] = { "DSISR", 32, OS_RW,  SPR_INT, },
+ 

[kvm-unit-tests PATCH v8 10/35] powerpc: interrupt stack backtracing

2024-04-05 Thread Nicholas Piggin
Add support for backtracing across interrupt stacks, and add
interrupt frame backtrace for unhandled interrupts.

This requires a back-chain created from initial interrupt stack
frame to the r1 value of the interrupted context. A label is
added at the return location of the exception handler call, so
the unwinder can recognize the initial interrupt frame.

The additional cstart entry-frame is no longer required because
the unwinder now looks for frame == 0 as well as address == 0.

Signed-off-by: Nicholas Piggin 
---
 lib/powerpc/processor.c |  4 +++-
 lib/ppc64/asm/stack.h   |  3 +++
 lib/ppc64/stack.c   | 53 +
 powerpc/Makefile.ppc64  |  1 +
 powerpc/cstart64.S  | 15 +++-
 5 files changed, 63 insertions(+), 13 deletions(-)
 create mode 100644 lib/ppc64/stack.c

diff --git a/lib/powerpc/processor.c b/lib/powerpc/processor.c
index ad0d95666..114584024 100644
--- a/lib/powerpc/processor.c
+++ b/lib/powerpc/processor.c
@@ -51,7 +51,9 @@ void do_handle_exception(struct pt_regs *regs)
return;
}
 
-   printf("unhandled cpu exception %#lx at NIA:0x%016lx MSR:0x%016lx\n", 
regs->trap, regs->nip, regs->msr);
+   printf("Unhandled cpu exception %#lx at NIA:0x%016lx MSR:0x%016lx\n",
+   regs->trap, regs->nip, regs->msr);
+   dump_frame_stack((void *)regs->nip, (void *)regs->gpr[1]);
abort();
 }
 
diff --git a/lib/ppc64/asm/stack.h b/lib/ppc64/asm/stack.h
index 9734bbb8f..94fd1021c 100644
--- a/lib/ppc64/asm/stack.h
+++ b/lib/ppc64/asm/stack.h
@@ -5,4 +5,7 @@
 #error Do not directly include . Just use .
 #endif
 
+#define HAVE_ARCH_BACKTRACE
+#define HAVE_ARCH_BACKTRACE_FRAME
+
 #endif
diff --git a/lib/ppc64/stack.c b/lib/ppc64/stack.c
new file mode 100644
index 0..e6f259de7
--- /dev/null
+++ b/lib/ppc64/stack.c
@@ -0,0 +1,53 @@
+#include 
+#include 
+#include 
+
+extern char do_handle_exception_return[];
+
+int arch_backtrace_frame(const void *frame, const void **return_addrs,
+int max_depth, bool current_frame)
+{
+   static int walking;
+   int depth = 0;
+   const unsigned long *bp = (unsigned long *)frame;
+   void *return_addr;
+
+   asm volatile("" ::: "lr"); /* Force it to save LR */
+
+   if (walking) {
+   printf("RECURSIVE STACK WALK!!!\n");
+   return 0;
+   }
+   walking = 1;
+
+   if (current_frame)
+   bp = __builtin_frame_address(0);
+
+   bp = (unsigned long *)bp[0];
+   return_addr = (void *)bp[2];
+
+   for (depth = 0; bp && depth < max_depth; depth++) {
+   return_addrs[depth] = return_addr;
+   if (return_addrs[depth] == 0)
+   break;
+   if (return_addrs[depth] == do_handle_exception_return) {
+   struct pt_regs *regs;
+
+   regs = (void *)bp + STACK_FRAME_OVERHEAD;
+   bp = (unsigned long *)bp[0];
+   /* Represent interrupt frame with vector number */
+   return_addr = (void *)regs->trap;
+   if (depth + 1 < max_depth) {
+   depth++;
+   return_addrs[depth] = return_addr;
+   return_addr = (void *)regs->nip;
+   }
+   } else {
+   bp = (unsigned long *)bp[0];
+   return_addr = (void *)bp[2];
+   }
+   }
+
+   walking = 0;
+   return depth;
+}
diff --git a/powerpc/Makefile.ppc64 b/powerpc/Makefile.ppc64
index b0ed2b104..eb682c226 100644
--- a/powerpc/Makefile.ppc64
+++ b/powerpc/Makefile.ppc64
@@ -17,6 +17,7 @@ cstart.o = $(TEST_DIR)/cstart64.o
 reloc.o  = $(TEST_DIR)/reloc64.o
 
 OBJDIRS += lib/ppc64
+cflatobjs += lib/ppc64/stack.o
 
 # ppc64 specific tests
 tests = $(TEST_DIR)/spapr_vpa.elf
diff --git a/powerpc/cstart64.S b/powerpc/cstart64.S
index 80baabe8f..07d297f61 100644
--- a/powerpc/cstart64.S
+++ b/powerpc/cstart64.S
@@ -51,16 +51,6 @@ start:
std r0,0(r1)
std r0,16(r1)
 
-   /*
-* Create entry frame of 64-bytes, same as the initial frame. A callee
-* may use the caller frame to store LR, and backtrace() termination
-* looks for return address == NULL, so the initial stack frame can't
-* be used to call C or else it could overwrite the zeroed LR save slot
-* and break backtrace termination.  This frame would be unnecessary if
-* backtrace looked for a zeroed frame address.
-*/
-   stdur1,-64(r1)
-
/* save DTB pointer */
std r3, 56(r1)
 
@@ -195,6 +185,7 @@ call_handler:
.endr
mfsprg1 r0
std r0,GPR1(r1)
+   std r0,0(r1) /* Backchain from interrupt stack to regular stack */
 
/* lr, xer, ccr */
 
@@ -213,12 +204,12 @@ call_handler:
subir31, 

[kvm-unit-tests PATCH v8 09/35] powerpc: Fix stack backtrace termination

2024-04-05 Thread Nicholas Piggin
The backtrace handler terminates when it sees a NULL caller address,
but the powerpc stack setup does not keep such a NULL caller frame
at the start of the stack.

This happens to work on pseries because the memory at 0 is mapped and
it contains 0 at the location of the return address pointer if it
were a stack frame. But this is fragile, and does not work with powernv
where address 0 contains firmware instructions.

Use the existing dummy frame on stack as the NULL caller, and create a
new frame on stack for the entry code.

Signed-off-by: Nicholas Piggin 
---
 powerpc/cstart64.S | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/powerpc/cstart64.S b/powerpc/cstart64.S
index e18ae9a22..80baabe8f 100644
--- a/powerpc/cstart64.S
+++ b/powerpc/cstart64.S
@@ -46,6 +46,21 @@ start:
add r1, r1, r31
add r2, r2, r31
 
+   /* Zero backpointers in initial stack frame so backtrace() stops */
+   li  r0,0
+   std r0,0(r1)
+   std r0,16(r1)
+
+   /*
+* Create entry frame of 64-bytes, same as the initial frame. A callee
+* may use the caller frame to store LR, and backtrace() termination
+* looks for return address == NULL, so the initial stack frame can't
+* be used to call C or else it could overwrite the zeroed LR save slot
+* and break backtrace termination.  This frame would be unnecessary if
+* backtrace looked for a zeroed frame address.
+*/
+   stdur1,-64(r1)
+
/* save DTB pointer */
std r3, 56(r1)
 
-- 
2.43.0



[kvm-unit-tests PATCH v8 08/35] powerpc: Fix KVM caps on POWER9 hosts

2024-04-05 Thread Nicholas Piggin
KVM does not like to run on POWER9 hosts without cap-ccf-assist=off.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 powerpc/run | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/powerpc/run b/powerpc/run
index e469f1eb3..5cdb94194 100755
--- a/powerpc/run
+++ b/powerpc/run
@@ -24,6 +24,8 @@ M+=",accel=$ACCEL$ACCEL_PROPS"
 
 if [[ "$ACCEL" == "tcg" ]] ; then
M+=",cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off"
+elif [[ "$ACCEL" == "kvm" ]] ; then
+   M+=",cap-ccf-assist=off"
 fi
 
 command="$qemu -nodefaults $M -bios $FIRMWARE"
-- 
2.43.0



[kvm-unit-tests PATCH v8 07/35] common: add memory dirtying vs migration test

2024-04-05 Thread Nicholas Piggin
This test stores to a bunch of pages and verifies previous stores,
while being continually migrated. Default runtime is 5 seconds.

Add this test to ppc64 and s390x builds. This can fail due to a QEMU
TCG physical memory dirty bitmap bug, so it is not enabled in unittests
for TCG yet.

The selftest-migration test time is reduced significantly because
this test

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 common/memory-verify.c  | 68 +
 common/selftest-migration.c |  8 ++---
 powerpc/Makefile.common |  1 +
 powerpc/memory-verify.c |  1 +
 powerpc/unittests.cfg   |  7 
 s390x/Makefile  |  1 +
 s390x/memory-verify.c   |  1 +
 s390x/unittests.cfg |  6 
 8 files changed, 89 insertions(+), 4 deletions(-)
 create mode 100644 common/memory-verify.c
 create mode 12 powerpc/memory-verify.c
 create mode 12 s390x/memory-verify.c

diff --git a/common/memory-verify.c b/common/memory-verify.c
new file mode 100644
index 0..1cefe95dc
--- /dev/null
+++ b/common/memory-verify.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Simple memory verification test, used to exercise dirty memory migration.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define NR_PAGES 32
+#define SIZE (NR_PAGES * PAGE_SIZE)
+
+static unsigned time_sec = 5;
+
+static void do_getopts(int argc, char **argv)
+{
+   int i;
+
+   for (i = 0; i < argc; ++i) {
+   if (strcmp(argv[i], "-t") == 0) {
+   i++;
+   if (i == argc)
+   break;
+   time_sec = atol(argv[i]);
+   }
+   }
+
+   printf("running for %d secs\n", time_sec);
+}
+
+int main(int argc, char **argv)
+{
+   void *mem = memalign(PAGE_SIZE, SIZE);
+   bool success = true;
+   uint64_t ms;
+   long i;
+
+   do_getopts(argc, argv);
+
+   report_prefix_push("memory");
+
+   memset(mem, 0, SIZE);
+
+   migrate_begin_continuous();
+   ms = get_clock_ms();
+   i = 0;
+   do {
+   int j;
+
+   for (j = 0; j < SIZE; j += PAGE_SIZE) {
+   if (*(volatile long *)(mem + j) != i) {
+   success = false;
+   goto out;
+   }
+   *(volatile long *)(mem + j) = i + 1;
+   }
+   i++;
+   } while (get_clock_ms() - ms < time_sec * 1000);
+out:
+   migrate_end_continuous();
+
+   report(success, "memory verification stress test");
+
+   report_prefix_pop();
+
+   return report_summary();
+}
diff --git a/common/selftest-migration.c b/common/selftest-migration.c
index 9a9b61835..3693148aa 100644
--- a/common/selftest-migration.c
+++ b/common/selftest-migration.c
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define NR_MIGRATIONS 15
+#define NR_MIGRATIONS 5
 
 int main(int argc, char **argv)
 {
@@ -28,11 +28,11 @@ int main(int argc, char **argv)
report(true, "cooperative migration");
 
migrate_begin_continuous();
-   mdelay(2000);
-   migrate_end_continuous();
mdelay(1000);
+   migrate_end_continuous();
+   mdelay(500);
migrate_begin_continuous();
-   mdelay(2000);
+   mdelay(1000);
migrate_end_continuous();
report(true, "continuous migration");
}
diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index da4a7bbb8..1e181da69 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -7,6 +7,7 @@
 tests-common = \
$(TEST_DIR)/selftest.elf \
$(TEST_DIR)/selftest-migration.elf \
+   $(TEST_DIR)/memory-verify.elf \
$(TEST_DIR)/spapr_hcall.elf \
$(TEST_DIR)/rtas.elf \
$(TEST_DIR)/emulator.elf \
diff --git a/powerpc/memory-verify.c b/powerpc/memory-verify.c
new file mode 12
index 0..5985c730f
--- /dev/null
+++ b/powerpc/memory-verify.c
@@ -0,0 +1 @@
+../common/memory-verify.c
\ No newline at end of file
diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index cae4949e8..e65217c18 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -49,6 +49,13 @@ machine = pseries
 groups = selftest migration
 extra_params = -append "skip"
 
+# This fails due to a QEMU TCG bug so KVM-only until QEMU is fixed upstream
+[migration-memory]
+file = memory-verify.elf
+accel = kvm
+machine = pseries
+groups = migration
+
 [spapr_hcall]
 file = spapr_hcall.elf
 
diff --git a/s390x/Makefile b/s390x/Makefile
index 344d46d68..ddc0969f3 100644
--- a/s390x/Makefile
+++ b/s390x/Makefile
@@ -1,5 +1,6 @@
 tests = $(TEST_DIR)/selftest.elf
 tests += $(TEST_DIR)/selftest-migration.elf
+tests += $(TEST_DIR)/memory-verify.elf
 tests += $(TEST_DIR)/intercept.elf
 tests += $(TEST_DIR)/emulator.elf
 

[kvm-unit-tests PATCH v8 06/35] gitlab-ci: Run migration selftest on s390x and powerpc

2024-04-05 Thread Nicholas Piggin
The migration harness is complicated and easy to break so CI will
be helpful.

Signed-off-by: Nicholas Piggin 
---
 .gitlab-ci.yml  | 32 +++-
 s390x/unittests.cfg |  8 
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index ff34b1f50..60b3cdfd2 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -92,27 +92,39 @@ build-arm:
 build-ppc64be:
  extends: .outoftree_template
  script:
- - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu
+ - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu nmap-ncat
  - mkdir build
  - cd build
  - ../configure --arch=ppc64 --endian=big --cross-prefix=powerpc64-linux-gnu-
  - make -j2
  - ACCEL=tcg ./run_tests.sh
- selftest-setup spapr_hcall rtas-get-time-of-day rtas-get-time-of-day-base
- rtas-set-time-of-day emulator
- | tee results.txt
+  selftest-setup
+  selftest-migration
+  selftest-migration-skip
+  spapr_hcall
+  rtas-get-time-of-day
+  rtas-get-time-of-day-base
+  rtas-set-time-of-day
+  emulator
+  | tee results.txt
  - if grep -q FAIL results.txt ; then exit 1 ; fi
 
 build-ppc64le:
  extends: .intree_template
  script:
- - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu
+ - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu nmap-ncat
  - ./configure --arch=ppc64 --endian=little --cross-prefix=powerpc64-linux-gnu-
  - make -j2
  - ACCEL=tcg ./run_tests.sh
- selftest-setup spapr_hcall rtas-get-time-of-day rtas-get-time-of-day-base
- rtas-set-time-of-day emulator
- | tee results.txt
+  selftest-setup
+  selftest-migration
+  selftest-migration-skip
+  spapr_hcall
+  rtas-get-time-of-day
+  rtas-get-time-of-day-base
+  rtas-set-time-of-day
+  emulator
+  | tee results.txt
  - if grep -q FAIL results.txt ; then exit 1 ; fi
 
 # build-riscv32:
@@ -135,7 +147,7 @@ build-riscv64:
 build-s390x:
  extends: .outoftree_template
  script:
- - dnf install -y qemu-system-s390x gcc-s390x-linux-gnu
+ - dnf install -y qemu-system-s390x gcc-s390x-linux-gnu nmap-ncat
  - mkdir build
  - cd build
  - ../configure --arch=s390x --cross-prefix=s390x-linux-gnu-
@@ -161,6 +173,8 @@ build-s390x:
   sclp-1g
   sclp-3g
   selftest-setup
+  selftest-migration-kvm
+  selftest-migration-skip
   sieve
   smp
   stsi
diff --git a/s390x/unittests.cfg b/s390x/unittests.cfg
index 49e3e4608..faa0ce0eb 100644
--- a/s390x/unittests.cfg
+++ b/s390x/unittests.cfg
@@ -31,6 +31,14 @@ groups = selftest migration
 # https://lore.kernel.org/qemu-devel/20240219061731.232570-1-npig...@gmail.com/
 accel = kvm
 
+[selftest-migration-kvm]
+file = selftest-migration.elf
+groups = nodefault
+accel = kvm
+# This is a special test for gitlab-ci that must not use TCG until the
+# TCG migration fix has made its way into CI environment's QEMU.
+# https://lore.kernel.org/qemu-devel/20240219061731.232570-1-npig...@gmail.com/
+
 [selftest-migration-skip]
 file = selftest-migration.elf
 groups = selftest migration
-- 
2.43.0



[kvm-unit-tests PATCH v8 05/35] arch-run: Add a "continuous" migration option for tests

2024-04-05 Thread Nicholas Piggin
The cooperative migration protocol is very good to control precise
pre and post conditions for a migration event. However in some cases
its intrusiveness to the test program, can mask problems and make
analysis more difficult.

For example to stress test migration vs concurrent complicated
memory access, including TLB refill, ram dirtying, etc., then the
tight spin at getchar() and resumption of the workload after
migration is unhelpful.

This adds a continuous migration mode that directs the harness to
perform migrations continually. This is added to the migration
selftests, which also sees cooperative migration iterations reduced
to avoid increasing test time too much.

Signed-off-by: Nicholas Piggin 
---
 common/selftest-migration.c | 16 +--
 lib/migrate.c   | 18 
 lib/migrate.h   |  3 ++
 scripts/arch-run.bash   | 55 -
 4 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/common/selftest-migration.c b/common/selftest-migration.c
index 0afd8581c..9a9b61835 100644
--- a/common/selftest-migration.c
+++ b/common/selftest-migration.c
@@ -9,12 +9,13 @@
  */
 #include 
 #include 
+#include 
 
-#define NR_MIGRATIONS 30
+#define NR_MIGRATIONS 15
 
 int main(int argc, char **argv)
 {
-   report_prefix_push("migration");
+   report_prefix_push("migration harness");
 
if (argc > 1 && !strcmp(argv[1], "skip")) {
migrate_skip();
@@ -24,7 +25,16 @@ int main(int argc, char **argv)
 
for (i = 0; i < NR_MIGRATIONS; i++)
migrate_quiet();
-   report(true, "simple harness stress");
+   report(true, "cooperative migration");
+
+   migrate_begin_continuous();
+   mdelay(2000);
+   migrate_end_continuous();
+   mdelay(1000);
+   migrate_begin_continuous();
+   mdelay(2000);
+   migrate_end_continuous();
+   report(true, "continuous migration");
}
 
report_prefix_pop();
diff --git a/lib/migrate.c b/lib/migrate.c
index 1d22196b7..770f76d5c 100644
--- a/lib/migrate.c
+++ b/lib/migrate.c
@@ -60,3 +60,21 @@ void migrate_skip(void)
puts("Skipped VM migration (quiet)\n");
(void)getchar();
 }
+
+void migrate_begin_continuous(void)
+{
+   puts("Begin continuous migration\n");
+   (void)getchar();
+}
+
+void migrate_end_continuous(void)
+{
+   /*
+* Migration can split this output between source and dest QEMU
+* output files, print twice and match once to always cope with
+* a split.
+*/
+   puts("End continuous migration\n");
+   puts("End continuous migration (quiet)\n");
+   (void)getchar();
+}
diff --git a/lib/migrate.h b/lib/migrate.h
index db6e0c501..35b6703a2 100644
--- a/lib/migrate.h
+++ b/lib/migrate.h
@@ -11,3 +11,6 @@ void migrate_quiet(void);
 void migrate_once(void);
 
 void migrate_skip(void);
+
+void migrate_begin_continuous(void);
+void migrate_end_continuous(void);
diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
index 4a1aab48d..1901a929f 100644
--- a/scripts/arch-run.bash
+++ b/scripts/arch-run.bash
@@ -125,15 +125,17 @@ qmp_events ()
 filter_quiet_msgs ()
 {
grep -v "Now migrate the VM (quiet)" |
+   grep -v "Begin continuous migration (quiet)" |
+   grep -v "End continuous migration (quiet)" |
grep -v "Skipped VM migration (quiet)"
 }
 
 seen_migrate_msg ()
 {
if [ $skip_migration -eq 1 ]; then
-   grep -q -e "Now migrate the VM" < $1
+   grep -q -e "Now migrate the VM" -e "Begin continuous migration" 
< $1
else
-   grep -q -e "Now migrate the VM" -e "Skipped VM migration" < $1
+   grep -q -e "Now migrate the VM" -e "Begin continuous migration" 
-e "Skipped VM migration" < $1
fi
 }
 
@@ -161,6 +163,7 @@ run_migration ()
src_qmpout=/dev/null
dst_qmpout=/dev/null
skip_migration=0
+   continuous_migration=0
 
mkfifo ${src_outfifo}
mkfifo ${dst_outfifo}
@@ -186,9 +189,12 @@ run_migration ()
do_migration || return $?
 
while ps -p ${live_pid} > /dev/null ; do
-   # Wait for test exit or further migration messages.
-   if ! seen_migrate_msg ${src_out} ;  then
+   if [ ${continuous_migration} -eq 1 ] ; then
+   do_migration || return $?
+   elif ! seen_migrate_msg ${src_out} ;  then
sleep 0.1
+   elif grep -q "Begin continuous migration" < ${src_out} ; then
+   do_migration || return $?
elif grep -q "Now migrate the VM" < ${src_out} ; then
do_migration || return $?
elif [ $skip_migration -eq 0 ] && grep -q "Skipped VM 
migration" < ${src_out} ; then
@@ -218,7 +224,7 @@ do_migration ()
 
# The test must 

[kvm-unit-tests PATCH v8 04/35] (arm|s390): Use migrate_skip in test cases

2024-04-05 Thread Nicholas Piggin
Have tests use the new migrate_skip command in skip paths, rather than
calling migrate_once to prevent harness reporting an error.

s390x/migration.c adds a new command that looks like it was missing
previously.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 arm/gic.c  | 21 -
 s390x/migration-cmm.c  |  8 
 s390x/migration-skey.c |  4 +++-
 s390x/migration.c  |  1 +
 4 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/arm/gic.c b/arm/gic.c
index c950b0d15..bbf828f17 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -782,13 +782,15 @@ static void test_its_migration(void)
struct its_device *dev2, *dev7;
cpumask_t mask;
 
-   if (its_setup1())
+   if (its_setup1()) {
+   migrate_skip();
return;
+   }
 
dev2 = its_get_device(2);
dev7 = its_get_device(7);
 
-   migrate_once();
+   migrate();
 
stats_reset();
cpumask_clear();
@@ -819,8 +821,10 @@ static void test_migrate_unmapped_collection(void)
int pe0 = 0;
u8 config;
 
-   if (its_setup1())
+   if (its_setup1()) {
+   migrate_skip();
return;
+   }
 
if (!errata(ERRATA_UNMAPPED_COLLECTIONS)) {
report_skip("Skipping test, as this test hangs without the fix. 
"
@@ -836,7 +840,7 @@ static void test_migrate_unmapped_collection(void)
its_send_mapti(dev2, 8192, 0, col);
gicv3_lpi_set_config(8192, LPI_PROP_DEFAULT);
 
-   migrate_once();
+   migrate();
 
/* on the destination, map the collection */
its_send_mapc(col, true);
@@ -875,8 +879,10 @@ static void test_its_pending_migration(void)
void *ptr;
int i;
 
-   if (its_prerequisites(4))
+   if (its_prerequisites(4)) {
+   migrate_skip();
return;
+   }
 
dev = its_create_device(2 /* dev id */, 8 /* nb_ites */);
its_send_mapd(dev, true);
@@ -923,7 +929,7 @@ static void test_its_pending_migration(void)
gicv3_lpi_rdist_enable(pe0);
gicv3_lpi_rdist_enable(pe1);
 
-   migrate_once();
+   migrate();
 
/* let's wait for the 256 LPIs to be handled */
mdelay(1000);
@@ -970,17 +976,14 @@ int main(int argc, char **argv)
} else if (!strcmp(argv[1], "its-migration")) {
report_prefix_push(argv[1]);
test_its_migration();
-   migrate_once();
report_prefix_pop();
} else if (!strcmp(argv[1], "its-pending-migration")) {
report_prefix_push(argv[1]);
test_its_pending_migration();
-   migrate_once();
report_prefix_pop();
} else if (!strcmp(argv[1], "its-migrate-unmapped-collection")) {
report_prefix_push(argv[1]);
test_migrate_unmapped_collection();
-   migrate_once();
report_prefix_pop();
} else if (strcmp(argv[1], "its-introspection") == 0) {
report_prefix_push(argv[1]);
diff --git a/s390x/migration-cmm.c b/s390x/migration-cmm.c
index 43673f18e..b4043a80e 100644
--- a/s390x/migration-cmm.c
+++ b/s390x/migration-cmm.c
@@ -55,12 +55,12 @@ int main(void)
 {
report_prefix_push("migration-cmm");
 
-   if (!check_essa_available())
+   if (!check_essa_available()) {
report_skip("ESSA is not available");
-   else
+   migrate_skip();
+   } else {
test_migration();
-
-   migrate_once();
+   }
 
report_prefix_pop();
return report_summary();
diff --git a/s390x/migration-skey.c b/s390x/migration-skey.c
index 8d6d8ecfe..1a196ae1e 100644
--- a/s390x/migration-skey.c
+++ b/s390x/migration-skey.c
@@ -169,6 +169,7 @@ static void test_skey_migration_parallel(void)
 
if (smp_query_num_cpus() == 1) {
report_skip("need at least 2 cpus for this test");
+   migrate_skip();
goto error;
}
 
@@ -233,6 +234,7 @@ int main(int argc, char **argv)
 
if (test_facility(169)) {
report_skip("storage key removal facility is active");
+   migrate_skip();
goto error;
}
 
@@ -247,11 +249,11 @@ int main(int argc, char **argv)
break;
default:
print_usage();
+   migrate_skip();
break;
}
 
 error:
-   migrate_once();
report_prefix_pop();
return report_summary();
 }
diff --git a/s390x/migration.c b/s390x/migration.c
index 269e272de..115afb731 100644
--- a/s390x/migration.c
+++ b/s390x/migration.c
@@ -164,6 +164,7 @@ int main(void)
 
if (smp_query_num_cpus() == 1) {
report_skip("need at least 2 cpus for this test");
+   migrate_skip();
goto done;
}
 
-- 
2.43.0



[kvm-unit-tests PATCH v8 03/35] migration: Add a migrate_skip command

2024-04-05 Thread Nicholas Piggin
Tests that are run with MIGRATION=yes but skip due to some requirement
not being met will show as a failure due to the harness requirement to
see one successful migration. The workaround for this is to migrate in
test's skip path. Add a new command that just tells the harness to not
expect a migration.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 common/selftest-migration.c | 14 -
 lib/migrate.c   | 19 -
 lib/migrate.h   |  2 ++
 powerpc/unittests.cfg   |  6 ++
 s390x/unittests.cfg |  5 +
 scripts/arch-run.bash   | 41 +
 6 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/common/selftest-migration.c b/common/selftest-migration.c
index 54b5d6b2d..0afd8581c 100644
--- a/common/selftest-migration.c
+++ b/common/selftest-migration.c
@@ -14,14 +14,18 @@
 
 int main(int argc, char **argv)
 {
-   int i = 0;
-
report_prefix_push("migration");
 
-   for (i = 0; i < NR_MIGRATIONS; i++)
-   migrate_quiet();
+   if (argc > 1 && !strcmp(argv[1], "skip")) {
+   migrate_skip();
+   report(true, "migration skipping");
+   } else {
+   int i;
 
-   report(true, "simple harness stress test");
+   for (i = 0; i < NR_MIGRATIONS; i++)
+   migrate_quiet();
+   report(true, "simple harness stress");
+   }
 
report_prefix_pop();
 
diff --git a/lib/migrate.c b/lib/migrate.c
index 92d1d957d..1d22196b7 100644
--- a/lib/migrate.c
+++ b/lib/migrate.c
@@ -39,7 +39,24 @@ void migrate_once(void)
 
if (migrated)
return;
-
migrated = true;
+
migrate();
 }
+
+/*
+ * When the test has been started in migration mode, but the test case is
+ * skipped and no migration point is reached, this can be used to tell the
+ * harness not to mark it as a failure to migrate.
+ */
+void migrate_skip(void)
+{
+   static bool did_migrate_skip;
+
+   if (did_migrate_skip)
+   return;
+   did_migrate_skip = true;
+
+   puts("Skipped VM migration (quiet)\n");
+   (void)getchar();
+}
diff --git a/lib/migrate.h b/lib/migrate.h
index 95b9102b0..db6e0c501 100644
--- a/lib/migrate.h
+++ b/lib/migrate.h
@@ -9,3 +9,5 @@
 void migrate(void);
 void migrate_quiet(void);
 void migrate_once(void);
+
+void migrate_skip(void);
diff --git a/powerpc/unittests.cfg b/powerpc/unittests.cfg
index 1559bee98..cae4949e8 100644
--- a/powerpc/unittests.cfg
+++ b/powerpc/unittests.cfg
@@ -43,6 +43,12 @@ groups = selftest migration
 # https://lore.kernel.org/qemu-devel/20240219061731.232570-1-npig...@gmail.com/
 accel = kvm
 
+[selftest-migration-skip]
+file = selftest-migration.elf
+machine = pseries
+groups = selftest migration
+extra_params = -append "skip"
+
 [spapr_hcall]
 file = spapr_hcall.elf
 
diff --git a/s390x/unittests.cfg b/s390x/unittests.cfg
index dac9e4db1..49e3e4608 100644
--- a/s390x/unittests.cfg
+++ b/s390x/unittests.cfg
@@ -31,6 +31,11 @@ groups = selftest migration
 # https://lore.kernel.org/qemu-devel/20240219061731.232570-1-npig...@gmail.com/
 accel = kvm
 
+[selftest-migration-skip]
+file = selftest-migration.elf
+groups = selftest migration
+extra_params = -append "skip"
+
 [intercept]
 file = intercept.elf
 
diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
index 39419d4e2..4a1aab48d 100644
--- a/scripts/arch-run.bash
+++ b/scripts/arch-run.bash
@@ -124,12 +124,17 @@ qmp_events ()
 
 filter_quiet_msgs ()
 {
-   grep -v "Now migrate the VM (quiet)"
+   grep -v "Now migrate the VM (quiet)" |
+   grep -v "Skipped VM migration (quiet)"
 }
 
 seen_migrate_msg ()
 {
-   grep -q -e "Now migrate the VM" < $1
+   if [ $skip_migration -eq 1 ]; then
+   grep -q -e "Now migrate the VM" < $1
+   else
+   grep -q -e "Now migrate the VM" -e "Skipped VM migration" < $1
+   fi
 }
 
 run_migration ()
@@ -142,7 +147,7 @@ run_migration ()
migcmdline=$@
 
trap 'trap - TERM ; kill 0 ; exit 2' INT TERM
-   trap 'rm -f ${src_out} ${dst_out} ${src_outfifo} ${dst_outfifo} 
${dst_incoming} ${src_qmp} ${dst_qmp} ${dst_infifo}' RETURN EXIT
+   trap 'rm -f ${src_out} ${dst_out} ${src_outfifo} ${dst_outfifo} 
${dst_incoming} ${src_qmp} ${dst_qmp} ${src_infifo} ${dst_infifo}' RETURN EXIT
 
dst_incoming=$(mktemp -u -t mig-helper-socket-incoming.XX)
src_out=$(mktemp -t mig-helper-stdout1.XX)
@@ -151,21 +156,26 @@ run_migration ()
dst_outfifo=$(mktemp -u -t mig-helper-fifo-stdout2.XX)
src_qmp=$(mktemp -u -t mig-helper-qmp1.XX)
dst_qmp=$(mktemp -u -t mig-helper-qmp2.XX)
-   dst_infifo=$(mktemp -u -t mig-helper-fifo-stdin.XX)
+   src_infifo=$(mktemp -u -t mig-helper-fifo-stdin1.XX)
+   dst_infifo=$(mktemp -u -t mig-helper-fifo-stdin2.XX)

[kvm-unit-tests PATCH v8 02/35] arch-run: Keep infifo open

2024-04-05 Thread Nicholas Piggin
The infifo fifo that is used to send characters to QEMU console is
only able to receive one character before the cat process exits.
Supporting interactions between test and harness involving multiple
characters requires the fifo to remain open.

The infifo is removed by the exit handler like other files and fifos
so it does not have to be removed explicitly.

With this we can let the cat out of the subshell, simplifying the
input pipeline.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 scripts/arch-run.bash | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
index e34d784c0..39419d4e2 100644
--- a/scripts/arch-run.bash
+++ b/scripts/arch-run.bash
@@ -158,6 +158,11 @@ run_migration ()
mkfifo ${src_outfifo}
mkfifo ${dst_outfifo}
 
+   # Holding both ends of the input fifo open prevents opens from
+   # blocking and readers getting EOF when a writer closes it.
+   mkfifo ${dst_infifo}
+   exec {dst_infifo_fd}<>${dst_infifo}
+
eval "$migcmdline" \
-chardev socket,id=mon,path=${src_qmp},server=on,wait=off \
-mon chardev=mon,mode=control > ${src_outfifo} &
@@ -191,14 +196,10 @@ run_migration ()
 
 do_migration ()
 {
-   # We have to use cat to open the named FIFO, because named FIFO's,
-   # unlike pipes, will block on open() until the other end is also
-   # opened, and that totally breaks QEMU...
-   mkfifo ${dst_infifo}
eval "$migcmdline" \
-chardev socket,id=mon,path=${dst_qmp},server=on,wait=off \
-mon chardev=mon,mode=control -incoming unix:${dst_incoming} \
-   < <(cat ${dst_infifo}) > ${dst_outfifo} &
+   < ${dst_infifo} > ${dst_outfifo} &
incoming_pid=$!
cat ${dst_outfifo} | tee ${dst_out} | filter_quiet_msgs &
 
@@ -245,7 +246,6 @@ do_migration ()
 
# keypress to dst so getchar completes and test continues
echo > ${dst_infifo}
-   rm ${dst_infifo}
 
# Wait for the incoming socket being removed, ready for next destination
while [ -S ${dst_incoming} ] ; do sleep 0.1 ; done
-- 
2.43.0



[kvm-unit-tests PATCH v8 01/35] arch-run: Add functions to help handle migration directives from test

2024-04-05 Thread Nicholas Piggin
The migration harness will be expanded to deal with more commands
from the test, moving these checks into functions helps keep things
managable.

Reviewed-by: Thomas Huth 
Signed-off-by: Nicholas Piggin 
---
 scripts/arch-run.bash | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
index 413f3eda8..e34d784c0 100644
--- a/scripts/arch-run.bash
+++ b/scripts/arch-run.bash
@@ -122,6 +122,16 @@ qmp_events ()
jq -c 'select(has("event"))'
 }
 
+filter_quiet_msgs ()
+{
+   grep -v "Now migrate the VM (quiet)"
+}
+
+seen_migrate_msg ()
+{
+   grep -q -e "Now migrate the VM" < $1
+}
+
 run_migration ()
 {
if ! command -v ncat >/dev/null 2>&1; then
@@ -152,7 +162,7 @@ run_migration ()
-chardev socket,id=mon,path=${src_qmp},server=on,wait=off \
-mon chardev=mon,mode=control > ${src_outfifo} &
live_pid=$!
-   cat ${src_outfifo} | tee ${src_out} | grep -v "Now migrate the VM 
(quiet)" &
+   cat ${src_outfifo} | tee ${src_out} | filter_quiet_msgs &
 
# Start the first destination QEMU machine in advance of the test
# reaching the migration point, since we expect at least one migration.
@@ -162,7 +172,7 @@ run_migration ()
 
while ps -p ${live_pid} > /dev/null ; do
# Wait for test exit or further migration messages.
-   if ! grep -q -i "Now migrate the VM" < ${src_out} ; then
+   if ! seen_migrate_msg ${src_out} ;  then
sleep 0.1
else
do_migration || return $?
@@ -190,11 +200,11 @@ do_migration ()
-mon chardev=mon,mode=control -incoming unix:${dst_incoming} \
< <(cat ${dst_infifo}) > ${dst_outfifo} &
incoming_pid=$!
-   cat ${dst_outfifo} | tee ${dst_out} | grep -v "Now migrate the VM 
(quiet)" &
+   cat ${dst_outfifo} | tee ${dst_out} | filter_quiet_msgs &
 
# The test must prompt the user to migrate, so wait for the
-   # "Now migrate VM" console message.
-   while ! grep -q -i "Now migrate the VM" < ${src_out} ; do
+   # "Now migrate VM" or similar console message.
+   while ! seen_migrate_msg ${src_out} ; do
if ! ps -p ${live_pid} > /dev/null ; then
echo "ERROR: Test exit before migration point." >&2
echo > ${dst_infifo}
-- 
2.43.0



[kvm-unit-tests PATCH v8 00/35] migration, powerpc improvements

2024-04-05 Thread Nicholas Piggin
Tree here
https://gitlab.com/npiggin/kvm-unit-tests/-/tree/powerpc?ref_type=heads

(That tree has some shellcheck patches at the end, not in this series)

Since v7, fixed a couple of Thomas' review comments. Also added
a test for PMC5 counting vs interrupts which is broken on upstream
TCG. And a small fix for SMP+MMU (secondary stack was being allocated
in discontiguous virtual memory if they were started when MMU is
enabled on the primary) discovered while I was making a test case
for TCG TLB races (not yet included in the series).
(https://lists.gnu.org/archive/html/qemu-ppc/2024-03/msg00567.html)

Thanks,
Nick

Nicholas Piggin (35):
  arch-run: Add functions to help handle migration directives from test
  arch-run: Keep infifo open
  migration: Add a migrate_skip command
  (arm|s390): Use migrate_skip in test cases
  arch-run: Add a "continuous" migration option for tests
  gitlab-ci: Run migration selftest on s390x and powerpc
  common: add memory dirtying vs migration test
  powerpc: Fix KVM caps on POWER9 hosts
  powerpc: Fix stack backtrace termination
  powerpc: interrupt stack backtracing
  powerpc/sprs: Specify SPRs with data rather than code
  powerpc/sprs: Avoid taking PMU interrupts caused by register fuzzing
  doc: start documentation directory with unittests.cfg doc
  scripts: allow machine option to be specified in unittests.cfg
  scripts: Accommodate powerpc powernv machine differences
  powerpc: Support powernv machine with QEMU TCG
  powerpc: Fix emulator illegal instruction test for powernv
  powerpc/sprs: Test hypervisor registers on powernv machine
  powerpc: general interrupt tests
  powerpc: Add rtas stop-self support
  powerpc: Remove broken SMP exception stack setup
  powerpc: add SMP and IPI support
  powerpc: Permit ACCEL=tcg,thread=single
  powerpc: Avoid using larx/stcx. in spinlocks when only one CPU is
running
  powerpc: Add atomics tests
  powerpc: Add timebase tests
  powerpc: Add MMU support
  common/sieve: Use vmalloc.h for setup_mmu definition
  common/sieve: Support machines without MMU
  powerpc: Add sieve.c common test
  powerpc: add usermode support
  powerpc: add pmu tests
  configure: Make arch_libdir a first-class entity
  powerpc: Remove remnants of ppc64 directory and build structure
  powerpc: gitlab CI update

 .gitlab-ci.yml   |  26 +-
 MAINTAINERS  |   1 -
 Makefile |   2 +-
 arm/gic.c|  21 +-
 arm/unittests.cfg|  26 +-
 common/memory-verify.c   |  68 +++
 common/selftest-migration.c  |  26 +-
 common/sieve.c   |  15 +-
 configure|  58 +-
 docs/unittests.txt   |  95 
 lib/libcflat.h   |   2 -
 lib/migrate.c|  37 +-
 lib/migrate.h|   5 +
 lib/{ppc64 => powerpc}/asm-offsets.c |   7 +
 lib/{ppc64 => powerpc}/asm/asm-offsets.h |   0
 lib/powerpc/asm/atomic.h |   6 +
 lib/powerpc/asm/barrier.h|  12 +
 lib/{ppc64 => powerpc}/asm/bitops.h  |   4 +-
 lib/powerpc/asm/hcall.h  |   6 +
 lib/{ppc64 => powerpc}/asm/io.h  |   4 +-
 lib/powerpc/asm/mmu.h|  10 +
 lib/powerpc/asm/opal.h   |  22 +
 lib/powerpc/asm/page.h   |  65 +++
 lib/powerpc/asm/pgtable-hwdef.h  |  66 +++
 lib/powerpc/asm/pgtable.h| 125 +
 lib/powerpc/asm/processor.h  |  63 +++
 lib/{ppc64 => powerpc}/asm/ptrace.h  |  22 +-
 lib/powerpc/asm/reg.h|  42 ++
 lib/powerpc/asm/rtas.h   |   2 +
 lib/powerpc/asm/setup.h  |   3 +-
 lib/powerpc/asm/smp.h|  50 +-
 lib/powerpc/asm/spinlock.h   |  11 +
 lib/powerpc/asm/stack.h  |   3 +
 lib/{ppc64 => powerpc}/asm/vpa.h |   0
 lib/powerpc/hcall.c  |   4 +-
 lib/powerpc/io.c |  41 +-
 lib/powerpc/io.h |   6 +
 lib/powerpc/mmu.c| 283 ++
 lib/powerpc/opal-calls.S |  50 ++
 lib/powerpc/opal.c   |  76 +++
 lib/powerpc/processor.c  |  91 +++-
 lib/powerpc/rtas.c   |  81 ++-
 lib/powerpc/setup.c  | 160 +-
 lib/powerpc/smp.c| 287 --
 lib/powerpc/spinlock.c   |  33 ++
 lib/powerpc/stack.c  |  53 ++
 lib/ppc64/.gitignore |   1 -
 lib/ppc64/asm/barrier.h  |   9 -
 lib/ppc64/asm/handlers.h |   1 -
 lib/ppc64/asm/hcall.h|   1 -
 lib/ppc64/asm/memory_areas.h |   6 -
 lib/ppc64/asm/page.h |   1 -
 lib/ppc64/asm/ppc_asm.h 

[PATCH] MAINTAINERS: Drop Li Yang as their email address stopped working

2024-04-05 Thread Uwe Kleine-König
When sending a patch to (among others) Li Yang the nxp MTA replied that
the address doesn't exist and so the mail couldn't be delivered. The
error code was 550, so at least technically that's not a temporal issue.

Signed-off-by: Uwe Kleine-König 
---
Hello,

I added the affected maintainers and lists to Cc:, maybe someone there
knows if this issue is only temporal?

@Greg: Given that I noticed the non-existing address when sending an usb
patch, I suggest you care for application of this patch (iff it should
be applied now). If Li Yang disappeared indeed, I'd prefer to drop the
contact from MAINTAINERS early to not give wrong expectations to
contributors.

Best regards
Uwe

 MAINTAINERS | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c121493f43d..be19aad15045 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2191,7 +2191,6 @@ N:mxs
 
 ARM/FREESCALE LAYERSCAPE ARM ARCHITECTURE
 M: Shawn Guo 
-M: Li Yang 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git
@@ -8523,7 +8522,6 @@ S:Maintained
 F: drivers/video/fbdev/fsl-diu-fb.*
 
 FREESCALE DMA DRIVER
-M: Li Yang 
 M: Zhang Wei 
 L: linuxppc-dev@lists.ozlabs.org
 S: Maintained
@@ -8688,10 +8686,9 @@ F:   drivers/soc/fsl/qe/tsa.h
 F: include/dt-bindings/soc/cpm1-fsl,tsa.h
 
 FREESCALE QUICC ENGINE UCC ETHERNET DRIVER
-M: Li Yang 
 L: net...@vger.kernel.org
 L: linuxppc-dev@lists.ozlabs.org
-S: Maintained
+S: Orphan
 F: drivers/net/ethernet/freescale/ucc_geth*
 
 FREESCALE QUICC ENGINE UCC HDLC DRIVER
@@ -8708,10 +8705,9 @@ S:   Maintained
 F: drivers/tty/serial/ucc_uart.c
 
 FREESCALE SOC DRIVERS
-M: Li Yang 
 L: linuxppc-dev@lists.ozlabs.org
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-S: Maintained
+S: Orphan
 F: Documentation/devicetree/bindings/misc/fsl,dpaa2-console.yaml
 F: Documentation/devicetree/bindings/soc/fsl/
 F: drivers/soc/fsl/
@@ -8745,10 +8741,9 @@ F:   
Documentation/devicetree/bindings/sound/fsl,qmc-audio.yaml
 F: sound/soc/fsl/fsl_qmc_audio.c
 
 FREESCALE USB PERIPHERAL DRIVERS
-M: Li Yang 
 L: linux-...@vger.kernel.org
 L: linuxppc-dev@lists.ozlabs.org
-S: Maintained
+S: Orphan
 F: drivers/usb/gadget/udc/fsl*
 
 FREESCALE USB PHY DRIVER

base-commit: c85af715cac0a951eea97393378e84bb49384734
-- 
2.43.0