Re: [RFC PATCH v2] powerpc/64s: Move idle code to powernv C code

2018-07-25 Thread Nicholas Piggin
Hi Gautham,

Thanks for the review, I also missed one or two things from you last
one, but I haven't forgotten them.

 On Wed, 25 Jul 2018 16:56:45 +0530
Gautham R Shenoy  wrote:

> Hello Nicholas,
> 
> On Sat, Jul 21, 2018 at 02:29:24PM +1000, Nicholas Piggin wrote:
> > Reimplement Book3S idle code to C, in the powernv platform code.
> > Assembly stubs are used to save and restore the stack frame and
> > non-volatile GPRs before going to idle, but these are small and
> > mostly agnostic to microarchitecture implementation details.
> > 
> > The optimisation where EC=ESL=0 idle modes did not have to save
> > GPRs or mtmsrd L=0 is restored, because it's simple to do.
> > 
> > Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
> > but saves and restores them all explicitly. This can easily be
> > extended to tracking the set of system-wide SPRs that do not have
> > to be saved each time.
> > 
> > Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
> > way this stuff will cope with non-trivial new CPU implementation
> > details, firmware changes, etc., without becoming unmaintainable.
> > 
> > Since RFC v1:
> > - Now tested and working with POWER9 hash and radix.
> > - KVM support added. This took a bit of work to untangle and might
> >   still have some issues, but POWER9 seems to work including hash on
> >   radix with dependent threads mode.
> > - This snowballed a bit because of KVM and other details making it
> >   not feasible to leave POWER7/8 code alone. That's only half done
> >   at the moment.
> > - So far this trades about 800 lines of asm for 500 of C. With POWER7/8
> >   support done it might be another hundred or so lines of C.
> > 
> > Would appreciate any feedback on the approach in particular the
> > significantly different (and hopefully cleaner) KVM approach.  
> 
> 
> I have reviewed the C-code movement from idle_book3s.S to
> powernv/idle.c. Some comments on that part of the code inline.

Thanks. Yes the POWER7/8 code is a bit incomplete as you noticed, but
you had some good suggestions there too.

> I haven't yet gone through the book3s_hv_rmhandlers.S code changes.

Yeah that's very tricky code and I don't know if I got it right yet.
Will have to run it past the KVM people too.

> 
> 
> > 
> > Thanks,
> > Nick
> >   
> 
> [..snip..]
> 
> > 
> >  #ifdef CONFIG_PPC_POWERNV
> > -   /* Per-core mask tracking idle threads and a lock bit-[L][] */
> > -   u32 *core_idle_state_ptr;
> > -   u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> > -   /* Mask to indicate thread id in core */
> > -   u8 thread_mask;
> > -   /* Mask to denote subcore sibling threads */
> > -   u8 subcore_sibling_mask;
> > -   /* Flag to request this thread not to stop */
> > -   atomic_t dont_stop;
> > -   /* The PSSCR value that the kernel requested before going to stop */
> > -   u64 requested_psscr;
> > +   union {
> > +   /* P7/P8 specific fields */
> > +   struct {
> > +   /* Per-core mask tracking idle threads and a lock 
> > bit-[L][] */
> > +   unsigned long *core_idle_state_ptr;  
> 
> Do we still need *core_idle_state_ptr ? Since this code is accessed
> through C where we have access to paca_ptrs global variable we can
> reference the "idle_state" (defined for P9 below). A lot of
> locking/unlocking code in powernv/idle.c further down the patch
> already does that, right?

Yeah you're probably right I think.

> > +
> > +   if (type == PNV_THREAD_WINKLE) {
> > +   sprs.rpr = mfspr(SPRN_RPR);
> > +   sprs.tscr = mfspr(SPRN_TSCR);  
> 
> Per-subcore SPRs aren't being saved here ?

Right, it's a bit incomplete. I'll post out a new version with P8
support soon.


> > +
> > +   if (power7_fastsleep_workaround_exit) {
> > +   rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
> > +   OPAL_CONFIG_IDLE_UNDO);
> > +   WARN_ON(rc);
> > +   }
> > +  
> 
> 
> > +   /* TB */
> > +   if (opal_resync_timebase() != OPAL_SUCCESS)
> > +   BUG();  
> 
> If we are waking up only from fastsleep, we won't lose any SPRs, but
> only the timebase. So we can return at this point.

Okay, got it.

> > +static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> > +{
> > +   int cpu = raw_smp_processor_id();
> > +   int first = cpu_first_thread_sibling(cpu);
> > +   unsigned long *state = _ptrs[first]->idle_state;
> > +   unsigned long srr1;
> > +   unsigned long mmcr0 = 0;
> > +   struct p9_sprs sprs;
> > +   bool sprs_saved = false;
> > +
> > +   if (!(psscr & (PSSCR_EC|PSSCR_ESL))) {
> > +   WARN_ON(!mmu_on);  
> 
> So this is a warning that we are trying to perform a ESL = 0 stop from a
> kvm offline case. Hence we cannot return IR|DR disabled.

Yes.

> > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> > +   if (cpu_has_feature(CPU_FTR_P9_TM_XER_SO_BUG)) {
> > +   local_paca->requested_psscr = psscr;
> 

Re: [RFC PATCH v2] powerpc/64s: Move idle code to powernv C code

2018-07-25 Thread Gautham R Shenoy
Hello Nicholas,

On Sat, Jul 21, 2018 at 02:29:24PM +1000, Nicholas Piggin wrote:
> Reimplement Book3S idle code to C, in the powernv platform code.
> Assembly stubs are used to save and restore the stack frame and
> non-volatile GPRs before going to idle, but these are small and
> mostly agnostic to microarchitecture implementation details.
> 
> The optimisation where EC=ESL=0 idle modes did not have to save
> GPRs or mtmsrd L=0 is restored, because it's simple to do.
> 
> Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
> but saves and restores them all explicitly. This can easily be
> extended to tracking the set of system-wide SPRs that do not have
> to be saved each time.
> 
> Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
> way this stuff will cope with non-trivial new CPU implementation
> details, firmware changes, etc., without becoming unmaintainable.
> 
> Since RFC v1:
> - Now tested and working with POWER9 hash and radix.
> - KVM support added. This took a bit of work to untangle and might
>   still have some issues, but POWER9 seems to work including hash on
>   radix with dependent threads mode.
> - This snowballed a bit because of KVM and other details making it
>   not feasible to leave POWER7/8 code alone. That's only half done
>   at the moment.
> - So far this trades about 800 lines of asm for 500 of C. With POWER7/8
>   support done it might be another hundred or so lines of C.
> 
> Would appreciate any feedback on the approach in particular the
> significantly different (and hopefully cleaner) KVM approach.


I have reviewed the C-code movement from idle_book3s.S to
powernv/idle.c. Some comments on that part of the code inline.

I haven't yet gone through the book3s_hv_rmhandlers.S code changes.


> 
> Thanks,
> Nick
> 

[..snip..]

> 
>  #ifdef CONFIG_PPC_POWERNV
> - /* Per-core mask tracking idle threads and a lock bit-[L][] */
> - u32 *core_idle_state_ptr;
> - u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
> - /* Mask to indicate thread id in core */
> - u8 thread_mask;
> - /* Mask to denote subcore sibling threads */
> - u8 subcore_sibling_mask;
> - /* Flag to request this thread not to stop */
> - atomic_t dont_stop;
> - /* The PSSCR value that the kernel requested before going to stop */
> - u64 requested_psscr;
> + union {
> + /* P7/P8 specific fields */
> + struct {
> + /* Per-core mask tracking idle threads and a lock 
> bit-[L][] */
> + unsigned long *core_idle_state_ptr;

Do we still need *core_idle_state_ptr ? Since this code is accessed
through C where we have access to paca_ptrs global variable we can
reference the "idle_state" (defined for P9 below). A lot of
locking/unlocking code in powernv/idle.c further down the patch
already does that, right?



> + /* PNV_THREAD_RUNNING/NAP/SLEEP */
> + u8 thread_idle_state;
> + /* Mask to indicate thread id in core */
> + u8 thread_mask;
> + /* Mask to denote subcore sibling threads */
> + u8 subcore_sibling_mask;
> + };
> 
> - /*
> -  * Save area for additional SPRs that need to be
> -  * saved/restored during cpuidle stop.
> -  */
> - struct stop_sprs stop_sprs;
> + /* P9 specific fields */
> + struct {
> + /* The PSSCR value that the kernel requested before 
> going to stop */
> + u64 requested_psscr;
> + unsigned long idle_state;
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + /* Flag to request this thread not to stop */
> + atomic_t dont_stop;
> +#endif
> + };
> + };
>  #endif
> 

[..snip..]

>   /*
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 12f13acee1f6..5b6af21ea9aa 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -16,6 +16,7 @@
>  #include 

[..snip..]

> +static inline void atomic_start_thread_idle(void)
> +{
> + int cpu = raw_smp_processor_id();
> + int first = cpu_first_thread_sibling(cpu);
> + int thread_nr = cpu_thread_in_core(cpu);
> + unsigned long *state = _ptrs[first]->idle_state;

We are using this in both POWER8 and POWER9 code and are (rightly IMO)
referencing the idle_state variable.

> +
> + clear_bit(thread_nr, state);
> +}
> +
> +static inline void atomic_stop_thread_idle(void)
> +{
> + int cpu = raw_smp_processor_id();
> + int first = cpu_first_thread_sibling(cpu);
> + int thread_nr = cpu_thread_in_core(cpu);
> + unsigned long *state = _ptrs[first]->idle_state;
> +
> + set_bit(thread_nr, state);
> +}
> +
> +static inline void atomic_lock_thread_idle(void)
> +{
> + int cpu =