On Wed, Sep 18, 2019 at 03:14:15PM +1000, Michael Ellerman wrote:
> "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> writes:
> > From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>
> >
> > Currently on Pseries Linux Guests, the offlined CPU can be put to one
> > of the following two states:
> >    - Long term processor cede (also called extended cede)
> >    - Returned to the Hypervisor via RTAS "stop-self" call.
> >
> > This is controlled by the kernel boot parameter "cede_offline=on/off".
> >
> > By default the offlined CPUs enter extended cede.
> 
> Since commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an 
> appropriate offline state") (Nov 2009)
> 
> Which you wrote :)

Mea Culpa! I forgot to include the "Fixes commit 3aa565f53c39" into
Patch 1 of the series.

> 
> Why was that wrong?

It was wrong from the definition of what PHYP considers as
"not-active" CPU. From the point of view of that hypervisor, a CPU is
not-active iff it is in RTAS "stop-self". Thus if a CPU is offline via
extended cede, and not using any cycles, it is still considered to be
active, by PHYP. This causes PURR accounting is broken. 

> 
> > The PHYP hypervisor considers CPUs in extended cede to be "active"
> > since the CPUs are still under the control fo the Linux Guests. Hence, when 
> > we change the
> > SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
> > will continue to count the values for offlined CPUs in extended cede
> > as if they are online.
> >
> > One of the expectations with PURR is that the for an interval of time,
> > the sum of the PURR increments across the online CPUs of a core should
> > equal the number of timebase ticks for that interval.
> >
> > This is currently not the case.
> 
> But why does that matter? It's just some accounting stuff, does it
> actually break something meaningful?

As Naveen mentioned, it breaks lparstat which the customers are using
for capacity planning. Unfortunately we discovered this 10 years after
the feature was written.

> 
> Also what does this do to the latency of CPU online/offline.

It will have a slightly higher latency compared to extended cede,
since it involves an additional rtas-call for both the start and
stopping of CPU. Will measure the exact difference and post it in the
next version.

> And what does this do on KVM?

KVM doesn't seem to depend on the state of the offline VCPU as it has
an explicit way of signalling whether a CPU is online or not, via
KVM_REG_PPC_ONLINE. In commit 7aa15842c15f ("KVM: PPC: Book3S HV: Set
RWMR on POWER8 so PURR/SPURR count correctly") we use this KVM reg to
update the count of online vCPUs in a core, and use this count to set
the RWMR correctly before dispatching the core.

So, this patchset doesn't affect KVM.

> 
> 
> > In the following data (Generated using
> > https://github.com/gautshen/misc/blob/master/purr_tb.py):
> >
> >
> > delta tb = tb ticks elapsed in 1 second.
> > delta purr = sum of PURR increments on online CPUs of that core in 1
> >                  second
> >       
> > SMT=off
> > ===========================================
> > Core                delta tb(apprx)  delta purr     
> > ===========================================
> > core00 [  0]        512000000       69883784        
> > core01 [  8]        512000000       88782536        
> > core02 [ 16]        512000000       94296824        
> > core03 [ 24]        512000000       80951968        
> 
> Showing the expected value in another column would make this much
> clearer.

Thanks. Will update the testcase to call out the expected value.
> 
> cheers
> 


--
Thanks and Regards
gautham.

Reply via email to