On Wed, Sep 18, 2019 at 03:14:15PM +1000, Michael Ellerman wrote: > "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> writes: > > From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> > > > > Currently on Pseries Linux Guests, the offlined CPU can be put to one > > of the following two states: > > - Long term processor cede (also called extended cede) > > - Returned to the Hypervisor via RTAS "stop-self" call. > > > > This is controlled by the kernel boot parameter "cede_offline=on/off". > > > > By default the offlined CPUs enter extended cede. > > Since commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an > appropriate offline state") (Nov 2009) > > Which you wrote :)
Mea Culpa! I forgot to include the "Fixes commit 3aa565f53c39" into Patch 1 of the series. > > Why was that wrong? It was wrong from the definition of what PHYP considers as "not-active" CPU. From the point of view of that hypervisor, a CPU is not-active iff it is in RTAS "stop-self". Thus if a CPU is offline via extended cede, and not using any cycles, it is still considered to be active, by PHYP. This causes PURR accounting is broken. > > > The PHYP hypervisor considers CPUs in extended cede to be "active" > > since the CPUs are still under the control fo the Linux Guests. Hence, when > > we change the > > SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs > > will continue to count the values for offlined CPUs in extended cede > > as if they are online. > > > > One of the expectations with PURR is that the for an interval of time, > > the sum of the PURR increments across the online CPUs of a core should > > equal the number of timebase ticks for that interval. > > > > This is currently not the case. > > But why does that matter? It's just some accounting stuff, does it > actually break something meaningful? As Naveen mentioned, it breaks lparstat which the customers are using for capacity planning. Unfortunately we discovered this 10 years after the feature was written. > > Also what does this do to the latency of CPU online/offline. It will have a slightly higher latency compared to extended cede, since it involves an additional rtas-call for both the start and stopping of CPU. Will measure the exact difference and post it in the next version. > And what does this do on KVM? KVM doesn't seem to depend on the state of the offline VCPU as it has an explicit way of signalling whether a CPU is online or not, via KVM_REG_PPC_ONLINE. In commit 7aa15842c15f ("KVM: PPC: Book3S HV: Set RWMR on POWER8 so PURR/SPURR count correctly") we use this KVM reg to update the count of online vCPUs in a core, and use this count to set the RWMR correctly before dispatching the core. So, this patchset doesn't affect KVM. > > > > In the following data (Generated using > > https://github.com/gautshen/misc/blob/master/purr_tb.py): > > > > > > delta tb = tb ticks elapsed in 1 second. > > delta purr = sum of PURR increments on online CPUs of that core in 1 > > second > > > > SMT=off > > =========================================== > > Core delta tb(apprx) delta purr > > =========================================== > > core00 [ 0] 512000000 69883784 > > core01 [ 8] 512000000 88782536 > > core02 [ 16] 512000000 94296824 > > core03 [ 24] 512000000 80951968 > > Showing the expected value in another column would make this much > clearer. Thanks. Will update the testcase to call out the expected value. > > cheers > -- Thanks and Regards gautham.