hotplug: Change the default behaviour of cede_offline

Naveen N. Rao Wed, 18 Sep 2019 10:32:17 -0700

Michael Ellerman wrote:

"Naveen N. Rao" <[email protected]> writes:

Michael Ellerman wrote:

"Gautham R. Shenoy" <[email protected]> writes:

From: "Gautham R. Shenoy" <[email protected]>


Currently on Pseries Linux Guests, the offlined CPU can be put to one
of the following two states:
   - Long term processor cede (also called extended cede)
   - Returned to the Hypervisor via RTAS "stop-self" call.

This is controlled by the kernel boot parameter "cede_offline=on/off".

By default the offlined CPUs enter extended cede.


Since commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an 
appropriate offline state") (Nov 2009)

Which you wrote :)

Why was that wrong?

The PHYP hypervisor considers CPUs in extended cede to be "active"
since the CPUs are still under the control fo the Linux Guests. Hence, when we 
change the
SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
will continue to count the values for offlined CPUs in extended cede
as if they are online.

One of the expectations with PURR is that the for an interval of time,
the sum of the PURR increments across the online CPUs of a core should
equal the number of timebase ticks for that interval.

This is currently not the case.


But why does that matter? It's just some accounting stuff, does it
actually break something meaningful?

Yes, this broke lparstat at the very least (though its quite unfortunatewe took so long to notice).


By "so long" you mean 10 years?

Also I've never heard of lparstat, but I assume it's one of these tools
that's meant to behave like the AIX equivalent?


Yes, and yes. lparstat is part of powerpc-utils.


If it's been "broken" for 10 years and no one noticed, I'd argue the
current behaviour is now "correct" and fixing it would actually be a
breakage :)


:)
More on this below...

With SMT disabled, and under load:
  $ sudo lparstat 1 10

  System Configuration

type=Shared mode=Uncapped smt=Off lcpu=2 mem=7759616 kB cpus=6 ent=1.00

  %user  %sys %wait    %idle    physc %entc lbusy  vcsw phint
  ----- ----- -----    -----    ----- ----- ----- ----- -----
  100.00  0.00  0.00     0.00     1.10 110.00 100.00 128784460     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128784860     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128785260     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128785662     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128786062     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128786462     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128786862     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128787262     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128787664     0
  100.00  0.00  0.00     0.00     1.07 107.00 100.00 128788064     0


What about that is wrong?

The 'physc' column represents cpu usage in units of physical cores.With 2 virtual cores ('lcpu=2') in uncapped, shared processor mode, weexpect this to be closer to 2 when fully loaded (and spare capacityelsewhere in the system).

With cede_offline=off:
  $ sudo lparstat 1 10

  System Configuration

type=Shared mode=Uncapped smt=Off lcpu=2 mem=7759616 kB cpus=6 ent=1.00

  %user  %sys %wait    %idle    physc %entc lbusy  vcsw phint
  ----- ----- -----    -----    ----- ----- ----- ----- -----
  100.00  0.00  0.00     0.00     1.94 194.00 100.00 128961588     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128961988     0
  100.00  0.00  0.00     0.00      inf   inf 100.00 128962392     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128962792     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128963192     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128963592     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128963992     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128964392     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128964792     0
  100.00  0.00  0.00     0.00     1.91 191.00 100.00 128965194     0

[The 'inf' values there show a different bug]

Also, since we expose [S]PURR through sysfs, any tools that make use ofthat directly are also affected due to this.


But again if we've had the current behaviour for 10 years then arguably
that's now the correct behaviour.

That's a fair point, and probably again points to this area getting lesstested. One of the main reasons for this being caught now though, isthat there are workloads being tested under lower SMT levels now. So, Isuspect no one has been relying on this behavior and we can considerthis to be a bug.



Thanks,
Naveen

Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline

Reply via email to