Re: [PATCH 0/2] measure latency of cpu hotplug path

Peter Zijlstra Mon, 28 Sep 2020 00:41:07 -0700

On Sun, Sep 27, 2020 at 07:41:45PM -0700, psoda...@codeaurora.org wrote:
> On 2020-09-24 07:58, Steven Rostedt wrote:
> > On Thu, 24 Sep 2020 10:34:14 +0200
> > pet...@infradead.org wrote:
> > 
> > > On Wed, Sep 23, 2020 at 04:37:44PM -0700, Prasad Sodagudi wrote:
> > > > There are all changes related to cpu hotplug path and would like to seek
> > > > upstream review. These are all patches in Qualcomm downstream kernel
> > > > for a quite long time. First patch sets the rt prioity to hotplug
> > > > task and second patch adds cpuhp trace events.
> > > >
> > > > 1) cpu-hotplug: Always use real time scheduling when hotplugging a CPU
> > > > 2) cpu/hotplug: Add cpuhp_latency trace event
> > > 
> > > Why? Hotplug is a known super slow path. If you care about hotplug
> > > latency you're doing it wrong.
> Hi Peter,
> 
> [PATCH 1/2] cpu/hotplug: Add cpuhp_latency trace event -
> 1)    Tracing of the cpuhp operation is important to find whether upstream
> changes or out of tree modules(or firmware changes) caused latency
> regression or not.


This is a contradiction in terms, it is impossible to have a latency
regression is you don't care about the latency in this super slow path
to begin with.

> 2)    Secondary cpus are hotplug out during the device suspend and hotplug in
> during the resume.

Indeed they are.

> 3)    firmware(psci calls handling from firmware) changes impact need to be
> tested right?

Firmware is firmware, it's broken by design and we can't fix it if it's
broken. The only sane solution is not having firmware :-)

> 4)    cpu hotplug framework(CPUHP_AP_ONLINE_DYN) dynamic callbacks may impact
> the hotplug latency.

Again, nobody cares.

> [PATCH 2/2] cpu-hotplug: Always use real time scheduling when  hotplugging a
> CPU –
> 
> CPU hotplug operation is stressed and while stress testing with full load on
> the system following problem is observed.
> CPU hotplug operations take place in preemptible context. This leaves the
> hotplugging thread at the mercy of overall system load and CPU
> availability. If the hotplugging thread does not get an opportunity to
> execute after it has already begun a hotplug operation, CPUs can
> end up being stuck in a quasi online state. In the worst case a CPU can be
> stuck in a state where the migration thread is parked while
> another task is executing and changing affinity in a loop. This combination
> can result in unbounded execution time for the running
> task until the hot plugging thread gets the chance to run to complete the
> hotplug operation.

How is that not an administration problem?

Also, you shouldn't be able to change your affinity _to_ a CPU that's
going down. One of the very first steps in hotplug ensures that.

Re: [PATCH 0/2] measure latency of cpu hotplug path

Reply via email to