On Sun, Sep 27, 2020 at 07:41:45PM -0700, psoda...@codeaurora.org wrote: > On 2020-09-24 07:58, Steven Rostedt wrote: > > On Thu, 24 Sep 2020 10:34:14 +0200 > > pet...@infradead.org wrote: > > > > > On Wed, Sep 23, 2020 at 04:37:44PM -0700, Prasad Sodagudi wrote: > > > > There are all changes related to cpu hotplug path and would like to seek > > > > upstream review. These are all patches in Qualcomm downstream kernel > > > > for a quite long time. First patch sets the rt prioity to hotplug > > > > task and second patch adds cpuhp trace events. > > > > > > > > 1) cpu-hotplug: Always use real time scheduling when hotplugging a CPU > > > > 2) cpu/hotplug: Add cpuhp_latency trace event > > > > > > Why? Hotplug is a known super slow path. If you care about hotplug > > > latency you're doing it wrong. > Hi Peter, > > [PATCH 1/2] cpu/hotplug: Add cpuhp_latency trace event - > 1) Tracing of the cpuhp operation is important to find whether upstream > changes or out of tree modules(or firmware changes) caused latency > regression or not.
This is a contradiction in terms, it is impossible to have a latency regression is you don't care about the latency in this super slow path to begin with. > 2) Secondary cpus are hotplug out during the device suspend and hotplug in > during the resume. Indeed they are. > 3) firmware(psci calls handling from firmware) changes impact need to be > tested right? Firmware is firmware, it's broken by design and we can't fix it if it's broken. The only sane solution is not having firmware :-) > 4) cpu hotplug framework(CPUHP_AP_ONLINE_DYN) dynamic callbacks may impact > the hotplug latency. Again, nobody cares. > [PATCH 2/2] cpu-hotplug: Always use real time scheduling when hotplugging a > CPU – > > CPU hotplug operation is stressed and while stress testing with full load on > the system following problem is observed. > CPU hotplug operations take place in preemptible context. This leaves the > hotplugging thread at the mercy of overall system load and CPU > availability. If the hotplugging thread does not get an opportunity to > execute after it has already begun a hotplug operation, CPUs can > end up being stuck in a quasi online state. In the worst case a CPU can be > stuck in a state where the migration thread is parked while > another task is executing and changing affinity in a loop. This combination > can result in unbounded execution time for the running > task until the hot plugging thread gets the chance to run to complete the > hotplug operation. How is that not an administration problem? Also, you shouldn't be able to change your affinity _to_ a CPU that's going down. One of the very first steps in hotplug ensures that.