On Fri, May 31, 2019 at 1:57 AM Rafael J. Wysocki <[email protected]> wrote: > > On Friday, May 31, 2019 10:47:21 AM CEST Jiri Kosina wrote: > > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > > > I disagree with that from the backwards compatibility point of view. > > > > > > > > I personally am quite frequently using differnet combinations of > > > > resumer/resumee kernels, and I've never been biten by it so far. I'd > > > > guess > > > > I am not the only one. > > > > Fixmap sort of breaks that invariant. > > > > > > Right now there is no backwards compatibility because nosmt resume is > > > already broken. > > > > Yeah, well, but that's "only" for nosmt kernels at least. > > > > > For "future" backwards compatibility we could just define a hard-coded > > > reserved fixmap page address, adjacent to the vsyscall reserved address. > > > > > > Something like this (not yet tested)? Maybe we could also remove the > > > resume_play_dead() hack? > > > > Does it also solve cpuidle case? I have no overview what all the cpuidle > > drivers might be potentially doing in their ->enter_dead() callbacks. > > Rafael? > > There are just two of them, ACPI cpuidle and intel_idle, and they both should > be covered. > > In any case, I think that this is the way to go here even though it may be > somewhat > problematic to start with. >
Given that there seems to be a genuine compatibility issue right now, can we design an actual sane way to hand off control of all CPUs rather than adding duct tape to an extremely fragile mechanism? I can think of at least two sensible solutions: 1. Have a self-contained "play dead for kexec/resume" function that touches only few well-defined physical pages: a set of page tables and a page of code. Load CR3 to point to those page tables, fill in the code with some form of infinite loop, and run it. Or just turn off paging entirely and run the infinite loop. Have the kernel doing the resuming inform the kernel being resumed of which pages these are, and have the kernel being resumed take over all CPUs before reusing the pages. 2. Put the CPU all the way to sleep by sending it an INIT IPI. Version 2 seems very simple and robust. Is there a reason we can't do it? We obviously don't want to do it for normal offline because it might be a high-power state, but a cpu in the wait-for-SIPI state is not going to exit that state all by itself. The patch to implement #2 should be short and sweet as long as we are careful to only put genuine APs to sleep like this. The only downside I can see is that an new kernel resuming and old kernel that was booted with nosmt is going to waste power, but I don't think that's a showstopper.

