On 02/05/25 10:57, Dave Hansen wrote: > gah, the cc list here is rotund... > > On 5/2/25 09:38, Valentin Schneider wrote: > ... >>> All of the paths to enter the kernel from userspace have some >>> SWITCH_TO_KERNEL_CR3 variant. If they didn't, the userspace that they >>> entered from could have attacked the kernel with Meltdown. >>> >>> I'm theorizing that if this is _just_ about avoiding TLB flush IPIs that >>> you can get away with a single mechanism. >> >> So right now there would indeed be the TLB flush IPIs, but also the >> text_poke() ones (sync_core() after patching text). >> >> These are the two NOHZ-breaking IPIs that show up on my HP box, and that I >> also got reports for from folks using NOHZ_FULL + CPU isolation in >> production, mostly on SPR "edge enhanced" type of systems. > ... >> While I don't expect the list to grow much, it's unfortunately not just the >> TLB flush IPIs. > > Isn't text patching way easier than TLB flushes? You just need *some* > serialization. Heck, since TLB flushes are architecturally serializing, > you could probably even reuse the exact same mechanism: implement > deferred text patch serialization operations as a deferred TLB flush. > > The hardest part is figuring out which CPUs are in the state where they > can be deferred or not. But you have to solve that in any case, and you > already have an algorithm to do it.
Alright, off to mess around SWITCH_TO_KERNEL_CR3 to see how shoving deferred operations there would look then.