On Fri, Dec 2, 2016 at 9:32 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski <l...@kernel.org> wrote: >> >> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is >> ~110ns. But Xen PV will trap CPUID if possible, so IRET-to-self >> should end up being a nice speedup. > > So if we care deeply about the performance of this, we should really > ask ourselves how much we need this... > > There are *very* few places where we really need to do a full > serializing instruction, and I'd worry that we really don't need it in > many of the places we do this. > > The only real case I'm aware of is modifying code that is modified > through a different linear address than it's executed.
TBH, I didn't start down this path for performance. I did it because I wanted to kill off a CPUID that was breaking on old CPUs that don't have CPUID. So I propose MOV-to-CR2 followed by an unconditional jump. My goal here is to make the #*!& thing work reliably and not be ludicrously slow. Borislav and I mulled over using an alternative to use CPUID if and only if we have CPUID, but that doesn't work because we call sync_core() before we're done applying alternatives. > > Is there anything else where we _really_ need this sync-core thing? > Sure, the microcode loader looks fine, but that doesn't look > particularly performance-critical either. > > So I'd like to know which sync_core is actually so > performance-critical that w e care about it, and then I'd like to > understand why it's needed at all, because I suspect a number of them > has been added with the model of "sprinkle random things around and > hope". apply_alternatives, unfortunately. It's performance-critical because it's intensely stupid and does sync_core() for every single patch. Fixing that would be nice, too. > Adding Peter Anvin to the participants list, because iirc he was the > one who really talked to hardwre engineers about the synchronization > issues with serializing kernel code. > > Linus