On Mon, Mar 15, 2021 at 7:10 PM Peter Zijlstra <pet...@infradead.org> wrote: > > On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote: > > > make V=1 -j4 LLVM=1 LLVM_IAS=1 > > So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs > when building with -march=sandybridge, they always use MOPL. > > Furthermore, the kernel explicitly sets: -falign-jumps=1 > -falign-loops=1, which, when not specified, default to 16 or so. > > This means that your userspace is *littered* with NOPL, even when you > build your entire distro from source with -march=sandybridge. > (arch/gentoo FTW I suppose). >
That reminds me of the Git repo of the wireguard maintainer. "x86: enable additional cpu optimizations for gcc v9.1+" You mean something like that ^^? - Sedat - [1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686 > (The only good new is that recent LLVM has a pass to use alternative > instruction encoding in order to grow a basic block in size in order to > minimize the amount of NOP it needs to emit at the end in order to > satisfy the jump/loop alignment.) > > So if you *really* deeply care about NOP performance on your SNB, start > by teaching LLVM about prefix NOPs and rebuild your complete userspace. > At that point, you can do some trivial patches to the kernel to make it > use -march=sandybridge and prefix NOPs too. > > Until that time, the vast majority of NOPs your CPU will execute will be > NOPL.