On Sat, Jan 10, 2015 at 1:09 PM, Denys Vlasenko <vda.li...@googlemail.com> wrote: > > I think using push/pop is okay. In the very hottest code paths > you may want to prefer mov's.
For kernel entrypoints in particular, the code sequence is quite possibly constrained by the decoder and instruction fetch rather than the execution engine. Even if the entrypoint were to be in the L1 I$ (which is not generally the case except in microbenchmarks), I am pretty sure that even Intel doesn't actually speculatively decode across system call boundaries, so unlike normal nice code, you don't have the front end running ahead of the execution engine. Looking at the system call hotpath, for example, it looks like we save/restore 8 registers. So 16 instructions or about 80 bytes of code. I could easily imagine us avoiding one cacheline access by using shorter 1- and 2-byte push/pop instructions (depending a bit on how the cacheline alignment works out, of course). Depending on how well it prefetches from L2 and/or exact decoder details, that kind of issue *can* overshadow the actual execution costs. Of course, on microbenchmarks (eg some system call benchmark that does "getppid()" in a loop), even the kernel side stays in the L1, so those might show possible execution issues more. And macrobenchmarks probably won't show a cycle or two in the system call or fault path anyway. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/