On 02/12/2018 02:36 PM, David Laight wrote:
From: Denys Vlasenko
Sent: 12 February 2018 13:29
...

x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.
...
Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
   arch/x86/entry/calling.h  | 36 ++++++++++++++++++++++++++++++++++++
   arch/x86/entry/entry_64.S |  6 ++----
   2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a05cbb8..57b1b87 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is 
built with
        UNWIND_HINT_REGS offset=\offset
        .endm

+       .macro PUSH_AND_CLEAR_REGS
+       /*
+        * Push registers and sanitize registers of values that a
+        * speculation attack might otherwise want to exploit. The
+        * lower registers are likely clobbered well before they
+        * could be put to use in a speculative execution gadget.
+        * Interleave XOR with PUSH for better uop scheduling:
+        */
+       pushq   %rdi            /* pt_regs->di */
+       pushq   %rsi            /* pt_regs->si */
+       pushq   %rdx            /* pt_regs->dx */
+       pushq   %rcx            /* pt_regs->cx */
+       pushq   %rax            /* pt_regs->ax */
+       pushq   %r8             /* pt_regs->r8 */
+       xorq    %r8, %r8        /* nospec   r8 */

xorq's are slower than xorl's on Silvermont/Knights Landing.
I propose using xorl instead.

Does using movq to copy the first zero to the other registers make
the code any faster?

ISTR mov reg-reg is often implemented as a register rename rather than an
alu operation.

xorl is implemented in register rename as well. Just, for some reason,
xorq did not get the same treatment on those CPUs.

Reply via email to