From: tip-b...@linutronix.de
> Sent: 03 April 2021 12:11
...
> Notice that since the longest alternative sequence is now:
> 
>    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
>    5:   f3 90                   pause
>    7:   0f ae e8                lfence
>    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
>    c:   48 89 04 24             mov    %rax,(%rsp)
>   10:   c3                      retq
> 
> 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if
> we can shrink the retpoline by 1 byte we can pack it more densely).

Every time I see this I can't help feeling that doing something
(aka anything) to get the 'mov' and 'retq' into the same 16 byte
code fetch/decode block but be advantageous.

Even something like:
        call    1f
        pause
        jmp     2f
1:      mov     %rax,(%rsp)
        retq
2:      pause
        lfence
        jmp     2b
Might meet all the requirements for the retpoline while
allowing the 'mov' and 'retq' be decoded in the same clock.

        David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Reply via email to