https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW

--- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Andrew Cooper from comment #22)
> One curious thing I have discovered.  While auditing the -mharden-sls=all
> code generation in Xen, I found examples where I got "ret int3 ret int3"
> with no intervening instructions.
> 
> It turns out this is not a regression in this change.  It is a pre-existing
> missing optimisation, which is made more obvious when every ret is extended
> with an int3.
> 
> It occurs for functions with either no stack frame at all, or functions
> which have an early exit before setting up the stack frame.  Some examples
> which occur at -O1 do not occur at -O2.
> 
> One curious example which does still repro at -O2 is this.  We have a hash
> lookup function:
> 
> struct context *sidtab_search(struct sidtab *s, u32 sid)
> {
>     int hvalue;
>     struct sidtab_node *cur;
> 
>     if ( !s )
>         return NULL;
> 
>     hvalue = SIDTAB_HASH(sid);
>     cur = s->htable[hvalue];
>     while ( cur != NULL && sid > cur->sid )
>         cur = cur->next;
> 
>     if ( cur == NULL || sid != cur->sid )
>     {
>         /* Remap invalid SIDs to the unlabeled SID. */
>         sid = SECINITSID_UNLABELED;
>         hvalue = SIDTAB_HASH(sid);
>         cur = s->htable[hvalue];
>         while ( cur != NULL && sid > cur->sid )
>             cur = cur->next;
>         if ( !cur || sid != cur->sid )
>             return NULL;
>     }
> 
>     return &cur->context;
> }
> 
> which compiles (reformatted a little for width - unmodified:
> https://paste.debian.net/hidden/7bf675d6/) to:
> 
> <sidtab_search>:
>          48 85 ff             test   %rdi,%rdi
> /------- 74 63                je     <sidtab_search+0x68>
> |        48 8b 17             mov    (%rdi),%rdx
> |        89 f0                mov    %esi,%eax
> |        83 e0 7f             and    $0x7f,%eax
> |        48 8b 04 c2          mov    (%rdx,%rax,8),%rax
> |        48 85 c0             test   %rax,%rax
> |   /--- 75 13                jne    <sidtab_search+0x29>
> |  /|--- eb 17                jmp    <sidtab_search+0x2f>
> |  ||    0f 1f 84 00 00 00 00 nopl   0x0(%rax,%rax,1)
> |  ||    00 
> |  ||/-> 48 8b 40 48          mov    0x48(%rax),%rax
> |  |||   48 85 c0             test   %rax,%rax
> |  +||-- 74 06                je     <sidtab_search+0x2f>
> |  |\|-> 39 30                cmp    %esi,(%rax)
> |  | \-- 72 f3                jb     <sidtab_search+0x20>
> | /|---- 74 24                je     <sidtab_search+0x53>
> | |\---> 48 8b 42 28          mov    0x28(%rdx),%rax
> | |      48 85 c0             test   %rax,%rax
> | | /--- 75 11                jne    <sidtab_search+0x49>
> |/|-|--- eb 32                jmp    <sidtab_search+0x6c> // (1)
> ||| |    66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
> ||| |/-> 48 8b 40 48          mov    0x48(%rax),%rax
> ||| ||   48 85 c0             test   %rax,%rax
> |||/||-- 74 17                je     <sidtab_search+0x60> // (2)
> ||||\|-> 83 38 04             cmpl   $0x4,(%rax)
> |||| \-- 76 f2                jbe    <sidtab_search+0x40>
> ||||     83 38 05             cmpl   $0x5,(%rax)
> +|||---- 75 15                jne    <sidtab_search+0x68>
> ||\|---> 48 83 c0 08          add    $0x8,%rax
> || |     c3                   retq   
> || |     cc                   int3   
> || |     0f 1f 80 00 00 00 00 nopl   0x0(%rax)
> || \---> c3                   retq                        // Target of (2)
> ||       cc                   int3   
> ||       66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
> \|-----> 31 c0                xor    %eax,%eax
>  |       c3                   retq   
>  |       cc                   int3   
>  \-----> c3                   retq                        // Target of (1)
>          cc                   int3   
>          66 90                xchg   %ax,%ax
> 
> There are 4 exits in total.  Two have to set up %eax, so they can't usefully
> be merged.
> 
> However, the unconditional jmp at (1) is 2 bytes, and could fully contain
> its target ret;int3 without even impacting the surrounding padding.  Whether
> it inlines or merges, this drops 4 bytes.
> 
> The conditional jump at (2) could be folded in to any of the other exit
> paths, dropping 16 bytes from the total size size.
> 
> I have no idea how easy/hard this may be to track down, or whether it is
> worth pursuing urgently, but it probably does want looking at, seeing as SLS
> hardening doubles the hit.

Please open a separate bug to track it.  Should shrink-wrap handle it?

Reply via email to