[Bug target/82227] ARM thumb inefficient tailcall return sequence (multiple pops)

ramana at gcc dot gnu.org Wed, 10 Oct 2018 03:21:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82227


Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P4
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-10-10
                 CC|                            |ramana at gcc dot gnu.org
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #1 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Confirmed.(In reply to Peter Cordes from comment #0)
> int ext();
> int tailcall_external() { return ext(); }
>  // https://godbolt.org/g/W43fxw
> 
> gcc6.3 -Os -mthumb
> 
>         push    {r4, lr}
>         bl      ext
>         pop     {r4}
>         pop     {r1}        # two separate pop instructions isn't optimal
>         bx      r1
> 
> gcc6.3 -Os -mthumb -mno-thumb-interwork
> 
>         push    {r4, lr}
>         bl      ext
>         pop     {r4, pc}
> 
> A 16-bit thumb pop instruction can only pop "lo" registers and PC, not back
> into LR.  That's why it can't  pop {r4, lr}  / bx lr  like it does in -marm
> mode.
> 
> But there is a more efficient way:
> 
>         pop     {r1, r2}
>         bx      r2

Yep. 


> 
> We never needed a call-preserved register; r4 was pushed only to keep the
> stack aligned.  So as long as we have 2 call-clobbered regs available, we
> can pop the padding that came from r4, and pop the saved lr, both into
> call-clobbered regs.
> 
> If we did need a call-preserved register for anything, two separate pop
> instructions are presumably better than any combination of pop-multiple and
> reg-reg moves.
> 
> ----
> 
> This also happens with two identical functions with different names, with
> -Os.  One compiles into a call to the other, done exactly the same way as to
> an external function.  (See the godbolt link above).
> 
> In that case, I don't understand why we can't just tail-call with a `b`
> instruction (like we get with -marm).  Both functions are compiled to Thumb2
> code, so we can jump to the other and let it do an interworking return,
> right?  Especially with -mno-thumb-interwork, I don't understand why
> tail-calls aren't optimized to a jump.

You need to read up on the various levels of the architecture and the command
line options. Thumb2 doesn't show up at the default level of the architecture
and needs atleast -mthumb -march=armv6t2 . Try reading this for a beginners
guide to the architecture. 

https://community.arm.com/tools/b/blog/posts/arm-cortex-a-processors-and-gcc-command-lines?CommentSortBy=CreatedDate&CommentSortOrder=Descending

We don't tail call in general for Thumb1 which is what your options imply
because the branches are just too short (encoded in 16bits ) IIRC.


> 
> (I'm not an expert on ARM / Thumb stuff, so there might be a reason I'm
> missing.)

[Bug target/82227] ARM thumb inefficient tailcall return sequence (multiple pops)

Reply via email to