https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529

--- Comment #17 from Jim Wilson <wilson at gcc dot gnu.org> ---
I still haven't been able to reproduce this, but I do see a problem.

In the original bug report, the only difference is that the code uses x4 in the
first part of the diff, and x24 in the second part of the diff, which seems
unimportant.  However, this value lives across a call to memcpy.  x24 is a safe
register here because it is callee saved.  x4 is not safe though, as it is an
argument passing/return value register, which may be clobbered by a call. 
Whether it gets clobbered depends on the memcpy implementation that is linked
with.  If people are linking with different memcpy implementations, that might
affect whether the bug is reproducible.

Disassembling my testcase, I don't see the same code sequence though.  I see
  401530:       d2800802        mov     x2, #0x40                       // #64
  401534:       52800b01        mov     w1, #0x58                       // #88
  401538:       aa1303e0        mov     x0, x19
  40153c:       940000d1        bl      401880 <memset>
  401540:       9121c324        add     x4, x25, #0x870
  401544:       91001663        add     x3, x19, #0x5                           
which is OK, because the "add x3, x19, #0x5" instruction comes after the memset
call.

Maybe there is something subtly different about how I'm configuring or building
the toolchain that results in the different LTO optimized code.

Reply via email to