https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91019

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-06-27
                 CC|                            |law at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  If you adjust the testcase to store only 7 bytes in encode_v2 I
would have expeccted DSE to trim the memcpy from seeing the second store.
With 8 (and also 2) the memcpy are folded to plain assignments early so
we have

encode_v1 (unsigned char * buf, long unsigned int a1, short unsigned int a2)
{
  <bb 2> [local count: 1073741824]:
  __builtin_memcpy (buf_2(D), &a1, 6);
  MEM[(char * {ref-all})buf_2(D) + 6B] = a2_5(D);
  return;

encode_v2 (unsigned char * buf, long unsigned int a1, short unsigned int a2)
{
  <bb 2> [local count: 1073741824]:
  MEM[(char * {ref-all})buf_2(D)] = a1_5(D);
  MEM[(char * {ref-all})buf_2(D) + 6B] = a2_6(D);
  return;

I'm not sure where it would best fit to _extend_ the earlier memcpy.  We have
to be careful to not introduce store data races.  Eventually store-merging
can come to the rescue...

OTOH how realistic is this testcase?

Reply via email to