On 12/28/23 07:59, Roger Sayle wrote:

This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.
Well, I think "fixes" is a bit of a stretch. We're avoiding the issue by changing the early RTL generation, but if I understand what's going on in the RTL optimizers and MIPS backend correctly, the core bug still remains. Admittedly I haven't put it under a debugger, but that MIPS definition of NOOP_TRUNCATION just seems badly wrong and is just waiting to pop it's ugly head up again.




The motivating example from the bugzilla PR is:

extern void ext(int);
void foo(const unsigned char *buf) {
   int val;
   ((unsigned char*)&val)[0] = *buf++;
   ((unsigned char*)&val)[1] = *buf++;
   ((unsigned char*)&val)[2] = *buf++;
   ((unsigned char*)&val)[3] = *buf++;
   if(val > 0)
     ext(1);
   else
     ext(0);
}

which at the end of the tree optimization passes looks like:

void foo (const unsigned char * buf)
{
   int val;
   unsigned char _1;
   unsigned char _2;
   unsigned char _3;
   unsigned char _4;
   int val.5_5;

   <bb 2> [local count: 1073741824]:
   _1 = *buf_7(D);
   MEM[(unsigned char *)&val] = _1;
   _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
   MEM[(unsigned char *)&val + 1B] = _2;
   _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
   MEM[(unsigned char *)&val + 2B] = _3;
   _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
   MEM[(unsigned char *)&val + 3B] = _4;
   val.5_5 = val;
   if (val.5_5 > 0)
     goto <bb 3>; [59.00%]
   else
     goto <bb 4>; [41.00%]

   <bb 3> [local count: 633507681]:
   ext (1);
   goto <bb 5>; [100.00%]

   <bb 4> [local count: 440234144]:
   ext (0);

   <bb 5> [local count: 1073741824]:
   val ={v} {CLOBBER(eol)};
   return;

}

Here four bytes are being sequentially written into the SImode value
val.  On some platforms, such as MIPS64, this SImode value is kept in
a 64-bit register, suitably sign-extended.  The function expand_assignment
contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
in expr.cc) which outputs an explicit extension operation after each
store_field (typically insv) to such promoted/extended pseudos.

The first observation is that there's no need to perform sign extension
after each byte in the example above; the extension is only required
after changes to the most significant byte (i.e. to a field that overlaps
the most significant bit).
True.



The bug fix is actually a bit more subtle, but at this point during
code expansion it's not safe to use a SUBREG when sign-extending this
field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
but combine (and other RTL optimizers) later realize that because SImode
values are always sign-extended in their 64-bit hard registers that
this is a no-op and eliminates it.  The trouble is that it's unsafe to
refer to the SImode lowpart of a 64-bit register using SUBREG at those
critical points when temporarily the value isn't correctly sign-extended,
and the usual backend invariants don't hold.  At these critical points,
the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
(sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.



Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
patch that I'll post in a few minutes.


This middle-end patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  The cc1 from a
cross-compiler to mips64 appears to generate much better code for
the above test case.  Ok for mainline?


2023-12-28  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         PR rtl-optimization/104914
         * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
         a sign or zero extension is only required if the modified field
         overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
         targets, don't refer to the temporarily incorrectly extended value
         using a SUBREG, but instead generate an explicit TRUNCATE rtx.
[ ... ]


+             /* Check if the field overlaps the MSB, requiring extension.  */
+             else if (known_eq (bitpos + bitsize,
+                                GET_MODE_BITSIZE (GET_MODE (to_rtx))))
Do you need to look at the size of the field as well? ie, the starting position might be before the sign bit, but the width of the field might cover the mode's sign bit?

I'm not real good in the RTL expansion code, so if I'm offbase on this, just let me know.

jeff

Reply via email to