PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-19 Thread Xi Ruoyao via Gcc-patches
Ping^5.

> > > On Thu, Aug 10, 2023 at 03:04:03PM +0200, Stefan Schulze Frielinghaus 
> > > wrote:
> > > > In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> > > > completely missed the fact that the normal form of a generated constant 
> > > > for a
> > > > mode with fewer bits than in HOST_WIDE_INT is a sign extended version 
> > > > of the
> > > > actual constant.  This even holds true for unsigned constants.
> > > > 
> > > > Fixed by masking out the upper bits for the incoming constant and sign
> > > > extending the resulting unsigned constant.
> > > > 
> > > > Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> > > > 
> > > > While reading existing optimizations in combine I stumbled across two
> > > > optimizations where either my intuition about the representation of
> > > > unsigned integers via a const_int rtx is wrong, which then in turn would
> > > > probably also mean that this patch is wrong, or that the optimizations
> > > > are missed sometimes.  In other words in the following I would assume
> > > > that the upper bits are masked out:
> > > > 
> > > > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > > > index 468b7fde911..80c4ff0fbaf 100644
> > > > --- a/gcc/combine.cc
> > > > +++ b/gcc/combine.cc
> > > > @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    /* (unsigned) < 0x8000 is equivalent to >= 0.  */
> > > >    else if (is_a  (mode, &int_mode)
> > > >    && GET_MODE_PRECISION (int_mode) - 1 < 
> > > > HOST_BITS_PER_WIDE_INT
> > > > -  && ((unsigned HOST_WIDE_INT) const_op
> > > > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION 
> > > > (int_mode) - 1)))
> > > >     {
> > > >   const_op = 0;
> > > > @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    /* (unsigned) >= 0x8000 is equivalent to < 0.  */
> > > >    else if (is_a  (mode, &int_mode)
> > > >    && GET_MODE_PRECISION (int_mode) - 1 < 
> > > > HOST_BITS_PER_WIDE_INT
> > > > -  && ((unsigned HOST_WIDE_INT) const_op
> > > > +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > >    == HOST_WIDE_INT_1U << (GET_MODE_PRECISION 
> > > > (int_mode) - 1)))
> > > >     {
> > > >   const_op = 0;
> > > > 
> > > > For example, while bootstrapping on x64 the optimization is missed since
> > > > a LTU comparison in QImode is done and the constant equals
> > > > 0xff80.
> > > > 
> > > > Sorry for inlining another patch, but I would really like to make sure
> > > > that my understanding is correct, now, before I come up with another
> > > > patch.  Thus it would be great if someone could shed some light on this.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * combine.cc (simplify_compare_const): Properly handle unsigned
> > > > constants while narrowing comparison of memory and constants.
> > > > ---
> > > >  gcc/combine.cc | 19 ++-
> > > >  1 file changed, 10 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > > > index e46d202d0a7..468b7fde911 100644
> > > > --- a/gcc/combine.cc
> > > > +++ b/gcc/combine.cc
> > > > @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > >    && !MEM_VOLATILE_P (op0)
> > > >    /* The optimization makes only sense for constants which are big 
> > > > enough
> > > >  so that we have a chance to chop off something at all.  */
> > > > -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> > > > -  /* Bail out, if the constant does not fit into INT_MODE.  */
> > > > -  && (unsigned HOST_WIDE_INT) const_op
> > > > -    < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 
> > > > 1) - 1)
> > > > +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode)) > 0xff
> > > >    /* Ensure that we do not overflow during normalization.  */
> > > > -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> > > > HOST_WIDE_INT_M1U))
> > > > +  && (code != GTU
> > > > + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> > > > (int_mode))
> > > > +    < HOST_WIDE_INT_M1U)
> > > > +  && trunc_int_for_mode (const_op, int_mode) == const_op)
> > > >  {
> > > > -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> > > > +  unsigned HOST_WIDE_INT n
> > > > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> > > >    enum rtx_code adjusted_code;
> > > >  
> > > >    /* Normalize code to either LEU or GEU.  */
> > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> > > > machine_mode mode,
> > > > HOST_WI

Re: PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-19 Thread Stefan Schulze Frielinghaus
Since this patch is sitting in the queue for quite some time and (more
importantly?) solves a bootstrap problem let me reiterate:

While writing the initial commit 7cdd0860949c6c3232e6cff1d7ca37bb5234074c
and the subsequent (potential) fix 41ef5a34161356817807be3a2e51fbdbe575ae85
I was not aware of the fact that the normal form of a CONST_INT,
representing an unsigned integer with fewer bits than in HOST_WIDE_INT,
is a sign-extended version of the unsigned integer.  This invariant is
checked in rtl.h where we have at line 2297:

case CONST_INT:
  if (precision < HOST_BITS_PER_WIDE_INT)
/* Nonzero BImodes are stored as STORE_FLAG_VALUE, which on many
   targets is 1 rather than -1.  */
gcc_checking_assert (INTVAL (x.first)
 == sext_hwi (INTVAL (x.first), precision)
 || (x.second == BImode && INTVAL (x.first) == 1));

This was pretty surprising and frankly speaking unintuitive to me which
is why I was skimming further over existing code where I have found this
in combine.cc:

  /* (unsigned) < 0x8000 is equivalent to >= 0.  */
  else if (is_a  (mode, &int_mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
   && ((unsigned HOST_WIDE_INT) const_op
   == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1)))
{

The expression of the if-statement is a contradiction rendering the then-part
unreachable unless you mask out the upper bits of the sign-extended
unsigned integer const_op (as proposed in the inlined patch):

  ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))

This is why I got a bit uncertain and hoped to get some feedback whether
my intuition is correct or not.  Meanwhile I also found a comment in
the internals book at "14.7 Constant Expression Types" where we have:

   "Constants generated for modes with fewer bits than in HOST_WIDE_INT
must be sign extended to full width (e.g., with gen_int_mode).
[...]
Note however that values are neither inherently signed nor
inherently unsigned; where necessary, signedness is determined by
the rtl operation instead."

At least this and the assert statement document that the normal form of
a CONST_INT is kind of special w.r.t. unsigned integers.  Is there
anyone who can shed some light on _why_ such a normal form was chosen?

Independent of why such a normal form was chosen, this patch restores
the normal form and solves the bootstrap problem for Loongarch.

Cheers,
Stefan


Re: PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-25 Thread Eric Botcazou
> This is why I got a bit uncertain and hoped to get some feedback whether
> my intuition is correct or not.  Meanwhile I also found a comment in
> the internals book at "14.7 Constant Expression Types" where we have:
> 
>"Constants generated for modes with fewer bits than in HOST_WIDE_INT
> must be sign extended to full width (e.g., with gen_int_mode).
> [...]
> Note however that values are neither inherently signed nor
> inherently unsigned; where necessary, signedness is determined by
> the rtl operation instead."
> 
> At least this and the assert statement document that the normal form of
> a CONST_INT is kind of special w.r.t. unsigned integers.  Is there
> anyone who can shed some light on _why_ such a normal form was chosen?

In RTL integral values have no sign, they just represent a given pattern of 
bits so, in order to have a 1-to-1 mapping, you need to choose a canonical 
form.  The signed form is probably more natural and, since CONST_INTs have no 
mode, the same objects are used for e.g. QImode and HImode, which means that 
you need to sign-extend the bit pattern.

-- 
Eric Botcazou




Re: PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-29 Thread Jeff Law




On 9/19/23 10:06, Stefan Schulze Frielinghaus wrote:

Since this patch is sitting in the queue for quite some time and (more
importantly?) solves a bootstrap problem let me reiterate:

While writing the initial commit 7cdd0860949c6c3232e6cff1d7ca37bb5234074c
and the subsequent (potential) fix 41ef5a34161356817807be3a2e51fbdbe575ae85
I was not aware of the fact that the normal form of a CONST_INT,
representing an unsigned integer with fewer bits than in HOST_WIDE_INT,
is a sign-extended version of the unsigned integer.  This invariant is
checked in rtl.h where we have at line 2297:

 case CONST_INT:
   if (precision < HOST_BITS_PER_WIDE_INT)
 /* Nonzero BImodes are stored as STORE_FLAG_VALUE, which on many
targets is 1 rather than -1.  */
 gcc_checking_assert (INTVAL (x.first)
  == sext_hwi (INTVAL (x.first), precision)
  || (x.second == BImode && INTVAL (x.first) == 1));

This was pretty surprising and frankly speaking unintuitive to me which


At the heart of the matter (as I think Eric recently mentioned) is that 
CONST_INT objects do not have a mode.  So consider something like 
(const_int 32768) on a 32bit host.


In modes wider than HImode is does exactly what you'd expect.  But if 
you tried to use it in a HImode context, then it's actually invalid as 
it needs to be sign extended resulting in (const_int -32768).  That's 
what gen_int_mode handles for you -- you provide a mode for the context 
in which the constant will be used and you get back suitable RTL.


This actually mimicks what some targets do in hardware for 32 bit 
objects being held in 64bit registers.


Anyway, that's the background.  It is highly likely there are bugs where 
we fail to use gen_int_mode when we should or in some tests for when an 
optimization may or may not be applicable.  So I wouldn't be surprised 
at all if you were to grub around and find cases that aren't properly 
handled -- particularly of the missed-optimization variety.






Independent of why such a normal form was chosen, this patch restores
the normal form and solves the bootstrap problem for Loongarch.
Digging it out and looking at it momentarily.  It just keep falling off 
the end of the TODO list day-to-day.  Sorry for that, particularly since 
you're probably getting bugged by folks looking to restore builds/test 
results.


jeff