Re: types in GIMPLE IR

2023-06-29 Thread Andrew Pinski via Gcc
On Thu, Jun 29, 2023 at 12:10 PM Krister Walfridsson via Gcc
 wrote:
>
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
> > IIRC we have some simplification rules that turn bit operations into
> > arithmetics.  Arithmetic is allowed if it keeps the values inside
> > [-1,0] for signed bools or [0, 1] for unsigned bools.
> >
> >> I have now verified that all cases seems to be just one operation of this
> >> form (where _127 has the value 0 or 1), so it cannot construct values
> >> such as 42. But the wide signed Boolean can have the three different
> >> values 1, 0, and -1, which I still think is at least one too many. :)
> >
> > Yeah, I'd be interested in a testcase that shows this behavior.
>
> I created PR 110487 with one example.
>
>
> >> I'll update my tool to complain if the value is outside the range [-1, 1].
>
> It is likely that all issues I have seen so far are due to PR 110487, so
> I'll keep the current behavior that complains if the value is outside the
> range [-1, 0].

Yes there are many similar to this all over GCC's folding.
In this case checking TYPE_PRECISION as described in the match.pd is
not even the right approach.
The whole TYPE_PRECISION on boolean types is definitely a big can of worms.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102622 is related but
that was signed boolean:1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106053 is another related case.

For this weekend, I am going to audit some of the match patterns for
these issues.


>
> /Krister


Re: types in GIMPLE IR

2023-06-29 Thread Krister Walfridsson via Gcc

On Thu, 29 Jun 2023, Richard Biener wrote:


IIRC we have some simplification rules that turn bit operations into
arithmetics.  Arithmetic is allowed if it keeps the values inside
[-1,0] for signed bools or [0, 1] for unsigned bools.


I have now verified that all cases seems to be just one operation of this
form (where _127 has the value 0 or 1), so it cannot construct values
such as 42. But the wide signed Boolean can have the three different
values 1, 0, and -1, which I still think is at least one too many. :)


Yeah, I'd be interested in a testcase that shows this behavior.


I created PR 110487 with one example.



I'll update my tool to complain if the value is outside the range [-1, 1].


It is likely that all issues I have seen so far are due to PR 110487, so 
I'll keep the current behavior that complains if the value is outside the 
range [-1, 0].


   /Krister


Re: types in GIMPLE IR

2023-06-29 Thread Richard Biener via Gcc
On Thu, Jun 29, 2023 at 2:06 PM Krister Walfridsson
 wrote:
>
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
> > The thing with signed bools is that the two relevant values are -1 (true)
> > and 0 (false), those are used for vector bool components where we also
> > need them to be of wider type (32bits in this case).
>
> My main confusion comes from seeing IR doing arithmetic such as
>
> _127;
> _169;
>...
>_169 = _127 + -1;
>
> or
>
> _127;
> _169;
>...
>_169 = -_127;
>
> and it was unclear to me what kind of arithmetic is allowed.

IIRC we have some simplification rules that turn bit operations into
arithmetics.  Arithmetic is allowed if it keeps the values inside
[-1,0] for signed bools or [0, 1] for unsigned bools.

> I have now verified that all cases seems to be just one operation of this
> form (where _127 has the value 0 or 1), so it cannot construct values
> such as 42. But the wide signed Boolean can have the three different
> values 1, 0, and -1, which I still think is at least one too many. :)

Yeah, I'd be interested in a testcase that shows this behavior.

Richard.


> I'll update my tool to complain if the value is outside the range [-1, 1].
>
>/Krister


Re: types in GIMPLE IR

2023-06-29 Thread Michael Matz via Gcc
Hello,

On Thu, 29 Jun 2023, Krister Walfridsson wrote:

> > The thing with signed bools is that the two relevant values are -1 (true)
> > and 0 (false), those are used for vector bool components where we also
> > need them to be of wider type (32bits in this case).
> 
> My main confusion comes from seeing IR doing arithmetic such as
> 
>_127;
>_169;
>   ...
>   _169 = _127 + -1;
> 
> or
> 
>_127;
>_169;
>   ...
>   _169 = -_127;
> 
> and it was unclear to me what kind of arithmetic is allowed.
> 
> I have now verified that all cases seems to be just one operation of this form
> (where _127 has the value 0 or 1), so it cannot construct values such as 42.
> But the wide signed Boolean can have the three different values 1, 0, and -1,
> which I still think is at least one too many. :)

It definitely is.  For signed bool it should be -1 and 0, for unsigned 
bool 1 and 0.  And of course, arithmetic on bools is always dubious, that  
should all be logical operations.  Modulo-arithmetic (mod 2) could be 
made to work, but then we would have to give up the idea of signed bools 
and always use conversions to signed int to get a bitmaks of all-ones.  
And as mod-2-arithmetic is equivalent to logical ops it seems a bit futile 
to go that way.

Of course, enforcing this all might lead to a surprising heap of errors, 
but one has to start somewhere, so ...

> I'll update my tool to complain if the value is outside the range [-1, 
> 1].

... maybe not do that, at least optionally, that maybe somewhen someone 
can look into fixing that all up? :-)  -fdubious-bools?


Ciao,
Michael.


Re: types in GIMPLE IR

2023-06-29 Thread Krister Walfridsson via Gcc

On Thu, 29 Jun 2023, Richard Biener wrote:


The thing with signed bools is that the two relevant values are -1 (true)
and 0 (false), those are used for vector bool components where we also
need them to be of wider type (32bits in this case).


My main confusion comes from seeing IR doing arithmetic such as

   _127;
   _169;
  ...
  _169 = _127 + -1;

or

   _127;
   _169;
  ...
  _169 = -_127;

and it was unclear to me what kind of arithmetic is allowed.

I have now verified that all cases seems to be just one operation of this 
form (where _127 has the value 0 or 1), so it cannot construct values 
such as 42. But the wide signed Boolean can have the three different 
values 1, 0, and -1, which I still think is at least one too many. :)


I'll update my tool to complain if the value is outside the range [-1, 1].

  /Krister


Re: types in GIMPLE IR

2023-06-29 Thread Richard Biener via Gcc
On Wed, Jun 28, 2023 at 5:47 PM Michael Matz via Gcc  wrote:
>
> Hello,
>
> On Wed, 28 Jun 2023, Krister Walfridsson via Gcc wrote:
>
> > Type safety
> > ---
> > Some transformations treat 1-bit types as a synonym of _Bool and mix the 
> > types
> > in expressions, such as:
> >
> >_2;
> >   _Bool _3;
> >   _Bool _4;
> >   ...
> >   _4 = _2 ^ _3;
> >
> > and similarly mixing _Bool and enum
> >
> >   enum E:bool { E0, E1 };
> >
> > in one operation.
> >
> > I had expected this to be invalid... What are the type safety rules in the
> > GIMPLE IR?
>
> Type safety in gimple is defined in terms of type compatiblity, with
> _many_ exceptions for specific types of statements.  Generally stuff is
> verified in verify_gimple_seq., in this case of a binary assign statement
> in verify_gimple_assign_binary.  As you can see there the normal rules for
> bread-and-butter binary assigns is simply that RHS, LHS1 and LHS2 must
> all be type-compatible.
>
> T1 and T2 are compatible if conversions from T1 to T2 are useless and
> conversion from T2 to T1 are also useless.  (types_compatible_p)  The meat
> for that is all in gimple-expr.cc:useless_type_conversion_p.  For this
> specific case again we have:
>
>   /* Preserve conversions to/from BOOLEAN_TYPE if types are not
>  of precision one.  */
>   if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
>!= (TREE_CODE (outer_type) == BOOLEAN_TYPE))
>   && TYPE_PRECISION (outer_type) != 1)
> return false;
>
> So, yes, booleans and 1-bit types can be compatible (under certain other
> conditions, see the function).
>
> > Somewhat related, gcc.c-torture/compile/pr96796.c performs a 
> > VIEW_CONVERT_EXPR
> > from
> >
> >   struct S1 {
> > long f3;
> > char f4;
> >   } g_3_4;
> >
> > to an int
> >
> >   p_51_9 = VIEW_CONVERT_EXPR(g_3_4);
> >
> > That must be wrong?
>
> VIEW_CONVERT_EXPR is _very_ generous.  See
> verify_types_in_gimple_reference:

Yep.  In general these cases should rather use a BIT_FIELD_REF to select
a same sized subpart and only then do the rvalue conversion.  But as Micha says
below making the IL stricter isn't an easy task.

>   if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
> {
>   /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check
>  that their operand is not a register an invariant when
>  requiring an lvalue (this usually means there is a SRA or IPA-SRA
>  bug).  Otherwise there is nothing to verify, gross mismatches at
>  most invoke undefined behavior.  */
>   if (require_lvalue
>   && (is_gimple_reg (op) || is_gimple_min_invariant (op)))
> {
>   error ("conversion of %qs on the left hand side of %qs",
>  get_tree_code_name (TREE_CODE (op)), code_name);
>   debug_generic_stmt (expr);
>   return true;
> }
>   else if (is_gimple_reg (op)
>&& TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE 
> (op)))
> {
>   error ("conversion of register to a different size in %qs",
>  code_name);
>   debug_generic_stmt (expr);
>   return true;
> }
> }
>
> Here the operand is not a register (but a global memory object), so
> everything goes.
>
> It should be said that over the years gimples type system became stricter
> and stricter, but it started as mostly everything-goes, so making it
> stricter is a bumpy road that isn't fully travelled yet, because checking
> types often results in testcase regressions :-)
>
> > Semantics of 
> > 
> > "Wide" Booleans, such as , seems to allow more values 
> > than
> > 0 and 1. For example, I've seen some IR operations like:
> >
> >   _66 = _16 ? _Literal () -1 : 0;
> >
> > But I guess there must be some semantic difference between
> >  and a 32-bit int, otherwise the wide Boolean type
> > would not be needed... So what are the difference?
>
> See above, normally conversions to booleans that are wider than 1 bit are
> _not_ useless (because they require booleanization to true/false).  In the
> above case the not-useless cast is within a COND_EXPR, so it's quite
> possible that the gimplifier didn't look hard enough to split this out
> into a proper conversion statement.  (The verifier doesn't look inside
> the expressions of the COND_EXPR, so also doesn't catch this one)

Note the above is GIMPLE FE syntax for a constant of a specific type,
a regular dump would just show _16 ? -1 : 0;

The thing with signed bools is that the two relevant values are -1 (true)
and 0 (false), those are used for vector bool components where we also
need them to be of wider type (32bits in this case).

Richard.

> If that turns out to be true and the above still happens when -1 is
> replaced by (say) 42, then it might be possible to construct a
> wrong-code testcase based on 

Re: types in GIMPLE IR

2023-06-28 Thread Michael Matz via Gcc
Hello,

On Wed, 28 Jun 2023, Krister Walfridsson via Gcc wrote:

> Type safety
> ---
> Some transformations treat 1-bit types as a synonym of _Bool and mix the types
> in expressions, such as:
> 
>_2;
>   _Bool _3;
>   _Bool _4;
>   ...
>   _4 = _2 ^ _3;
> 
> and similarly mixing _Bool and enum
> 
>   enum E:bool { E0, E1 };
> 
> in one operation.
> 
> I had expected this to be invalid... What are the type safety rules in the
> GIMPLE IR?

Type safety in gimple is defined in terms of type compatiblity, with 
_many_ exceptions for specific types of statements.  Generally stuff is 
verified in verify_gimple_seq., in this case of a binary assign statement 
in verify_gimple_assign_binary.  As you can see there the normal rules for 
bread-and-butter binary assigns is simply that RHS, LHS1 and LHS2 must 
all be type-compatible.

T1 and T2 are compatible if conversions from T1 to T2 are useless and 
conversion from T2 to T1 are also useless.  (types_compatible_p)  The meat 
for that is all in gimple-expr.cc:useless_type_conversion_p.  For this 
specific case again we have:

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

So, yes, booleans and 1-bit types can be compatible (under certain other 
conditions, see the function).

> Somewhat related, gcc.c-torture/compile/pr96796.c performs a VIEW_CONVERT_EXPR
> from
> 
>   struct S1 {
> long f3;
> char f4;
>   } g_3_4;
> 
> to an int
> 
>   p_51_9 = VIEW_CONVERT_EXPR(g_3_4);
> 
> That must be wrong?

VIEW_CONVERT_EXPR is _very_ generous.  See 
verify_types_in_gimple_reference: 

  if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
{
  /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check
 that their operand is not a register an invariant when
 requiring an lvalue (this usually means there is a SRA or IPA-SRA
 bug).  Otherwise there is nothing to verify, gross mismatches at
 most invoke undefined behavior.  */
  if (require_lvalue
  && (is_gimple_reg (op) || is_gimple_min_invariant (op)))
{
  error ("conversion of %qs on the left hand side of %qs",
 get_tree_code_name (TREE_CODE (op)), code_name);
  debug_generic_stmt (expr);
  return true;
}
  else if (is_gimple_reg (op)
   && TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE 
(op)))
{
  error ("conversion of register to a different size in %qs",
 code_name);
  debug_generic_stmt (expr);
  return true;
}
}

Here the operand is not a register (but a global memory object), so 
everything goes.

It should be said that over the years gimples type system became stricter 
and stricter, but it started as mostly everything-goes, so making it 
stricter is a bumpy road that isn't fully travelled yet, because checking 
types often results in testcase regressions :-)

> Semantics of 
> 
> "Wide" Booleans, such as , seems to allow more values than
> 0 and 1. For example, I've seen some IR operations like:
> 
>   _66 = _16 ? _Literal () -1 : 0;
> 
> But I guess there must be some semantic difference between 
>  and a 32-bit int, otherwise the wide Boolean type 
> would not be needed... So what are the difference?

See above, normally conversions to booleans that are wider than 1 bit are 
_not_ useless (because they require booleanization to true/false).  In the 
above case the not-useless cast is within a COND_EXPR, so it's quite 
possible that the gimplifier didn't look hard enough to split this out 
into a proper conversion statement.  (The verifier doesn't look inside 
the expressions of the COND_EXPR, so also doesn't catch this one)

If that turns out to be true and the above still happens when -1 is 
replaced by (say) 42, then it might be possible to construct a 
wrong-code testcase based on the fact that _66 as boolean should contain 
only two observable values (true/false), but could then contain 42.  OTOH, 
it might also not be possible to create such testcase, namely when GCC is 
internally too conservative in handling wide bools :-)  In that case we 
probably have a missed optimization somewhere, which when implemented 
would enable construction of such wrong-code testcase ;)

(I'm saying that -1 should be replaced by something else for a wrong-code 
testcase, because -1 is special and could justifieably be special-cased in 
GCC: -1 is the proper arithmetic value for a signed boolean that is 
"true").


Ciao,
Michael.