Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-10-28 Thread Richard Biener
On October 28, 2017 2:53:56 PM GMT+02:00, Marc Glisse  
wrote:
>
>I am sending the new version of the patch in a separate email, to make
>it 
>more visible, and only replying to a few points here.
>
>On Thu, 19 Oct 2017, Richard Biener wrote:
>
>> On Mon, Oct 9, 2017 at 12:55 PM, Marc Glisse 
>wrote:
>>> On Mon, 3 Jul 2017, Richard Biener wrote:
>>>
 On Sat, 1 Jul 2017, Marc Glisse wrote:

> I wrote a quick prototype to see what the fallout would look like.
> Surprisingly, it actually passes bootstrap+testsuite on ppc64el
>with all
> languages with no regression. Sure, it is probably not a complete
> migration, there are likely a few places still converting to
>ptrdiff_t
> to perform a regular subtraction, but this seems to indicate that
>the
> work isn't as bad as using a signed type in pointer_plus_expr for
> instance.


 The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
 from POINTER_MINUS_EXPR in some cases I guess).

 The tree code needs documenting in tree.def and generic.texi.

 Otherwise ok(*).

 Thanks,
 Richard.

 (*) ok, just kidding -- or maybe not
>>>
>>>
>>> I updated the prototype a bit. Some things I noticed:
>>>
>>> - the C front-end has support for address spaces that seems to imply
>that
>>> pointer subtraction (and division by the size) may be done in a type
>larger
>>> than ptrdiff_t. Currently, it generates
>>> (ptrdiff_t)(((inttype)q-(inttype)p)/size) for q-p where inttype is
>some type
>>> potentially larger than ptrdiff_t.
>>
>> It uses a ptrdiff_t corresponding to the respective address space,
>yes.
>> That we use sizetype elsewhere unconditionally is a bug :/
>>
>>> I am thus generating a POINTER_DIFF_EXPR
>>> with that type, while I was originally hoping its type would always
>be
>>> ptrdiff_t. It complicates code and I am not sure I didn't break
>address
>>> spaces anyway... (expansion likely needs to do the inttype dance)
>>
>> I think that's fine.  Ideally targets would provide a type to use for
>each
>> respective address space given we have targets that have sizetype
>smaller
>> than ptr_mode.
>>
>>> Are ptrdiff_t (what POINTER_DIFF_EXPR should return) and size_t
>(what
>>> POINTER_PLUS_EXPR takes as second argument) always the same type
>>> signed/unsigned?
>>
>> POINTER_PLUS_EXPR takes 'sizetype', not size_t.
>
>Ah, yes, that's the one I meant...
>
>> So the answer is clearly no.  And yes, that's ugly and broken and
>should be fixed.
>>
>>> Counter-examples: m32c (when !TARGET_A16), visium, darwin,
>>> powerpcspe, s390, vms... and it isn't even always the same that is
>bigger
>>> than the other. That's quite messy.
>>
>> Eh.  Yeah, targets are free to choose 'sizetype' and they do so for
>> efficiency.  IMHO we should get rid of this "feature".
>>
>>> - I had to use @@ in match.pd, not for constants, but because in
>GENERIC we
>>> sometimes ended up with pointers where operand_equal_p said yes but
>>> types_match disagreed.
>>>
>>> - It currently regresses 2 go tests: net/http runtime/debug
>>
>> Those are flaky for me and fail sometimes and sometimes not.
>
>Yes, I noticed that (there are 1 or 2 others in the go testsuite).
>
>> +@item POINTER_DIFF_EXPR
>> +This node represents pointer subtraction.  The two operands always
>> +have pointer/reference type.  The second operand is always an
>unsigned
>> +integer type compatible with sizetype.  It returns a signed integer.
>>
>> the 2nd sentence looks bogusly copied.
>
>Oups, removed.
>
>>
>> +  /* FIXME.  */
>> +  if (code == POINTER_DIFF_EXPR)
>> +   return int_const_binop (MINUS_EXPR,
>> +   fold_convert (ptrdiff_type_node,
>arg1),
>> +   fold_convert (ptrdiff_type_node,
>arg2));
>>
>>  wide_int_to_tree (type, wi::to_widest (arg1) - wi::to_widest
>(arg2));
>
>Before your reply, I wrote something similar using offset_int instead
>of 
>widest_int (and handling overflow, mostly for documentation purposes).
>I 
>wasn't sure which one to pick, I can change to widest_int if you think
>it 
>is better...

Offset_int is better. 

>> +case POINTER_DIFF_EXPR:
>> +  {
>> +   if (!POINTER_TYPE_P (rhs1_type)
>> +   || !POINTER_TYPE_P (rhs2_type)
>> +   // || !useless_type_conversion_p (rhs2_type, rhs1_type)
>>
>> types_compatible_p (rhs1_type, rhs2_type)?
>
>Makes sense.
>
>> +   // || !useless_type_conversion_p (ptrdiff_type_node,
>lhs_type))
>> +   || TREE_CODE (lhs_type) != INTEGER_TYPE
>> +   || TYPE_UNSIGNED (lhs_type))
>> + {
>> +   error ("type mismatch in pointer diff expression");
>> +   debug_generic_stmt (lhs_type);
>> +   debug_generic_stmt (rhs1_type);
>> +   debug_generic_stmt (rhs2_type);
>> +   return true;
>>
>> there's also verify_expr which could want adjustment for newly
>created
>> tree kinds.
>
>ok.
>

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-10-28 Thread Marc Glisse


I am sending the new version of the patch in a separate email, to make it 
more visible, and only replying to a few points here.


On Thu, 19 Oct 2017, Richard Biener wrote:


On Mon, Oct 9, 2017 at 12:55 PM, Marc Glisse  wrote:

On Mon, 3 Jul 2017, Richard Biener wrote:


On Sat, 1 Jul 2017, Marc Glisse wrote:


I wrote a quick prototype to see what the fallout would look like.
Surprisingly, it actually passes bootstrap+testsuite on ppc64el with all
languages with no regression. Sure, it is probably not a complete
migration, there are likely a few places still converting to ptrdiff_t
to perform a regular subtraction, but this seems to indicate that the
work isn't as bad as using a signed type in pointer_plus_expr for
instance.



The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
from POINTER_MINUS_EXPR in some cases I guess).

The tree code needs documenting in tree.def and generic.texi.

Otherwise ok(*).

Thanks,
Richard.

(*) ok, just kidding -- or maybe not



I updated the prototype a bit. Some things I noticed:

- the C front-end has support for address spaces that seems to imply that
pointer subtraction (and division by the size) may be done in a type larger
than ptrdiff_t. Currently, it generates
(ptrdiff_t)(((inttype)q-(inttype)p)/size) for q-p where inttype is some type
potentially larger than ptrdiff_t.


It uses a ptrdiff_t corresponding to the respective address space, yes.
That we use sizetype elsewhere unconditionally is a bug :/


I am thus generating a POINTER_DIFF_EXPR
with that type, while I was originally hoping its type would always be
ptrdiff_t. It complicates code and I am not sure I didn't break address
spaces anyway... (expansion likely needs to do the inttype dance)


I think that's fine.  Ideally targets would provide a type to use for each
respective address space given we have targets that have sizetype smaller
than ptr_mode.


Are ptrdiff_t (what POINTER_DIFF_EXPR should return) and size_t (what
POINTER_PLUS_EXPR takes as second argument) always the same type
signed/unsigned?


POINTER_PLUS_EXPR takes 'sizetype', not size_t.


Ah, yes, that's the one I meant...


So the answer is clearly no.  And yes, that's ugly and broken and should be 
fixed.


Counter-examples: m32c (when !TARGET_A16), visium, darwin,
powerpcspe, s390, vms... and it isn't even always the same that is bigger
than the other. That's quite messy.


Eh.  Yeah, targets are free to choose 'sizetype' and they do so for
efficiency.  IMHO we should get rid of this "feature".


- I had to use @@ in match.pd, not for constants, but because in GENERIC we
sometimes ended up with pointers where operand_equal_p said yes but
types_match disagreed.

- It currently regresses 2 go tests: net/http runtime/debug


Those are flaky for me and fail sometimes and sometimes not.


Yes, I noticed that (there are 1 or 2 others in the go testsuite).


+@item POINTER_DIFF_EXPR
+This node represents pointer subtraction.  The two operands always
+have pointer/reference type.  The second operand is always an unsigned
+integer type compatible with sizetype.  It returns a signed integer.

the 2nd sentence looks bogusly copied.


Oups, removed.



+  /* FIXME.  */
+  if (code == POINTER_DIFF_EXPR)
+   return int_const_binop (MINUS_EXPR,
+   fold_convert (ptrdiff_type_node, arg1),
+   fold_convert (ptrdiff_type_node, arg2));

 wide_int_to_tree (type, wi::to_widest (arg1) - wi::to_widest (arg2));


Before your reply, I wrote something similar using offset_int instead of 
widest_int (and handling overflow, mostly for documentation purposes). I 
wasn't sure which one to pick, I can change to widest_int if you think it 
is better...



+case POINTER_DIFF_EXPR:
+  {
+   if (!POINTER_TYPE_P (rhs1_type)
+   || !POINTER_TYPE_P (rhs2_type)
+   // || !useless_type_conversion_p (rhs2_type, rhs1_type)

types_compatible_p (rhs1_type, rhs2_type)?


Makes sense.


+   // || !useless_type_conversion_p (ptrdiff_type_node, lhs_type))
+   || TREE_CODE (lhs_type) != INTEGER_TYPE
+   || TYPE_UNSIGNED (lhs_type))
+ {
+   error ("type mismatch in pointer diff expression");
+   debug_generic_stmt (lhs_type);
+   debug_generic_stmt (rhs1_type);
+   debug_generic_stmt (rhs2_type);
+   return true;

there's also verify_expr which could want adjustment for newly created
tree kinds.


ok.

So if the precision of the result is larger than that of the pointer 
operands we sign-extend the result, right?  So the subtraction is 
performed in precision of the pointer operands and then 
sign-extended/truncated to the result type? Which means it is _not_ a 
widening subtraction to get around the half-address-space issue.  The 
tree.def documentation should reflect this semantic detail.  Not sure if 
the RTL expansion follows it.


I actually have no idea what expansion 

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-10-19 Thread Richard Biener
On Mon, Oct 9, 2017 at 12:55 PM, Marc Glisse  wrote:
> On Mon, 3 Jul 2017, Richard Biener wrote:
>
>> On Sat, 1 Jul 2017, Marc Glisse wrote:
>>
>>> I wrote a quick prototype to see what the fallout would look like.
>>> Surprisingly, it actually passes bootstrap+testsuite on ppc64el with all
>>> languages with no regression. Sure, it is probably not a complete
>>> migration, there are likely a few places still converting to ptrdiff_t
>>> to perform a regular subtraction, but this seems to indicate that the
>>> work isn't as bad as using a signed type in pointer_plus_expr for
>>> instance.
>>
>>
>> The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
>> from POINTER_MINUS_EXPR in some cases I guess).
>>
>> The tree code needs documenting in tree.def and generic.texi.
>>
>> Otherwise ok(*).
>>
>> Thanks,
>> Richard.
>>
>> (*) ok, just kidding -- or maybe not
>
>
> I updated the prototype a bit. Some things I noticed:
>
> - the C front-end has support for address spaces that seems to imply that
> pointer subtraction (and division by the size) may be done in a type larger
> than ptrdiff_t. Currently, it generates
> (ptrdiff_t)(((inttype)q-(inttype)p)/size) for q-p where inttype is some type
> potentially larger than ptrdiff_t.

It uses a ptrdiff_t corresponding to the respective address space, yes.
That we use sizetype elsewhere unconditionally is a bug :/

 I am thus generating a POINTER_DIFF_EXPR
> with that type, while I was originally hoping its type would always be
> ptrdiff_t. It complicates code and I am not sure I didn't break address
> spaces anyway... (expansion likely needs to do the inttype dance)

I think that's fine.  Ideally targets would provide a type to use for each
respective address space given we have targets that have sizetype smaller
than ptr_mode.

> Are ptrdiff_t (what POINTER_DIFF_EXPR should return) and size_t (what
> POINTER_PLUS_EXPR takes as second argument) always the same type
> signed/unsigned?

POINTER_PLUS_EXPR takes 'sizetype', not size_t.  So the answer is clearly
no.  And yes, that's ugly and broken and should be fixed.

> Counter-examples: m32c (when !TARGET_A16), visium, darwin,
> powerpcspe, s390, vms... and it isn't even always the same that is bigger
> than the other. That's quite messy.

Eh.  Yeah, targets are free to choose 'sizetype' and they do so for
efficiency.  IMHO we should get rid of this "feature".

> - I had to use @@ in match.pd, not for constants, but because in GENERIC we
> sometimes ended up with pointers where operand_equal_p said yes but
> types_match disagreed.
>
> - It currently regresses 2 go tests: net/http runtime/debug

Those are flaky for me and fail sometimes and sometimes not.

+@item POINTER_DIFF_EXPR
+This node represents pointer subtraction.  The two operands always
+have pointer/reference type.  The second operand is always an unsigned
+integer type compatible with sizetype.  It returns a signed integer.

the 2nd sentence looks bogusly copied.

+  /* FIXME.  */
+  if (code == POINTER_DIFF_EXPR)
+   return int_const_binop (MINUS_EXPR,
+   fold_convert (ptrdiff_type_node, arg1),
+   fold_convert (ptrdiff_type_node, arg2));

  wide_int_to_tree (type, wi::to_widest (arg1) - wi::to_widest (arg2));

?

+case POINTER_DIFF_EXPR:
+  {
+   if (!POINTER_TYPE_P (rhs1_type)
+   || !POINTER_TYPE_P (rhs2_type)
+   // || !useless_type_conversion_p (rhs2_type, rhs1_type)

types_compatible_p (rhs1_type, rhs2_type)?

+   // || !useless_type_conversion_p (ptrdiff_type_node, lhs_type))
+   || TREE_CODE (lhs_type) != INTEGER_TYPE
+   || TYPE_UNSIGNED (lhs_type))
+ {
+   error ("type mismatch in pointer diff expression");
+   debug_generic_stmt (lhs_type);
+   debug_generic_stmt (rhs1_type);
+   debug_generic_stmt (rhs2_type);
+   return true;

there's also verify_expr which could want adjustment for newly created
tree kinds.

So if the precision of the result is larger than that of the pointer operands
we sign-extend the result, right?  So the subtraction is performed in precision
of the pointer operands and then sign-extended/truncated to the result type?
Which means it is _not_ a widening subtraction to get around the
half-address-space
issue.  The tree.def documentation should reflect this semantic
detail.  Not sure
if the RTL expansion follows it.

I think that we'd ideally have a single legal INTEGER_TYPE precision
per pointer type precision and that those precisions should match...
we don't have to follow the mistakes of POINTER_PLUS_EXPR.

So ... above verify TYPE_PRECISION (rhs1_type) == TYPE_PRECISION (lhs_type)?
Some targets have 24bit ptr_mode but no 24bit integer type which means the
FE likely chooses 32bit int for the difference computation.  But the middle-end
should be free to create a 24bit precision type with SImode.

Otherwise as said 

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-10-09 Thread Marc Glisse

On Mon, 3 Jul 2017, Richard Biener wrote:


On Sat, 1 Jul 2017, Marc Glisse wrote:


I wrote a quick prototype to see what the fallout would look like.
Surprisingly, it actually passes bootstrap+testsuite on ppc64el with all
languages with no regression. Sure, it is probably not a complete
migration, there are likely a few places still converting to ptrdiff_t
to perform a regular subtraction, but this seems to indicate that the
work isn't as bad as using a signed type in pointer_plus_expr for
instance.


The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
from POINTER_MINUS_EXPR in some cases I guess).

The tree code needs documenting in tree.def and generic.texi.

Otherwise ok(*).

Thanks,
Richard.

(*) ok, just kidding -- or maybe not


I updated the prototype a bit. Some things I noticed:

- the C front-end has support for address spaces that seems to imply that 
pointer subtraction (and division by the size) may be done in a type 
larger than ptrdiff_t. Currently, it generates 
(ptrdiff_t)(((inttype)q-(inttype)p)/size) for q-p where inttype is some 
type potentially larger than ptrdiff_t. I am thus generating a 
POINTER_DIFF_EXPR with that type, while I was originally hoping its type 
would always be ptrdiff_t. It complicates code and I am not sure I didn't 
break address spaces anyway... (expansion likely needs to do the inttype 
dance)


Are ptrdiff_t (what POINTER_DIFF_EXPR should return) and size_t (what 
POINTER_PLUS_EXPR takes as second argument) always the same type 
signed/unsigned? Counter-examples: m32c (when !TARGET_A16), visium, 
darwin, powerpcspe, s390, vms... and it isn't even always the same that is 
bigger than the other. That's quite messy.


- I had to use @@ in match.pd, not for constants, but because in GENERIC 
we sometimes ended up with pointers where operand_equal_p said yes but 
types_match disagreed.


- It currently regresses 2 go tests: net/http runtime/debug

--
Marc GlisseIndex: gcc/c/c-fold.c
===
--- gcc/c/c-fold.c	(revision 253536)
+++ gcc/c/c-fold.c	(working copy)
@@ -238,20 +238,21 @@ c_fully_fold_internal (tree expr, bool i
 case COMPOUND_EXPR:
 case MODIFY_EXPR:
 case PREDECREMENT_EXPR:
 case PREINCREMENT_EXPR:
 case POSTDECREMENT_EXPR:
 case POSTINCREMENT_EXPR:
 case PLUS_EXPR:
 case MINUS_EXPR:
 case MULT_EXPR:
 case POINTER_PLUS_EXPR:
+case POINTER_DIFF_EXPR:
 case TRUNC_DIV_EXPR:
 case CEIL_DIV_EXPR:
 case FLOOR_DIV_EXPR:
 case TRUNC_MOD_EXPR:
 case RDIV_EXPR:
 case EXACT_DIV_EXPR:
 case LSHIFT_EXPR:
 case RSHIFT_EXPR:
 case BIT_IOR_EXPR:
 case BIT_XOR_EXPR:
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c	(revision 253536)
+++ gcc/c/c-typeck.c	(working copy)
@@ -3772,21 +3772,21 @@ parser_build_binary_op (location_t locat
   && TREE_CODE (type2) == ENUMERAL_TYPE
   && TYPE_MAIN_VARIANT (type1) != TYPE_MAIN_VARIANT (type2))
 warning_at (location, OPT_Wenum_compare,
 		"comparison between %qT and %qT",
 		type1, type2);
 
   return result;
 }
 
 /* Return a tree for the difference of pointers OP0 and OP1.
-   The resulting tree has type int.  */
+   The resulting tree has type ptrdiff_t.  */
 
 static tree
 pointer_diff (location_t loc, tree op0, tree op1)
 {
   tree restype = ptrdiff_type_node;
   tree result, inttype;
 
   addr_space_t as0 = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (op0)));
   addr_space_t as1 = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (op1)));
   tree target_type = TREE_TYPE (TREE_TYPE (op0));
@@ -3824,23 +3824,21 @@ pointer_diff (location_t loc, tree op0,
 	 "pointer of type % used in subtraction");
   if (TREE_CODE (target_type) == FUNCTION_TYPE)
 pedwarn (loc, OPT_Wpointer_arith,
 	 "pointer to a function used in subtraction");
 
   /* First do the subtraction as integers;
  then drop through to build the divide operator.
  Do not do default conversions on the minus operator
  in case restype is a short type.  */
 
-  op0 = build_binary_op (loc,
-			 MINUS_EXPR, convert (inttype, op0),
-			 convert (inttype, op1), false);
+  op0 = build2_loc (loc, POINTER_DIFF_EXPR, inttype, op0, op1);
   /* This generates an error if op1 is pointer to incomplete type.  */
   if (!COMPLETE_OR_VOID_TYPE_P (TREE_TYPE (TREE_TYPE (orig_op1
 error_at (loc, "arithmetic on pointer to an incomplete type");
 
   op1 = c_size_in_bytes (target_type);
 
   if (pointer_to_zero_sized_aggr_p (TREE_TYPE (orig_op1)))
 error_at (loc, "arithmetic on pointer to an empty aggregate");
 
   /* Divide by the size, in easiest possible way.  */
Index: gcc/c-family/c-pretty-print.c
===
--- gcc/c-family/c-pretty-print.c	(revision 253536)
+++ gcc/c-family/c-pretty-print.c	(working copy)
@@ -1869,20 +1869,21 @@ c_pretty_printer::multiplicative_express
  

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-07-03 Thread Richard Biener
On Sat, 1 Jul 2017, Marc Glisse wrote:

> On Thu, 22 Jun 2017, Richard Biener wrote:
> 
> > On Thu, 22 Jun 2017, Marc Glisse wrote:
> > 
> > > On Thu, 22 Jun 2017, Richard Biener wrote:
> > > 
> > > > > If we consider pointers as unsigned, with a subtraction that has a
> > > > > signed
> > > > > result with the constraint that overflow is undefined, we cannot model
> > > > > that
> > > > > optimally with just the usual signed/unsigned operations, so I am in
> > > > > favor
> > > > > of
> > > > > POINTER_DIFF, at least in the long run (together with having a signed
> > > > > second
> > > > > argument for POINTER_PLUS, etc). For 64-bit platforms it might have
> > > > > been
> > > > > easier to declare that the upper half (3/4 ?) of the address space
> > > > > doesn't
> > > > > exist...
> > > > 
> > > > I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree
> > > > code is quite a big job.
> > > 
> > > Yes :-(
> > > It is probably not realistic to introduce it just to avoid a couple
> > > regressions while fixing a bug.
> > > 
> > > > So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce
> > > > ptrdiff_t.  What's the advantage of having this?
> > > 
> > > It represents q-p with one statement instead of 3 (long)q-(long)p or 4
> > > (long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so
> > > (q-p)>0 is equivalent to p > > what (undefined) overflow means for pointers.
> > > 
> > > Of course it is hard to know in advance if that's significant or
> > > negligible, maybe size_t finds its way in too many places anyway.
> > 
> > As with all those experiments ...
> > 
> > Well, if I would sell this as a consultant to somebody I'd estimate
> > 3 man months for this work which realistically means you have to
> > start now otherwise you won't make it this stage 1.
> 
> I wrote a quick prototype to see what the fallout would look like.
> Surprisingly, it actually passes bootstrap+testsuite on ppc64el with all
> languages with no regression. Sure, it is probably not a complete
> migration, there are likely a few places still converting to ptrdiff_t
> to perform a regular subtraction, but this seems to indicate that the
> work isn't as bad as using a signed type in pointer_plus_expr for
> instance.

The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
from POINTER_MINUS_EXPR in some cases I guess).

The tree code needs documenting in tree.def and generic.texi.

Otherwise ok(*).

Thanks,
Richard.

(*) ok, just kidding -- or maybe not


Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-07-01 Thread Marc Glisse

On Thu, 22 Jun 2017, Richard Biener wrote:


On Thu, 22 Jun 2017, Marc Glisse wrote:


On Thu, 22 Jun 2017, Richard Biener wrote:


If we consider pointers as unsigned, with a subtraction that has a signed
result with the constraint that overflow is undefined, we cannot model
that
optimally with just the usual signed/unsigned operations, so I am in favor
of
POINTER_DIFF, at least in the long run (together with having a signed
second
argument for POINTER_PLUS, etc). For 64-bit platforms it might have been
easier to declare that the upper half (3/4 ?) of the address space doesn't
exist...


I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree
code is quite a big job.


Yes :-(
It is probably not realistic to introduce it just to avoid a couple
regressions while fixing a bug.


So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce
ptrdiff_t.  What's the advantage of having this?


It represents q-p with one statement instead of 3 (long)q-(long)p or 4
(long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so
(q-p)>0 is equivalent to p

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-22 Thread Richard Biener
On Thu, 22 Jun 2017, Marc Glisse wrote:

> On Thu, 22 Jun 2017, Richard Biener wrote:
> 
> > > If we consider pointers as unsigned, with a subtraction that has a signed
> > > result with the constraint that overflow is undefined, we cannot model
> > > that
> > > optimally with just the usual signed/unsigned operations, so I am in favor
> > > of
> > > POINTER_DIFF, at least in the long run (together with having a signed
> > > second
> > > argument for POINTER_PLUS, etc). For 64-bit platforms it might have been
> > > easier to declare that the upper half (3/4 ?) of the address space doesn't
> > > exist...
> > 
> > I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree
> > code is quite a big job.
> 
> Yes :-(
> It is probably not realistic to introduce it just to avoid a couple
> regressions while fixing a bug.
> 
> > So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce
> > ptrdiff_t.  What's the advantage of having this?
> 
> It represents q-p with one statement instead of 3 (long)q-(long)p or 4
> (long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so
> (q-p)>0 is equivalent to p what (undefined) overflow means for pointers.
> 
> Of course it is hard to know in advance if that's significant or
> negligible, maybe size_t finds its way in too many places anyway.

As with all those experiments ...

Well, if I would sell this as a consultant to somebody I'd estimate
3 man months for this work which realistically means you have to
start now otherwise you won't make it this stage 1.

A smaller job would be to make POINTER_PLUS_EXPR take ptrdiff_t
as offset operand.  But the fallout is likely similar.  A lame(?)
half-way transition would allow for both unsigned and signed
ptrdiff_t (note sizetype -> [u]ptrdiff_t is already a transition
for some embedded archs eventually).  Maybe allowing both signed
and unsigned offsets is desirable (you of course get interesting
effects when combining those in GIMPLE where signedness matters).

Richard.

> > And yes, I agree that POINTER_PLUS_EXPR should take
> > ptrdiff_t rather than sizetype offset -- changing one without the
> > other will lead to awkwardness in required patterns involving
> > both like (p - q) + q.
> > 
> > As said, it's a big job with likely all sorts of (testsuite) fallout.
> > 
> > > > The third one is
> > > >if ([b] - [c] != b - c)
> > > >link_error();
> > > > where fold already during generic folding used to be able to cope with
> > > > it,
> > > > but now we have:
> > > > (long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4
> > > > !=
> > > > b - c
> > > > which we don't fold.
> > > 
> > > Once we have this last expression, we have lost, we need to know that the
> > > multiplication cannot overflow for this. When the size multiplications are
> > > done in a signed type in the future (?), it might help.
> > 
> > Not sure where the unsigned multiply comes from -- I guess we fold
> > it inside the cast ...
> 
> We usually do those multiplications in an unsigned type. I experimented
> with changing one such place in
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01641.html , there is
> probably at least another one in the middle-end.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-22 Thread Marc Glisse

On Thu, 22 Jun 2017, Richard Biener wrote:


If we consider pointers as unsigned, with a subtraction that has a signed
result with the constraint that overflow is undefined, we cannot model that
optimally with just the usual signed/unsigned operations, so I am in favor of
POINTER_DIFF, at least in the long run (together with having a signed second
argument for POINTER_PLUS, etc). For 64-bit platforms it might have been
easier to declare that the upper half (3/4 ?) of the address space doesn't
exist...


I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree
code is quite a big job.


Yes :-(
It is probably not realistic to introduce it just to avoid a couple 
regressions while fixing a bug.



So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce
ptrdiff_t.  What's the advantage of having this?


It represents q-p with one statement instead of 3 (long)q-(long)p or 4 
(long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so 
(q-p)>0 is equivalent to p

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-22 Thread Richard Biener
On Wed, 21 Jun 2017, Marc Glisse wrote:

> On Wed, 21 Jun 2017, Jakub Jelinek wrote:
> 
> > So, I wrote following patch to do the subtraction in unsigned
> > type.  It passes bootstrap, but on both x86_64-linux and i686-linux
> > regresses:
> > +FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
> > +FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized
> > "minus_expr"
> > +FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)
> > 
> > E.g. in the first testcase we have in the test:
> > static uintptr_t a =  ((char *)&(char *)&)+((char *)&(char
> > *)&);
> > Without the patch, we ended up with:
> > static uintptr_t a = (uintptr_t) (((long int)  - (long int) ) + ((long
> > int)  - (long int) ));
> > but with the patch with (the negation in signed type sounds like a folding
> > bug), which is too difficult for the initializer_constant_valid_p* handling:
> > (uintptr_t) (((long unsigned int) -(long int)  - (long unsigned int) )
> > + ((long unsigned int)  + (long unsigned int) ));
> > Shall we just xfail that test, or make sure we don't reassociate such
> > subtractions, something different?
> 
> Adding to match.pd a few more simple reassoc transformations (like
> (c-b)+(b-a) for instance) that work for both signed and unsigned is on
> my TODO-list, though that may not be enough. Maybe together with fixing
> whatever produced the negation would suffice?

I guess try to fix the negation and see if that magically fixes things...

> > The second failure is on:
> > int f (long *a, long *b, long *c) {
> >__PTRDIFF_TYPE__ l1 = b - a;
> >__PTRDIFF_TYPE__ l2 = c - a;
> >return l1 < l2;
> > }
> > where without the patch during forwprop2 we optimize it
> > using match.pd:
> > /* X - Z < Y - Z is the same as X < Y when there is no overflow.  */
> > because we had:
> >  b.0_1 = (long int) b_8(D);
> >  a.1_2 = (long int) a_9(D);
> >  _3 = b.0_1 - a.1_2;
> >  c.2_4 = (long int) c_11(D);
> >  a.3_5 = (long int) a_9(D);
> >  _6 = c.2_4 - a.3_5;
> >  _7 = _3 < _6;
> > But with the patch we have:
> >  b.0_1 = (long unsigned int) b_9(D);
> >  a.1_2 = (long unsigned int) a_10(D);
> >  _3 = b.0_1 - a.1_2;
> >  _4 = (long int) _3;
> >  c.2_5 = (long unsigned int) c_11(D);
> >  _6 = c.2_5 - a.1_2;
> >  _7 = (long int) _6;
> >  _8 = _4 < _7;
> > instead.  But that is something we can't generally optimize.
> > So do we need to introduce POINTER_DIFF (where we could still
> > optimize this) or remove the test?  If we rely on largest possible
> > array to be half of the VA size - 1 (i.e. where for x > y both being
> > pointers into the same array x - y > 0), then it is a valid optimization
> > of the 2 pointer subtractions, but it is not a valid optimization on
> > comparison of unsigned subtractions cast to signed type.
> 
> (this testcase was meant as a simpler version of
> vector.size() < vector.capacity() )
> 
> It does indeed seem impossible to do this optimization with the unsigned
> pointer subtraction.

Yep :/  This is the issue with all the non-pointer integer folding
fixes as well -- as soon as we introduce unsigned ops for the fear
of introducing undefined overflow we lose on followup optimization
opportunities.

> If we consider pointers as unsigned, with a subtraction that has a signed
> result with the constraint that overflow is undefined, we cannot model that
> optimally with just the usual signed/unsigned operations, so I am in favor of
> POINTER_DIFF, at least in the long run (together with having a signed second
> argument for POINTER_PLUS, etc). For 64-bit platforms it might have been
> easier to declare that the upper half (3/4 ?) of the address space doesn't
> exist...

I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree
code is quite a big job.  So we'd have POINTER_DIFF_EXPR take two
pointer typed args and produce ptrdiff_t.  What's the advantage of
having this?  And yes, I agree that POINTER_PLUS_EXPR should take
ptrdiff_t rather than sizetype offset -- changing one without the
other will lead to awkwardness in required patterns involving
both like (p - q) + q.

As said, it's a big job with likely all sorts of (testsuite) fallout.

> > The third one is
> >if ([b] - [c] != b - c)
> >link_error();
> > where fold already during generic folding used to be able to cope with it,
> > but now we have:
> > (long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 !=
> > b - c
> > which we don't fold.
> 
> Once we have this last expression, we have lost, we need to know that the
> multiplication cannot overflow for this. When the size multiplications are
> done in a signed type in the future (?), it might help.

Not sure where the unsigned multiply comes from -- I guess we fold
it inside the cast ...

> On the other hand, is this an important optimization? I am surprised we are
> only doing this transformation in generic (so some hack in the front-end could
> still work), it shouldn't be hard to implement some 

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-21 Thread Marc Glisse

On Wed, 21 Jun 2017, Jakub Jelinek wrote:


So, I wrote following patch to do the subtraction in unsigned
type.  It passes bootstrap, but on both x86_64-linux and i686-linux
regresses:
+FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
+FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized "minus_expr"
+FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)

E.g. in the first testcase we have in the test:
static uintptr_t a =  ((char *)&(char *)&)+((char *)&(char *)&);
Without the patch, we ended up with:
static uintptr_t a = (uintptr_t) (((long int)  - (long int) ) + ((long int) 
 - (long int) ));
but with the patch with (the negation in signed type sounds like a folding
bug), which is too difficult for the initializer_constant_valid_p* handling:
(uintptr_t) (((long unsigned int) -(long int)  - (long unsigned int) ) + ((long 
unsigned int)  + (long unsigned int) ));
Shall we just xfail that test, or make sure we don't reassociate such
subtractions, something different?


Adding to match.pd a few more simple reassoc transformations (like
(c-b)+(b-a) for instance) that work for both signed and unsigned is on
my TODO-list, though that may not be enough. Maybe together with fixing
whatever produced the negation would suffice?


The second failure is on:
int f (long *a, long *b, long *c) {
   __PTRDIFF_TYPE__ l1 = b - a;
   __PTRDIFF_TYPE__ l2 = c - a;
   return l1 < l2;
}
where without the patch during forwprop2 we optimize it
using match.pd:
/* X - Z < Y - Z is the same as X < Y when there is no overflow.  */
because we had:
 b.0_1 = (long int) b_8(D);
 a.1_2 = (long int) a_9(D);
 _3 = b.0_1 - a.1_2;
 c.2_4 = (long int) c_11(D);
 a.3_5 = (long int) a_9(D);
 _6 = c.2_4 - a.3_5;
 _7 = _3 < _6;
But with the patch we have:
 b.0_1 = (long unsigned int) b_9(D);
 a.1_2 = (long unsigned int) a_10(D);
 _3 = b.0_1 - a.1_2;
 _4 = (long int) _3;
 c.2_5 = (long unsigned int) c_11(D);
 _6 = c.2_5 - a.1_2;
 _7 = (long int) _6;
 _8 = _4 < _7;
instead.  But that is something we can't generally optimize.
So do we need to introduce POINTER_DIFF (where we could still
optimize this) or remove the test?  If we rely on largest possible
array to be half of the VA size - 1 (i.e. where for x > y both being
pointers into the same array x - y > 0), then it is a valid optimization
of the 2 pointer subtractions, but it is not a valid optimization on
comparison of unsigned subtractions cast to signed type.


(this testcase was meant as a simpler version of
vector.size() < vector.capacity() )

It does indeed seem impossible to do this optimization with the unsigned
pointer subtraction.

If we consider pointers as unsigned, with a subtraction that has a signed 
result with the constraint that overflow is undefined, we cannot model 
that optimally with just the usual signed/unsigned operations, so I am in 
favor of POINTER_DIFF, at least in the long run (together with having a 
signed second argument for POINTER_PLUS, etc). For 64-bit platforms it 
might have been easier to declare that the upper half (3/4 ?) of the 
address space doesn't exist...



The third one is
   if ([b] - [c] != b - c)
   link_error();
where fold already during generic folding used to be able to cope with it,
but now we have:
(long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 != b - 
c
which we don't fold.


Once we have this last expression, we have lost, we need to know that the 
multiplication cannot overflow for this. When the size multiplications are 
done in a signed type in the future (?), it might help.


On the other hand, is this an important optimization? I am surprised we 
are only doing this transformation in generic (so some hack in the 
front-end could still work), it shouldn't be hard to implement some subset 
of fold_addr_of_array_ref_difference in match.pd (it is recursive so a 
complete move may be harder). But that would make your patch break even 
more stuff :-(


--
Marc Glisse


Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-21 Thread Jakub Jelinek
On Wed, Jun 21, 2017 at 04:40:01PM +0200, Jakub Jelinek wrote:
> So, I wrote following patch to do the subtraction in unsigned
> type.  It passes bootstrap, but on both x86_64-linux and i686-linux
> regresses:
> +FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
> +FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized 
> "minus_expr"
> +FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)

Another option is to do what the patch does only when sanitizing and accept
in that case less efficient code and rejection of weird corner case
testcases like the first one.  We risk miscompilation of the pointer
differences, but I haven't managed to come up with a testcase where it would
show (I guess more likely is when we propagate constants into the pointers).

Jakub


[RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-21 Thread Jakub Jelinek
On Tue, Jun 20, 2017 at 10:18:20AM +0200, Richard Biener wrote:
> > > > 3) not really related to this patch, but something I also saw during the
> > > > bootstrap-ubsan on i686-linux:
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147426384 - 2147475412 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147426384 - 2147478324 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147450216 - 2147451580 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147450216 - 2147465664 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147469348 - 2147451544 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147482364 - 2147475376 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147483624 - 2147475376 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147483628 - 2147451544 cannot be represented in type 'int'
> > > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: 
> > > > -2147426384 - 2147475376 cannot be represented in type 'int'
> > > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: 
> > > > -2147450216 - 2147451544 cannot be represented in type 'int'
> > > > The problem here is that we lower pointer subtraction, e.g.
> > > > long foo (char *p, char *q) { return q - p; }
> > > > as return (ptrdiff_t) ((ssizetype) q - (ssizetype) p);
> > > > and even for a valid testcase where we have an array across
> > > > the middle of the virtual address space, say the first one above
> > > > is (char *) 0x8000dfb0 - (char *) 0x7fffdfd4 subtraction, even if
> > > > there is 128KB array starting at 0x7fffd000, it will yield
> > > > UB (not in the source, but in whatever the compiler lowered it into).
> > > > So, shall we instead do the subtraction in sizetype and only then
> > > > cast?  For sizeof (*ptr) > 1 I think we have some outstanding PR,
> > > > and it is more difficult to find out in what types to compute it.
> > > > Or do we want to introduce POINTER_DIFF_EXPR?
> > > 
> > > Just use uintptr_t for the difference computation (well, an unsigned
> > > integer type of desired precision -- mind address-spaces), then cast
> > > the result to signed.
> > 
> > Ok (of course, will handle this separately from the rest).
> 
> Yes.  Note I didn't look at the actual patch (yet).

So, I wrote following patch to do the subtraction in unsigned
type.  It passes bootstrap, but on both x86_64-linux and i686-linux
regresses:
+FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
+FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized "minus_expr"
+FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)

E.g. in the first testcase we have in the test:
static uintptr_t a =  ((char *)&(char *)&)+((char *)&(char *)&);
Without the patch, we ended up with:
static uintptr_t a = (uintptr_t) (((long int)  - (long int) ) + ((long 
int)  - (long int) ));
but with the patch with (the negation in signed type sounds like a folding
bug), which is too difficult for the initializer_constant_valid_p* handling:
(uintptr_t) (((long unsigned int) -(long int)  - (long unsigned int) ) + 
((long unsigned int)  + (long unsigned int) ));
Shall we just xfail that test, or make sure we don't reassociate such
subtractions, something different?

The second failure is on:
int f (long *a, long *b, long *c) {
__PTRDIFF_TYPE__ l1 = b - a;
__PTRDIFF_TYPE__ l2 = c - a;
return l1 < l2;
}
where without the patch during forwprop2 we optimize it
using match.pd:
/* X - Z < Y - Z is the same as X < Y when there is no overflow.  */
because we had:
  b.0_1 = (long int) b_8(D);
  a.1_2 = (long int) a_9(D);
  _3 = b.0_1 - a.1_2;
  c.2_4 = (long int) c_11(D);
  a.3_5 = (long int) a_9(D);
  _6 = c.2_4 - a.3_5;
  _7 = _3 < _6;
But with the patch we have:
  b.0_1 = (long unsigned int) b_9(D);
  a.1_2 = (long unsigned int) a_10(D);
  _3 = b.0_1 - a.1_2;
  _4 = (long int) _3;
  c.2_5 = (long unsigned int) c_11(D);
  _6 = c.2_5 - a.1_2;
  _7 = (long int) _6;
  _8 = _4 < _7;
instead.  But that is something we can't generally optimize.
So do we need to introduce POINTER_DIFF (where we could still
optimize this) or remove the test?  If we rely on largest possible
array to be half of the VA size - 1 (i.e. where for x > y both being
pointers into the same array x - y > 0), then it is a valid optimization
of the 2 pointer subtractions, but it is not a valid optimization on
comparison of unsigned subtractions cast to signed type.

The third one is
if ([b] - [c] != b - c)