Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-25 Thread Jakub Jelinek
On Wed, Feb 25, 2015 at 10:01:07PM -0800, Martin Uecker wrote:
> this patch removes a bogus check for flexible array members
> which prevents array references to be instrumented in some
> interesting cases. Arrays accessed through pointers are now
> instrumented correctly.
> 
> The check was unnecessary because flexible arrays are not 
> instrumented anyway because of
> TYPE_MAX_VALUE (domain) == NULL_TREE. 

No, it is not bogus nor unnecessary.
This isn't about just real flexible arrays, but similar constructs,
C++ doesn't have flexible array members, nor C89, so people use the
GNU extension of struct S { ... ; char a[0]; } instead, or
use char a[1]; as the last member and still expect to be able to access
s->a[i] for i > 0 say on heap allocations etc.

I think we were talking at some point about having a strict-bounds or
something similar that would accept only real flexible array members and
nothing else and be more pedantic, at the disadvantage of diagnosing tons
of real-world code that is supported by GCC.

Perhaps if the reference is just an array, not a struct containing
flexible-array-member-like array, we could consider it strict always,
but that is certainly not what your patch does.

> * c-family/c-ubsan.c (ubsan_instrument_bounds): Remove

c-family/ shouldn't be in the ChangeLog entry, instead that part should go
into c-family/ChangeLog.
> --- a/gcc/c-family/c-ubsan.c
> +++ b/gcc/c-family/c-ubsan.c
> @@ -293,6 +293,7 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
> *index,
>tree type = TREE_TYPE (array);
>tree domain = TYPE_DOMAIN (type);
>  
> +  /* This also takes care of flexilbe array members */

Typo.

> +#ifdef __cplusplus
> +extern "C" void* malloc(unsigned long x);
> +#else
> +extern void* malloc(unsigned long x);
> +#endif

Why are you declaring malloc (wrongly), when it isn't used?
Malloc argument is size_t aka __SIZE_TYPE__, not unsigned long.

Jakub


Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-25 Thread Marek Polacek
On Thu, Feb 26, 2015 at 07:36:54AM +0100, Jakub Jelinek wrote:
> On Wed, Feb 25, 2015 at 10:01:07PM -0800, Martin Uecker wrote:
> > this patch removes a bogus check for flexible array members
> > which prevents array references to be instrumented in some
> > interesting cases. Arrays accessed through pointers are now
> > instrumented correctly.
> > 
> > The check was unnecessary because flexible arrays are not 
> > instrumented anyway because of
> > TYPE_MAX_VALUE (domain) == NULL_TREE. 
> 
> No, it is not bogus nor unnecessary.
> This isn't about just real flexible arrays, but similar constructs,
> C++ doesn't have flexible array members, nor C89, so people use the
> GNU extension of struct S { ... ; char a[0]; } instead, or
> use char a[1]; as the last member and still expect to be able to access
> s->a[i] for i > 0 say on heap allocations etc.
 
Yes, in other words, your patch would make a difference here:

int
main (void)
{
  struct S { int i; char a[1]; };
  struct S *t = (struct S *) __builtin_malloc (sizeof (struct S) + 10);
  t->a[2] = 1;
}

(bounds-1.c test case should test this, too.)

> I think we were talking at some point about having a strict-bounds or
> something similar that would accept only real flexible array members and
> nothing else and be more pedantic, at the disadvantage of diagnosing tons
> of real-world code that is supported by GCC.
> 
> Perhaps if the reference is just an array, not a struct containing
> flexible-array-member-like array, we could consider it strict always,
> but that is certainly not what your patch does.

Martin, can you try whether this (untested) does what you're after?

--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -303,8 +303,9 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
 
   /* Detect flexible array members and suchlike.  */
   tree base = get_base_address (array);
-  if (base && (TREE_CODE (base) == INDIRECT_REF
-  || TREE_CODE (base) == MEM_REF))
+  if (TREE_CODE (array) == COMPONENT_REF
+  && base && (TREE_CODE (base) == INDIRECT_REF
+ || TREE_CODE (base) == MEM_REF))
 {
   tree next = NULL_TREE;
   tree cref = array;

I think it is a bug that we're playing games on something that is not
an element of a structure.

Marek


Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Martin Uecker

Thu, 26 Feb 2015 07:36:54 +0100
Jakub Jelinek :
> On Wed, Feb 25, 2015 at 10:01:07PM -0800, Martin Uecker wrote:
> > this patch removes a bogus check for flexible array members
> > which prevents array references to be instrumented in some
> > interesting cases. Arrays accessed through pointers are now
> > instrumented correctly.
> > 
> > The check was unnecessary because flexible arrays are not 
> > instrumented anyway because of
> > TYPE_MAX_VALUE (domain) == NULL_TREE. 
> 
> No, it is not bogus nor unnecessary.
> This isn't about just real flexible arrays, but similar constructs,
> C++ doesn't have flexible array members, nor C89, so people use the
> GNU extension of struct S { ... ; char a[0]; } instead, or

The GNU extension is still allowed, i.e. not instrumented with
the patch.

> use char a[1]; as the last member and still expect to be able to access
> s->a[i] for i > 0 say on heap allocations etc.

And this is broken code. I would argue that a user who uses the
ubsan *expects* this to be diagnosed. Atleast I was surprised
that it didn't catch more out-of-bounds accesses.

If this is indeed intentional, we should:

- document these exceptions
- remove the misleading comment
- have test cases also for sizes > 0
- fix it not to prevent instrumentation of all array accesses
  through pointers (i.e. when the array is not a member of a struct)
- have a configuration option (e.g. -fsanitize=strict-bounds)


> I think we were talking at some point about having a strict-bounds or
> something similar that would accept only real flexible array members and
> nothing else and be more pedantic, at the disadvantage of diagnosing tons
> of real-world code that is supported by GCC.
>
> Perhaps if the reference is just an array, not a struct containing
> flexible-array-member-like array, we could consider it strict always,
> but that is certainly not what your patch does.

Indeed, currently the patch tries to make -fsanitize=bounds detect what
the  standard considers undefined behaviour. At the moment, I do not
quite see why ubsan should be so permissive as you say.

But we can also add a new option -fsanitize=strict-bounds. Then I could
use this for my (real-world) projects.



> > * c-family/c-ubsan.c (ubsan_instrument_bounds): Remove
> 
> c-family/ shouldn't be in the ChangeLog entry, instead that part should go
> into c-family/ChangeLog.

> > --- a/gcc/c-family/c-ubsan.c
> > +++ b/gcc/c-family/c-ubsan.c
> > @@ -293,6 +293,7 @@ ubsan_instrument_bounds (location_t loc, tree array, 
> > tree *index,
> >tree type = TREE_TYPE (array);
> >tree domain = TYPE_DOMAIN (type);
> >  
> > +  /* This also takes care of flexilbe array members */
> 
> Typo.
> 
> > +#ifdef __cplusplus
> > +extern "C" void* malloc(unsigned long x);
> > +#else
> > +extern void* malloc(unsigned long x);
> > +#endif
> 
> Why are you declaring malloc (wrongly), when it isn't used?
> Malloc argument is size_t aka __SIZE_TYPE__, not unsigned long.

Left over from removing tests already done elsewhere.

Thank you for your comments,
Martin




Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Jakub Jelinek
On Thu, Feb 26, 2015 at 12:59:08AM -0800, Martin Uecker wrote:
> > No, it is not bogus nor unnecessary.
> > This isn't about just real flexible arrays, but similar constructs,
> > C++ doesn't have flexible array members, nor C89, so people use the
> > GNU extension of struct S { ... ; char a[0]; } instead, or
> 
> The GNU extension is still allowed, i.e. not instrumented with
> the patch.
> 
> > use char a[1]; as the last member and still expect to be able to access
> > s->a[i] for i > 0 say on heap allocations etc.
> 
> And this is broken code. I would argue that a user who uses the
> ubsan *expects* this to be diagnosed. Atleast I was surprised
> that it didn't catch more out-of-bounds accesses.

So can you explain what a C++ programmer can do portably?  It has neither
flexible array members, nor without GNU extensions zero sized arrays.
If the array size is constant, perhaps turn the struct into a template,
but if it is variable?  Ditto for C89 code.
The amount of code that uses this idiom in the wild is huge.

Jakub


Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Marek Polacek
On Thu, Feb 26, 2015 at 12:59:08AM -0800, Martin Uecker wrote:
> 
> Thu, 26 Feb 2015 07:36:54 +0100
> Jakub Jelinek :
> > On Wed, Feb 25, 2015 at 10:01:07PM -0800, Martin Uecker wrote:
> > > this patch removes a bogus check for flexible array members
> > > which prevents array references to be instrumented in some
> > > interesting cases. Arrays accessed through pointers are now
> > > instrumented correctly.
> > > 
> > > The check was unnecessary because flexible arrays are not 
> > > instrumented anyway because of
> > > TYPE_MAX_VALUE (domain) == NULL_TREE. 
> > 
> > No, it is not bogus nor unnecessary.
> > This isn't about just real flexible arrays, but similar constructs,
> > C++ doesn't have flexible array members, nor C89, so people use the
> > GNU extension of struct S { ... ; char a[0]; } instead, or
> 
> The GNU extension is still allowed, i.e. not instrumented with
> the patch.
> 
> > use char a[1]; as the last member and still expect to be able to access
> > s->a[i] for i > 0 say on heap allocations etc.
> 
> And this is broken code. I would argue that a user who uses the
> ubsan *expects* this to be diagnosed. Atleast I was surprised
> that it didn't catch more out-of-bounds accesses.
 
I wouldn't say broken, it's being used pretty often.  E.g.
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
"In ISO C90, you would have to give contents a length of 1"

> If this is indeed intentional, we should:
> 
> - document these exceptions

See the link above + docs say for -fsanitize=bounds that "Flexible array
members are not instrumented".

> - remove the misleading comment

Which one?
The /* Detect flexible array members and suchlike.  */ IMHO says
exactly what's meant to be done.

> - have test cases also for sizes > 0

Yes, we should have that.

> - fix it not to prevent instrumentation of all array accesses
>   through pointers (i.e. when the array is not a member of a struct)

Yes, see the patch in my previous mail.

> - have a configuration option (e.g. -fsanitize=strict-bounds)

Yeah, something like that would be useful to have.  As Jakub
mentioned, we discussed this in the past.  Probably GCC 6 stuff
though.

Marek


Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Richard Biener
On Thu, Feb 26, 2015 at 8:32 AM, Marek Polacek  wrote:
> On Thu, Feb 26, 2015 at 07:36:54AM +0100, Jakub Jelinek wrote:
>> On Wed, Feb 25, 2015 at 10:01:07PM -0800, Martin Uecker wrote:
>> > this patch removes a bogus check for flexible array members
>> > which prevents array references to be instrumented in some
>> > interesting cases. Arrays accessed through pointers are now
>> > instrumented correctly.
>> >
>> > The check was unnecessary because flexible arrays are not
>> > instrumented anyway because of
>> > TYPE_MAX_VALUE (domain) == NULL_TREE.
>>
>> No, it is not bogus nor unnecessary.
>> This isn't about just real flexible arrays, but similar constructs,
>> C++ doesn't have flexible array members, nor C89, so people use the
>> GNU extension of struct S { ... ; char a[0]; } instead, or
>> use char a[1]; as the last member and still expect to be able to access
>> s->a[i] for i > 0 say on heap allocations etc.
>
> Yes, in other words, your patch would make a difference here:
>
> int
> main (void)
> {
>   struct S { int i; char a[1]; };
>   struct S *t = (struct S *) __builtin_malloc (sizeof (struct S) + 10);
>   t->a[2] = 1;
> }
>
> (bounds-1.c test case should test this, too.)

Btw, I've seen struct S { int i; char a[4]; }; as well.

>> I think we were talking at some point about having a strict-bounds or
>> something similar that would accept only real flexible array members and
>> nothing else and be more pedantic, at the disadvantage of diagnosing tons
>> of real-world code that is supported by GCC.
>>
>> Perhaps if the reference is just an array, not a struct containing
>> flexible-array-member-like array, we could consider it strict always,
>> but that is certainly not what your patch does.
>
> Martin, can you try whether this (untested) does what you're after?
>
> --- gcc/c-family/c-ubsan.c
> +++ gcc/c-family/c-ubsan.c
> @@ -303,8 +303,9 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
> *index,
>
>/* Detect flexible array members and suchlike.  */
>tree base = get_base_address (array);
> -  if (base && (TREE_CODE (base) == INDIRECT_REF
> -  || TREE_CODE (base) == MEM_REF))
> +  if (TREE_CODE (array) == COMPONENT_REF

Err - this doesn't detect

int
main (void)
{
  int *t = (int *) __builtin_malloc (sizeof (int) * 10);
  int (*a)[1] = (int (*)[1])t;
  (*a)[2] = 1;
}

that is a trailing array VLA.

What I've definitely seen is

int
main (void)
{
  int *t = (int *) __builtin_malloc (sizeof (int) * 9);
  int (*a)[3][3] = (int (*)[3][3])t;
  (*a)[0][9] = 1;
}

that is, assume that the array dimension with the fast running
index "wraps" over into the next (hello SPEC CPU 2006!).

> +  && base && (TREE_CODE (base) == INDIRECT_REF
> + || TREE_CODE (base) == MEM_REF))
>  {
>tree next = NULL_TREE;
>tree cref = array;
>
> I think it is a bug that we're playing games on something that is not
> an element of a structure.

Not sure about this.

Richard.

> Marek


Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Marek Polacek
On Thu, Feb 26, 2015 at 11:08:04AM +0100, Richard Biener wrote:
> > --- gcc/c-family/c-ubsan.c
> > +++ gcc/c-family/c-ubsan.c
> > @@ -303,8 +303,9 @@ ubsan_instrument_bounds (location_t loc, tree array, 
> > tree *index,
> >
> >/* Detect flexible array members and suchlike.  */
> >tree base = get_base_address (array);
> > -  if (base && (TREE_CODE (base) == INDIRECT_REF
> > -  || TREE_CODE (base) == MEM_REF))
> > +  if (TREE_CODE (array) == COMPONENT_REF
> 
> Err - this doesn't detect
> 
> int
> main (void)
> {
>   int *t = (int *) __builtin_malloc (sizeof (int) * 10);
>   int (*a)[1] = (int (*)[1])t;
>   (*a)[2] = 1;
> }
> 
> that is a trailing array VLA.
> 
> What I've definitely seen is
> 
> int
> main (void)
> {
>   int *t = (int *) __builtin_malloc (sizeof (int) * 9);
>   int (*a)[3][3] = (int (*)[3][3])t;
>   (*a)[0][9] = 1;
> }
 
I think we should error on those.
With my patch we'd emit the same -fsanitize=bounds runtime errors as clang
does.

> that is, assume that the array dimension with the fast running
> index "wraps" over into the next (hello SPEC CPU 2006!).
 
I think they're invoking UB then.

> > +  && base && (TREE_CODE (base) == INDIRECT_REF
> > + || TREE_CODE (base) == MEM_REF))
> >  {
> >tree next = NULL_TREE;
> >tree cref = array;
> >
> > I think it is a bug that we're playing games on something that is not
> > an element of a structure.
> 
> Not sure about this.

The comment says that we're trying to detect a flexible array member
there - and those can't be outside struct.  I certainly hadn't anything
else in my mind when I was writing that ;).

Marek


Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Martin Uecker
Marek Polacek :

(Hi Marek and Jakub, I really appreciate your work on GCC and 
ubsan. I just want to add the minor enhancements necessary 
to make it really useful for me).

> > And this is broken code. I would argue that a user who uses the
> > ubsan *expects* this to be diagnosed. Atleast I was surprised
> > that it didn't catch more out-of-bounds accesses.
>  
> I wouldn't say broken, it's being used pretty often.  E.g.
> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> "In ISO C90, you would have to give contents a length of 1"

Ok, let's add an option then. But that it is common doesn't
mean it is not broken (especially since there is a legal
way to that with GCC and C99) and I think there should be a 
way to diagnose it with ubsan.

> > If this is indeed intentional, we should:
> > 
> > - document these exceptions
> 
> See the link above + docs say for -fsanitize=bounds that "Flexible array
> members are not instrumented".

Flexible array members should not be intrumented. But
what is suprising to the user and should be documented
is that other arrays (e.g. which hjust happen to be the
last in a struct) are not instrumented.

> > - remove the misleading comment
> 
> Which one?
> The /* Detect flexible array members and suchlike.  */ IMHO says
> exactly what's meant to be done.

The reason it confused me was because flexible array members never
see this code because the are catched by the 'if' condition
directly before. 

> > - have test cases also for sizes > 0
> 
> Yes, we should have that.
> 
> > - fix it not to prevent instrumentation of all array accesses
> >   through pointers (i.e. when the array is not a member of a struct)
> 
> Yes, see the patch in my previous mail.

Thank you! I will test this and report back

> > - have a configuration option (e.g. -fsanitize=strict-bounds)
> 
> Yeah, something like that would be useful to have.  As Jakub
> mentioned, we discussed this in the past.  Probably GCC 6 stuff
> though.

I will look into this.

Martin



Re: [PATCH] ubsan: remove bogus check for flexible array members

2015-02-26 Thread Martin Uecker
Am Thu, 26 Feb 2015 10:05:14 +0100
Jakub Jelinek :

> On Thu, Feb 26, 2015 at 12:59:08AM -0800, Martin Uecker wrote:
> > > No, it is not bogus nor unnecessary.
> > > This isn't about just real flexible arrays, but similar constructs,
> > > C++ doesn't have flexible array members, nor C89, so people use the
> > > GNU extension of struct S { ... ; char a[0]; } instead, or
> > 
> > The GNU extension is still allowed, i.e. not instrumented with
> > the patch.
> > 
> > > use char a[1]; as the last member and still expect to be able to access
> > > s->a[i] for i > 0 say on heap allocations etc.
> > 
> > And this is broken code. I would argue that a user who uses the
> > ubsan *expects* this to be diagnosed. Atleast I was surprised
> > that it didn't catch more out-of-bounds accesses.
> 
> So can you explain what a C++ programmer can do portably? 

Using broken code is not really portable either, because other
compilers diagnose this. Also, we are not talking about breaking
that code - they can simply continue to use that code. They could
just not expect ubsan to not diagnose it. This would make
them aware of the fact that their code is problematic - which is
exactly what I would expect from ubsan.

> It has neither
> flexible array members, nor without GNU extensions zero sized arrays.
> If the array size is constant, perhaps turn the struct into a template,
> but if it is variable?  Ditto for C89 code.
> The amount of code that uses this idiom in the wild is huge.

Ok, I will add an option then. Or should this be language dependent?

Maritn