Re: [PATCH v2] detect out-of-bounds stores by atomic functions [PR102453]

2021-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 12, 2021 at 9:44 PM Martin Sebor  wrote:
>
> On 10/12/21 12:52 AM, Richard Biener wrote:
> > On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor  wrote:
> >>
> >> The attached change extends GCC's warnings for out-of-bounds
> >> stores to cover atomic (and __sync) built-ins.
> >>
> >> Rather than hardcoding the properties of these built-ins just
> >> for the sake of the out-of-bounds detection, on the assumption
> >> that it might be useful for future optimizations as well, I took
> >> the approach of extending class attr_fnspec to express their
> >> special property that they encode the size of the access in their
> >> name.
> >>
> >> I also took the liberty of making attr_fnspec assignable (something
> >> the rest of my patch relies on), and updating some comments for
> >> the characters the class uses to encode function properties, based
> >> on my understanding of their purpose.
> >>
> >> Tested on x86_64-linux.
> >
> > Hmm, so you place 'A' at an odd place (where the return value is specified),
> > but you do not actually specify the behavior on the return value.  Shoudln't
> >
> > + 'A'specifies that the function atomically accesses a constant
> > +   1 << N bytes where N is indicated by character 3+2i
> >
> > maybe read
> >
> >  'A' specifies that the function returns the memory pointed to
> > by argument
> >   one of size 1 << N bytes where N is indicated by
> > character 3 +2i accessed atomically
> >
> > ?
>
> I didn't think the return value would be interesting because in
> general (parallel accesses) it's not related (in an observable
> way) to the value of the dereferenced operand.  Not all
> the built-ins also return a value (e.g., atomic_store), and
> whether or not one does return the argument would need to be
> encoded somehow because it cannot be determined from the return
> type (__atomic_compare_exchange and __atomic_test_and_set return
> bool that's not necessarily the value of the operand).  Also,
> since the functions return the operand value either before or
> after the update, we'd need another letter to describe that.
> (This alone could be dealt with simply by using 'A' and 'a',
> but that's not enough for the other cases.)
>
> So with all these possibilities I don't think encoding
> the return value at this point is worthwhile.  If/when this
> enhancement turns out to be used for optimization and we think
> encoding the return value would be helpful, I'd say let's
> revisit it then.  The accessor APIs should make it a fairly
> straightforward exercise.

I though it would be useful for points-to analysis since knowing how
the return value is composed improves the points-to result for it.

Note that IPA mod-ref now synthesizes fn-spec and might make use
of 'A' if it were not narrowly defined.  Sure it's probably difficult to
fully specify the RMW cycle that's eventually done but since we
have a way to specify a non-constant size of accesses as passed
by a parameter it would be nice to allow specifying a constant size
anyhow.  It just occured to me we could use "fake" parameters to
encode those, so for

void foo (int *);

use like ". R2c4" saying that parameter 1 is read with the size
specified by (non-existing) parameter 2 which is specified as
'c'onstant 1 << 4.

Alternatively a constant size specification could use alternate
encoding 'a' to 'f'.  That said, if 'A' is not suppose to specify
the return value it shouldn't be in the return value specification...

> > I also wonder if it's necessary to constrain this to 'atomic' accesses
> > for the purpose of the patch and whether that detail could be omitted to
> > eventually make more use of it?
>
> I pondered the same question but I couldn't think of any other
> built-ins with similar semantics (read-write-modify, return
> a result either pre- or post-modification), so I opted for
> simplicity.  I am open to generalizing it if/when there is
> a function I could test it with, although I'm not sure
> the current encoding scheme has enough letters and letter
> positions to describe the effects in their full generality.
>
> >
> > Likewise
> >
> > + '0'...'9'  specifies the size of value written/read is given either
> > +   by the specified argument, or for atomic functions, by
> > +   2 ^ N where N is the constant value denoted by the character
> >
> > should mention (excluding '0') for the argument position.
>
> Sure, I'll update the comment if you think this change is worth
> pursuing.
>
> >
> > /* length of the fn spec string.  */
> > -  const unsigned len;
> > +  unsigned len;
> >
> > why that?
>
> The const member is what prevents the struct from being assignable,
> which is what the rest of the patch depends on.
>
> >
> > +  /* Return true of the function is an __atomic or __sync built-in.  */
> >
> > you didn't specify that for 'A' ...
> >
> > +  bool
> > +  atomic_p () const
> > +  {
> > +return str[0] == 'A';
> > +  }
> >
> > +attr_fnspec
> 

[PATCH v2] detect out-of-bounds stores by atomic functions [PR102453]

2021-10-12 Thread Martin Sebor via Gcc-patches

On 10/12/21 12:52 AM, Richard Biener wrote:

On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor  wrote:


The attached change extends GCC's warnings for out-of-bounds
stores to cover atomic (and __sync) built-ins.

Rather than hardcoding the properties of these built-ins just
for the sake of the out-of-bounds detection, on the assumption
that it might be useful for future optimizations as well, I took
the approach of extending class attr_fnspec to express their
special property that they encode the size of the access in their
name.

I also took the liberty of making attr_fnspec assignable (something
the rest of my patch relies on), and updating some comments for
the characters the class uses to encode function properties, based
on my understanding of their purpose.

Tested on x86_64-linux.


Hmm, so you place 'A' at an odd place (where the return value is specified),
but you do not actually specify the behavior on the return value.  Shoudln't

+ 'A'specifies that the function atomically accesses a constant
+   1 << N bytes where N is indicated by character 3+2i

maybe read

 'A' specifies that the function returns the memory pointed to
by argument
  one of size 1 << N bytes where N is indicated by
character 3 +2i accessed atomically

?


I didn't think the return value would be interesting because in
general (parallel accesses) it's not related (in an observable
way) to the value of the dereferenced operand.  Not all
the built-ins also return a value (e.g., atomic_store), and
whether or not one does return the argument would need to be
encoded somehow because it cannot be determined from the return
type (__atomic_compare_exchange and __atomic_test_and_set return
bool that's not necessarily the value of the operand).  Also,
since the functions return the operand value either before or
after the update, we'd need another letter to describe that.
(This alone could be dealt with simply by using 'A' and 'a',
but that's not enough for the other cases.)

So with all these possibilities I don't think encoding
the return value at this point is worthwhile.  If/when this
enhancement turns out to be used for optimization and we think
encoding the return value would be helpful, I'd say let's
revisit it then.  The accessor APIs should make it a fairly
straightforward exercise.


I also wonder if it's necessary to constrain this to 'atomic' accesses
for the purpose of the patch and whether that detail could be omitted to
eventually make more use of it?


I pondered the same question but I couldn't think of any other
built-ins with similar semantics (read-write-modify, return
a result either pre- or post-modification), so I opted for
simplicity.  I am open to generalizing it if/when there is
a function I could test it with, although I'm not sure
the current encoding scheme has enough letters and letter
positions to describe the effects in their full generality.



Likewise

+ '0'...'9'  specifies the size of value written/read is given either
+   by the specified argument, or for atomic functions, by
+   2 ^ N where N is the constant value denoted by the character

should mention (excluding '0') for the argument position.


Sure, I'll update the comment if you think this change is worth
pursuing.



/* length of the fn spec string.  */
-  const unsigned len;
+  unsigned len;

why that?


The const member is what prevents the struct from being assignable,
which is what the rest of the patch depends on.



+  /* Return true of the function is an __atomic or __sync built-in.  */

you didn't specify that for 'A' ...

+  bool
+  atomic_p () const
+  {
+return str[0] == 'A';
+  }

+attr_fnspec
+atomic_builtin_fnspec (tree callee)
+{
+  switch (DECL_FUNCTION_CODE (callee))
+{
+#define BUILTIN_ACCESS_SIZE_FNSPEC(N, lgsz)\
+  BUILT_IN_ATOMIC_LOAD_ ## N:  \
+   return "Ap" "R" lgsz;

note that doing this for atomics makes those no longer a compiler barrier
for (aliased) loads and stores which means they are no longer a reliable
way to implement locks.  That's a reason why I never pushed a
PTA/alias patch I have to open-code this.

Thus, do we really want to do this?


That's my question to you :) If you don't think this attr_fnspec
extension would be useful for optimization I'll drop this part
of the patch and open-code it separately only for the out-of-bounds
diagnostics.  I'd started out that way but the fnspec class made
the code cleaner and if it could be used elsewhere so much
the better.  Please let me know.

Martin