Re: Definition of EAF_NOESCAPE and fnspec strings

2020-11-09 Thread Richard Biener
On Mon, 9 Nov 2020, Jan Hubicka wrote:

> > > But it also means that some of our FNSPECs are wrong now.  I wonder if we 
> > > can
> > > split this porperty to two different flags like EAF_NOEASCAPE and
> > > EAF_INDIRECT_NOESCAPE?
> > 
> > Note that EAF_NOESCAPE allows "escaping" to the return value (see
> > handle_rhs_call).  I guess for simplicity we could allow escaping
> 
> I see, missed that! 
> 
> > of not escaped but USED params to other written to params.  I think
> > we also don't handle "escaping" of a parameter indirectly to itself, thus
> > 
> > int i;
> > int *p = 
> > int **q = 
> > foo ();
> > if (*q != i)
> >   abort ();
> > 
> > and
> > 
> > foo (int ***p)
> > {
> >   *p = **p;
> > } 
> > 
> > or so with foos param EAF_NOESCAPE (but not EAF_DIRECT).
> > 
> > Splitting up EAF_NOESCAPE makes it quite difficult to understand.
> > Arguably explicit handling of memcpy and friends _does_ pay off
> > for points-to analysis since I'd say modelling all possible and useful
> > things in fnspec would make it a monster ... you'd basically want
> > to have a way to specify additional constraints in the fnspec itself,
> > like *1 = *2, but then also distinguish points-to effects from
> > may-alias ones.
> 
> Yep, i am not arguing for eliminating special case of memcpy (because we
> have the additional info that it only copies pointers from *src to
> *dest).
> 
> However I find current definition of EAF_NOESCAPE bit hard to handle in
> modref, since naturally it is quite reliable to track all uses of ssa
> name that correspond to parameters, but it is harder to track where
> values read from pointed-to memory can eventually go.

Yeah - also the fnspec of memcpy _is_ wrong at the moment ...

> What I do is to walk all uses of SSA_NAME that correspond to parameter
> and try to unerstand it.  If it is one of
>  1) memory load via derefence of name
>  2) memory store where name is base of LHS
>  3) memory store where name is rhs
>  4) name is passed as a value to function 
>  5) dereferenced value from name is passed to funtion
>  6) used as value normal gimple expression
>  7) used in return (as RHS or base of memory dereference address)
>  8) it is used only in reference but not as base (as array index or so)
> 
> Then I merge in (and) flags I determine as follow:
> 
> For 1) clear EAF_USED and recurse to LHS name. Based on its flags I
> decide on:
>   - EAF_DIRECT (if LHS has EAF_UNUSED),
>   - EAF_NOCLOBBER (if LHS has EAF_NOCLOBBER)
>   - EAF_NOESCAPE (if LHS has EAF_NOESCAPE). 
> 
> For 2) I clear EAF_NOCLOBBER and EAF_UNUSED flag
> 
> For 3) I give up (clear all flags) since I would need to track where the
> memory is going.
> 
> For 4) I determine flag of function parameter
> 
> For 5) I need to do same handling as 1) where flag of "loaded value" is
> flag of the function
> 
> For 6) I determine flags of LHS and merge them in

guess you could track a SSA lattice "based on parameter N" which
would need to be a mask (thus only track up to 32 parameters?)

> For 7) I clear NOESCAPE if rhs is name itself
> and UNUSED + NOESCAPE if rhs is derefernece from name.
> 
> For 8) I do nothing.  Here the names are non-pointers that I track
> because of earlier dereference.
> 
> 
> 
> So I think 7) can be relaxed.  Main problem is hoever that we often see 1)
> and then 3) or 7) on LHS that makes us punt very often.
> 
> The fact that pointer directly does not escape but pointed to memory can
> seems still very useful since one does not need to add *ptr to points-to
> sets. But I will try relaxing 7).
> 
> If we allow values escaping to other parameters and itself, I think I
> can relax 3) if base of the store is default def of PARM_DECL.

I think the important part is to get things correct.  Maybe it's worth
to add write/read flags where the argument _does_ escape in case the
function itself is otherwise pure/const.  For PTA that doesn't make
a difference (and fnspec was all about PTA ...) but for alias-analysis
it does.

> > 
> > I wonder if we should teach the GIMPLE FE to parse 'fn spec'
> > so we can write unit tests for the attribute ... or maybe simply
> > add this to the __GIMPLE spec string.
> 
> May be nice and also describe carefully that NOESCAPE and NOCLOBBER also
> reffers to indirect references.  Current description
> "Nonzero if the argument does not escape."
> reads to me that it is about ptr itself, not about *ptr and also it does
> not speak of the escaping to return value etc.

Well, if 'ptr' escapes then obvoiously all memory reachable from 'ptr'
escapes - escaping is always transitive.

And as escaping is in the context of the caller sth escaping to the
caller itself (via return) can hardly be considered escaping (again
this was designed for PTA ...).

I guess it makes sense to be able to separate escaping from the rest.

Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: Definition of EAF_NOESCAPE and fnspec strings

2020-11-09 Thread Jan Hubicka
> > But it also means that some of our FNSPECs are wrong now.  I wonder if we 
> > can
> > split this porperty to two different flags like EAF_NOEASCAPE and
> > EAF_INDIRECT_NOESCAPE?
> 
> Note that EAF_NOESCAPE allows "escaping" to the return value (see
> handle_rhs_call).  I guess for simplicity we could allow escaping

I see, missed that! 

> of not escaped but USED params to other written to params.  I think
> we also don't handle "escaping" of a parameter indirectly to itself, thus
> 
> int i;
> int *p = 
> int **q = 
> foo ();
> if (*q != i)
>   abort ();
> 
> and
> 
> foo (int ***p)
> {
>   *p = **p;
> } 
> 
> or so with foos param EAF_NOESCAPE (but not EAF_DIRECT).
> 
> Splitting up EAF_NOESCAPE makes it quite difficult to understand.
> Arguably explicit handling of memcpy and friends _does_ pay off
> for points-to analysis since I'd say modelling all possible and useful
> things in fnspec would make it a monster ... you'd basically want
> to have a way to specify additional constraints in the fnspec itself,
> like *1 = *2, but then also distinguish points-to effects from
> may-alias ones.

Yep, i am not arguing for eliminating special case of memcpy (because we
have the additional info that it only copies pointers from *src to
*dest).

However I find current definition of EAF_NOESCAPE bit hard to handle in
modref, since naturally it is quite reliable to track all uses of ssa
name that correspond to parameters, but it is harder to track where
values read from pointed-to memory can eventually go.

What I do is to walk all uses of SSA_NAME that correspond to parameter
and try to unerstand it.  If it is one of
 1) memory load via derefence of name
 2) memory store where name is base of LHS
 3) memory store where name is rhs
 4) name is passed as a value to function 
 5) dereferenced value from name is passed to funtion
 6) used as value normal gimple expression
 7) used in return (as RHS or base of memory dereference address)
 8) it is used only in reference but not as base (as array index or so)

Then I merge in (and) flags I determine as follow:

For 1) clear EAF_USED and recurse to LHS name. Based on its flags I
decide on:
  - EAF_DIRECT (if LHS has EAF_UNUSED),
  - EAF_NOCLOBBER (if LHS has EAF_NOCLOBBER)
  - EAF_NOESCAPE (if LHS has EAF_NOESCAPE). 

For 2) I clear EAF_NOCLOBBER and EAF_UNUSED flag

For 3) I give up (clear all flags) since I would need to track where the
memory is going.

For 4) I determine flag of function parameter

For 5) I need to do same handling as 1) where flag of "loaded value" is
flag of the function

For 6) I determine flags of LHS and merge them in

For 7) I clear NOESCAPE if rhs is name itself
and UNUSED + NOESCAPE if rhs is derefernece from name.

For 8) I do nothing.  Here the names are non-pointers that I track
because of earlier dereference.



So I think 7) can be relaxed.  Main problem is hoever that we often see 1)
and then 3) or 7) on LHS that makes us punt very often.

The fact that pointer directly does not escape but pointed to memory can
seems still very useful since one does not need to add *ptr to points-to
sets. But I will try relaxing 7).

If we allow values escaping to other parameters and itself, I think I
can relax 3) if base of the store is default def of PARM_DECL.
> 
> I wonder if we should teach the GIMPLE FE to parse 'fn spec'
> so we can write unit tests for the attribute ... or maybe simply
> add this to the __GIMPLE spec string.

May be nice and also describe carefully that NOESCAPE and NOCLOBBER also
reffers to indirect references.  Current description
"Nonzero if the argument does not escape."
reads to me that it is about ptr itself, not about *ptr and also it does
not speak of the escaping to return value etc.

Honza


Re: Definition of EAF_NOESCAPE and fnspec strings

2020-11-08 Thread Richard Biener
On Sun, 8 Nov 2020, Jan Hubicka wrote:

> Hi,
> I implemented simple propagation of EAF arguments for ipa-modref (that is not
> hard to do).  My main aim was to detect cases where THIS parameter does not
> escape but is used to read/write pointed to memory.  This is meant to
> avoid poor code produced when we i.e. offline destructors on cold path.
> 
> Unfortunately detecting EAF_NOESCAPE for such parameters breaks.  This is 
> already
> visible on memcpy if we let it to be be handled via its fnspec
> attribute. If I disable special handling in structalias:
>
> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> index a4832b75436..2b614913b57 100644
> --- a/gcc/tree-ssa-structalias.c
> +++ b/gcc/tree-ssa-structalias.c
> @@ -4389,7 +4389,6 @@ find_func_aliases_for_builtin_call (struct function 
> *fn, gcall *t)
>case BUILT_IN_STRCPY:
>case BUILT_IN_STRNCPY:
>case BUILT_IN_BCOPY:
> -  case BUILT_IN_MEMCPY:
>case BUILT_IN_MEMMOVE:
>case BUILT_IN_MEMPCPY:
>case BUILT_IN_STPCPY:
> 
> In the following testcase we miss the fact that memcpy may merge i4p PTA
> set to i3p.
> 
> extern void exit (int);
> extern void abort (void);
> int size = 8;
> 
> int main (void)
> {
>   int i3 = -1, i4 = 55;
>   int *i3p = 
>   int *i4p = 
> 
>   memcpy(, , size);
>   if (i3p != i4p)
> abort ();
>   exit (0);
> }
> 
> This seems to be documented for some degree:
> 
> /* Add *tem = nonlocal, do not add *tem = callused as
>EAF_NOESCAPE parameters do not escape to other parameters
>and all other uses appear in NONLOCAL as well.  */
> 
> But it also means that some of our FNSPECs are wrong now.  I wonder if we can
> split this porperty to two different flags like EAF_NOEASCAPE and
> EAF_INDIRECT_NOESCAPE?

Note that EAF_NOESCAPE allows "escaping" to the return value (see
handle_rhs_call).  I guess for simplicity we could allow escaping
of not escaped but USED params to other written to params.  I think
we also don't handle "escaping" of a parameter indirectly to itself, thus

int i;
int *p = 
int **q = 
foo ();
if (*q != i)
  abort ();

and

foo (int ***p)
{
  *p = **p;
} 

or so with foos param EAF_NOESCAPE (but not EAF_DIRECT).

Splitting up EAF_NOESCAPE makes it quite difficult to understand.
Arguably explicit handling of memcpy and friends _does_ pay off
for points-to analysis since I'd say modelling all possible and useful
things in fnspec would make it a monster ... you'd basically want
to have a way to specify additional constraints in the fnspec itself,
like *1 = *2, but then also distinguish points-to effects from
may-alias ones.

I wonder if we should teach the GIMPLE FE to parse 'fn spec'
so we can write unit tests for the attribute ... or maybe simply
add this to the __GIMPLE spec string.

> Auto-detecting EAF_INDIRECT_NOESCAPE seems bit harder at least assuming
> that any read can actually load few bits of pointer possibly written
> there beause if simple member functions accesses values pointer by THIS
> and return them we lose a track if the returned value is an escape point
> I would say.
> 
> The difference in constraints are:
> 
> --- good/a.c.035t.ealias  2020-11-08 13:34:04.397499515 +0100
> +++ a.c.035t.ealias   2020-11-08 13:39:15.483438469 +0100
> @@ -106,7 +106,15 @@
>  size = NONLOCAL
>  size.0_1 = size
>  _2 = size.0_1
> -i3p = i4p
> +callarg(18) = 
> +callarg(18) = callarg(18) + UNKNOWN
> +CALLUSED(16) = callarg(18)
> +CALLCLOBBERED(17) = callarg(18)
> +*callarg(18) = NONLOCAL
> +callarg(19) = 
> +callarg(19) = callarg(19) + UNKNOWN
> +CALLUSED(16) = callarg(19)
> +ESCAPED = _2
>  i3p.1_3 = i3p
>  i4p.2_4 = i4p
>  ESCAPED = 
> 
> ...
> 
>  Call clobber information
>  
> -ESCAPED, points-to NULL, points-to vars: { }
> +ESCAPED, points-to non-local, points-to NULL, points-to vars: { }
>  
>  Flow-insensitive points-to information
>  
> -i3p.1_3, points-to NULL, points-to vars: { D.1947 D.1948 }
> +i3p.1_3, points-to non-local, points-to escaped, points-to NULL, points-to 
> vars: { D.1947 }
>  i4p.2_4, points-to NULL, points-to vars: { D.1948 }
>  
>  main ()
> 
> Honza
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Definition of EAF_NOESCAPE and fnspec strings

2020-11-08 Thread Jan Hubicka
Hi,
I implemented simple propagation of EAF arguments for ipa-modref (that is not
hard to do).  My main aim was to detect cases where THIS parameter does not
escape but is used to read/write pointed to memory.  This is meant to
avoid poor code produced when we i.e. offline destructors on cold path.

Unfortunately detecting EAF_NOESCAPE for such parameters breaks.  This is 
already
visible on memcpy if we let it to be be handled via its fnspec
attribute. If I disable special handling in structalias:

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index a4832b75436..2b614913b57 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4389,7 +4389,6 @@ find_func_aliases_for_builtin_call (struct function *fn, 
gcall *t)
   case BUILT_IN_STRCPY:
   case BUILT_IN_STRNCPY:
   case BUILT_IN_BCOPY:
-  case BUILT_IN_MEMCPY:
   case BUILT_IN_MEMMOVE:
   case BUILT_IN_MEMPCPY:
   case BUILT_IN_STPCPY:

In the following testcase we miss the fact that memcpy may merge i4p PTA
set to i3p.

extern void exit (int);
extern void abort (void);
int size = 8;

int main (void)
{
  int i3 = -1, i4 = 55;
  int *i3p = 
  int *i4p = 

  memcpy(, , size);
  if (i3p != i4p)
abort ();
  exit (0);
}

This seems to be documented for some degree:

  /* Add *tem = nonlocal, do not add *tem = callused as
 EAF_NOESCAPE parameters do not escape to other parameters
 and all other uses appear in NONLOCAL as well.  */

But it also means that some of our FNSPECs are wrong now.  I wonder if we can
split this porperty to two different flags like EAF_NOEASCAPE and
EAF_INDIRECT_NOESCAPE?

Auto-detecting EAF_INDIRECT_NOESCAPE seems bit harder at least assuming
that any read can actually load few bits of pointer possibly written
there beause if simple member functions accesses values pointer by THIS
and return them we lose a track if the returned value is an escape point
I would say.

The difference in constraints are:

--- good/a.c.035t.ealias2020-11-08 13:34:04.397499515 +0100
+++ a.c.035t.ealias 2020-11-08 13:39:15.483438469 +0100
@@ -106,7 +106,15 @@
 size = NONLOCAL
 size.0_1 = size
 _2 = size.0_1
-i3p = i4p
+callarg(18) = 
+callarg(18) = callarg(18) + UNKNOWN
+CALLUSED(16) = callarg(18)
+CALLCLOBBERED(17) = callarg(18)
+*callarg(18) = NONLOCAL
+callarg(19) = 
+callarg(19) = callarg(19) + UNKNOWN
+CALLUSED(16) = callarg(19)
+ESCAPED = _2
 i3p.1_3 = i3p
 i4p.2_4 = i4p
 ESCAPED = 

...

 Call clobber information
 
-ESCAPED, points-to NULL, points-to vars: { }
+ESCAPED, points-to non-local, points-to NULL, points-to vars: { }
 
 Flow-insensitive points-to information
 
-i3p.1_3, points-to NULL, points-to vars: { D.1947 D.1948 }
+i3p.1_3, points-to non-local, points-to escaped, points-to NULL, points-to 
vars: { D.1947 }
 i4p.2_4, points-to NULL, points-to vars: { D.1948 }
 
 main ()

Honza