Re: GSoC Project Ideas

2019-04-17 Thread Martin Sebor

On 4/17/19 8:57 AM, Joseph Myers wrote:

On Mon, 1 Apr 2019, Patrick Palka wrote:


A possibly related project is to "defer" output of diagnostics
until we know the stmt/expression we emit it for survived dead
code elimination.  Here there's the question what to key the
diagnostic off and how to move it (that is, detect if the code
causing it really fully went dead).


Interesting.  Which diagnostics would you have in mind to defer in this way?


For example, where the C front end does folding early for diagnostics such
as -Wsign-compare, and then makes its own limited attempt to see if e.g.
an expression of signed type must have nonnegative value.  It would be
appropriate for the diagnostic to be done later (so more powerful
optimizations can be used to tell if the value in fact is in a range
meaning the diagnostic is not needed, or if it's in dead code), and that
would reduce the number of places the C front end needs to call
c_fully_fold (with a following call to c_wrap_maybe_const to avoid
repeated recursive folding of the same trees).


Besides the front-ends, diagnostics currently issued during folding
(-Wrestrict, -Wstringop-overflow, and -Wstringop-truncation) would
also benefit from the same approach.  For instance, the warning below
could be avoided.  (The warning has to be issued from the folder in
order to detect the problem before the strncpy call is turned into
memcpy or MEM_REF.)

Martin

$ cat a.c && gcc -O -S -Wall -Wextra -fdump-tree-gimple=/dev/stdout 
-o/dev/stdout a.c

void f (char *d, unsigned i)
{
  if (i & (1LU << 34))
   __builtin_strncpy (d, "123", 3);
}
.file   "a.c"
f (char * d, unsigned int i)
{
  _1 = (long unsigned int) i;
  _2 = _1 & 17179869184;
  if (_2 != 0) goto ; else goto ;
  :
  __builtin_strncpy (d, "123", 3);
  :
}


a.c: In function ‘f’:
a.c:4:4: warning: ‘__builtin_strncpy’ output truncated before 
terminating nul copying 3 bytes from a string of the same length 
[-Wstringop-truncation]

4 |__builtin_strncpy (d, "123", 3);
  |^~~
.text
.globl  f
.type   f, @function
f:
.LFB0:
.cfi_startproc
ret
    .cfi_endproc
.LFE0:
.size   f, .-f
.ident  "GCC: (GNU) 9.0.1 20190417 (experimental)"
.section.note.GNU-stack,"",@progbits


Re: C provenance semantics proposal

2019-04-17 Thread Peter Sewell
On Wed, 17 Apr 2019 at 15:12, Uecker, Martin
 wrote:
>
> Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> >  wrote:
> > >
> > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > >  wrote:
> > > > >
> > > > > >  Since
> > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > >
> > > > > This is correct. What the proposal does though is put a limit
> > > > > on where pointers obtained from integers are allowed to point
> > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > "anything" provenances also cannot point to all possible
> > > > > objects.
> > > >
> > > > Yes.  We exclude objects that do not have their address taken
> > > > though (so somewhat similar to your "exposed").
> > >
> > > Also if the address never escapes?
> >
> > Yes.

Just for reference, here's Richard's above example in Cerberus,
changed slightly so the Cerberus default allocator puts b after a,
One can step through to see that this execution is allowed.

https://cerberus.cl.cam.ac.uk/?short/9b83be



> Then with respect to "expose" it seems GCC implements
> a superset which means it allows some behavior which
> is undefined according to the proposal. So all seems
> well with respect to this part.

y

For the float thing (which is truly horrible :-), that reinterpretation
of a pointer value as two floats would also (in our terms) expose
the pointer, so the conversion the other way would just work.

> With respect to tracking provenance through integers
> some changes might be required.
>
> Let's consider this example:
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>int* p = (int*)pj; // can be one-after pointer of 'x'
>p[-1] = 1; // well defined?
> }
>
> If I understand correctly, a pointer obtained from
> pi + 4 would have a "anything" provenance (which is
> fine). But the pointer obtained from 'pj' would have the
> provenance of 'y' so the access to 'x' would not
> be allowed. But according to the preferred version of
> our proposal, the pointer could also be used to
> access 'x' because it is also exposed.

That's an interesting example.  Here's a live version
in Cerberus, which shows (at step 19) the ambiguous
provenance of the result of the (int*) cast, resolved
in the next steps by the pointer subtraction.

https://cerberus.cl.cam.ac.uk/?short/a23efd

So it's allowed in PNVI-ae-udi.  For interest,
PNVI-plain makes this UB (as one can see by changing
the selected option under the Model dropdown).

> GCC could make pj have a "anything" provenance
> even though it is not modified. (This would break
> some optimization such as the one for Matlab.)
>
> Maybe one could also refine this optimization to check
> for additional conditions which rule out the case
> that there is another object the pointer could point
> to.
>
> Best,
> Martin


Re: Putting an all-zero variable into BSS

2019-04-17 Thread Joseph Myers
On Mon, 8 Apr 2019, Richard Biener wrote:

> That is, the C testcase
> 
> const char x[1024*1024] = {};
> 
> reproduces the "issue".  The comment in bss_initializer_p though
> explicitely says
> 
>   /* Do not put non-common constants into the .bss section, they belong in
>  a readonly section, except when NAMED is true.  */
>   return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named)
> 
> (where named refers to explicit .bss section marked decls).  Note
> the docs for -fzero-initialized-in-bss doesn't mention that this doesn't
> apply for readonly variables.

I'd say it's a security risk (breaks expected hardening properties) for a 
const variable with static storage duration not to end up in read-only 
memory, regardless of its size or contents, so such variables should not 
be put in BSS by default.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: vector alignment

2019-04-17 Thread Joseph Myers
On Wed, 3 Apr 2019, Richard Biener wrote:

> > 1) Is there some reason to align vectors on the same boundary
> > as their size no matter how big it is?  I can't find such
> > a requirement in the ABIs I looked at.  Or would it be more
> > appropriate to align the big ones on the preferred boundary
> > for the target?  For instance, does it make more sense to
> > align a 64KB vector on a 64KB boundary than on, say,
> > a 64-byte boundary (or some other boundary less than 64K?)
> 
> I don't think there's a good reason.  Instead I think that
> BIGGEST_ALIGNMENT is what we should go for as upper limit,
> anything bigger doesn't make sense (unless the user explicitely
> requests it).

It's plausible that you'd want the ABI for a given vector type not to 
depend on which architecture set extensions are enabled, but those 
extensions might affect BIGGEST_ALIGNMENT (or the largest alignment 
meaningful for any instruction, which is not necessarily the same as 
BIGGEST_ALIGNMENT; Arm has some vector instructions that can use 
alignments larger than BIGGEST_ALIGNMENT, for example).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: GSoC Project Ideas

2019-04-17 Thread Joseph Myers
On Mon, 1 Apr 2019, Patrick Palka wrote:

> > > > A possibly related project is to "defer" output of diagnostics 
> > > > until we know the stmt/expression we emit it for survived dead 
> > > > code elimination.  Here there's the question what to key the 
> > > > diagnostic off and how to move it (that is, detect if the code 
> > > > causing it really fully went dead).
> 
> Interesting.  Which diagnostics would you have in mind to defer in this way?

For example, where the C front end does folding early for diagnostics such 
as -Wsign-compare, and then makes its own limited attempt to see if e.g. 
an expression of signed type must have nonnegative value.  It would be 
appropriate for the diagnostic to be done later (so more powerful 
optimizations can be used to tell if the value in fact is in a range 
meaning the diagnostic is not needed, or if it's in dead code), and that 
would reduce the number of places the C front end needs to call 
c_fully_fold (with a following call to c_wrap_maybe_const to avoid 
repeated recursive folding of the same trees).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C provenance semantics proposal

2019-04-17 Thread Uecker, Martin
Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
>  wrote:
> > 
> > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > >  wrote:
> > > > 
> > > > >  Since
> > > > > your proposal is based on an abstract machine there isn't anything
> > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > pointers with no provenance (pointing outside of any object), right?
> > > > 
> > > > This is correct. What the proposal does though is put a limit
> > > > on where pointers obtained from integers are allowed to point
> > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > "anything" provenances also cannot point to all possible
> > > > objects.
> > > 
> > > Yes.  We exclude objects that do not have their address taken
> > > though (so somewhat similar to your "exposed").
> > 
> > Also if the address never escapes?
> 
> Yes.

Then with respect to "expose" it seems GCC implements
a superset which means it allows some behavior which
is undefined according to the proposal. So all seems
well with respect to this part.


With respect to tracking provenance through integers
some changes might be required.

Let's consider this example:
   
int x;
int y;
uintptr_t pi = (uintptr_t)&x;
uintptr_t pj = (uintptr_t)&y;
 
if (pi + 4 == pj) {

   int* p = (int*)pj; // can be one-after pointer of 'x'
   p[-1] = 1; // well defined?
}

If I understand correctly, a pointer obtained from
pi + 4 would have a "anything" provenance (which is
fine). But the pointer obtained from 'pj' would have the
provenance of 'y' so the access to 'x' would not
be allowed. But according to the preferred version of
our proposal, the pointer could also be used to
access 'x' because it is also exposed.

GCC could make pj have a "anything" provenance
even though it is not modified. (This would break 
some optimization such as the one for Matlab.)

Maybe one could also refine this optimization to check
for additional conditions which rule out the case
that there is another object the pointer could point
to.

Best,
Martin

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
 wrote:
>
> Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> >  wrote:
>
> > >
> > > >  Since
> > > > your proposal is based on an abstract machine there isn't anything
> > > > like a pointer with multiple provenances (which "anything" is), just
> > > > pointers with no provenance (pointing outside of any object), right?
> > >
> > > This is correct. What the proposal does though is put a limit
> > > on where pointers obtained from integers are allowed to point
> > > to: They cannot point to non-exposed objects. I assume GCC
> > > "anything" provenances also cannot point to all possible
> > > objects.
> >
> > Yes.  We exclude objects that do not have their address taken
> > though (so somewhat similar to your "exposed").
>
> Also if the address never escapes?

Yes.

> Using address-taken as the criterion is one option we considered,
> but we felt this exposes too many objects, like automatic
> arrays or locally used malloced/alloced data etc.
>
> Using integer-casts as criterion means that all
> objects whose address is taken but where (a) it is not
> seen that the pointer is cast to an integer and
> where (b) the pointer never escapes can be assumed safe.

Yeah, since the abstract machine sees everything using whatever
seems fit is possible.

Richard.

> Best,
> Martin


Re: C provenance semantics proposal

2019-04-17 Thread Uecker, Martin
Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
>  wrote:

> > 
> > >  Since
> > > your proposal is based on an abstract machine there isn't anything
> > > like a pointer with multiple provenances (which "anything" is), just
> > > pointers with no provenance (pointing outside of any object), right?
> > 
> > This is correct. What the proposal does though is put a limit
> > on where pointers obtained from integers are allowed to point
> > to: They cannot point to non-exposed objects. I assume GCC
> > "anything" provenances also cannot point to all possible
> > objects.
> 
> Yes.  We exclude objects that do not have their address taken
> though (so somewhat similar to your "exposed").

Also if the address never escapes?

Using address-taken as the criterion is one option we considered,
but we felt this exposes too many objects, like automatic
arrays or locally used malloced/alloced data etc.

Using integer-casts as criterion means that all
objects whose address is taken but where (a) it is not
seen that the pointer is cast to an integer and
where (b) the pointer never escapes can be assumed safe.

Best,
Martin

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
 wrote:
>
>
> Hi Richard,
>
> Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell  
> > wrote:
> > >
> > > On 17/04/2019, Richard Biener  wrote:
> > > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell 
> > > > wrote:
>
> ...
> > > > So this is not what GCC implements which tracks provenance through
> > > > non-pointer types to a limited extent when only copying is taking place.
> > > >
> > > > Your proposal makes
> > > >
> > > >  int a, b;
> > > >  int *p = &a;
> > > >  int *q = &b;
> > > >  uintptr_t pi = (uintptr_t)p; //expose
> > > >  uintptr_t qi = (uintptr_t)q; //expose
> > > >  pi += 4;
> > > >  if (pi == qi)
> > > >*(int *)pi = 1;
> > > >
> > > > well-defined since (int *)pi now has the provenance of &b.
> > >
> > > Yes.  (Just to be clear: it's not that we think the above example is
> > > desirable in itself, but it's well-defined as a consequence of what
> > > we do to make other common idioms, eg pointer bit manipulation,
> > > well-defined.)
> > >
> > > > Note GCC, when tracking provenance of non-pointer type
> > > > adds like in
> > > >
> > > >   int *p = &a;
> > > >   uintptr_t pi = (uintptr_t)p;
> > > >   pi += 4;
> > > >
> > > > considers pi to have provenance "anything" (not sure if you
> > > > have something like that) since we add 4 which has provenance
> > > > "anything" to pi which has provenance &a.
> > >
> > > We don't at present have a provenance "anything", but if the gcc
> > > "anything" means that it's assumed that it might alias with anything,
> > > then it looks like gcc's implementing a sound approximation to
> > > the proposal here?
> >
> > GCC makes the code well-defined whereas the proposal would make
> > dereferencing a pointer based on pi invoke undefined behavior?
>
> No, if there is an exposed object where pi points to, it is
> defined behaviour.
>
> >  Since
> > your proposal is based on an abstract machine there isn't anything
> > like a pointer with multiple provenances (which "anything" is), just
> > pointers with no provenance (pointing outside of any object), right?
>
> This is correct. What the proposal does though is put a limit
> on where pointers obtained from integers are allowed to point
> to: They cannot point to non-exposed objects. I assume GCC
> "anything" provenances also cannot point to all possible
> objects.

Yes.  We exclude objects that do not have their address taken
though (so somewhat similar to your "exposed").

> > For points-to analysis we of course have to track all possible
> > provenances of a pointer (and if we know it doesn't point inside
> > any object we make it point to nothing).
>
> Yes, a compiler should track what it knows (it could also track
> if it knows that some pointers point to the same object, etc.)
> while the abstract machine knows everything there is to know.
>
> > Btw, GCC changed its behavior here to support optimizing matlab
> > generated C code which passes pointers to arrays across functions
> > by marshalling them in two float typed halves (yikes!).  GCC is able
> > to properly track provenance across the decomposition / recomposition
> > when doing points-to analysis ;)
>
> Impressive ;-)  I would have thought that such encoding
> happens at ABI boundaries, where you cannot track anyway.
> But this seems to occur inside compiled code?

It occurs when matlab generates C code for an expression.  They
seem to use floating-point for everything at their self-invented ABI
boundary so "obviously" that includes pointers.

> While we do not attach a provenance to integers
> in our proposal, it does not necessarily imply that a compiler
> is not allowed to track such information. It then depends on
> how it uses it.
>
> For example,
>
> int z;
> int x;
> uintptr_t pi = (uintptr_t)&x;
>
> // encode in two floats ;-)
>
> // pass floats around
>
> // decode
>
> int* p = (int*)pi;
>
> If the compiler can prove that the address is still
> the same, it can also reattach the original provenance
> under some conditions.
>
> But there is a caveat: It can only do this is it cannot
> also be  a one-after pointer for z (or some other object).
> If the address of 'z' is not exposed, it may be able to
> assume this.
>
> > Btw, one thing GCC struggles is when it applies rules that clearly
> > apply to pointer dereferences to pointer equality compares where
> > the standard has that special casing of comparing two pointers
> > where one points one after an object requiring the comparison
> > to evaluate to true when the objects are adjacent.  GCC
> > currently statically optimizes if (&x + 1 == &y) to false for
> > this reason (but not the corresponding integer comparison).
>
> Yes, according to the current rules (and this doesn't change in
> the proposal) two points comparing equal does not imply that
> they are interchangable. Making the comparison unspecified
> (as C++) would not help. We could make it undefined, which
> would m

Re: C provenance semantics proposal

2019-04-17 Thread Uecker, Martin

Hi Richard,

Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener:
> On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell  
> wrote:
> > 
> > On 17/04/2019, Richard Biener  wrote:
> > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell 
> > > wrote:

...
> > > So this is not what GCC implements which tracks provenance through
> > > non-pointer types to a limited extent when only copying is taking place.
> > > 
> > > Your proposal makes
> > > 
> > >  int a, b;
> > >  int *p = &a;
> > >  int *q = &b;
> > >  uintptr_t pi = (uintptr_t)p; //expose
> > >  uintptr_t qi = (uintptr_t)q; //expose
> > >  pi += 4;
> > >  if (pi == qi)
> > >    *(int *)pi = 1;
> > > 
> > > well-defined since (int *)pi now has the provenance of &b.
> > 
> > Yes.  (Just to be clear: it's not that we think the above example is
> > desirable in itself, but it's well-defined as a consequence of what
> > we do to make other common idioms, eg pointer bit manipulation,
> > well-defined.)
> > 
> > > Note GCC, when tracking provenance of non-pointer type
> > > adds like in
> > > 
> > >   int *p = &a;
> > >   uintptr_t pi = (uintptr_t)p;
> > >   pi += 4;
> > > 
> > > considers pi to have provenance "anything" (not sure if you
> > > have something like that) since we add 4 which has provenance
> > > "anything" to pi which has provenance &a.
> > 
> > We don't at present have a provenance "anything", but if the gcc
> > "anything" means that it's assumed that it might alias with anything,
> > then it looks like gcc's implementing a sound approximation to
> > the proposal here?
> 
> GCC makes the code well-defined whereas the proposal would make
> dereferencing a pointer based on pi invoke undefined behavior?

No, if there is an exposed object where pi points to, it is
defined behaviour. 

>  Since
> your proposal is based on an abstract machine there isn't anything
> like a pointer with multiple provenances (which "anything" is), just
> pointers with no provenance (pointing outside of any object), right?

This is correct. What the proposal does though is put a limit
on where pointers obtained from integers are allowed to point
to: They cannot point to non-exposed objects. I assume GCC
"anything" provenances also cannot point to all possible
objects.

> For points-to analysis we of course have to track all possible
> provenances of a pointer (and if we know it doesn't point inside
> any object we make it point to nothing).

Yes, a compiler should track what it knows (it could also track
if it knows that some pointers point to the same object, etc.)
while the abstract machine knows everything there is to know.

> Btw, GCC changed its behavior here to support optimizing matlab
> generated C code which passes pointers to arrays across functions
> by marshalling them in two float typed halves (yikes!).  GCC is able
> to properly track provenance across the decomposition / recomposition
> when doing points-to analysis ;)

Impressive ;-)  I would have thought that such encoding
happens at ABI boundaries, where you cannot track anyway.
But this seems to occur inside compiled code?

While we do not attach a provenance to integers
in our proposal, it does not necessarily imply that a compiler
is not allowed to track such information. It then depends on
how it uses it.

For example,

int z;
int x;
uintptr_t pi = (uintptr_t)&x;

// encode in two floats ;-)

// pass floats around

// decode

int* p = (int*)pi;

If the compiler can prove that the address is still
the same, it can also reattach the original provenance
under some conditions.

But there is a caveat: It can only do this is it cannot
also be  a one-after pointer for z (or some other object).
If the address of 'z' is not exposed, it may be able to
assume this.

> Btw, one thing GCC struggles is when it applies rules that clearly
> apply to pointer dereferences to pointer equality compares where
> the standard has that special casing of comparing two pointers
> where one points one after an object requiring the comparison
> to evaluate to true when the objects are adjacent.  GCC
> currently statically optimizes if (&x + 1 == &y) to false for
> this reason (but not the corresponding integer comparison).

Yes, according to the current rules (and this doesn't change in
the proposal) two points comparing equal does not imply that
they are interchangable. Making the comparison unspecified 
(as C++) would not help. We could make it undefined, which
would make all optimizations based on the assumption that
the pointer are interchangable valid. But I fear that this
would introduce a corner case that could lead to subtle
and hard-to-detect bugs.

Martin

> Richard.
> 
> > 
> > best,
> > Peter
> > 
> > 
> > > > The user-disambiguation refinement adds some complexity but supports
> > > > roundtrip casts, from pointer to integer and back, of pointers that
> > > > are one-past a storage instance.
> 
> 

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell  wrote:
>
> On 17/04/2019, Richard Biener  wrote:
> > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell 
> > wrote:
> >>
> >> On Fri, 12 Apr 2019 at 15:51, Jeff Law  wrote:
> >> >
> >> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> >> > > Dear all,
> >> > >
> >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> >> > > (the WG14 C memory object model study group) now
> >> > > have a detailed proposal for pointer provenance semantics, refining
> >> > > the "provenance not via integers (PNVI)" model presented there.
> >> > > This will be discussed at the ISO WG14 C standards committee at the
> >> > > end of April, and comments from the GCC community before then would
> >> > > be very welcome.   The proposal reconciles the needs of existing code
> >> > > and the behaviour of existing compilers as well as we can, but it
> >> > > doesn't
> >> > > exactly match any of the latter, so we'd especially like to know
> >> > > whether
> >> > > it would be feasible to implement - our hope is that it would only
> >> > > require
> >> > > minor changes.  It's presented in three documents:
> >> > >
> >> > > N2362  Moving to a provenance-aware memory model for C: proposal for
> >> > > C2x
> >> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> >> > > This introduces the proposal and gives the proposed change to the
> >> > > standard
> >> > > text, presented as change-highlighted pages of the standard
> >> > > (though one might want to read the N2363 examples before going into
> >> > > that).
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> >> > >
> >> > > N2363  C provenance semantics: examples.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt,
> >> > > Martin Uecker.
> >> > > This explains the proposal and its design choices with discussion of
> >> > > a
> >> > > series of examples.
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> >> > >
> >> > > N2364  C provenance semantics: detailed semantics.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> >> > > This gives a detailed mathematical semantics for the proposal
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> >> > >
> >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> >> > > executable version of the semantics, with a web interface that
> >> > > allows one to explore and visualise the behaviour of small test
> >> > > programs, stepping through and seeing the abstract-machine
> >> > > memory state including provenance information.   N2363 compares
> >> > > the results of this for the example programs with gcc, clang, and icc
> >> > > results, though the tests are really intended as tests of the
> >> > > semantics
> >> > > rather than compiler tests, so one has to interpret this with care.
> >> > THanks.  I just noticed this came up in EuroLLVM as well.Getting
> >> > some standards clarity in this space would be good.
> >> >
> >> > Richi is in the best position to cover for GCC, but I suspect he's
> >> > buried with gcc-9 issues as we approach the upcoming release.
> >> > Hopefully
> >> > he'll have time to review this once crunch time has past.  I think more
> >> > than anything sanity checking the proposal's requirements vs what can
> >> > be
> >> > reasonably implmemented is most important at this stage.
> >>
> >> Indeed.  We talked with him at the GNU cauldron, without uncovering
> >> any serious problems, but more detailed review from an implementability
> >> point of view would be great.   For the UB mailing list we just made
> >> a brief plain-text summary of the proposal (leaving out all the examples
> >> and standards diff, and glossing over some details).  I'll paste that
> >> in below in case it's helpful.  The next WG14 meeting is the week of
> >> April 29; comments before then would be particularly useful if that's
> >> possible.
> >>
> >> best,
> >> Peter
> >>
> >> C pointer values are typically represented at runtime as simple
> >> concrete numeric values, but mainstream compilers routinely exploit
> >> information about the "provenance" of pointers to reason that they
> >> cannot alias, and hence to justify optimisations.  This is
> >> long-standing practice, but exactly what it means (what programmers
> >> can rely on, and what provenance-based alias analysis is allowed to
> >> do), has never been nailed down.   That's what the proposal does.
> >>
> >>
> >> The basic idea is to associate a *provenance* with every pointer
> >> value, identifying the original storage instance (or allocation, in
> >> other words) that the pointer is derived from.  In more detail:
> >>
> >> - We take abstract-machine pointer values to be pairs (pi,a), adding a
> >>   provenance pi, either @i where i is a storage instance ID, or the
> >>   *empty* provenance, to their concrete address a.
> >>
> >> - On every storage i

Re: C provenance semantics proposal

2019-04-17 Thread Peter Sewell
On 17/04/2019, Richard Biener  wrote:
> On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell 
> wrote:
>>
>> On Fri, 12 Apr 2019 at 15:51, Jeff Law  wrote:
>> >
>> > On 4/2/19 2:11 AM, Peter Sewell wrote:
>> > > Dear all,
>> > >
>> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
>> > > (the WG14 C memory object model study group) now
>> > > have a detailed proposal for pointer provenance semantics, refining
>> > > the "provenance not via integers (PNVI)" model presented there.
>> > > This will be discussed at the ISO WG14 C standards committee at the
>> > > end of April, and comments from the GCC community before then would
>> > > be very welcome.   The proposal reconciles the needs of existing code
>> > > and the behaviour of existing compilers as well as we can, but it
>> > > doesn't
>> > > exactly match any of the latter, so we'd especially like to know
>> > > whether
>> > > it would be feasible to implement - our hope is that it would only
>> > > require
>> > > minor changes.  It's presented in three documents:
>> > >
>> > > N2362  Moving to a provenance-aware memory model for C: proposal for
>> > > C2x
>> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
>> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
>> > > This introduces the proposal and gives the proposed change to the
>> > > standard
>> > > text, presented as change-highlighted pages of the standard
>> > > (though one might want to read the N2363 examples before going into
>> > > that).
>> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
>> > >
>> > > N2363  C provenance semantics: examples.
>> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt,
>> > > Martin Uecker.
>> > > This explains the proposal and its design choices with discussion of
>> > > a
>> > > series of examples.
>> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
>> > >
>> > > N2364  C provenance semantics: detailed semantics.
>> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
>> > > This gives a detailed mathematical semantics for the proposal
>> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
>> > >
>> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
>> > > executable version of the semantics, with a web interface that
>> > > allows one to explore and visualise the behaviour of small test
>> > > programs, stepping through and seeing the abstract-machine
>> > > memory state including provenance information.   N2363 compares
>> > > the results of this for the example programs with gcc, clang, and icc
>> > > results, though the tests are really intended as tests of the
>> > > semantics
>> > > rather than compiler tests, so one has to interpret this with care.
>> > THanks.  I just noticed this came up in EuroLLVM as well.Getting
>> > some standards clarity in this space would be good.
>> >
>> > Richi is in the best position to cover for GCC, but I suspect he's
>> > buried with gcc-9 issues as we approach the upcoming release.
>> > Hopefully
>> > he'll have time to review this once crunch time has past.  I think more
>> > than anything sanity checking the proposal's requirements vs what can
>> > be
>> > reasonably implmemented is most important at this stage.
>>
>> Indeed.  We talked with him at the GNU cauldron, without uncovering
>> any serious problems, but more detailed review from an implementability
>> point of view would be great.   For the UB mailing list we just made
>> a brief plain-text summary of the proposal (leaving out all the examples
>> and standards diff, and glossing over some details).  I'll paste that
>> in below in case it's helpful.  The next WG14 meeting is the week of
>> April 29; comments before then would be particularly useful if that's
>> possible.
>>
>> best,
>> Peter
>>
>> C pointer values are typically represented at runtime as simple
>> concrete numeric values, but mainstream compilers routinely exploit
>> information about the "provenance" of pointers to reason that they
>> cannot alias, and hence to justify optimisations.  This is
>> long-standing practice, but exactly what it means (what programmers
>> can rely on, and what provenance-based alias analysis is allowed to
>> do), has never been nailed down.   That's what the proposal does.
>>
>>
>> The basic idea is to associate a *provenance* with every pointer
>> value, identifying the original storage instance (or allocation, in
>> other words) that the pointer is derived from.  In more detail:
>>
>> - We take abstract-machine pointer values to be pairs (pi,a), adding a
>>   provenance pi, either @i where i is a storage instance ID, or the
>>   *empty* provenance, to their concrete address a.
>>
>> - On every storage instance creation (of objects with static, thread,
>>   automatic, and allocated storage duration), the abstract machine
>>   nondeterministically chooses a fresh storage instance ID i (unique
>>   across the entire execution), and the resulting pointer

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell  wrote:
>
> On Fri, 12 Apr 2019 at 15:51, Jeff Law  wrote:
> >
> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> > > Dear all,
> > >
> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> > > (the WG14 C memory object model study group) now
> > > have a detailed proposal for pointer provenance semantics, refining
> > > the "provenance not via integers (PNVI)" model presented there.
> > > This will be discussed at the ISO WG14 C standards committee at the
> > > end of April, and comments from the GCC community before then would
> > > be very welcome.   The proposal reconciles the needs of existing code
> > > and the behaviour of existing compilers as well as we can, but it doesn't
> > > exactly match any of the latter, so we'd especially like to know whether
> > > it would be feasible to implement - our hope is that it would only require
> > > minor changes.  It's presented in three documents:
> > >
> > > N2362  Moving to a provenance-aware memory model for C: proposal for C2x
> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> > > This introduces the proposal and gives the proposed change to the standard
> > > text, presented as change-highlighted pages of the standard
> > > (though one might want to read the N2363 examples before going into that).
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> > >
> > > N2363  C provenance semantics: examples.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin 
> > > Uecker.
> > > This explains the proposal and its design choices with discussion of a
> > > series of examples.
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> > >
> > > N2364  C provenance semantics: detailed semantics.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> > > This gives a detailed mathematical semantics for the proposal
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> > >
> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> > > executable version of the semantics, with a web interface that
> > > allows one to explore and visualise the behaviour of small test
> > > programs, stepping through and seeing the abstract-machine
> > > memory state including provenance information.   N2363 compares
> > > the results of this for the example programs with gcc, clang, and icc
> > > results, though the tests are really intended as tests of the semantics
> > > rather than compiler tests, so one has to interpret this with care.
> > THanks.  I just noticed this came up in EuroLLVM as well.Getting
> > some standards clarity in this space would be good.
> >
> > Richi is in the best position to cover for GCC, but I suspect he's
> > buried with gcc-9 issues as we approach the upcoming release.  Hopefully
> > he'll have time to review this once crunch time has past.  I think more
> > than anything sanity checking the proposal's requirements vs what can be
> > reasonably implmemented is most important at this stage.
>
> Indeed.  We talked with him at the GNU cauldron, without uncovering
> any serious problems, but more detailed review from an implementability
> point of view would be great.   For the UB mailing list we just made
> a brief plain-text summary of the proposal (leaving out all the examples
> and standards diff, and glossing over some details).  I'll paste that
> in below in case it's helpful.  The next WG14 meeting is the week of
> April 29; comments before then would be particularly useful if that's 
> possible.
>
> best,
> Peter
>
> C pointer values are typically represented at runtime as simple
> concrete numeric values, but mainstream compilers routinely exploit
> information about the "provenance" of pointers to reason that they
> cannot alias, and hence to justify optimisations.  This is
> long-standing practice, but exactly what it means (what programmers
> can rely on, and what provenance-based alias analysis is allowed to
> do), has never been nailed down.   That's what the proposal does.
>
>
> The basic idea is to associate a *provenance* with every pointer
> value, identifying the original storage instance (or allocation, in
> other words) that the pointer is derived from.  In more detail:
>
> - We take abstract-machine pointer values to be pairs (pi,a), adding a
>   provenance pi, either @i where i is a storage instance ID, or the
>   *empty* provenance, to their concrete address a.
>
> - On every storage instance creation (of objects with static, thread,
>   automatic, and allocated storage duration), the abstract machine
>   nondeterministically chooses a fresh storage instance ID i (unique
>   across the entire execution), and the resulting pointer value
>   carries that single storage instance ID as its provenance @i.
>
> - Provenance is preserved by pointer arithmetic that adds or subtracts
>   an integer to a pointer.
>
> - At any acc