Re: GSoC Project Ideas
On 4/17/19 8:57 AM, Joseph Myers wrote: On Mon, 1 Apr 2019, Patrick Palka wrote: A possibly related project is to "defer" output of diagnostics until we know the stmt/expression we emit it for survived dead code elimination. Here there's the question what to key the diagnostic off and how to move it (that is, detect if the code causing it really fully went dead). Interesting. Which diagnostics would you have in mind to defer in this way? For example, where the C front end does folding early for diagnostics such as -Wsign-compare, and then makes its own limited attempt to see if e.g. an expression of signed type must have nonnegative value. It would be appropriate for the diagnostic to be done later (so more powerful optimizations can be used to tell if the value in fact is in a range meaning the diagnostic is not needed, or if it's in dead code), and that would reduce the number of places the C front end needs to call c_fully_fold (with a following call to c_wrap_maybe_const to avoid repeated recursive folding of the same trees). Besides the front-ends, diagnostics currently issued during folding (-Wrestrict, -Wstringop-overflow, and -Wstringop-truncation) would also benefit from the same approach. For instance, the warning below could be avoided. (The warning has to be issued from the folder in order to detect the problem before the strncpy call is turned into memcpy or MEM_REF.) Martin $ cat a.c && gcc -O -S -Wall -Wextra -fdump-tree-gimple=/dev/stdout -o/dev/stdout a.c void f (char *d, unsigned i) { if (i & (1LU << 34)) __builtin_strncpy (d, "123", 3); } .file "a.c" f (char * d, unsigned int i) { _1 = (long unsigned int) i; _2 = _1 & 17179869184; if (_2 != 0) goto ; else goto ; : __builtin_strncpy (d, "123", 3); : } a.c: In function ‘f’: a.c:4:4: warning: ‘__builtin_strncpy’ output truncated before terminating nul copying 3 bytes from a string of the same length [-Wstringop-truncation] 4 |__builtin_strncpy (d, "123", 3); |^~~ .text .globl f .type f, @function f: .LFB0: .cfi_startproc ret .cfi_endproc .LFE0: .size f, .-f .ident "GCC: (GNU) 9.0.1 20190417 (experimental)" .section.note.GNU-stack,"",@progbits
Re: C provenance semantics proposal
On Wed, 17 Apr 2019 at 15:12, Uecker, Martin wrote: > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > wrote: > > > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > > > wrote: > > > > > > > > > > > Since > > > > > > your proposal is based on an abstract machine there isn't anything > > > > > > like a pointer with multiple provenances (which "anything" is), just > > > > > > pointers with no provenance (pointing outside of any object), right? > > > > > > > > > > This is correct. What the proposal does though is put a limit > > > > > on where pointers obtained from integers are allowed to point > > > > > to: They cannot point to non-exposed objects. I assume GCC > > > > > "anything" provenances also cannot point to all possible > > > > > objects. > > > > > > > > Yes. We exclude objects that do not have their address taken > > > > though (so somewhat similar to your "exposed"). > > > > > > Also if the address never escapes? > > > > Yes. Just for reference, here's Richard's above example in Cerberus, changed slightly so the Cerberus default allocator puts b after a, One can step through to see that this execution is allowed. https://cerberus.cl.cam.ac.uk/?short/9b83be > Then with respect to "expose" it seems GCC implements > a superset which means it allows some behavior which > is undefined according to the proposal. So all seems > well with respect to this part. y For the float thing (which is truly horrible :-), that reinterpretation of a pointer value as two floats would also (in our terms) expose the pointer, so the conversion the other way would just work. > With respect to tracking provenance through integers > some changes might be required. > > Let's consider this example: > > int x; > int y; > uintptr_t pi = (uintptr_t)&x; > uintptr_t pj = (uintptr_t)&y; > > if (pi + 4 == pj) { > >int* p = (int*)pj; // can be one-after pointer of 'x' >p[-1] = 1; // well defined? > } > > If I understand correctly, a pointer obtained from > pi + 4 would have a "anything" provenance (which is > fine). But the pointer obtained from 'pj' would have the > provenance of 'y' so the access to 'x' would not > be allowed. But according to the preferred version of > our proposal, the pointer could also be used to > access 'x' because it is also exposed. That's an interesting example. Here's a live version in Cerberus, which shows (at step 19) the ambiguous provenance of the result of the (int*) cast, resolved in the next steps by the pointer subtraction. https://cerberus.cl.cam.ac.uk/?short/a23efd So it's allowed in PNVI-ae-udi. For interest, PNVI-plain makes this UB (as one can see by changing the selected option under the Model dropdown). > GCC could make pj have a "anything" provenance > even though it is not modified. (This would break > some optimization such as the one for Matlab.) > > Maybe one could also refine this optimization to check > for additional conditions which rule out the case > that there is another object the pointer could point > to. > > Best, > Martin
Re: Putting an all-zero variable into BSS
On Mon, 8 Apr 2019, Richard Biener wrote: > That is, the C testcase > > const char x[1024*1024] = {}; > > reproduces the "issue". The comment in bss_initializer_p though > explicitely says > > /* Do not put non-common constants into the .bss section, they belong in > a readonly section, except when NAMED is true. */ > return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named) > > (where named refers to explicit .bss section marked decls). Note > the docs for -fzero-initialized-in-bss doesn't mention that this doesn't > apply for readonly variables. I'd say it's a security risk (breaks expected hardening properties) for a const variable with static storage duration not to end up in read-only memory, regardless of its size or contents, so such variables should not be put in BSS by default. -- Joseph S. Myers jos...@codesourcery.com
Re: vector alignment
On Wed, 3 Apr 2019, Richard Biener wrote: > > 1) Is there some reason to align vectors on the same boundary > > as their size no matter how big it is? I can't find such > > a requirement in the ABIs I looked at. Or would it be more > > appropriate to align the big ones on the preferred boundary > > for the target? For instance, does it make more sense to > > align a 64KB vector on a 64KB boundary than on, say, > > a 64-byte boundary (or some other boundary less than 64K?) > > I don't think there's a good reason. Instead I think that > BIGGEST_ALIGNMENT is what we should go for as upper limit, > anything bigger doesn't make sense (unless the user explicitely > requests it). It's plausible that you'd want the ABI for a given vector type not to depend on which architecture set extensions are enabled, but those extensions might affect BIGGEST_ALIGNMENT (or the largest alignment meaningful for any instruction, which is not necessarily the same as BIGGEST_ALIGNMENT; Arm has some vector instructions that can use alignments larger than BIGGEST_ALIGNMENT, for example). -- Joseph S. Myers jos...@codesourcery.com
Re: GSoC Project Ideas
On Mon, 1 Apr 2019, Patrick Palka wrote: > > > > A possibly related project is to "defer" output of diagnostics > > > > until we know the stmt/expression we emit it for survived dead > > > > code elimination. Here there's the question what to key the > > > > diagnostic off and how to move it (that is, detect if the code > > > > causing it really fully went dead). > > Interesting. Which diagnostics would you have in mind to defer in this way? For example, where the C front end does folding early for diagnostics such as -Wsign-compare, and then makes its own limited attempt to see if e.g. an expression of signed type must have nonnegative value. It would be appropriate for the diagnostic to be done later (so more powerful optimizations can be used to tell if the value in fact is in a range meaning the diagnostic is not needed, or if it's in dead code), and that would reduce the number of places the C front end needs to call c_fully_fold (with a following call to c_wrap_maybe_const to avoid repeated recursive folding of the same trees). -- Joseph S. Myers jos...@codesourcery.com
Re: C provenance semantics proposal
Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > wrote: > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > > wrote: > > > > > > > > > Since > > > > > your proposal is based on an abstract machine there isn't anything > > > > > like a pointer with multiple provenances (which "anything" is), just > > > > > pointers with no provenance (pointing outside of any object), right? > > > > > > > > This is correct. What the proposal does though is put a limit > > > > on where pointers obtained from integers are allowed to point > > > > to: They cannot point to non-exposed objects. I assume GCC > > > > "anything" provenances also cannot point to all possible > > > > objects. > > > > > > Yes. We exclude objects that do not have their address taken > > > though (so somewhat similar to your "exposed"). > > > > Also if the address never escapes? > > Yes. Then with respect to "expose" it seems GCC implements a superset which means it allows some behavior which is undefined according to the proposal. So all seems well with respect to this part. With respect to tracking provenance through integers some changes might be required. Let's consider this example: int x; int y; uintptr_t pi = (uintptr_t)&x; uintptr_t pj = (uintptr_t)&y; if (pi + 4 == pj) { int* p = (int*)pj; // can be one-after pointer of 'x' p[-1] = 1; // well defined? } If I understand correctly, a pointer obtained from pi + 4 would have a "anything" provenance (which is fine). But the pointer obtained from 'pj' would have the provenance of 'y' so the access to 'x' would not be allowed. But according to the preferred version of our proposal, the pointer could also be used to access 'x' because it is also exposed. GCC could make pj have a "anything" provenance even though it is not modified. (This would break some optimization such as the one for Matlab.) Maybe one could also refine this optimization to check for additional conditions which rule out the case that there is another object the pointer could point to. Best, Martin
Re: C provenance semantics proposal
On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin wrote: > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > > wrote: > > > > > > > > Since > > > > your proposal is based on an abstract machine there isn't anything > > > > like a pointer with multiple provenances (which "anything" is), just > > > > pointers with no provenance (pointing outside of any object), right? > > > > > > This is correct. What the proposal does though is put a limit > > > on where pointers obtained from integers are allowed to point > > > to: They cannot point to non-exposed objects. I assume GCC > > > "anything" provenances also cannot point to all possible > > > objects. > > > > Yes. We exclude objects that do not have their address taken > > though (so somewhat similar to your "exposed"). > > Also if the address never escapes? Yes. > Using address-taken as the criterion is one option we considered, > but we felt this exposes too many objects, like automatic > arrays or locally used malloced/alloced data etc. > > Using integer-casts as criterion means that all > objects whose address is taken but where (a) it is not > seen that the pointer is cast to an integer and > where (b) the pointer never escapes can be assumed safe. Yeah, since the abstract machine sees everything using whatever seems fit is possible. Richard. > Best, > Martin
Re: C provenance semantics proposal
Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener: > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin > wrote: > > > > > Since > > > your proposal is based on an abstract machine there isn't anything > > > like a pointer with multiple provenances (which "anything" is), just > > > pointers with no provenance (pointing outside of any object), right? > > > > This is correct. What the proposal does though is put a limit > > on where pointers obtained from integers are allowed to point > > to: They cannot point to non-exposed objects. I assume GCC > > "anything" provenances also cannot point to all possible > > objects. > > Yes. We exclude objects that do not have their address taken > though (so somewhat similar to your "exposed"). Also if the address never escapes? Using address-taken as the criterion is one option we considered, but we felt this exposes too many objects, like automatic arrays or locally used malloced/alloced data etc. Using integer-casts as criterion means that all objects whose address is taken but where (a) it is not seen that the pointer is cast to an integer and where (b) the pointer never escapes can be assumed safe. Best, Martin
Re: C provenance semantics proposal
On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin wrote: > > > Hi Richard, > > Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener: > > On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell > > wrote: > > > > > > On 17/04/2019, Richard Biener wrote: > > > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell > > > > wrote: > > ... > > > > So this is not what GCC implements which tracks provenance through > > > > non-pointer types to a limited extent when only copying is taking place. > > > > > > > > Your proposal makes > > > > > > > > int a, b; > > > > int *p = &a; > > > > int *q = &b; > > > > uintptr_t pi = (uintptr_t)p; //expose > > > > uintptr_t qi = (uintptr_t)q; //expose > > > > pi += 4; > > > > if (pi == qi) > > > >*(int *)pi = 1; > > > > > > > > well-defined since (int *)pi now has the provenance of &b. > > > > > > Yes. (Just to be clear: it's not that we think the above example is > > > desirable in itself, but it's well-defined as a consequence of what > > > we do to make other common idioms, eg pointer bit manipulation, > > > well-defined.) > > > > > > > Note GCC, when tracking provenance of non-pointer type > > > > adds like in > > > > > > > > int *p = &a; > > > > uintptr_t pi = (uintptr_t)p; > > > > pi += 4; > > > > > > > > considers pi to have provenance "anything" (not sure if you > > > > have something like that) since we add 4 which has provenance > > > > "anything" to pi which has provenance &a. > > > > > > We don't at present have a provenance "anything", but if the gcc > > > "anything" means that it's assumed that it might alias with anything, > > > then it looks like gcc's implementing a sound approximation to > > > the proposal here? > > > > GCC makes the code well-defined whereas the proposal would make > > dereferencing a pointer based on pi invoke undefined behavior? > > No, if there is an exposed object where pi points to, it is > defined behaviour. > > > Since > > your proposal is based on an abstract machine there isn't anything > > like a pointer with multiple provenances (which "anything" is), just > > pointers with no provenance (pointing outside of any object), right? > > This is correct. What the proposal does though is put a limit > on where pointers obtained from integers are allowed to point > to: They cannot point to non-exposed objects. I assume GCC > "anything" provenances also cannot point to all possible > objects. Yes. We exclude objects that do not have their address taken though (so somewhat similar to your "exposed"). > > For points-to analysis we of course have to track all possible > > provenances of a pointer (and if we know it doesn't point inside > > any object we make it point to nothing). > > Yes, a compiler should track what it knows (it could also track > if it knows that some pointers point to the same object, etc.) > while the abstract machine knows everything there is to know. > > > Btw, GCC changed its behavior here to support optimizing matlab > > generated C code which passes pointers to arrays across functions > > by marshalling them in two float typed halves (yikes!). GCC is able > > to properly track provenance across the decomposition / recomposition > > when doing points-to analysis ;) > > Impressive ;-) I would have thought that such encoding > happens at ABI boundaries, where you cannot track anyway. > But this seems to occur inside compiled code? It occurs when matlab generates C code for an expression. They seem to use floating-point for everything at their self-invented ABI boundary so "obviously" that includes pointers. > While we do not attach a provenance to integers > in our proposal, it does not necessarily imply that a compiler > is not allowed to track such information. It then depends on > how it uses it. > > For example, > > int z; > int x; > uintptr_t pi = (uintptr_t)&x; > > // encode in two floats ;-) > > // pass floats around > > // decode > > int* p = (int*)pi; > > If the compiler can prove that the address is still > the same, it can also reattach the original provenance > under some conditions. > > But there is a caveat: It can only do this is it cannot > also be a one-after pointer for z (or some other object). > If the address of 'z' is not exposed, it may be able to > assume this. > > > Btw, one thing GCC struggles is when it applies rules that clearly > > apply to pointer dereferences to pointer equality compares where > > the standard has that special casing of comparing two pointers > > where one points one after an object requiring the comparison > > to evaluate to true when the objects are adjacent. GCC > > currently statically optimizes if (&x + 1 == &y) to false for > > this reason (but not the corresponding integer comparison). > > Yes, according to the current rules (and this doesn't change in > the proposal) two points comparing equal does not imply that > they are interchangable. Making the comparison unspecified > (as C++) would not help. We could make it undefined, which > would m
Re: C provenance semantics proposal
Hi Richard, Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener: > On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell > wrote: > > > > On 17/04/2019, Richard Biener wrote: > > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell > > > wrote: ... > > > So this is not what GCC implements which tracks provenance through > > > non-pointer types to a limited extent when only copying is taking place. > > > > > > Your proposal makes > > > > > > int a, b; > > > int *p = &a; > > > int *q = &b; > > > uintptr_t pi = (uintptr_t)p; //expose > > > uintptr_t qi = (uintptr_t)q; //expose > > > pi += 4; > > > if (pi == qi) > > > *(int *)pi = 1; > > > > > > well-defined since (int *)pi now has the provenance of &b. > > > > Yes. (Just to be clear: it's not that we think the above example is > > desirable in itself, but it's well-defined as a consequence of what > > we do to make other common idioms, eg pointer bit manipulation, > > well-defined.) > > > > > Note GCC, when tracking provenance of non-pointer type > > > adds like in > > > > > > int *p = &a; > > > uintptr_t pi = (uintptr_t)p; > > > pi += 4; > > > > > > considers pi to have provenance "anything" (not sure if you > > > have something like that) since we add 4 which has provenance > > > "anything" to pi which has provenance &a. > > > > We don't at present have a provenance "anything", but if the gcc > > "anything" means that it's assumed that it might alias with anything, > > then it looks like gcc's implementing a sound approximation to > > the proposal here? > > GCC makes the code well-defined whereas the proposal would make > dereferencing a pointer based on pi invoke undefined behavior? No, if there is an exposed object where pi points to, it is defined behaviour. > Since > your proposal is based on an abstract machine there isn't anything > like a pointer with multiple provenances (which "anything" is), just > pointers with no provenance (pointing outside of any object), right? This is correct. What the proposal does though is put a limit on where pointers obtained from integers are allowed to point to: They cannot point to non-exposed objects. I assume GCC "anything" provenances also cannot point to all possible objects. > For points-to analysis we of course have to track all possible > provenances of a pointer (and if we know it doesn't point inside > any object we make it point to nothing). Yes, a compiler should track what it knows (it could also track if it knows that some pointers point to the same object, etc.) while the abstract machine knows everything there is to know. > Btw, GCC changed its behavior here to support optimizing matlab > generated C code which passes pointers to arrays across functions > by marshalling them in two float typed halves (yikes!). GCC is able > to properly track provenance across the decomposition / recomposition > when doing points-to analysis ;) Impressive ;-) I would have thought that such encoding happens at ABI boundaries, where you cannot track anyway. But this seems to occur inside compiled code? While we do not attach a provenance to integers in our proposal, it does not necessarily imply that a compiler is not allowed to track such information. It then depends on how it uses it. For example, int z; int x; uintptr_t pi = (uintptr_t)&x; // encode in two floats ;-) // pass floats around // decode int* p = (int*)pi; If the compiler can prove that the address is still the same, it can also reattach the original provenance under some conditions. But there is a caveat: It can only do this is it cannot also be a one-after pointer for z (or some other object). If the address of 'z' is not exposed, it may be able to assume this. > Btw, one thing GCC struggles is when it applies rules that clearly > apply to pointer dereferences to pointer equality compares where > the standard has that special casing of comparing two pointers > where one points one after an object requiring the comparison > to evaluate to true when the objects are adjacent. GCC > currently statically optimizes if (&x + 1 == &y) to false for > this reason (but not the corresponding integer comparison). Yes, according to the current rules (and this doesn't change in the proposal) two points comparing equal does not imply that they are interchangable. Making the comparison unspecified (as C++) would not help. We could make it undefined, which would make all optimizations based on the assumption that the pointer are interchangable valid. But I fear that this would introduce a corner case that could lead to subtle and hard-to-detect bugs. Martin > Richard. > > > > > best, > > Peter > > > > > > > > The user-disambiguation refinement adds some complexity but supports > > > > roundtrip casts, from pointer to integer and back, of pointers that > > > > are one-past a storage instance. > >
Re: C provenance semantics proposal
On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell wrote: > > On 17/04/2019, Richard Biener wrote: > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell > > wrote: > >> > >> On Fri, 12 Apr 2019 at 15:51, Jeff Law wrote: > >> > > >> > On 4/2/19 2:11 AM, Peter Sewell wrote: > >> > > Dear all, > >> > > > >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we > >> > > (the WG14 C memory object model study group) now > >> > > have a detailed proposal for pointer provenance semantics, refining > >> > > the "provenance not via integers (PNVI)" model presented there. > >> > > This will be discussed at the ISO WG14 C standards committee at the > >> > > end of April, and comments from the GCC community before then would > >> > > be very welcome. The proposal reconciles the needs of existing code > >> > > and the behaviour of existing compilers as well as we can, but it > >> > > doesn't > >> > > exactly match any of the latter, so we'd especially like to know > >> > > whether > >> > > it would be feasible to implement - our hope is that it would only > >> > > require > >> > > minor changes. It's presented in three documents: > >> > > > >> > > N2362 Moving to a provenance-aware memory model for C: proposal for > >> > > C2x > >> > > by the memory object model study group. Jens Gustedt, Peter Sewell, > >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker. > >> > > This introduces the proposal and gives the proposed change to the > >> > > standard > >> > > text, presented as change-highlighted pages of the standard > >> > > (though one might want to read the N2363 examples before going into > >> > > that). > >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf > >> > > > >> > > N2363 C provenance semantics: examples. > >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, > >> > > Martin Uecker. > >> > > This explains the proposal and its design choices with discussion of > >> > > a > >> > > series of examples. > >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf > >> > > > >> > > N2364 C provenance semantics: detailed semantics. > >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes. > >> > > This gives a detailed mathematical semantics for the proposal > >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf > >> > > > >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an > >> > > executable version of the semantics, with a web interface that > >> > > allows one to explore and visualise the behaviour of small test > >> > > programs, stepping through and seeing the abstract-machine > >> > > memory state including provenance information. N2363 compares > >> > > the results of this for the example programs with gcc, clang, and icc > >> > > results, though the tests are really intended as tests of the > >> > > semantics > >> > > rather than compiler tests, so one has to interpret this with care. > >> > THanks. I just noticed this came up in EuroLLVM as well.Getting > >> > some standards clarity in this space would be good. > >> > > >> > Richi is in the best position to cover for GCC, but I suspect he's > >> > buried with gcc-9 issues as we approach the upcoming release. > >> > Hopefully > >> > he'll have time to review this once crunch time has past. I think more > >> > than anything sanity checking the proposal's requirements vs what can > >> > be > >> > reasonably implmemented is most important at this stage. > >> > >> Indeed. We talked with him at the GNU cauldron, without uncovering > >> any serious problems, but more detailed review from an implementability > >> point of view would be great. For the UB mailing list we just made > >> a brief plain-text summary of the proposal (leaving out all the examples > >> and standards diff, and glossing over some details). I'll paste that > >> in below in case it's helpful. The next WG14 meeting is the week of > >> April 29; comments before then would be particularly useful if that's > >> possible. > >> > >> best, > >> Peter > >> > >> C pointer values are typically represented at runtime as simple > >> concrete numeric values, but mainstream compilers routinely exploit > >> information about the "provenance" of pointers to reason that they > >> cannot alias, and hence to justify optimisations. This is > >> long-standing practice, but exactly what it means (what programmers > >> can rely on, and what provenance-based alias analysis is allowed to > >> do), has never been nailed down. That's what the proposal does. > >> > >> > >> The basic idea is to associate a *provenance* with every pointer > >> value, identifying the original storage instance (or allocation, in > >> other words) that the pointer is derived from. In more detail: > >> > >> - We take abstract-machine pointer values to be pairs (pi,a), adding a > >> provenance pi, either @i where i is a storage instance ID, or the > >> *empty* provenance, to their concrete address a. > >> > >> - On every storage i
Re: C provenance semantics proposal
On 17/04/2019, Richard Biener wrote: > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell > wrote: >> >> On Fri, 12 Apr 2019 at 15:51, Jeff Law wrote: >> > >> > On 4/2/19 2:11 AM, Peter Sewell wrote: >> > > Dear all, >> > > >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we >> > > (the WG14 C memory object model study group) now >> > > have a detailed proposal for pointer provenance semantics, refining >> > > the "provenance not via integers (PNVI)" model presented there. >> > > This will be discussed at the ISO WG14 C standards committee at the >> > > end of April, and comments from the GCC community before then would >> > > be very welcome. The proposal reconciles the needs of existing code >> > > and the behaviour of existing compilers as well as we can, but it >> > > doesn't >> > > exactly match any of the latter, so we'd especially like to know >> > > whether >> > > it would be feasible to implement - our hope is that it would only >> > > require >> > > minor changes. It's presented in three documents: >> > > >> > > N2362 Moving to a provenance-aware memory model for C: proposal for >> > > C2x >> > > by the memory object model study group. Jens Gustedt, Peter Sewell, >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker. >> > > This introduces the proposal and gives the proposed change to the >> > > standard >> > > text, presented as change-highlighted pages of the standard >> > > (though one might want to read the N2363 examples before going into >> > > that). >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf >> > > >> > > N2363 C provenance semantics: examples. >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, >> > > Martin Uecker. >> > > This explains the proposal and its design choices with discussion of >> > > a >> > > series of examples. >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf >> > > >> > > N2364 C provenance semantics: detailed semantics. >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes. >> > > This gives a detailed mathematical semantics for the proposal >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf >> > > >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an >> > > executable version of the semantics, with a web interface that >> > > allows one to explore and visualise the behaviour of small test >> > > programs, stepping through and seeing the abstract-machine >> > > memory state including provenance information. N2363 compares >> > > the results of this for the example programs with gcc, clang, and icc >> > > results, though the tests are really intended as tests of the >> > > semantics >> > > rather than compiler tests, so one has to interpret this with care. >> > THanks. I just noticed this came up in EuroLLVM as well.Getting >> > some standards clarity in this space would be good. >> > >> > Richi is in the best position to cover for GCC, but I suspect he's >> > buried with gcc-9 issues as we approach the upcoming release. >> > Hopefully >> > he'll have time to review this once crunch time has past. I think more >> > than anything sanity checking the proposal's requirements vs what can >> > be >> > reasonably implmemented is most important at this stage. >> >> Indeed. We talked with him at the GNU cauldron, without uncovering >> any serious problems, but more detailed review from an implementability >> point of view would be great. For the UB mailing list we just made >> a brief plain-text summary of the proposal (leaving out all the examples >> and standards diff, and glossing over some details). I'll paste that >> in below in case it's helpful. The next WG14 meeting is the week of >> April 29; comments before then would be particularly useful if that's >> possible. >> >> best, >> Peter >> >> C pointer values are typically represented at runtime as simple >> concrete numeric values, but mainstream compilers routinely exploit >> information about the "provenance" of pointers to reason that they >> cannot alias, and hence to justify optimisations. This is >> long-standing practice, but exactly what it means (what programmers >> can rely on, and what provenance-based alias analysis is allowed to >> do), has never been nailed down. That's what the proposal does. >> >> >> The basic idea is to associate a *provenance* with every pointer >> value, identifying the original storage instance (or allocation, in >> other words) that the pointer is derived from. In more detail: >> >> - We take abstract-machine pointer values to be pairs (pi,a), adding a >> provenance pi, either @i where i is a storage instance ID, or the >> *empty* provenance, to their concrete address a. >> >> - On every storage instance creation (of objects with static, thread, >> automatic, and allocated storage duration), the abstract machine >> nondeterministically chooses a fresh storage instance ID i (unique >> across the entire execution), and the resulting pointer
Re: C provenance semantics proposal
On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell wrote: > > On Fri, 12 Apr 2019 at 15:51, Jeff Law wrote: > > > > On 4/2/19 2:11 AM, Peter Sewell wrote: > > > Dear all, > > > > > > continuing the discussion from the 2018 GNU Tools Cauldron, we > > > (the WG14 C memory object model study group) now > > > have a detailed proposal for pointer provenance semantics, refining > > > the "provenance not via integers (PNVI)" model presented there. > > > This will be discussed at the ISO WG14 C standards committee at the > > > end of April, and comments from the GCC community before then would > > > be very welcome. The proposal reconciles the needs of existing code > > > and the behaviour of existing compilers as well as we can, but it doesn't > > > exactly match any of the latter, so we'd especially like to know whether > > > it would be feasible to implement - our hope is that it would only require > > > minor changes. It's presented in three documents: > > > > > > N2362 Moving to a provenance-aware memory model for C: proposal for C2x > > > by the memory object model study group. Jens Gustedt, Peter Sewell, > > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker. > > > This introduces the proposal and gives the proposed change to the standard > > > text, presented as change-highlighted pages of the standard > > > (though one might want to read the N2363 examples before going into that). > > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf > > > > > > N2363 C provenance semantics: examples. > > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin > > > Uecker. > > > This explains the proposal and its design choices with discussion of a > > > series of examples. > > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf > > > > > > N2364 C provenance semantics: detailed semantics. > > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes. > > > This gives a detailed mathematical semantics for the proposal > > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf > > > > > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an > > > executable version of the semantics, with a web interface that > > > allows one to explore and visualise the behaviour of small test > > > programs, stepping through and seeing the abstract-machine > > > memory state including provenance information. N2363 compares > > > the results of this for the example programs with gcc, clang, and icc > > > results, though the tests are really intended as tests of the semantics > > > rather than compiler tests, so one has to interpret this with care. > > THanks. I just noticed this came up in EuroLLVM as well.Getting > > some standards clarity in this space would be good. > > > > Richi is in the best position to cover for GCC, but I suspect he's > > buried with gcc-9 issues as we approach the upcoming release. Hopefully > > he'll have time to review this once crunch time has past. I think more > > than anything sanity checking the proposal's requirements vs what can be > > reasonably implmemented is most important at this stage. > > Indeed. We talked with him at the GNU cauldron, without uncovering > any serious problems, but more detailed review from an implementability > point of view would be great. For the UB mailing list we just made > a brief plain-text summary of the proposal (leaving out all the examples > and standards diff, and glossing over some details). I'll paste that > in below in case it's helpful. The next WG14 meeting is the week of > April 29; comments before then would be particularly useful if that's > possible. > > best, > Peter > > C pointer values are typically represented at runtime as simple > concrete numeric values, but mainstream compilers routinely exploit > information about the "provenance" of pointers to reason that they > cannot alias, and hence to justify optimisations. This is > long-standing practice, but exactly what it means (what programmers > can rely on, and what provenance-based alias analysis is allowed to > do), has never been nailed down. That's what the proposal does. > > > The basic idea is to associate a *provenance* with every pointer > value, identifying the original storage instance (or allocation, in > other words) that the pointer is derived from. In more detail: > > - We take abstract-machine pointer values to be pairs (pi,a), adding a > provenance pi, either @i where i is a storage instance ID, or the > *empty* provenance, to their concrete address a. > > - On every storage instance creation (of objects with static, thread, > automatic, and allocated storage duration), the abstract machine > nondeterministically chooses a fresh storage instance ID i (unique > across the entire execution), and the resulting pointer value > carries that single storage instance ID as its provenance @i. > > - Provenance is preserved by pointer arithmetic that adds or subtracts > an integer to a pointer. > > - At any acc