Re: Questions about C as used/implemented in practice
Many thanks for these responses. We'll want to discuss some of them further, but, before we do, survey responses from any other GCC developers would be very welcome, especially from those who know the analysis and optimisation code. (So far GCC is relatively under-represented in our data; we have more responses from Clang and OS kernel developers). The survey is here: http://goo.gl/iFhYIr It consists of 15 short questions about the sequential behaviour of C memory and pointers. thanks, Peter On 25 April 2015 at 22:42, Joseph Myers jos...@codesourcery.com wrote: On Fri, 17 Apr 2015, Peter Sewell wrote: [1/15] How predictable are reads from padding bytes? If you zero all bytes of a struct and then write some of its members, do reads of the padding return zero? (e.g. for a bytewise CAS or hash of the struct, or to know that no security-relevant data has leaked into them.) The padding may not be zero (both in practice, and as specified by C11 6.2.6.1#6). A plausible sequence of optimizations is to apply SRA, replacing the memset with a sequence of member assignments (discarding assignments to padding) in order to do so. To avoid leaks, allow hashing etc., padding should be explicitly named. [2/15] Uninitialised values Is reading an uninitialised variable or struct member (with a current mainstream compiler): (This might either be due to a bug or be intentional, e.g. when copying a partially initialised struct, or to output, hash, or set some bits of a value that may have been partially initialised.) Going to give arbitrary, unstable values (that is, the variable assigned from the uninitialised variable itself acts as uninitialised and having no consistent value). (Quite possibly subsequent transformations will have the effect of undefined behavior.) Inconsistency of observed values is an inevitable consequence of transformations PHI (undefined, X) - X (useful in practice for programs that don't actually use uninitialised variables, but where the compiler can't see that). [3/15] Can one use pointer arithmetic between separately allocated C objects? If you calculate an offset between two separately allocated C memory objects (e.g. malloc'd regions or global or local variables) by pointer subtraction, can you make a usable pointer to the second by adding the offset to the address of the first? This is not safe in practice even if the alignment is sufficient (and if the alignment of the type is less than its size, obviously such a subtraction can't possibly work even with a naive compiler). [4/15] Is pointer equality sensitive to their original allocation sites? For two pointers derived from the addresses of two separate allocations, will equality testing (with ==) of them just compare their runtime values, or might it take their original allocations into account and assume that they do not alias, even if they happen to have the same runtime value? (for current mainstream compilers) It is not safe to assume that equality has a stable result in such cases (either in practice, or in my view of the standard as discussed in bug 61502). [5/15] Can pointer values be copied indirectly? Can you make a usable copy of a pointer by copying its representation bytes with code that indirectly computes the identity function on them, e.g. writing the pointer value to a file and then reading it back, and using compression or encryption on the way? Yes, it is valid to copy any object that way (of course, the original pointer must still be valid at the time it is read back in). It is not, however, valid or safe to manufacture a pointer value out of thin air by, for example, generating random bytes and seeing if the representation happens to compare equal to that of a pointer. See DR#260. Practical safety may depend on whether the compiler can see through how the pointer representation was generated. [6/15] Pointer comparison at different types Can one do == comparison between pointers to objects of different types (e.g. pointers to int, float, and different struct types)? Such a comparison violates the constraints on equality operators (C11 6.5.9#2). If you use conversions to compatible types or pointers to void, it can only be expected to be safe if you restrict yourself to cases where 6.3.2.3 defines the value resulting from the conversion (aliasing rules are based on the limitations on when pointer conversions are defined, not just on 6.5#7, and comparisons can get optimised in practice based on those rules). [7/15] Pointer comparison across different allocations Can one do comparison between pointers to separately allocated objects? This is likely to work in practice (for e.g. implementing functions like memmove) although not permitted by ISO C. [8/15] Pointer values after lifetime end Can you inspect (e.g. by comparing with ==) the value of a pointer to an object after the object itself has been free'd or its scope has
Re: Questions about C as used/implemented in practice
On Fri, 17 Apr 2015, Peter Sewell wrote: [1/15] How predictable are reads from padding bytes? If you zero all bytes of a struct and then write some of its members, do reads of the padding return zero? (e.g. for a bytewise CAS or hash of the struct, or to know that no security-relevant data has leaked into them.) The padding may not be zero (both in practice, and as specified by C11 6.2.6.1#6). A plausible sequence of optimizations is to apply SRA, replacing the memset with a sequence of member assignments (discarding assignments to padding) in order to do so. To avoid leaks, allow hashing etc., padding should be explicitly named. [2/15] Uninitialised values Is reading an uninitialised variable or struct member (with a current mainstream compiler): (This might either be due to a bug or be intentional, e.g. when copying a partially initialised struct, or to output, hash, or set some bits of a value that may have been partially initialised.) Going to give arbitrary, unstable values (that is, the variable assigned from the uninitialised variable itself acts as uninitialised and having no consistent value). (Quite possibly subsequent transformations will have the effect of undefined behavior.) Inconsistency of observed values is an inevitable consequence of transformations PHI (undefined, X) - X (useful in practice for programs that don't actually use uninitialised variables, but where the compiler can't see that). [3/15] Can one use pointer arithmetic between separately allocated C objects? If you calculate an offset between two separately allocated C memory objects (e.g. malloc'd regions or global or local variables) by pointer subtraction, can you make a usable pointer to the second by adding the offset to the address of the first? This is not safe in practice even if the alignment is sufficient (and if the alignment of the type is less than its size, obviously such a subtraction can't possibly work even with a naive compiler). [4/15] Is pointer equality sensitive to their original allocation sites? For two pointers derived from the addresses of two separate allocations, will equality testing (with ==) of them just compare their runtime values, or might it take their original allocations into account and assume that they do not alias, even if they happen to have the same runtime value? (for current mainstream compilers) It is not safe to assume that equality has a stable result in such cases (either in practice, or in my view of the standard as discussed in bug 61502). [5/15] Can pointer values be copied indirectly? Can you make a usable copy of a pointer by copying its representation bytes with code that indirectly computes the identity function on them, e.g. writing the pointer value to a file and then reading it back, and using compression or encryption on the way? Yes, it is valid to copy any object that way (of course, the original pointer must still be valid at the time it is read back in). It is not, however, valid or safe to manufacture a pointer value out of thin air by, for example, generating random bytes and seeing if the representation happens to compare equal to that of a pointer. See DR#260. Practical safety may depend on whether the compiler can see through how the pointer representation was generated. [6/15] Pointer comparison at different types Can one do == comparison between pointers to objects of different types (e.g. pointers to int, float, and different struct types)? Such a comparison violates the constraints on equality operators (C11 6.5.9#2). If you use conversions to compatible types or pointers to void, it can only be expected to be safe if you restrict yourself to cases where 6.3.2.3 defines the value resulting from the conversion (aliasing rules are based on the limitations on when pointer conversions are defined, not just on 6.5#7, and comparisons can get optimised in practice based on those rules). [7/15] Pointer comparison across different allocations Can one do comparison between pointers to separately allocated objects? This is likely to work in practice (for e.g. implementing functions like memmove) although not permitted by ISO C. [8/15] Pointer values after lifetime end Can you inspect (e.g. by comparing with ==) the value of a pointer to an object after the object itself has been free'd or its scope has ended? Such a comparison may not give meaningful or consistent results (although the consequences are likely to be bounded in practice). [9/15] Pointer arithmetic Can you (transiently) construct an out-of-bounds pointer value (e.g. before the beginning of an array, or more than one-past its end) by pointer arithmetic, so long as later arithmetic makes it in-bounds before it is used to access memory? This is not safe; compilers may optimise based on pointers being within bounds. In some cases, it's possible such code might not even link, depending
Questions about C as used/implemented in practice
Dear gcc list, we are trying to clarify what behaviour of C implementations is actually relied upon in modern practice, and what behaviour is guaranteed by current mainstream implementations (these are quite different from the ISO standards, and may differ in different contexts). Focussing on the sequential behaviour of memory operations, we've collected a short survey of 15 questions about C: http://goo.gl/iFhYIr Your answers to these would be very helpful, especially if you can speak authoritatively about what gcc does (it's difficult for us to directly investigate the emergent properties of the combination of optimisations in a production compiler). This continues a research project at the University of Cambridge; in earlier work (with Batty, Owens, and Sarkar) we addressed the C/C++11 concurrency model, which resulted in fixes to the ISO standards and supports work on compiler testing (by Zappa Nardelli, Morisset, and Pawan). many thanks, Kayvan Memarian and Peter Sewell
Re: Questions about C as used/implemented in practice
On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote: Dear gcc list, we are trying to clarify what behaviour of C implementations is actually relied upon in modern practice, and what behaviour is guaranteed by current mainstream implementations (these are quite different from the ISO standards, and may differ in different contexts). I’m not sure what you mean by “guaranteed”. I suspect what the GCC team will say is guaranteed is “what the standard says”. If by “guaranteed” you mean the behavior that happens to be implemented in a particular version of the compiler, that may well be different, as you said. But it’s also not particularly meaningful, because it is subject to change at any time subject to the constraints of the standard, and is likely to be different among different versions, and for that matter among different target architectures and of course optimization settings. paul
Re: Questions about C as used/implemented in practice
On 17 April 2015 at 15:19, paul_kon...@dell.com wrote: On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote: Dear gcc list, we are trying to clarify what behaviour of C implementations is actually relied upon in modern practice, and what behaviour is guaranteed by current mainstream implementations (these are quite different from the ISO standards, and may differ in different contexts). I’m not sure what you mean by “guaranteed”. I suspect what the GCC team will say is guaranteed is “what the standard says”. If that's really true, that will be interesting, but there may be areas where (a) current implementation behaviour is stronger than what the ISO standards require, and (b) important code relies on that behaviour to such an extent that it becomes pragmatically infeasible to change it. Such cases are part of what we're trying to discover here. There are also cases where the ISO standards are unclear or internally inconsistent. If by “guaranteed” you mean the behavior that happens to be implemented in a particular version of the compiler, that may well be different, as you said. But it’s also not particularly meaningful, because it is subject to change at any time subject to the constraints of the standard, and is likely to be different among different versions, and for that matter among different target architectures and of course optimization settings. Some amount of variation has to be allowed, of course - in fact, what we'd like to clarify is really the envelope of allowable variation, and that will have to be parametric on at least some optimisation settings. paul
Re: Questions about C as used/implemented in practice
On 17 April 2015 at 17:03, mse...@redhat.com wrote: On 04/17/2015 09:01 AM, Peter Sewell wrote: On 17 April 2015 at 15:19, paul_kon...@dell.com wrote: On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote: Dear gcc list, we are trying to clarify what behaviour of C implementations is actually relied upon in modern practice, and what behaviour is guaranteed by current mainstream implementations (these are quite different from the ISO standards, and may differ in different contexts). I’m not sure what you mean by “guaranteed”. I suspect what the GCC team will say is guaranteed is “what the standard says”. If that's really true, that will be interesting, but there may be areas where (a) current implementation behaviour is stronger than what the ISO standards require, and (b) important code relies on that behaviour to such an extent that it becomes pragmatically infeasible to change it. Such cases are part of what we're trying to discover here. There are also cases where the ISO standards are unclear or internally inconsistent. Implementations can and often do provide stronger guarantees than the standards require. When the do, they must be documented in order to be safely relied on. This is termed as implementation-defined behavior in standards. The cases where the ISO standard explicitly identifies implementation-defined behaviour are generally unproblematic. The cases we're asking about, on the other hand, are typically cases which ISO declares to be undefined behaviour (sometimes for historical reasons relating to now-obsolete implementations) but where some code does depend on particular implementation behaviour. We are trying to identify and bound those cases. Standards may be unclear to casual readers but they must be consistent and unambiguous. When they're not it's a defect that should be raised against them. Yes, that's true - and we have in the past worked with the C++ and C standards committees, to fix inconsistencies in the concurrency model. But more than that, standards (including any implementation-specific documentation) and common practice have to be sufficiently in sync that the two work together: the former should give strong enough guarantees to support normal usage, and implementations should be sound with respect to them. For some aspects of C, we are currently quite some way from that. If by “guaranteed” you mean the behavior that happens to be implemented in a particular version of the compiler, that may well be different, as you said. But it’s also not particularly meaningful, because it is subject to change at any time subject to the constraints of the standard, and is likely to be different among different versions, and for that matter among different target architectures and of course optimization settings. Some amount of variation has to be allowed, of course - in fact, what we'd like to clarify is really the envelope of allowable variation, and that will have to be parametric on at least some optimisation settings. All the questions in the survey that can be are answered are answered without unambiguity in the C standard (either as well- defined behavior - 4, 5, 11, 12, 15, unspecified - 1, 13, or undefined - 2, 3, 7, 8, 9, 10, 14). We are really not asking about what the ISO standard says, but rather about what can be and what is relied upon in practice. (That said, our reading of the standard differs on several of those points.) Peter There are no optimization options that affect the answers. Martin paul
Re: Questions about C as used/implemented in practice
On 04/17/2015 09:01 AM, Peter Sewell wrote: On 17 April 2015 at 15:19, paul_kon...@dell.com wrote: On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote: Dear gcc list, we are trying to clarify what behaviour of C implementations is actually relied upon in modern practice, and what behaviour is guaranteed by current mainstream implementations (these are quite different from the ISO standards, and may differ in different contexts). I’m not sure what you mean by “guaranteed”. I suspect what the GCC team will say is guaranteed is “what the standard says”. If that's really true, that will be interesting, but there may be areas where (a) current implementation behaviour is stronger than what the ISO standards require, and (b) important code relies on that behaviour to such an extent that it becomes pragmatically infeasible to change it. Such cases are part of what we're trying to discover here. There are also cases where the ISO standards are unclear or internally inconsistent. Implementations can and often do provide stronger guarantees than the standards require. When the do, they must be documented in order to be safely relied on. This is termed as implementation-defined behavior in standards. Standards may be unclear to casual readers but they must be consistent and unambiguous. When they're not it's a defect that should be raised against them. If by “guaranteed” you mean the behavior that happens to be implemented in a particular version of the compiler, that may well be different, as you said. But it’s also not particularly meaningful, because it is subject to change at any time subject to the constraints of the standard, and is likely to be different among different versions, and for that matter among different target architectures and of course optimization settings. Some amount of variation has to be allowed, of course - in fact, what we'd like to clarify is really the envelope of allowable variation, and that will have to be parametric on at least some optimisation settings. All the questions in the survey that can be are answered are answered without unambiguity in the C standard (either as well- defined behavior - 4, 5, 11, 12, 15, unspecified - 1, 13, or undefined - 2, 3, 7, 8, 9, 10, 14). There are no optimization options that affect the answers. Martin paul