Re: Questions about C as used/implemented in practice

2015-05-25 Thread Peter Sewell
Many thanks for these responses.  We'll want to discuss some of them
further, but, before we do, survey responses from any other GCC
developers would be very welcome, especially from those who know the
analysis and optimisation code.   (So far GCC is relatively
under-represented in our data; we have more responses from Clang and
OS kernel developers). The survey is here:

  http://goo.gl/iFhYIr

It consists of 15 short questions about the sequential behaviour of C
memory and pointers.

thanks,
Peter



On 25 April 2015 at 22:42, Joseph Myers jos...@codesourcery.com wrote:
 On Fri, 17 Apr 2015, Peter Sewell wrote:

 [1/15] How predictable are reads from padding bytes?
 If you zero all bytes of a struct and then write some of its members, do
 reads of the padding return zero? (e.g. for a bytewise CAS or hash of
 the struct, or to know that no security-relevant data has leaked into
 them.)

 The padding may not be zero (both in practice, and as specified by C11
 6.2.6.1#6).  A plausible sequence of optimizations is to apply SRA,
 replacing the memset with a sequence of member assignments (discarding
 assignments to padding) in order to do so.  To avoid leaks, allow hashing
 etc., padding should be explicitly named.

 [2/15] Uninitialised values
 Is reading an uninitialised variable or struct member (with a current
 mainstream compiler):
 (This might either be due to a bug or be intentional, e.g. when copying
 a partially initialised struct, or to output, hash, or set some bits of
 a value that may have been partially initialised.)

 Going to give arbitrary, unstable values (that is, the variable assigned
 from the uninitialised variable itself acts as uninitialised and having no
 consistent value).  (Quite possibly subsequent transformations will have
 the effect of undefined behavior.)

 Inconsistency of observed values is an inevitable consequence of
 transformations PHI (undefined, X) - X (useful in practice for programs
 that don't actually use uninitialised variables, but where the compiler
 can't see that).

 [3/15] Can one use pointer arithmetic between separately allocated C
 objects?
 If you calculate an offset between two separately allocated C memory
 objects (e.g. malloc'd regions or global or local variables) by pointer
 subtraction, can you make a usable pointer to the second by adding the
 offset to the address of the first?

 This is not safe in practice even if the alignment is sufficient (and if
 the alignment of the type is less than its size, obviously such a
 subtraction can't possibly work even with a naive compiler).

 [4/15] Is pointer equality sensitive to their original allocation sites?
 For two pointers derived from the addresses of two separate allocations,
 will equality testing (with ==) of them just compare their runtime
 values, or might it take their original allocations into account and
 assume that they do not alias, even if they happen to have the same
 runtime value? (for current mainstream compilers)

 It is not safe to assume that equality has a stable result in such cases
 (either in practice, or in my view of the standard as discussed in bug
 61502).

 [5/15] Can pointer values be copied indirectly?
 Can you make a usable copy of a pointer by copying its representation
 bytes with code that indirectly computes the identity function on them,
 e.g. writing the pointer value to a file and then reading it back, and
 using compression or encryption on the way?

 Yes, it is valid to copy any object that way (of course, the original
 pointer must still be valid at the time it is read back in).

 It is not, however, valid or safe to manufacture a pointer value out of
 thin air by, for example, generating random bytes and seeing if the
 representation happens to compare equal to that of a pointer.  See DR#260.
 Practical safety may depend on whether the compiler can see through how
 the pointer representation was generated.

 [6/15] Pointer comparison at different types
 Can one do == comparison between pointers to objects of different types
 (e.g. pointers to int, float, and different struct types)?

 Such a comparison violates the constraints on equality operators (C11
 6.5.9#2).  If you use conversions to compatible types or pointers to void,
 it can only be expected to be safe if you restrict yourself to cases where
 6.3.2.3 defines the value resulting from the conversion (aliasing rules
 are based on the limitations on when pointer conversions are defined, not
 just on 6.5#7, and comparisons can get optimised in practice based on
 those rules).

 [7/15] Pointer comparison across different allocations
 Can one do  comparison between pointers to separately allocated
 objects?

 This is likely to work in practice (for e.g. implementing functions like
 memmove) although not permitted by ISO C.

 [8/15] Pointer values after lifetime end
 Can you inspect (e.g. by comparing with ==) the value of a pointer to an
 object after the object itself has been free'd or its scope has 

Re: Questions about C as used/implemented in practice

2015-04-25 Thread Joseph Myers
On Fri, 17 Apr 2015, Peter Sewell wrote:

 [1/15] How predictable are reads from padding bytes?
 If you zero all bytes of a struct and then write some of its members, do 
 reads of the padding return zero? (e.g. for a bytewise CAS or hash of 
 the struct, or to know that no security-relevant data has leaked into 
 them.)

The padding may not be zero (both in practice, and as specified by C11 
6.2.6.1#6).  A plausible sequence of optimizations is to apply SRA, 
replacing the memset with a sequence of member assignments (discarding 
assignments to padding) in order to do so.  To avoid leaks, allow hashing 
etc., padding should be explicitly named.

 [2/15] Uninitialised values
 Is reading an uninitialised variable or struct member (with a current 
 mainstream compiler):
 (This might either be due to a bug or be intentional, e.g. when copying 
 a partially initialised struct, or to output, hash, or set some bits of 
 a value that may have been partially initialised.)

Going to give arbitrary, unstable values (that is, the variable assigned 
from the uninitialised variable itself acts as uninitialised and having no 
consistent value).  (Quite possibly subsequent transformations will have 
the effect of undefined behavior.)

Inconsistency of observed values is an inevitable consequence of 
transformations PHI (undefined, X) - X (useful in practice for programs 
that don't actually use uninitialised variables, but where the compiler 
can't see that).

 [3/15] Can one use pointer arithmetic between separately allocated C 
 objects?
 If you calculate an offset between two separately allocated C memory 
 objects (e.g. malloc'd regions or global or local variables) by pointer 
 subtraction, can you make a usable pointer to the second by adding the 
 offset to the address of the first?

This is not safe in practice even if the alignment is sufficient (and if 
the alignment of the type is less than its size, obviously such a 
subtraction can't possibly work even with a naive compiler).

 [4/15] Is pointer equality sensitive to their original allocation sites?
 For two pointers derived from the addresses of two separate allocations, 
 will equality testing (with ==) of them just compare their runtime 
 values, or might it take their original allocations into account and 
 assume that they do not alias, even if they happen to have the same 
 runtime value? (for current mainstream compilers) 

It is not safe to assume that equality has a stable result in such cases 
(either in practice, or in my view of the standard as discussed in bug 
61502).

 [5/15] Can pointer values be copied indirectly?
 Can you make a usable copy of a pointer by copying its representation 
 bytes with code that indirectly computes the identity function on them, 
 e.g. writing the pointer value to a file and then reading it back, and 
 using compression or encryption on the way?

Yes, it is valid to copy any object that way (of course, the original 
pointer must still be valid at the time it is read back in).

It is not, however, valid or safe to manufacture a pointer value out of 
thin air by, for example, generating random bytes and seeing if the 
representation happens to compare equal to that of a pointer.  See DR#260.  
Practical safety may depend on whether the compiler can see through how 
the pointer representation was generated.

 [6/15] Pointer comparison at different types
 Can one do == comparison between pointers to objects of different types 
 (e.g. pointers to int, float, and different struct types)?

Such a comparison violates the constraints on equality operators (C11 
6.5.9#2).  If you use conversions to compatible types or pointers to void, 
it can only be expected to be safe if you restrict yourself to cases where 
6.3.2.3 defines the value resulting from the conversion (aliasing rules 
are based on the limitations on when pointer conversions are defined, not 
just on 6.5#7, and comparisons can get optimised in practice based on 
those rules).

 [7/15] Pointer comparison across different allocations
 Can one do  comparison between pointers to separately allocated 
 objects?

This is likely to work in practice (for e.g. implementing functions like 
memmove) although not permitted by ISO C.

 [8/15] Pointer values after lifetime end
 Can you inspect (e.g. by comparing with ==) the value of a pointer to an 
 object after the object itself has been free'd or its scope has ended?

Such a comparison may not give meaningful or consistent results (although 
the consequences are likely to be bounded in practice).

 [9/15] Pointer arithmetic
 Can you (transiently) construct an out-of-bounds pointer value (e.g. 
 before the beginning of an array, or more than one-past its end) by 
 pointer arithmetic, so long as later arithmetic makes it in-bounds 
 before it is used to access memory?

This is not safe; compilers may optimise based on pointers being within 
bounds.  In some cases, it's possible such code might not even link, 
depending 

Questions about C as used/implemented in practice

2015-04-17 Thread Peter Sewell
Dear gcc list,

we are trying to clarify what behaviour of C implementations is
actually relied upon in modern practice, and what behaviour is
guaranteed by current mainstream implementations (these are quite
different from the ISO standards, and may differ in different
contexts).

Focussing on the sequential behaviour of memory operations, we've
collected a short survey of 15 questions about C:

  http://goo.gl/iFhYIr

Your answers to these would be very helpful, especially if you can
speak authoritatively about what gcc does (it's difficult for us to
directly investigate the emergent properties of the combination of
optimisations in a production compiler).

This continues a research project at the University of Cambridge; in
earlier work (with Batty, Owens, and Sarkar) we addressed the C/C++11
concurrency model, which resulted in fixes to the ISO standards and
supports work on compiler testing (by Zappa Nardelli, Morisset, and
Pawan).

many thanks,
Kayvan Memarian and Peter Sewell


Re: Questions about C as used/implemented in practice

2015-04-17 Thread Paul_Koning

 On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote:
 
 Dear gcc list,
 
 we are trying to clarify what behaviour of C implementations is
 actually relied upon in modern practice, and what behaviour is
 guaranteed by current mainstream implementations (these are quite
 different from the ISO standards, and may differ in different
 contexts).

I’m not sure what you mean by “guaranteed”.

I suspect what the GCC team will say is guaranteed is “what the standard says”. 
 If by “guaranteed” you mean the behavior that happens to be implemented in a 
particular version of the compiler, that may well be different, as you said.  
But it’s also not particularly meaningful, because it is subject to change at 
any time subject to the constraints of the standard, and is likely to be 
different among different versions, and for that matter among different target 
architectures and of course optimization settings.

paul



Re: Questions about C as used/implemented in practice

2015-04-17 Thread Peter Sewell
On 17 April 2015 at 15:19,  paul_kon...@dell.com wrote:

 On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote:

 Dear gcc list,

 we are trying to clarify what behaviour of C implementations is
 actually relied upon in modern practice, and what behaviour is
 guaranteed by current mainstream implementations (these are quite
 different from the ISO standards, and may differ in different
 contexts).

 I’m not sure what you mean by “guaranteed”.

 I suspect what the GCC team will say is guaranteed is “what the standard 
 says”.

If that's really true, that will be interesting, but there may be
areas where (a) current implementation behaviour is stronger than what
the ISO standards require, and (b) important code relies on that
behaviour to such an extent that it becomes pragmatically infeasible
to change it.  Such cases are part of what we're trying to discover
here.  There are also cases where the ISO standards are unclear or
internally inconsistent.

  If by “guaranteed” you mean the behavior that happens to be implemented in a 
 particular version of the compiler, that may well be different, as you said.  
 But it’s also not particularly meaningful, because it is subject to change at 
 any time subject to the constraints of the standard, and is likely to be 
 different among different versions, and for that matter among different 
 target architectures and of course optimization settings.

Some amount of variation has to be allowed, of course - in fact, what
we'd like to clarify is really the envelope of allowable variation,
and that will have to be parametric on at least some optimisation
settings.

 paul



Re: Questions about C as used/implemented in practice

2015-04-17 Thread Peter Sewell
On 17 April 2015 at 17:03,  mse...@redhat.com wrote:
 On 04/17/2015 09:01 AM, Peter Sewell wrote:

 On 17 April 2015 at 15:19,  paul_kon...@dell.com wrote:


 On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk
 wrote:

 Dear gcc list,

 we are trying to clarify what behaviour of C implementations is
 actually relied upon in modern practice, and what behaviour is
 guaranteed by current mainstream implementations (these are quite
 different from the ISO standards, and may differ in different
 contexts).


 I’m not sure what you mean by “guaranteed”.

 I suspect what the GCC team will say is guaranteed is “what the standard
 says”.


 If that's really true, that will be interesting, but there may be
 areas where (a) current implementation behaviour is stronger than what
 the ISO standards require, and (b) important code relies on that
 behaviour to such an extent that it becomes pragmatically infeasible
 to change it.  Such cases are part of what we're trying to discover
 here.  There are also cases where the ISO standards are unclear or
 internally inconsistent.


 Implementations can and often do provide stronger guarantees than
 the standards require. When the do, they must be documented in order
 to be safely relied on.  This is termed as implementation-defined
 behavior in standards.

The cases where the ISO standard explicitly identifies
implementation-defined behaviour are generally unproblematic.

The cases we're asking about, on the other hand, are typically cases
which ISO declares to be undefined behaviour (sometimes for historical
reasons relating to now-obsolete implementations) but where some code
does depend on particular implementation behaviour.  We are trying to
identify and bound those cases.

 Standards may be unclear to casual readers but they must be consistent
 and unambiguous.
 When they're not it's a defect that should be raised
 against them.

Yes, that's true - and we have in the past worked with the C++ and C
standards committees, to fix inconsistencies in the concurrency model.

But more than that, standards (including any implementation-specific
documentation) and common practice have to be sufficiently in sync
that the two work together:   the former should give strong enough
guarantees to support normal usage, and implementations should be
sound with respect to them.   For some aspects of C, we are currently
quite some way from that.


   If by “guaranteed” you mean the behavior that happens to be implemented
 in a particular version of the compiler, that may well be different, as you
 said.  But it’s also not particularly meaningful, because it is subject to
 change at any time subject to the constraints of the standard, and is likely
 to be different among different versions, and for that matter among
 different target architectures and of course optimization settings.


 Some amount of variation has to be allowed, of course - in fact, what
 we'd like to clarify is really the envelope of allowable variation,
 and that will have to be parametric on at least some optimisation
 settings.


 All the questions in the survey that can be are answered are
 answered without unambiguity in the C standard (either as well-
 defined behavior - 4, 5, 11, 12, 15, unspecified - 1, 13, or
 undefined - 2, 3, 7, 8, 9, 10, 14).

We are really not asking about what the ISO standard says, but rather
about what can be and what is relied upon in practice.   (That said,
our reading of the standard differs on several of those points.)

Peter


 There are no optimization
 options that affect the answers.
 Martin


  paul




Re: Questions about C as used/implemented in practice

2015-04-17 Thread msebor

On 04/17/2015 09:01 AM, Peter Sewell wrote:

On 17 April 2015 at 15:19,  paul_kon...@dell.com wrote:



On Apr 17, 2015, at 9:14 AM, Peter Sewell peter.sew...@cl.cam.ac.uk wrote:

Dear gcc list,

we are trying to clarify what behaviour of C implementations is
actually relied upon in modern practice, and what behaviour is
guaranteed by current mainstream implementations (these are quite
different from the ISO standards, and may differ in different
contexts).


I’m not sure what you mean by “guaranteed”.

I suspect what the GCC team will say is guaranteed is “what the standard says”.


If that's really true, that will be interesting, but there may be
areas where (a) current implementation behaviour is stronger than what
the ISO standards require, and (b) important code relies on that
behaviour to such an extent that it becomes pragmatically infeasible
to change it.  Such cases are part of what we're trying to discover
here.  There are also cases where the ISO standards are unclear or
internally inconsistent.


Implementations can and often do provide stronger guarantees than
the standards require. When the do, they must be documented in order
to be safely relied on. This is termed as implementation-defined
behavior in standards.

Standards may be unclear to casual readers but they must be consistent
and unambiguous. When they're not it's a defect that should be raised
against them.




  If by “guaranteed” you mean the behavior that happens to be implemented in a 
particular version of the compiler, that may well be different, as you said.  
But it’s also not particularly meaningful, because it is subject to change at 
any time subject to the constraints of the standard, and is likely to be 
different among different versions, and for that matter among different target 
architectures and of course optimization settings.


Some amount of variation has to be allowed, of course - in fact, what
we'd like to clarify is really the envelope of allowable variation,
and that will have to be parametric on at least some optimisation
settings.


All the questions in the survey that can be are answered are
answered without unambiguity in the C standard (either as well-
defined behavior - 4, 5, 11, 12, 15, unspecified - 1, 13, or
undefined - 2, 3, 7, 8, 9, 10, 14). There are no optimization
options that affect the answers.

Martin




 paul