On 2016-06-08 17:37, Martin Sebor wrote:
On 06/08/2016 12:36 AM, Alexander Cherepanov wrote:
Hi!

If a variable of type _Bool contains something different from 0 and 1
its use amounts to UB in gcc and clang. There is a couple of examples in
[1] ([2] is also interesting).

[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

----------------------------------------------------------------------
#include <stdio.h>

int main()
{
   _Bool b;
   *(char *)&b = 123;
   printf("%d\n", *(char *)&b);
}
----------------------------------------------------------------------

Results:

----------------------------------------------------------------------
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
----------------------------------------------------------------------

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)

Similar example with long double:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71522

It seems that padding in _Bool is treated as permanently unspecified. Is
this behavior intentional? What's the theory behind it?

One possible explanations is C11, 6.2.6.2p1, which reads: "The values of
any padding bits are unspecified." But it's somewhat a stretch to
conclude from it that the values of padding bits cannot be specified
even with explicit assignment.

Another possible approach is to refer to Committee Response for Question
1 in DR 260 which reads: "Values may have any bit-pattern that validly
represents them and the implementation is free to move between alternate
representations (for example, it may normalize pointers, floating-point
representations etc.). [...] the actual bit-pattern may change without
direct action of the program."

There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).

Are there notes from these discussions or something?

I would hesitate to call it
consensus but I think it would be fair to say that the opinion
of the vocal majority is that implementations aren't intended
to spontaneously change valid (i.e., determinate) representations
of objects in the absence of an access to the value of the object.

Thanks for the info. IMHO this part of DR 260 has even more serious consequences than the part about pointer provenance. It effectively prohibits manual byte-by-byte (or any non-atomic) copying of objects for types like long double. If an implementation decides to normalize a value in a variable during copying it will see an inconsistent representation, e.g. a trap representation. It's a sure way to get total garbage. I don't know if allowing implementations to normalize values is useful but the current language in DR 260 allows too much.

As for valid/determinate representation this is another place where distinction between a value and a representation is worth stressing. Uninitialized variables are a clear case -- both its value and representation are indeterminate. But what if we set some part of representation of a variable -- it doesn't yet have a determinate value but we want the part that we have set to be preserved. Another interesting example is a pointer after free() -- its representation is kinda determinate but its value is indeterminate.

--
Alexander Cherepanov

Reply via email to