Re: _Bool and trap representations
On 06/17/2016 02:19 PM, Alexander Cherepanov wrote: On 2016-06-15 17:15, Martin Sebor wrote: There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). Are there notes from these discussions or something? Notes from discussions during committee meetings are in the meeting minutes that are posted along with other committee documents on the public site. Those that relate to aspects of defect reports are usually captured in the Committee Discussion and Committee Response to the DR. Other than that, committee discussions that take place on the committee mailing list (such as the recent ones on this topic) are archived for reference of committee members (unlike C++, the C archives are not intended to be open to the public). So it seems the discussion you referred to is not public, that's unfortunate. And to clarify what you wrote about stability of valid representations, is padding expected to be stable when it's not specifically set? I.e. is the following optimization supposed to be conforming or not? The standard says that padding bytes take on unspecified values after an assignment to a structure, so the program isn't strictly conforming because its output depends on such a value. At the same time, unspecified values are, in general, expected to be stable. But I think in this case it's only because of the standard's limited vocabulary. The distinction between an unspecified and undefined value was meant to allow for the latter to be a trap representation. But lately an undefined value has also come to mean potentially unstable (some people call such values "wobbly"). If the standard adopted a term for unspecified values that don't need not be stable (say wobbly) I would expect the committee to be comfortable applying it padding bits and allowing the code in the example to produce two different lines. But this is one of the topics still under active discussion. Martin Source code: -- #include int main(int argc, char **argv) { (void)argv; struct { char c; int i; } s = {0, 0}; printf("%d\n", argc ? ((unsigned char *))[1] : 5); printf("%d\n", argc ? ((unsigned char *))[1] : 7); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 5 7 -- gcc version: gcc (GCC) 7.0.0 20160616 (experimental) Of course, clang does essentially the same but the testcase is a bit more involved (I can post it if somebody is interested). OTOH clang is more predictable in this area because rules for dealing with undefined values in llvm are more-or-less documented -- http://llvm.org/docs/LangRef.html#undefined-values . I don't see gcc treating padding in long double as indeterminate in the same way as in structs but clang seems to treat them equally.
Re: _Bool and trap representations
On 2016-06-15 17:15, Martin Sebor wrote: There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). Are there notes from these discussions or something? Notes from discussions during committee meetings are in the meeting minutes that are posted along with other committee documents on the public site. Those that relate to aspects of defect reports are usually captured in the Committee Discussion and Committee Response to the DR. Other than that, committee discussions that take place on the committee mailing list (such as the recent ones on this topic) are archived for reference of committee members (unlike C++, the C archives are not intended to be open to the public). So it seems the discussion you referred to is not public, that's unfortunate. And to clarify what you wrote about stability of valid representations, is padding expected to be stable when it's not specifically set? I.e. is the following optimization supposed to be conforming or not? Source code: -- #include int main(int argc, char **argv) { (void)argv; struct { char c; int i; } s = {0, 0}; printf("%d\n", argc ? ((unsigned char *))[1] : 5); printf("%d\n", argc ? ((unsigned char *))[1] : 7); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 5 7 -- gcc version: gcc (GCC) 7.0.0 20160616 (experimental) Of course, clang does essentially the same but the testcase is a bit more involved (I can post it if somebody is interested). OTOH clang is more predictable in this area because rules for dealing with undefined values in llvm are more-or-less documented -- http://llvm.org/docs/LangRef.html#undefined-values . I don't see gcc treating padding in long double as indeterminate in the same way as in structs but clang seems to treat them equally. -- Alexander Cherepanov
Re: _Bool and trap representations
There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). Are there notes from these discussions or something? Notes from discussions during committee meetings are in the meeting minutes that are posted along with other committee documents on the public site. Those that relate to aspects of defect reports are usually captured in the Committee Discussion and Committee Response to the DR. Other than that, committee discussions that take place on the committee mailing list (such as the recent ones on this topic) are archived for reference of committee members (unlike C++, the C archives are not intended to be open to the public). Martin
Re: _Bool and trap representations
On Wed, 15 Jun 2016, Bernd Edlinger wrote: > Hi, > > I modified Aexander's test case a bit, and found something > unexpected, which looks like a GCC-BUG to me: > > cat test.c > #include > #include > #include > > int main() > { > long double d0, d; > > memcpy(, > "\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", sizeof > d0); > > // d = d0; > memcpy(, , sizeof d0); > // if (memcmp(, , sizeof d)) > // abort(); > > printf("d = %Lf\n", d); > > for (unsigned char *p = (unsigned char *) + sizeof d; p > (unsigned char > *)) > printf("%02x ", *--p); > printf("\n"); > } > // EOF > > gcc -O3 test.c && ./a.out > d = 0.00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > but, when I un-comment the memcmp the test case shows the expected > result: > d = 0.00 > 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00 > > gcc-Version 7.0.0 20160613 (experimental) (GCC) > > > Would'nt you agree, that if we use memcpy it should be possible > to preserve denormalized or otherwise invalid bit patterns? > And why should the memcmp have an influence on the memcpy? I think this is PR71522 which I already fixed on trunk. memcmp takes the address of the var and by doing that inhibits the broken transform. Richard.
Re: _Bool and trap representations
Hi, I modified Aexander's test case a bit, and found something unexpected, which looks like a GCC-BUG to me: cat test.c #include #include #include int main() { long double d0, d; memcpy(, "\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", sizeof d0); // d = d0; memcpy(, , sizeof d0); // if (memcmp(, , sizeof d)) // abort(); printf("d = %Lf\n", d); for (unsigned char *p = (unsigned char *) + sizeof d; p > (unsigned char *)) printf("%02x ", *--p); printf("\n"); } // EOF gcc -O3 test.c && ./a.out d = 0.00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 but, when I un-comment the memcmp the test case shows the expected result: d = 0.00 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00 gcc-Version 7.0.0 20160613 (experimental) (GCC) Would'nt you agree, that if we use memcpy it should be possible to preserve denormalized or otherwise invalid bit patterns? And why should the memcmp have an influence on the memcpy? Bernd.
Re: _Bool and trap representations
On 2016-06-14 00:13, Joseph Myers wrote: On Tue, 14 Jun 2016, Alexander Cherepanov wrote: The problem is that parts of representations of two different ordinary values can form a trap representation. Oh, you're talking about normalizing the destination rather than the source of the copy? Yes. I don't see this problem with a current gcc so the problem is hypothetical AFAICT. -- Alexander Cherepanov
Re: _Bool and trap representations
On Tue, 14 Jun 2016, Alexander Cherepanov wrote: > The problem is that parts of representations of two different ordinary values > can form a trap representation. Oh, you're talking about normalizing the destination rather than the source of the copy? -- Joseph S. Myers jos...@codesourcery.com
Re: _Bool and trap representations
On 2016-06-13 22:51, Joseph Myers wrote: On Mon, 13 Jun 2016, Alexander Cherepanov wrote: Thanks for the info. IMHO this part of DR 260 has even more serious consequences than the part about pointer provenance. It effectively prohibits manual byte-by-byte (or any non-atomic) copying of objects for types like long double. If an implementation decides to normalize a value in a variable during copying it will see an inconsistent representation, e.g. a trap representation. It's a sure way to get total garbage. I don't know if allowing No, that's not the case; even if representations can change during byte-by-byte copying, such copying of long double values is *still* safe. All long double values for x86 long double have exactly one valid representation in the value bits, and if the padding bits change during copying it doesn't matter; it's only representations that are already trap representations (unnormals, pseudo-* etc.) that might be interpreted inconsistently. The problem is that parts of representations of two different ordinary values can form a trap representation. Suppose x = 1.0 and y = 0.0, i.e. they have the following representations (from high bytes to low bytes): padding signint & frac & exp |---| |---| |-| x: 00 00 00 00 00 00 3f ff 80 00 00 00 00 00 00 00 y: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Suppose that we copy from x to y byte-by-byte starting from high bytes. And suppose the normalization kicks in after copying 8 bytes. We have already copied the sign and the exponent but haven't yet overwritten the 'Integer' bit of Significand so we have the following representation: z: 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00 This is an unnormal and current gcc normalization converts it into 0.0 throwing the exponent away. Copying the remaining 8 bytes leads to a pseudo-denormal: w: 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 But this is already a minor detail. The code to see how gcc normalizes 'z': -- #include #include int main() { long double d0, d; memcpy(, "\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", sizeof d0); d = d0; printf("d = %Lf\n", d); for (unsigned char *p = (unsigned char *) + sizeof d; p > (unsigned char *)) printf("%02x ", *--p); printf("\n"); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out d = 0.00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 -- gcc version: gcc (GCC) 7.0.0 20160613 (experimental) -- Alexander Cherepanov
Re: _Bool and trap representations
On Mon, 13 Jun 2016, Alexander Cherepanov wrote: > Thanks for the info. IMHO this part of DR 260 has even more serious > consequences than the part about pointer provenance. It effectively prohibits > manual byte-by-byte (or any non-atomic) copying of objects for types like long > double. If an implementation decides to normalize a value in a variable during > copying it will see an inconsistent representation, e.g. a trap > representation. It's a sure way to get total garbage. I don't know if allowing No, that's not the case; even if representations can change during byte-by-byte copying, such copying of long double values is *still* safe. All long double values for x86 long double have exactly one valid representation in the value bits, and if the padding bits change during copying it doesn't matter; it's only representations that are already trap representations (unnormals, pseudo-* etc.) that might be interpreted inconsistently. Likewise for IBM long double; the only cases of more than one representation for a value are (a) a zero low part might have either sign (in which case an arbitrary choice of bytes from the two representations still gives a valid representation of the same value) and (b) the low part of a NaN is of no significance. -- Joseph S. Myers jos...@codesourcery.com
Re: _Bool and trap representations
On 2016-06-08 17:37, Martin Sebor wrote: On 06/08/2016 12:36 AM, Alexander Cherepanov wrote: Hi! If a variable of type _Bool contains something different from 0 and 1 its use amounts to UB in gcc and clang. There is a couple of examples in [1] ([2] is also interesting). [1] https://github.com/TrustInSoft/tis-interpreter/issues/39 [2] https://github.com/TrustInSoft/tis-interpreter/issues/100 But my question is about the following example: -- #include int main() { _Bool b; *(char *) = 123; printf("%d\n", *(char *)); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 123 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 1 -- gcc version: gcc (GCC) 7.0.0 20160604 (experimental) Similar example with long double: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71522 It seems that padding in _Bool is treated as permanently unspecified. Is this behavior intentional? What's the theory behind it? One possible explanations is C11, 6.2.6.2p1, which reads: "The values of any padding bits are unspecified." But it's somewhat a stretch to conclude from it that the values of padding bits cannot be specified even with explicit assignment. Another possible approach is to refer to Committee Response for Question 1 in DR 260 which reads: "Values may have any bit-pattern that validly represents them and the implementation is free to move between alternate representations (for example, it may normalize pointers, floating-point representations etc.). [...] the actual bit-pattern may change without direct action of the program." There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). Are there notes from these discussions or something? I would hesitate to call it consensus but I think it would be fair to say that the opinion of the vocal majority is that implementations aren't intended to spontaneously change valid (i.e., determinate) representations of objects in the absence of an access to the value of the object. Thanks for the info. IMHO this part of DR 260 has even more serious consequences than the part about pointer provenance. It effectively prohibits manual byte-by-byte (or any non-atomic) copying of objects for types like long double. If an implementation decides to normalize a value in a variable during copying it will see an inconsistent representation, e.g. a trap representation. It's a sure way to get total garbage. I don't know if allowing implementations to normalize values is useful but the current language in DR 260 allows too much. As for valid/determinate representation this is another place where distinction between a value and a representation is worth stressing. Uninitialized variables are a clear case -- both its value and representation are indeterminate. But what if we set some part of representation of a variable -- it doesn't yet have a determinate value but we want the part that we have set to be preserved. Another interesting example is a pointer after free() -- its representation is kinda determinate but its value is indeterminate. -- Alexander Cherepanov
Re: _Bool and trap representations
On 06/08/2016 12:36 AM, Alexander Cherepanov wrote: Hi! If a variable of type _Bool contains something different from 0 and 1 its use amounts to UB in gcc and clang. There is a couple of examples in [1] ([2] is also interesting). [1] https://github.com/TrustInSoft/tis-interpreter/issues/39 [2] https://github.com/TrustInSoft/tis-interpreter/issues/100 But my question is about the following example: -- #include int main() { _Bool b; *(char *) = 123; printf("%d\n", *(char *)); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 123 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 1 -- gcc version: gcc (GCC) 7.0.0 20160604 (experimental) It seems that padding in _Bool is treated as permanently unspecified. Is this behavior intentional? What's the theory behind it? One possible explanations is C11, 6.2.6.2p1, which reads: "The values of any padding bits are unspecified." But it's somewhat a stretch to conclude from it that the values of padding bits cannot be specified even with explicit assignment. Another possible approach is to refer to Committee Response for Question 1 in DR 260 which reads: "Values may have any bit-pattern that validly represents them and the implementation is free to move between alternate representations (for example, it may normalize pointers, floating-point representations etc.). [...] the actual bit-pattern may change without direct action of the program." There has been quite a bit of discussion among the committee on this subject lately (the last part is the subject of DR #451, though it's discussed in the context of uninitialized objects with indeterminate values). I would hesitate to call it consensus but I think it would be fair to say that the opinion of the vocal majority is that implementations aren't intended to spontaneously change valid (i.e., determinate) representations of objects in the absence of an access to the value of the object. There are also two special cases that apply to the code example above: accesses via an lvalue of a character type (which has no padding bits and so no trap representation), and objects that could not have been declared to have register storage because their address is taken (DR #338). Those should be expected to have a stable representation/bit pattern from one read to the next. Martin
Re: _Bool and trap representations
On Wed, Jun 8, 2016 at 10:04 AM, Alexander Cherepanovwrote: > On 2016-06-08 10:29, Richard Biener wrote: >> >> On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov >> wrote: > > [skip] >>> >>> But my question is about the following example: >>> >>> -- >>> #include >>> >>> int main() >>> { >>>_Bool b; >>>*(char *) = 123; >>>printf("%d\n", *(char *)); >>> } >>> -- >>> >>> Results: >>> >>> -- >>> $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out >>> 123 >>> >>> $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out >>> 1 >>> -- > > [skip] >> >> Another explanation is that this is a bug. It manifests itself at the >> time >> we re-write 'b' into SSA form, disregarding the fact that we access it >> via a type that while matching in size does not match in precision. > > > Oh, that's much more boring outcome:-) ;-) >> Can you open a bugreport? > > > Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 . Patch posted / in testing. Richard. > -- > Alexander Cherepanov
Re: _Bool and trap representations
On 2016-06-08 10:29, Richard Biener wrote: On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanovwrote: [skip] But my question is about the following example: -- #include int main() { _Bool b; *(char *) = 123; printf("%d\n", *(char *)); } -- Results: -- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 123 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 1 -- [skip] Another explanation is that this is a bug. It manifests itself at the time we re-write 'b' into SSA form, disregarding the fact that we access it via a type that while matching in size does not match in precision. Oh, that's much more boring outcome:-) Can you open a bugreport? Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 . -- Alexander Cherepanov
Re: _Bool and trap representations
On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanovwrote: > Hi! > > If a variable of type _Bool contains something different from 0 and 1 its > use amounts to UB in gcc and clang. There is a couple of examples in [1] > ([2] is also interesting). > > [1] https://github.com/TrustInSoft/tis-interpreter/issues/39 > [2] https://github.com/TrustInSoft/tis-interpreter/issues/100 > > But my question is about the following example: > > -- > #include > > int main() > { > _Bool b; > *(char *) = 123; > printf("%d\n", *(char *)); > } > -- > > Results: > > -- > $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out > 123 > > $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out > 1 > -- > > gcc version: gcc (GCC) 7.0.0 20160604 (experimental) > > It seems that padding in _Bool is treated as permanently unspecified. Is > this behavior intentional? What's the theory behind it? > > One possible explanations is C11, 6.2.6.2p1, which reads: "The values of any > padding bits are unspecified." But it's somewhat a stretch to conclude from > it that the values of padding bits cannot be specified even with explicit > assignment. > > Another possible approach is to refer to Committee Response for Question 1 > in DR 260 which reads: "Values may have any bit-pattern that validly > represents them and the implementation is free to move between alternate > representations (for example, it may normalize pointers, floating-point > representations etc.). [...] the actual bit-pattern may change without > direct action of the program." > > Is similar behavior expected from other types of padding (padding in long > double, padding bytes/bits in structs/unions) in the future? Maybe even > normalization of pointers (randomly aligning misaligned pointers)? Another explanation is that this is a bug. It manifests itself at the time we re-write 'b' into SSA form, disregarding the fact that we access it via a type that while matching in size does not match in precision. Can you open a bugreport? Thanks, Richard. > -- > Alexander Cherepanov