Re: _Bool and trap representations

2016-06-20 Thread Martin Sebor

On 06/17/2016 02:19 PM, Alexander Cherepanov wrote:

On 2016-06-15 17:15, Martin Sebor wrote:

There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


Notes from discussions during committee meetings are in the meeting
minutes that are posted along with other committee documents on the
public site.   Those that relate to aspects of defect reports are
usually captured in the Committee Discussion and Committee Response
to the DR.  Other than that, committee discussions that take place
on the committee mailing list (such as the recent ones on this topic)
are archived for reference of committee members (unlike C++, the C
archives are not intended to be open to the public).


So it seems the discussion you referred to is not public, that's
unfortunate.

And to clarify what you wrote about stability of valid representations,
is padding expected to be stable when it's not specifically set? I.e. is
the following optimization supposed to be conforming or not?


The standard says that padding bytes take on unspecified values
after an assignment to a structure, so the program isn't strictly
conforming because its output depends on such a value.  At the
same time, unspecified values are, in general, expected to be
stable.  But I think in this case it's only because of the
standard's limited vocabulary.  The distinction between
an unspecified and undefined value was meant to allow for the
latter to be a trap representation.  But lately an undefined
value has also come to mean potentially unstable (some people
call such values "wobbly").  If the standard adopted a term
for unspecified values that don't need not be stable (say
wobbly) I would expect the committee to be comfortable applying
it padding bits and allowing the code in the example to produce
two different lines.  But this is one of the topics still under
active discussion.

Martin



Source code:

--
#include 

int main(int argc, char **argv)
{
   (void)argv;

   struct { char c; int i; } s = {0, 0};

   printf("%d\n", argc ? ((unsigned char *))[1] : 5);
   printf("%d\n", argc ? ((unsigned char *))[1] : 7);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
5
7
--

gcc version: gcc (GCC) 7.0.0 20160616 (experimental)

Of course, clang does essentially the same but the testcase is a bit
more involved (I can post it if somebody is interested). OTOH clang is
more predictable in this area because rules for dealing with undefined
values in llvm are more-or-less documented --
http://llvm.org/docs/LangRef.html#undefined-values .

I don't see gcc treating padding in long double as indeterminate in the
same way as in structs but clang seems to treat them equally.





Re: _Bool and trap representations

2016-06-17 Thread Alexander Cherepanov

On 2016-06-15 17:15, Martin Sebor wrote:

There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


Notes from discussions during committee meetings are in the meeting
minutes that are posted along with other committee documents on the
public site.   Those that relate to aspects of defect reports are
usually captured in the Committee Discussion and Committee Response
to the DR.  Other than that, committee discussions that take place
on the committee mailing list (such as the recent ones on this topic)
are archived for reference of committee members (unlike C++, the C
archives are not intended to be open to the public).


So it seems the discussion you referred to is not public, that's 
unfortunate.


And to clarify what you wrote about stability of valid representations, 
is padding expected to be stable when it's not specifically set? I.e. is 
the following optimization supposed to be conforming or not?


Source code:

--
#include 

int main(int argc, char **argv)
{
  (void)argv;

  struct { char c; int i; } s = {0, 0};

  printf("%d\n", argc ? ((unsigned char *))[1] : 5);
  printf("%d\n", argc ? ((unsigned char *))[1] : 7);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
5
7
--

gcc version: gcc (GCC) 7.0.0 20160616 (experimental)

Of course, clang does essentially the same but the testcase is a bit 
more involved (I can post it if somebody is interested). OTOH clang is 
more predictable in this area because rules for dealing with undefined 
values in llvm are more-or-less documented -- 
http://llvm.org/docs/LangRef.html#undefined-values .


I don't see gcc treating padding in long double as indeterminate in the 
same way as in structs but clang seems to treat them equally.


--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-15 Thread Martin Sebor

There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


Notes from discussions during committee meetings are in the meeting
minutes that are posted along with other committee documents on the
public site.  Those that relate to aspects of defect reports are
usually captured in the Committee Discussion and Committee Response
to the DR.  Other than that, committee discussions that take place
on the committee mailing list (such as the recent ones on this topic)
are archived for reference of committee members (unlike C++, the C
archives are not intended to be open to the public).

Martin


Re: _Bool and trap representations

2016-06-15 Thread Richard Biener
On Wed, 15 Jun 2016, Bernd Edlinger wrote:

> Hi,
> 
> I modified Aexander's test case a bit, and found something
> unexpected, which looks like a GCC-BUG to me:
> 
> cat test.c
> #include 
> #include 
> #include 
> 
> int main()
> {
>   long double d0, d;
> 
>   memcpy(, 
> "\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", sizeof 
> d0);
> 
> //  d = d0;
>   memcpy(, , sizeof d0);
> // if (memcmp(, , sizeof d))
> //   abort();
> 
>   printf("d = %Lf\n", d);
> 
>   for (unsigned char *p = (unsigned char *) + sizeof d; p > (unsigned char 
> *))
> printf("%02x ", *--p);
>   printf("\n");
> }
> // EOF
> 
> gcc -O3 test.c && ./a.out 
> d = 0.00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 
> but, when I un-comment the memcmp the test case shows the expected
> result:
> d = 0.00
> 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00 
> 
> gcc-Version 7.0.0 20160613 (experimental) (GCC) 
> 
> 
> Would'nt you agree, that if we use memcpy it should be possible
> to preserve denormalized or otherwise invalid bit patterns?
> And why should the memcmp have an influence on the memcpy?

I think this is PR71522 which I already fixed on trunk.  memcmp
takes the address of the var and by doing that inhibits the broken
transform.

Richard.


Re: _Bool and trap representations

2016-06-15 Thread Bernd Edlinger
Hi,

I modified Aexander's test case a bit, and found something
unexpected, which looks like a GCC-BUG to me:

cat test.c
#include 
#include 
#include 

int main()
{
  long double d0, d;

  memcpy(, 
"\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", sizeof d0);

//  d = d0;
  memcpy(, , sizeof d0);
// if (memcmp(, , sizeof d))
//   abort();

  printf("d = %Lf\n", d);

  for (unsigned char *p = (unsigned char *) + sizeof d; p > (unsigned char 
*))
printf("%02x ", *--p);
  printf("\n");
}
// EOF

gcc -O3 test.c && ./a.out 
d = 0.00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

but, when I un-comment the memcmp the test case shows the expected
result:
d = 0.00
00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00 

gcc-Version 7.0.0 20160613 (experimental) (GCC) 


Would'nt you agree, that if we use memcpy it should be possible
to preserve denormalized or otherwise invalid bit patterns?
And why should the memcmp have an influence on the memcpy?

Bernd.

Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov

On 2016-06-14 00:13, Joseph Myers wrote:

On Tue, 14 Jun 2016, Alexander Cherepanov wrote:


The problem is that parts of representations of two different ordinary values
can form a trap representation.


Oh, you're talking about normalizing the destination rather than the
source of the copy?


Yes.

I don't see this problem with a current gcc so the problem is 
hypothetical AFAICT.


--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-13 Thread Joseph Myers
On Tue, 14 Jun 2016, Alexander Cherepanov wrote:

> The problem is that parts of representations of two different ordinary values
> can form a trap representation.

Oh, you're talking about normalizing the destination rather than the 
source of the copy?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov

On 2016-06-13 22:51, Joseph Myers wrote:

On Mon, 13 Jun 2016, Alexander Cherepanov wrote:


Thanks for the info. IMHO this part of DR 260 has even more serious
consequences than the part about pointer provenance. It effectively prohibits
manual byte-by-byte (or any non-atomic) copying of objects for types like long
double. If an implementation decides to normalize a value in a variable during
copying it will see an inconsistent representation, e.g. a trap
representation. It's a sure way to get total garbage. I don't know if allowing


No, that's not the case; even if representations can change during
byte-by-byte copying, such copying of long double values is *still* safe.
All long double values for x86 long double have exactly one valid
representation in the value bits, and if the padding bits change during
copying it doesn't matter; it's only representations that are already trap
representations (unnormals, pseudo-* etc.) that might be interpreted
inconsistently.


The problem is that parts of representations of two different ordinary 
values can form a trap representation.


Suppose x = 1.0 and y = 0.0, i.e. they have the following 
representations (from high bytes to low bytes):


padding  signint & frac
 & exp
   |---| |---| |-|
x: 00 00 00 00 00 00 3f ff 80 00 00 00 00 00 00 00
y: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Suppose that we copy from x to y byte-by-byte starting from high bytes. 
And suppose the normalization kicks in after copying 8 bytes. We have 
already copied the sign and the exponent but haven't yet overwritten the 
'Integer' bit of Significand so we have the following representation:


z: 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00

This is an unnormal and current gcc normalization converts it into 0.0 
throwing the exponent away. Copying the remaining 8 bytes leads to a 
pseudo-denormal:


w: 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00

But this is already a minor detail.

The code to see how gcc normalizes 'z':

--
#include 
#include 

int main()
{
  long double d0, d;

  memcpy(, 
"\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", 
sizeof d0);

  d = d0;

  printf("d = %Lf\n", d);
  for (unsigned char *p = (unsigned char *) + sizeof d; p > (unsigned 
char *))

printf("%02x ", *--p);
  printf("\n");
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
d = 0.00
00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00
--

gcc version: gcc (GCC) 7.0.0 20160613 (experimental)

--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-13 Thread Joseph Myers
On Mon, 13 Jun 2016, Alexander Cherepanov wrote:

> Thanks for the info. IMHO this part of DR 260 has even more serious
> consequences than the part about pointer provenance. It effectively prohibits
> manual byte-by-byte (or any non-atomic) copying of objects for types like long
> double. If an implementation decides to normalize a value in a variable during
> copying it will see an inconsistent representation, e.g. a trap
> representation. It's a sure way to get total garbage. I don't know if allowing

No, that's not the case; even if representations can change during 
byte-by-byte copying, such copying of long double values is *still* safe.  
All long double values for x86 long double have exactly one valid 
representation in the value bits, and if the padding bits change during 
copying it doesn't matter; it's only representations that are already trap 
representations (unnormals, pseudo-* etc.) that might be interpreted 
inconsistently.

Likewise for IBM long double; the only cases of more than one 
representation for a value are (a) a zero low part might have either sign 
(in which case an arbitrary choice of bytes from the two representations 
still gives a valid representation of the same value) and (b) the low part 
of a NaN is of no significance.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov

On 2016-06-08 17:37, Martin Sebor wrote:

On 06/08/2016 12:36 AM, Alexander Cherepanov wrote:

Hi!

If a variable of type _Bool contains something different from 0 and 1
its use amounts to UB in gcc and clang. There is a couple of examples in
[1] ([2] is also interesting).

[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *) = 123;
   printf("%d\n", *(char *));
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)


Similar example with long double:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71522


It seems that padding in _Bool is treated as permanently unspecified. Is
this behavior intentional? What's the theory behind it?

One possible explanations is C11, 6.2.6.2p1, which reads: "The values of
any padding bits are unspecified." But it's somewhat a stretch to
conclude from it that the values of padding bits cannot be specified
even with explicit assignment.

Another possible approach is to refer to Committee Response for Question
1 in DR 260 which reads: "Values may have any bit-pattern that validly
represents them and the implementation is free to move between alternate
representations (for example, it may normalize pointers, floating-point
representations etc.). [...] the actual bit-pattern may change without
direct action of the program."


There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


I would hesitate to call it
consensus but I think it would be fair to say that the opinion
of the vocal majority is that implementations aren't intended
to spontaneously change valid (i.e., determinate) representations
of objects in the absence of an access to the value of the object.


Thanks for the info. IMHO this part of DR 260 has even more serious 
consequences than the part about pointer provenance. It effectively 
prohibits manual byte-by-byte (or any non-atomic) copying of objects for 
types like long double. If an implementation decides to normalize a 
value in a variable during copying it will see an inconsistent 
representation, e.g. a trap representation. It's a sure way to get total 
garbage. I don't know if allowing implementations to normalize values is 
useful but the current language in DR 260 allows too much.


As for valid/determinate representation this is another place where 
distinction between a value and a representation is worth stressing. 
Uninitialized variables are a clear case -- both its value and 
representation are indeterminate. But what if we set some part of 
representation of a variable -- it doesn't yet have a determinate value 
but we want the part that we have set to be preserved. Another 
interesting example is a pointer after free() -- its representation is 
kinda determinate but its value is indeterminate.


--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-08 Thread Martin Sebor

On 06/08/2016 12:36 AM, Alexander Cherepanov wrote:

Hi!

If a variable of type _Bool contains something different from 0 and 1
its use amounts to UB in gcc and clang. There is a couple of examples in
[1] ([2] is also interesting).

[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *) = 123;
   printf("%d\n", *(char *));
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)

It seems that padding in _Bool is treated as permanently unspecified. Is
this behavior intentional? What's the theory behind it?

One possible explanations is C11, 6.2.6.2p1, which reads: "The values of
any padding bits are unspecified." But it's somewhat a stretch to
conclude from it that the values of padding bits cannot be specified
even with explicit assignment.

Another possible approach is to refer to Committee Response for Question
1 in DR 260 which reads: "Values may have any bit-pattern that validly
represents them and the implementation is free to move between alternate
representations (for example, it may normalize pointers, floating-point
representations etc.). [...] the actual bit-pattern may change without
direct action of the program."


There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).  I would hesitate to call it
consensus but I think it would be fair to say that the opinion
of the vocal majority is that implementations aren't intended
to spontaneously change valid (i.e., determinate) representations
of objects in the absence of an access to the value of the object.
There are also two special cases that apply to the code example
above: accesses via an lvalue of a character type (which has no
padding bits and so no trap representation), and objects that
could not have been declared to have register storage because
their address is taken (DR #338).  Those should be expected
to have a stable representation/bit pattern from one read
to the next.

Martin



Re: _Bool and trap representations

2016-06-08 Thread Richard Biener
On Wed, Jun 8, 2016 at 10:04 AM, Alexander Cherepanov
 wrote:
> On 2016-06-08 10:29, Richard Biener wrote:
>>
>> On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov
>>  wrote:
>
> [skip]
>>>
>>> But my question is about the following example:
>>>
>>> --
>>> #include 
>>>
>>> int main()
>>> {
>>>_Bool b;
>>>*(char *) = 123;
>>>printf("%d\n", *(char *));
>>> }
>>> --
>>>
>>> Results:
>>>
>>> --
>>> $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
>>> 123
>>>
>>> $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
>>> 1
>>> --
>
> [skip]
>>
>> Another explanation is that this is a bug.  It manifests itself at the
>> time
>> we re-write 'b' into SSA form, disregarding the fact that we access it
>> via a type that while matching in size does not match in precision.
>
>
> Oh, that's much more boring outcome:-)

;-)

>> Can you open a bugreport?
>
>
> Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 .

Patch posted / in testing.

Richard.

> --
> Alexander Cherepanov


Re: _Bool and trap representations

2016-06-08 Thread Alexander Cherepanov

On 2016-06-08 10:29, Richard Biener wrote:

On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov
 wrote:

[skip]

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *) = 123;
   printf("%d\n", *(char *));
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

[skip]

Another explanation is that this is a bug.  It manifests itself at the time
we re-write 'b' into SSA form, disregarding the fact that we access it
via a type that while matching in size does not match in precision.


Oh, that's much more boring outcome:-)


Can you open a bugreport?


Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 .

--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-08 Thread Richard Biener
On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov
 wrote:
> Hi!
>
> If a variable of type _Bool contains something different from 0 and 1 its
> use amounts to UB in gcc and clang. There is a couple of examples in [1]
> ([2] is also interesting).
>
> [1] https://github.com/TrustInSoft/tis-interpreter/issues/39
> [2] https://github.com/TrustInSoft/tis-interpreter/issues/100
>
> But my question is about the following example:
>
> --
> #include 
>
> int main()
> {
>   _Bool b;
>   *(char *) = 123;
>   printf("%d\n", *(char *));
> }
> --
>
> Results:
>
> --
> $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
> 123
>
> $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
> 1
> --
>
> gcc version: gcc (GCC) 7.0.0 20160604 (experimental)
>
> It seems that padding in _Bool is treated as permanently unspecified. Is
> this behavior intentional? What's the theory behind it?
>
> One possible explanations is C11, 6.2.6.2p1, which reads: "The values of any
> padding bits are unspecified." But it's somewhat a stretch to conclude from
> it that the values of padding bits cannot be specified even with explicit
> assignment.
>
> Another possible approach is to refer to Committee Response for Question 1
> in DR 260 which reads: "Values may have any bit-pattern that validly
> represents them and the implementation is free to move between alternate
> representations (for example, it may normalize pointers, floating-point
> representations etc.). [...] the actual bit-pattern may change without
> direct action of the program."
>
> Is similar behavior expected from other types of padding (padding in long
> double, padding bytes/bits in structs/unions) in the future? Maybe even
> normalization of pointers (randomly aligning misaligned pointers)?

Another explanation is that this is a bug.  It manifests itself at the time
we re-write 'b' into SSA form, disregarding the fact that we access it
via a type that while matching in size does not match in precision.

Can you open a bugreport?

Thanks,
Richard.

> --
> Alexander Cherepanov