_Bool and trap representations

2016-06-07 Thread Alexander Cherepanov

Hi!

If a variable of type _Bool contains something different from 0 and 1 
its use amounts to UB in gcc and clang. There is a couple of examples in 
[1] ([2] is also interesting).


[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

--
#include 

int main()
{
  _Bool b;
  *(char *)&b = 123;
  printf("%d\n", *(char *)&b);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)

It seems that padding in _Bool is treated as permanently unspecified. Is 
this behavior intentional? What's the theory behind it?


One possible explanations is C11, 6.2.6.2p1, which reads: "The values of 
any padding bits are unspecified." But it's somewhat a stretch to 
conclude from it that the values of padding bits cannot be specified 
even with explicit assignment.


Another possible approach is to refer to Committee Response for Question 
1 in DR 260 which reads: "Values may have any bit-pattern that validly 
represents them and the implementation is free to move between alternate 
representations (for example, it may normalize pointers, floating-point 
representations etc.). [...] the actual bit-pattern may change without 
direct action of the program."


Is similar behavior expected from other types of padding (padding in 
long double, padding bytes/bits in structs/unions) in the future? Maybe 
even normalization of pointers (randomly aligning misaligned pointers)?


--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-08 Thread Alexander Cherepanov

On 2016-06-08 10:29, Richard Biener wrote:

On Wed, Jun 8, 2016 at 8:36 AM, Alexander Cherepanov
 wrote:

[skip]

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *)&b = 123;
   printf("%d\n", *(char *)&b);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

[skip]

Another explanation is that this is a bug.  It manifests itself at the time
we re-write 'b' into SSA form, disregarding the fact that we access it
via a type that while matching in size does not match in precision.


Oh, that's much more boring outcome:-)


Can you open a bugreport?


Sure, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71452 .

--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov

On 2016-06-08 17:37, Martin Sebor wrote:

On 06/08/2016 12:36 AM, Alexander Cherepanov wrote:

Hi!

If a variable of type _Bool contains something different from 0 and 1
its use amounts to UB in gcc and clang. There is a couple of examples in
[1] ([2] is also interesting).

[1] https://github.com/TrustInSoft/tis-interpreter/issues/39
[2] https://github.com/TrustInSoft/tis-interpreter/issues/100

But my question is about the following example:

--
#include 

int main()
{
   _Bool b;
   *(char *)&b = 123;
   printf("%d\n", *(char *)&b);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
123

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
1
--

gcc version: gcc (GCC) 7.0.0 20160604 (experimental)


Similar example with long double:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71522


It seems that padding in _Bool is treated as permanently unspecified. Is
this behavior intentional? What's the theory behind it?

One possible explanations is C11, 6.2.6.2p1, which reads: "The values of
any padding bits are unspecified." But it's somewhat a stretch to
conclude from it that the values of padding bits cannot be specified
even with explicit assignment.

Another possible approach is to refer to Committee Response for Question
1 in DR 260 which reads: "Values may have any bit-pattern that validly
represents them and the implementation is free to move between alternate
representations (for example, it may normalize pointers, floating-point
representations etc.). [...] the actual bit-pattern may change without
direct action of the program."


There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


I would hesitate to call it
consensus but I think it would be fair to say that the opinion
of the vocal majority is that implementations aren't intended
to spontaneously change valid (i.e., determinate) representations
of objects in the absence of an access to the value of the object.


Thanks for the info. IMHO this part of DR 260 has even more serious 
consequences than the part about pointer provenance. It effectively 
prohibits manual byte-by-byte (or any non-atomic) copying of objects for 
types like long double. If an implementation decides to normalize a 
value in a variable during copying it will see an inconsistent 
representation, e.g. a trap representation. It's a sure way to get total 
garbage. I don't know if allowing implementations to normalize values is 
useful but the current language in DR 260 allows too much.


As for valid/determinate representation this is another place where 
distinction between a value and a representation is worth stressing. 
Uninitialized variables are a clear case -- both its value and 
representation are indeterminate. But what if we set some part of 
representation of a variable -- it doesn't yet have a determinate value 
but we want the part that we have set to be preserved. Another 
interesting example is a pointer after free() -- its representation is 
kinda determinate but its value is indeterminate.


--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov

On 2016-06-13 22:51, Joseph Myers wrote:

On Mon, 13 Jun 2016, Alexander Cherepanov wrote:


Thanks for the info. IMHO this part of DR 260 has even more serious
consequences than the part about pointer provenance. It effectively prohibits
manual byte-by-byte (or any non-atomic) copying of objects for types like long
double. If an implementation decides to normalize a value in a variable during
copying it will see an inconsistent representation, e.g. a trap
representation. It's a sure way to get total garbage. I don't know if allowing


No, that's not the case; even if representations can change during
byte-by-byte copying, such copying of long double values is *still* safe.
All long double values for x86 long double have exactly one valid
representation in the value bits, and if the padding bits change during
copying it doesn't matter; it's only representations that are already trap
representations (unnormals, pseudo-* etc.) that might be interpreted
inconsistently.


The problem is that parts of representations of two different ordinary 
values can form a trap representation.


Suppose x = 1.0 and y = 0.0, i.e. they have the following 
representations (from high bytes to low bytes):


padding  signint & frac
 & exp
   |---| |---| |-|
x: 00 00 00 00 00 00 3f ff 80 00 00 00 00 00 00 00
y: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Suppose that we copy from x to y byte-by-byte starting from high bytes. 
And suppose the normalization kicks in after copying 8 bytes. We have 
already copied the sign and the exponent but haven't yet overwritten the 
'Integer' bit of Significand so we have the following representation:


z: 00 00 00 00 00 00 3f ff 00 00 00 00 00 00 00 00

This is an unnormal and current gcc normalization converts it into 0.0 
throwing the exponent away. Copying the remaining 8 bytes leads to a 
pseudo-denormal:


w: 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00

But this is already a minor detail.

The code to see how gcc normalizes 'z':

--
#include 
#include 

int main()
{
  long double d0, d;

  memcpy(&d0, 
"\x00\x00\x00\x00\x00\x00\x00\x00\xff\x3f\x00\x00\x00\x00\x00\x00", 
sizeof d0);

  d = d0;

  printf("d = %Lf\n", d);
  for (unsigned char *p = (unsigned char *)&d + sizeof d; p > (unsigned 
char *)&d;)

printf("%02x ", *--p);
  printf("\n");
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
d = 0.00
00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00
------

gcc version: gcc (GCC) 7.0.0 20160613 (experimental)

--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-13 Thread Alexander Cherepanov

On 2016-06-14 00:13, Joseph Myers wrote:

On Tue, 14 Jun 2016, Alexander Cherepanov wrote:


The problem is that parts of representations of two different ordinary values
can form a trap representation.


Oh, you're talking about normalizing the destination rather than the
source of the copy?


Yes.

I don't see this problem with a current gcc so the problem is 
hypothetical AFAICT.


--
Alexander Cherepanov


Re: _Bool and trap representations

2016-06-17 Thread Alexander Cherepanov

On 2016-06-15 17:15, Martin Sebor wrote:

There has been quite a bit of discussion among the committee on
this subject lately (the last part is the subject of DR #451,
though it's discussed in the context of uninitialized objects
with indeterminate values).


Are there notes from these discussions or something?


Notes from discussions during committee meetings are in the meeting
minutes that are posted along with other committee documents on the
public site.   Those that relate to aspects of defect reports are
usually captured in the Committee Discussion and Committee Response
to the DR.  Other than that, committee discussions that take place
on the committee mailing list (such as the recent ones on this topic)
are archived for reference of committee members (unlike C++, the C
archives are not intended to be open to the public).


So it seems the discussion you referred to is not public, that's 
unfortunate.


And to clarify what you wrote about stability of valid representations, 
is padding expected to be stable when it's not specifically set? I.e. is 
the following optimization supposed to be conforming or not?


Source code:

--
#include 

int main(int argc, char **argv)
{
  (void)argv;

  struct { char c; int i; } s = {0, 0};

  printf("%d\n", argc ? ((unsigned char *)&s)[1] : 5);
  printf("%d\n", argc ? ((unsigned char *)&s)[1] : 7);
}
--

Results:

--
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
5
7
--

gcc version: gcc (GCC) 7.0.0 20160616 (experimental)

Of course, clang does essentially the same but the testcase is a bit 
more involved (I can post it if somebody is interested). OTOH clang is 
more predictable in this area because rules for dealing with undefined 
values in llvm are more-or-less documented -- 
http://llvm.org/docs/LangRef.html#undefined-values .


I don't see gcc treating padding in long double as indeterminate in the 
same way as in structs but clang seems to treat them equally.


--
Alexander Cherepanov


Aliasing of arrays

2016-11-30 Thread Alexander Cherepanov

Hi!

Pascal Cuoq communicated to me the following example:

int ar1(int (*p)[3], int (*q)[3])
{
  (*p)[0] = 1;
  (*q)[1] = 2;
  return (*p)[0];
}

gcc of versions 4.9.2 and 7.0.0 20161129 optimize it with -O2 on the 
premise that elements with different indices don't alias:


 :
   0:   c7 47 0c 01 00 00 00movl   $0x1,0xc(%rdi)
   7:   b8 01 00 00 00  mov$0x1,%eax
   c:   c7 46 10 02 00 00 00movl   $0x2,0x10(%rsi)
  13:   c3  retq

That's fine. But then I expect that gcc will also assume that arrays of 
different known lengths don't alias, i.e. that gcc will optimize this 
example:


int ar2(int (*p)[8], int (*q)[7]) {
  (*p)[3] = 1;
  (*q)[3] = 2;
  return (*p)[3];
}

But this is not optimized:

0020 :
  20:   c7 47 0c 01 00 00 00movl   $0x1,0xc(%rdi)
  27:   c7 46 0c 02 00 00 00movl   $0x2,0xc(%rsi)
  2e:   8b 47 0cmov0xc(%rdi),%eax

Is this behavior fully intentional, is the first example optimized too 
aggressively, is an optimization missed in the second example, or is the 
situation more complex?


--
Alexander Cherepanov