Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Hi, On Sun, 10 Jan 2010, Dave Korn wrote: Ok. So if I had four ints, and I wanted to cast the pointers to char and compare them as 16 chars, that would be OK, because the chars would alias the ints; but in this case, where they started as chars and I cast them to ints, those ints don't alias against the original chars. Is that an accurate precis? Yes, this is correct. Many people are surprised by that, as they often learned that 'char' is the catch-all escape from aliasing problems. This is only true in one direction (accessing anything as chars), but not in the other (accessing chars as something else). Ciao, Michael.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On Sun, 2010-01-10 at 15:46 +0100, Eric Botcazou wrote: The aliasing rules treat char specially because char is a bit like a poor main's void. Not symmetrically though, only for the type of the lvalue expression used to access the object (C99 6.5.7). BTW in Ada if one uses address clause to overlay a 16 character string and a 4 4-byte integer array (both aliased) which is then accessed what can we expect GCC-wise? Are we safe from aliasing related optimizations? FWIW the program below seems to work as expected. Laurent procedure P is subtype String16 is String (1 .. 16); S16 : aliased String16; for S16'alignment use Integer'Alignment; type Int4 is array (1 .. 4) of Integer; I4 : aliased Int4; for I4'Address use S16'Address; X : constant := 1 + 256 + 256*256 + 256*256*256; begin S16 := (others = Character'Val (1)); if I4 /= (X, X, X, X) then raise Program_Error; end if; end P;
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
BTW in Ada if one uses address clause to overlay a 16 character string and a 4 4-byte integer array (both aliased) which is then accessed what can we expect GCC-wise? Are we safe from aliasing related optimizations? Yes, we use a big hammer to make sure 'Address is immune to these issues. -- Eric Botcazou
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Dave Korn dave.korn.cyg...@googlemail.com writes: Is that really right? The type of the pointer (in6-__s6_addr) that we're casting is unsigned char *, so shouldn't it already alias everything anyway and dereferencing it be allowed, like it is for the casted (a)? I'll file a PR if so. (I can't pretend I find the language in the spec easy to follow.) IIUC both accesses are actually wrong, but in the case of a there is no information about the actual target type, so the compiler cannot optimize. In both cases an object is accessed as type uint32_t, but the effective type is different. Aliasing is not symmetric, the aliasing exception only applies to the case of accessing through a character type, the effective type of the object does not matter. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On 01/10/2010 10:30 AM, Andreas Schwab wrote: Dave Korn dave.korn.cyg...@googlemail.com writes: Is that really right? The type of the pointer (in6-__s6_addr) that we're casting is unsigned char *, so shouldn't it already alias everything anyway and dereferencing it be allowed, like it is for the casted (a)? I'll file a PR if so. (I can't pretend I find the language in the spec easy to follow.) IIUC both accesses are actually wrong, but in the case of a there is no information about the actual target type, so the compiler cannot optimize. In both cases an object is accessed as type uint32_t, but the effective type is different. typedef unsigned char uint8_t; typedef unsigned int uint32_t; struct in6_addr { uint8_t __s6_addr[16]; }; static inline int address_in_use (unsigned char *a, struct in6_addr *in6) { if const uint32_t *)(a))[0] == ((const uint32_t *)(in6-__s6_addr))[0] ((const uint32_t *)(a))[1] == ((const uint32_t *)(in6-__s6_addr))[1] ((const uint32_t *)(a))[2] == ((const uint32_t *)(in6-__s6_addr))[2] ((const uint32_t *)(a))[3] == ((const uint32_t *)(in6-__s6_addr))[3])) return 1; return 0; } Why do you say the effective type is different? As long as uint8_t is a character type and both *a and *in6-__s6_addr have been written as uint32_t, this looks legal. Andrew.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On 01/10/2010 12:39 PM, Andreas Schwab wrote: Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. That is clearly the assumption on which the code relies. Andrew.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Andrew Haley a...@redhat.com writes: On 01/10/2010 12:39 PM, Andreas Schwab wrote: Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. In which way does it make a difference? For aliasing consideration, only the type of access matters. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
RE: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
... The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. In which way does it make a difference? For aliasing consideration, only the type of access matters. The aliasing rules treat char specially because char is a bit like a poor main's void. paul
RE: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
... typedef unsigned char uint8_t; typedef unsigned int uint32_t; struct in6_addr { uint8_t __s6_addr[16]; }; static inline int address_in_use (unsigned char *a, struct in6_addr *in6) { if const uint32_t *)(a))[0] == ((const uint32_t *)(in6-__s6_addr))[0] ((const uint32_t *)(a))[1] == ((const uint32_t *)(in6-__s6_addr))[1] ((const uint32_t *)(a))[2] == ((const uint32_t *)(in6-__s6_addr))[2] ((const uint32_t *)(a))[3] == ((const uint32_t *)(in6-__s6_addr))[3])) return 1; return 0; } That code seems to be broken for reasons other than aliasing -- it can easily give alignment errors on platforms that require natural alignment (because an in6_addr object might be allocated at an odd address). paul
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
The aliasing rules treat char specially because char is a bit like a poor main's void. Not symmetrically though, only for the type of the lvalue expression used to access the object (C99 6.5.7). -- Eric Botcazou
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On 01/10/2010 02:39 PM, Paul Koning wrote: ... typedef unsigned char uint8_t; typedef unsigned int uint32_t; struct in6_addr { uint8_t __s6_addr[16]; }; static inline int address_in_use (unsigned char *a, struct in6_addr *in6) { if const uint32_t *)(a))[0] == ((const uint32_t *)(in6-__s6_addr))[0] ((const uint32_t *)(a))[1] == ((const uint32_t *)(in6-__s6_addr))[1] ((const uint32_t *)(a))[2] == ((const uint32_t *)(in6-__s6_addr))[2] ((const uint32_t *)(a))[3] == ((const uint32_t *)(in6-__s6_addr))[3])) return 1; return 0; } That code seems to be broken for reasons other than aliasing -- it can easily give alignment errors on platforms that require natural alignment (because an in6_addr object might be allocated at an odd address). It that case, the allocator would be broken. But the breakage isn't here, it's in the allocator. I don't think there's anything wrong with this code. Andrew.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Andrew Haley wrote: On 01/10/2010 12:39 PM, Andreas Schwab wrote: Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. That is clearly the assumption on which the code relies. But in the new compilers it's an integer type, not a character type--from the spec: 7.18.1.1 Exact-width integer types 1 The typedef name intN_t designates a signed integer type with width N , no padding bits, and a two’s complement representation. Thus, int8_t denotes a signed integer type with a width of exactly 8 bits. 2 The typedef name uintN_t designates an unsigned integer type with width N . Thus, uint24_t denotes an unsigned integer type with a width of exactly 24 bits. 3 These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names. Patrick
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On 01/10/2010 04:58 PM, Patrick Horgan wrote: Andrew Haley wrote: On 01/10/2010 12:39 PM, Andreas Schwab wrote: Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. That is clearly the assumption on which the code relies. But in the new compilers it's an integer type, not a character type--from the spec: In which case this code is undefined, I agree. IMO it would be unwise for gcc not to treat uint8_t as a member of Alias Set 0, since doing so would break existing code (of which this is an excellent example) for no useful purpose. Andrew.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On Sun, 10 Jan 2010, Patrick Horgan wrote: But in the new compilers it's an integer type, not a character type--from the spec: signed char is a signed integer type, and unsigned char is an unsigned integer type. (char is neither, although it behaves the same as one of signed char and unsigned char, so char is not a valid choice for any of the intN_t and uintN_t typedefs.) These types are also character types. Though it might be useful to have extended integer types for int8_t and uint8_t that don't have the special aliasing properties of character types, as suggested in http://gcc.gnu.org/ml/gcc/2000-05/msg01106.html, this has not been implemented. -- Joseph S. Myers jos...@codesourcery.com
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Patrick Horgan wrote: Andrew Haley wrote: On 01/10/2010 12:39 PM, Andreas Schwab wrote: Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. That is clearly the assumption on which the code relies. But in the new compilers it's an integer type, not a character type--from the spec: It's a typedef at the top of the sample code: typedef unsigned char uint8_t; The example doesn't rely on any headers. cheers, DaveK
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Andrew Haley wrote: On 01/10/2010 02:39 PM, Paul Koning wrote: ... typedef unsigned char uint8_t; typedef unsigned int uint32_t; struct in6_addr { uint8_t __s6_addr[16]; }; static inline int address_in_use (unsigned char *a, struct in6_addr *in6) { if const uint32_t *)(a))[0] == ((const uint32_t *)(in6-__s6_addr))[0] ((const uint32_t *)(a))[1] == ((const uint32_t *)(in6-__s6_addr))[1] ((const uint32_t *)(a))[2] == ((const uint32_t *)(in6-__s6_addr))[2] ((const uint32_t *)(a))[3] == ((const uint32_t *)(in6-__s6_addr))[3])) return 1; return 0; } That code seems to be broken for reasons other than aliasing -- it can easily give alignment errors on platforms that require natural alignment (because an in6_addr object might be allocated at an odd address). It that case, the allocator would be broken. But the breakage isn't here, it's in the allocator. I don't think there's anything wrong with this code. Yes, and regardless; the compiler doesn't need to worry about that, it just needs to make the assumption that it /is/ aligned (since it is allowed to assume I will not invoke undefined behaviour). cheers, DaveK
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On Sun, Jan 10, 2010 at 08:58:47AM -0800, Patrick Horgan wrote: Andrew Haley wrote: On 01/10/2010 12:39 PM, Andreas Schwab wrote: Andrew Haley a...@redhat.com writes: Why do you say the effective type is different? The object type is uint8_t, but accessed as uint32_t. That is undefined. Unless uint8_t is a character type, as I understand it. That is clearly the assumption on which the code relies. But in the new compilers it's an integer type, not a character type--from the spec: A character type is an integer type, so it is quite possible for uint8_t to qualify both as a character type and as an integer type. -- Insert your favourite quote here. Erik Trulsson ertr1...@student.uu.se
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Eric Botcazou wrote: The aliasing rules treat char specially because char is a bit like a poor main's void. Not symmetrically though, only for the type of the lvalue expression used to access the object (C99 6.5.7). Ok. So if I had four ints, and I wanted to cast the pointers to char and compare them as 16 chars, that would be OK, because the chars would alias the ints; but in this case, where they started as chars and I cast them to ints, those ints don't alias against the original chars. Is that an accurate precis? Andreas, you wrote: Aliasing is not symmetric. To be precise, we're saying it's not commutative here; that (A aliases B) does not imply (B aliases A)? I don't think I've ever heard it expressed that explicitly before. So, the only way to do this trick nowadays is to use a union, right? union u { uint8_t aschars[16]; uint32_t aslongs[4]; }; static inline int address_in_use2 (unsigned char *a, struct in6_addr *in6) { union u *u1 = (union u *)in6.__s6_addr; union u *u2 = (union u *)a; if ((u1-aslongs[0] == u2-aslongs[0]) (u1-aslongs[1] == u2-aslongs[1]) (u1-aslongs[2] == u2-aslongs[2]) (u1-aslongs[3] == u2-aslongs[3])) return 1; return 0; } ... and this is allowed because I have cast the char pointers into pointers to an aggregate or union type that includes one of the aforementioned types, and the object in question currently has no effective type because it was stored into by chars (when it was a char[] array in struct in6_addr) so the type of the lvalue when I dereference it is just the type of the lvalue used for the access? Also, that's getting kinda tricky to write in a macro, isn't it? I mean without having to use something highly gcc-specific like expression statements? cheers, DaveK
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Dave Korn dave.korn.cyg...@googlemail.com writes: Andreas, you wrote: Aliasing is not symmetric. To be precise, we're saying it's not commutative here; that (A aliases B) does not imply (B aliases A)? I don't think I've ever heard it expressed that explicitly before. Aliasing is not an operator, it's a property of an lvalue expression. static inline int address_in_use2 (unsigned char *a, struct in6_addr *in6) { return memcmp (a, in6.__s6_addr, sizeof (in6.__s6_addr)) == 0; } Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Andreas Schwab wrote: Dave Korn dave.korn.cyg...@googlemail writes: Andreas, you wrote: Aliasing is not symmetric. To be precise, we're saying it's not commutative here; that (A aliases B) does not imply (B aliases A)? I don't think I've ever heard it expressed that explicitly before. Aliasing is not an operator, it's a property of an lvalue expression. Yes, fair enough; but properties can commute just as much as operators can (although it's perhaps less intuitively surprising when they don't). Anyway I just mentioned it because I think it would be a useful term to bandy about a bit more often in aliasing discussions! cheers, DaveK
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Yes, fair enough; but properties can commute just as much as operators can (although it's perhaps less intuitively surprising when they don't). To be picky, binary operators are commutative (or not), binary relations are symmetric (or not). -- Eric Botcazou
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Eric Botcazou wrote: Yes, fair enough; but properties can commute just as much as operators can (although it's perhaps less intuitively surprising when they don't). To be picky, binary operators are commutative (or not), binary relations are symmetric (or not). Ah, I wasn't aware of the subtleties of the terminology, thanks for pointing it out. cheers, DaveK
Re: Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
On Sun, Jan 10, 2010 at 06:24:14PM +, Dave Korn wrote: Eric Botcazou wrote: The aliasing rules treat char specially because char is a bit like a poor main's void. Not symmetrically though, only for the type of the lvalue expression used to access the object (C99 6.5.7). Ok. So if I had four ints, and I wanted to cast the pointers to char and compare them as 16 chars, that would be OK, because the chars would alias the ints; but in this case, where they started as chars and I cast them to ints, those ints don't alias against the original chars. Is that an accurate precis? Andreas, you wrote: Aliasing is not symmetric. To be precise, we're saying it's not commutative here; that (A aliases B) does not imply (B aliases A)? I don't think I've ever heard it expressed that explicitly before. I think you (and several other people) are actually conflating two separate concepts here. One is aliasing, which can be described as when can two different lvalues actually refer to the same object. The other is if it is actually allowed to access a particular object through a given lvalue. These two concepts partially overlap, but they are not the same. I would say that aliasing actually is symmetric such that if (A aliases B) then (B aliases A), but that does not necessarily mean that it is safe to access the underlying object through both A and B. (This just depends on how one defines the term aliasing. Note that aliasing is only defined indirectly in the C standard, and the term itself only occurs in non-normative text.) Take for example the following situation: float f=0; int main(void) { char *cp = (char *)f; float *fp = f; *cp = 42; *fp; return 0; } Here the lvalues '*cp' and '*fp' clearly alias each other, in that they both refer to the object 'f'. Section 6.5p7 does not disallow access to 'f' either through '*fp' (which has the same type as the effective type of the object, i.e. 'float') or through '*cp' (whose type is a character type.) (Also note that 6.5p7 is irrelevant when storing a value into '*cp' since that section only talks about accessing the stored value of an object, i.e. reading from it.) However the access to 'f' through the lvalue '*fp' still invokes undefined behaviour since it may at that point contain a trap representation. (If the order of the expressions '*cp = 42' and '*fp' had been switched around, then no undefined behaviour would have occurred.) The standard has special language at several places that essentially says that you can treat any object as an array of unsigned chars both when reading and writing to the object. (You can mostly treat any object as an array of char, or array of signed char as well, but since a signed char can have a trap representation it is not necessarily safe to read from such an lvalue in all situations. Unsigned char is the only type guaranteed to not have any trap representation.) There is no such special language for any other types. If one ignores the special case of character types, the rules basically boil down to: If an object has been declared with some specific type (this includes all objects with automatic or static storage duration) then you can only read from it using an lvalue whose type is compatible with the type as the object. (Possibly with differences in signed/unsigned specification and in qualifiers as described in 6.5p7) If an object does not have a declared type (such as memory allocated through malloc) then if you have stored a value there using an lvalue of some specific type, then you can only read the value using an lvalue of compatible type. (Again possibly with differences in qualifiers and signed/unsigned specifiers.) (Assuming the object is suitably aligned and large enough, there is not really any prohibition against storing a value into an object using an lvalue of a completely different type, but since you cannot read that value back there is not much point in doing so.) Note that most sorts of type punning (i.e. treating an object as if it was of a different type than it actually is) is disallowed by the C standard. In particular accessing an array of chars as if it was an array of int is not allowed any more than accessing an array of int as if it was an array of float. -- Insert your favourite quote here. Erik Trulsson ertr1...@student.uu.se
Sorry to mention aliasing again, but is the standard IN6_ARE_ADDR_EQUAL really wrong?
Hello, I don't want to reopen the long-rumbling discussion about what gcc ought to /want/ to do; I'd just like to know if warning in this case is indeed what it wants to do. The standard definition of IN6_ARE_ADDR_EQUAL looks a bit like this: #define IN6_ARE_ADDR_EQUAL(a, b) \ (((const uint32_t *)(a))[0] == ((const uint32_t *)(b))[0] \ ((const uint32_t *)(a))[1] == ((const uint32_t *)(b))[1] \ ((const uint32_t *)(a))[2] == ((const uint32_t *)(b))[2] \ ((const uint32_t *)(a))[3] == ((const uint32_t *)(b))[3]) That's cygwin's, but glibc is roughly the same (modulo s/const/__const/g). Anyhow it gives a strict aliasing warning now, that it didn't used to in 4.3.4: reduced testcase is - $ cat walias1.c typedef unsigned char uint8_t; typedef unsigned int uint32_t; struct in6_addr { uint8_t __s6_addr[16]; }; static inline int address_in_use (unsigned char *a, struct in6_addr *in6) { if const uint32_t *)(a))[0] == ((const uint32_t *)(in6-__s6_addr))[0] ((const uint32_t *)(a))[1] == ((const uint32_t *)(in6-__s6_addr))[1] ((const uint32_t *)(a))[2] == ((const uint32_t *)(in6-__s6_addr))[2] ((const uint32_t *)(a))[3] == ((const uint32_t *)(in6-__s6_addr))[3])) return 1; return 0; } ad...@ubik /tmp/warning $ /usr/bin/gcc-4 -c walias1.c -Wstrict-aliasing -O2 ad...@ubik /tmp/warning $ gcc-4 -c walias1.c -Wstrict-aliasing -O2 walias1.c: In function 'address_in_use': walias1.c:14:3: warning: dereferencing type-punned pointer will break strict-aliasing rules - Is that really right? The type of the pointer (in6-__s6_addr) that we're casting is unsigned char *, so shouldn't it already alias everything anyway and dereferencing it be allowed, like it is for the casted (a)? I'll file a PR if so. (I can't pretend I find the language in the spec easy to follow.) cheers, DaveK