Re: aliasing

Martin Uecker via Gcc Mon, 18 Mar 2024 08:01:23 -0700

Am Montag, dem 18.03.2024 um 14:29 +0100 schrieb David Brown:
> 
> On 18/03/2024 12:41, Martin Uecker wrote:
> > 
> > 
> > Hi David,
> > 
> > Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
> > > Hi,
> > > 
> > > I would very glad to see this change in the standards.
> > > 
> > > 
> > > Should "byte type" include all character types (signed, unsigned and
> > > plain), or should it be restricted to "unsigned char" since that is the
> > > "byte" type ?  (I think allowing all character types makes sense, but
> > > only unsigned char is guaranteed to be suitable for general object
> > > backing store.)
> > 
> > At the moment, the special type that can access all others are
> > all non-atomic character types.  So for symmetry reasons, it
> > seems that this is also what we want for backing store.
> > 
> > I am not sure what you mean by "only unsigned char". Are you talking
> > about C++?  "unsigned char" has no special role in C.
> > 
> 
> "unsigned char" does have a special role in C - in 6.2.6.1p4 it 
> describes any object as being able to be copied to an array of unsigned 
> char to get the "object representation". 
>  The same is not true for an 
> array of "signed char".  I think it would be possible to have an 
> implementation where "signed char" was 8-bit two's complement except 
> that 0x80 would be a trap representation rather than -128.  I am not 
> sure of the consequences of such an implementation (assuming I am even 
> correct in it being allowed).


Yes, but with C23 this is not possible anymore. I think signed
char or char should work equally well now. 

> 
> > > 
> > > Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an
> > > alias for "unsigned char", but it could be something different, like an
> > > alias for __UINT8_TYPE__, or "unsigned int
> > > __attribute__((mode(QImode)))", which is used in the AVR gcc port.
> > 
> > I think this might be a reason to not include it, as it could
> > affect aliasing analysis. At least, this would be a different
> > independent change to consider.
> > 
> 
> I think it is important that there is a guarantee here, because people 
> do use uint8_t as a generic "raw memory" type.  Embedded standards like 
> MISRA strongly discourage the use of "unsized" types such as "unsigned 
> char", and it is generally assumed that "uint8_t" has the aliasing 
> superpowers of a character type.  But it is possible that the a change 
> would be better put in the library section on <stdint.h> rather than 
> this section.
> 
> > > 
> > > In my line of work - small-systems embedded development - it is common
> > > to have "home-made" or specialised memory allocation systems rather than
> > > relying on a generic heap.  This is, I think, some of the "existing
> > > practice" that you are considering here - there is a "backing store" of
> > > some sort that can be allocated and used as objects of a type other than
> > > the declared type of the backing store.  While a simple unsigned char
> > > array is a very common kind of backing store, there are others that are
> > > used, and it would be good to be sure of the correctness guarantees for
> > > these.  Possibilities that I have seen include:
> > > 
> > > unsigned char heap1[N];
> > > 
> > > uint8_t heap2[N];
> > > 
> > > union {
> > >   double dummy_for_alignment;
> > >   char heap[N];
> > > } heap3;
> > > 
> > > struct {
> > >   uint32_t capacity;
> > >   uint8_t * p_next_free;
> > >   uint8_t heap[N];
> > > } heap4;
> > > 
> > > uint32_t heap5[N];
> > > 
> > > Apart from this last one, if "uint8_t" is guaranteed to be a "byte
> > > type", then I believe your wording means that these unions and structs
> > > would also work as "byte arrays".  But it might be useful to add a
> > > footnote clarifying that.
> > > 
> > 
> > I need to think about this.
> > 
> 
> Thank you.
> 
> I see people making a lot of assumptions in their embedded programming 
> that are not fully justified in the C standards.  Sometimes the 
> assumptions are just bad, or it would be easy to write code without the 
> assumptions.  But at other times it would be very awkward or inefficient 
> to write code that is completely "safe" (in terms of having fully 
> defined behaviour from the C standards or from implementation-dependent 
> behaviour).  Making your own dynamic memory allocation functions is one 
> such case.  So I have a tendency to jump on any suggestion of changes to 
> the C (or C++) standards that could let people write such essential code 
> in a safer or more efficient manner.

That something is undefined does not automatically mean it is 
forbidden or unsafe.  It simply means it is not portable.  I think
in the embedded space it will be difficult to make everything well
defined.  But I fully agree that widely used techniques should
ideally be based on defined behavior and we should  change the
standard accordingly.

> 
> > > (It is also not uncommon to have the backing space allocated by the
> > > linker, but then it falls under the existing "no declared type" case.)
> > 
> > Yes, although with the change we would make the "no declared type" also
> > be byte arrays, so there is then simply no difference anymore.
> > 
> 
> Fair enough.  (Linker-defined storage does not just have no declared 
> type, it has no directly declared size or other properties either.  The 
> start and the stop of the storage area is typically declared as "extern 
> uint8_t __space_start[], __space_stop[];", or perhaps as single 
> characters or uint32_t types.  The space in between is just calculated 
> as the difference between pointers to these.)
> 
> > > 
> > > 
> > > I would not want uint32_t to be considered an "alias anything" type, but
> > > I have occasionally seen such types used for memory store backings.  It
> > > is perhaps worth considering defining "byte type" as "non-atomic
> > > character type, [u]int8_t (if they exist), or other
> > > implementation-defined types".
> > 
> > This could make sense, the question is whether we want to encourage
> > the use of other types for this use case, as this would then not
> > be portable.
> 
> I think uint8_t should be highly portable, except to targets where it 
> does not exist (and in this day and age, that basically means some DSP 
> devices that have 16-bit, 24-bit or 32-bit char).
> 
> There is precedence for this wording, however, in 6.7.2.1p5 for 
> bit-fields - "A bit-field shall have a type that is a qualified or 
> unqualified version of _Bool, signed int, unsigned int, or some other 
> implementation-defined type".
> 
> I think it should be clear enough that using an implementation-defined 
> type rather than a character type would potentially limit portability. 
> For the kinds of systems I am thinking off, extreme portability is 
> normally not of prime concern - efficiency on a particular target with a 
> particular compiler is often more important.

Thanks, I will bring back this information to WG14.
> 
> > 
> > Are there important reason for not using "unsigned char" ?
> > 
> 
> What is "important" is often a subjective matter.  One reason many 
> people use "uint8_t" is that they prefer to be explicit about sizes, and 
> would rather have a hard error if the code is used on a target that 
> doesn't support the size.  Some coding standards, such as the very 
> common (though IMHO somewhat flawed) MISRA standard, strongly encourage 
> size-specific types and consider the use of "int" or "unsigned char" as 
> a violation of their rules and directives.  Many libraries and code 
> bases with a history older than C99 have their own typedef names for 
> size-specific types or low-level storage types, such as "sys_uint8", 
> "BYTE", "u8", and so on, and users may prefer these for consistency. 
> And for people with a background in hardware or assembly (not uncommon 
> for small systems embedded programming), or other languages such as 
> Rust, "unsigned char" sounds vague, poorly defined, and somewhat 
> meaningless as a type name for a raw byte of memory or a minimal sized 
> unsigned integer.
> 
> Of course most alternative names for bytes would be typedefs of 
> "unsigned char" and therefore work just the same way.  But as noted 
> before, uint8_t could be defined in another manner on some systems (and 
> on GCC for the AVR, it /is/ defined in a different way - though I have 
> no idea why).
> 
> And bigger types, such as uint32_t, have been used to force alignment 
> for backing store (either because the compiler did not support _Alignas, 
> or the programmer did not know about it).  (But I am not suggesting that 
> plain "uint32_t" should be considered a "byte type" for aliasing purposes.)
> 
> > > 
> > > Some other compilers might guarantee not to do type-based alias analysis
> > > and thus view all types as "byte types" in this way.  For gcc, there
> > > could be a kind of reverse "may_alias" type attribute to create such 
> > > types.
> > > 
> > > 
> > > 
> > > There are a number of other features that could make allocation
> > > functions more efficient and safer in use, and which could be ideally be
> > > standardised in the C standards or at least added as gcc extensions, but
> > > I think that's more than you are looking for here!
> > 
> > It is possible to submit proposal to WG14.
> > 
> 
> Yes, I know.  But giving you some feedback here is a step in that 
> direction - even if it turns out that it doesn't affect your wording in 
> the end.

Any kind of feedback is very welcome. Thank you!

Martin

> > > On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > can you please take a quick look at this? This is intended to align
> > > > the C standard with existing practice with respect to aliasing by
> > > > removing the special rules for "objects with no declared type" and
> > > > making it fully symmetric and only based on types with non-atomic
> > > > character types being able to alias everything.
> > > > 
> > > > 
> > > > Unrelated to this change, I have another question:  I wonder if GCC
> > > > (or any other compiler) actually exploits the " or is copied as an
> > > > array of  byte type, " rule to  make assumptions about the effective
> > > > types of the target array? I know compilers do this work memcpy...
> > > > Maybe also if a loop is transformed to memcpy?
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > Add the following definition after 3.5, paragraph 2:
> > > > 
> > > > byte array
> > > > object having either no declared type or an array of objects declared 
> > > > with a byte type
> > > > 
> > > > byte type
> > > > non-atomic character type
> > > > 
> > > > Modify 6.5,paragraph 6:
> > > > The effective type of an object that is not a byte array, for an access 
> > > > to its
> > > > stored value, is the declared type of the object.97) If a value is
> > > > stored into a byte array through an lvalue having a byte type, then
> > > > the type of the lvalue becomes the effective type of the object for that
> > > > access and for subsequent accesses that do not modify the stored value.
> > > > If a value is copied into a byte array using memcpy or memmove, or is
> > > > copied as an array of byte type, then the effective type of the
> > > > modified object for that access and for subsequent accesses that do not
> > > > modify the value is the effective type of the object from which the
> > > > value is copied, if it has one. For all other accesses to a byte array,
> > > > the effective type of the object is simply the type of the lvalue used
> > > > for the access.
> > > > 
> > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
> > > > 
> > > > 
> > > > 
> > > 
> > 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging

Re: aliasing

Reply via email to