Am Montag, dem 18.03.2024 um 14:29 +0100 schrieb David Brown: > > On 18/03/2024 12:41, Martin Uecker wrote: > > > > > > Hi David, > > > > Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown: > > > Hi, > > > > > > I would very glad to see this change in the standards. > > > > > > > > > Should "byte type" include all character types (signed, unsigned and > > > plain), or should it be restricted to "unsigned char" since that is the > > > "byte" type ? (I think allowing all character types makes sense, but > > > only unsigned char is guaranteed to be suitable for general object > > > backing store.) > > > > At the moment, the special type that can access all others are > > all non-atomic character types. So for symmetry reasons, it > > seems that this is also what we want for backing store. > > > > I am not sure what you mean by "only unsigned char". Are you talking > > about C++? "unsigned char" has no special role in C. > > > > "unsigned char" does have a special role in C - in 6.2.6.1p4 it > describes any object as being able to be copied to an array of unsigned > char to get the "object representation". > The same is not true for an > array of "signed char". I think it would be possible to have an > implementation where "signed char" was 8-bit two's complement except > that 0x80 would be a trap representation rather than -128. I am not > sure of the consequences of such an implementation (assuming I am even > correct in it being allowed).
Yes, but with C23 this is not possible anymore. I think signed char or char should work equally well now. > > > > > > > Should it also include "uint8_t" (if it exists) ? "uint8_t" is often an > > > alias for "unsigned char", but it could be something different, like an > > > alias for __UINT8_TYPE__, or "unsigned int > > > __attribute__((mode(QImode)))", which is used in the AVR gcc port. > > > > I think this might be a reason to not include it, as it could > > affect aliasing analysis. At least, this would be a different > > independent change to consider. > > > > I think it is important that there is a guarantee here, because people > do use uint8_t as a generic "raw memory" type. Embedded standards like > MISRA strongly discourage the use of "unsized" types such as "unsigned > char", and it is generally assumed that "uint8_t" has the aliasing > superpowers of a character type. But it is possible that the a change > would be better put in the library section on <stdint.h> rather than > this section. > > > > > > > In my line of work - small-systems embedded development - it is common > > > to have "home-made" or specialised memory allocation systems rather than > > > relying on a generic heap. This is, I think, some of the "existing > > > practice" that you are considering here - there is a "backing store" of > > > some sort that can be allocated and used as objects of a type other than > > > the declared type of the backing store. While a simple unsigned char > > > array is a very common kind of backing store, there are others that are > > > used, and it would be good to be sure of the correctness guarantees for > > > these. Possibilities that I have seen include: > > > > > > unsigned char heap1[N]; > > > > > > uint8_t heap2[N]; > > > > > > union { > > > double dummy_for_alignment; > > > char heap[N]; > > > } heap3; > > > > > > struct { > > > uint32_t capacity; > > > uint8_t * p_next_free; > > > uint8_t heap[N]; > > > } heap4; > > > > > > uint32_t heap5[N]; > > > > > > Apart from this last one, if "uint8_t" is guaranteed to be a "byte > > > type", then I believe your wording means that these unions and structs > > > would also work as "byte arrays". But it might be useful to add a > > > footnote clarifying that. > > > > > > > I need to think about this. > > > > Thank you. > > I see people making a lot of assumptions in their embedded programming > that are not fully justified in the C standards. Sometimes the > assumptions are just bad, or it would be easy to write code without the > assumptions. But at other times it would be very awkward or inefficient > to write code that is completely "safe" (in terms of having fully > defined behaviour from the C standards or from implementation-dependent > behaviour). Making your own dynamic memory allocation functions is one > such case. So I have a tendency to jump on any suggestion of changes to > the C (or C++) standards that could let people write such essential code > in a safer or more efficient manner. That something is undefined does not automatically mean it is forbidden or unsafe. It simply means it is not portable. I think in the embedded space it will be difficult to make everything well defined. But I fully agree that widely used techniques should ideally be based on defined behavior and we should change the standard accordingly. > > > > (It is also not uncommon to have the backing space allocated by the > > > linker, but then it falls under the existing "no declared type" case.) > > > > Yes, although with the change we would make the "no declared type" also > > be byte arrays, so there is then simply no difference anymore. > > > > Fair enough. (Linker-defined storage does not just have no declared > type, it has no directly declared size or other properties either. The > start and the stop of the storage area is typically declared as "extern > uint8_t __space_start[], __space_stop[];", or perhaps as single > characters or uint32_t types. The space in between is just calculated > as the difference between pointers to these.) > > > > > > > > > > I would not want uint32_t to be considered an "alias anything" type, but > > > I have occasionally seen such types used for memory store backings. It > > > is perhaps worth considering defining "byte type" as "non-atomic > > > character type, [u]int8_t (if they exist), or other > > > implementation-defined types". > > > > This could make sense, the question is whether we want to encourage > > the use of other types for this use case, as this would then not > > be portable. > > I think uint8_t should be highly portable, except to targets where it > does not exist (and in this day and age, that basically means some DSP > devices that have 16-bit, 24-bit or 32-bit char). > > There is precedence for this wording, however, in 6.7.2.1p5 for > bit-fields - "A bit-field shall have a type that is a qualified or > unqualified version of _Bool, signed int, unsigned int, or some other > implementation-defined type". > > I think it should be clear enough that using an implementation-defined > type rather than a character type would potentially limit portability. > For the kinds of systems I am thinking off, extreme portability is > normally not of prime concern - efficiency on a particular target with a > particular compiler is often more important. Thanks, I will bring back this information to WG14. > > > > > Are there important reason for not using "unsigned char" ? > > > > What is "important" is often a subjective matter. One reason many > people use "uint8_t" is that they prefer to be explicit about sizes, and > would rather have a hard error if the code is used on a target that > doesn't support the size. Some coding standards, such as the very > common (though IMHO somewhat flawed) MISRA standard, strongly encourage > size-specific types and consider the use of "int" or "unsigned char" as > a violation of their rules and directives. Many libraries and code > bases with a history older than C99 have their own typedef names for > size-specific types or low-level storage types, such as "sys_uint8", > "BYTE", "u8", and so on, and users may prefer these for consistency. > And for people with a background in hardware or assembly (not uncommon > for small systems embedded programming), or other languages such as > Rust, "unsigned char" sounds vague, poorly defined, and somewhat > meaningless as a type name for a raw byte of memory or a minimal sized > unsigned integer. > > Of course most alternative names for bytes would be typedefs of > "unsigned char" and therefore work just the same way. But as noted > before, uint8_t could be defined in another manner on some systems (and > on GCC for the AVR, it /is/ defined in a different way - though I have > no idea why). > > And bigger types, such as uint32_t, have been used to force alignment > for backing store (either because the compiler did not support _Alignas, > or the programmer did not know about it). (But I am not suggesting that > plain "uint32_t" should be considered a "byte type" for aliasing purposes.) > > > > > > > Some other compilers might guarantee not to do type-based alias analysis > > > and thus view all types as "byte types" in this way. For gcc, there > > > could be a kind of reverse "may_alias" type attribute to create such > > > types. > > > > > > > > > > > > There are a number of other features that could make allocation > > > functions more efficient and safer in use, and which could be ideally be > > > standardised in the C standards or at least added as gcc extensions, but > > > I think that's more than you are looking for here! > > > > It is possible to submit proposal to WG14. > > > > Yes, I know. But giving you some feedback here is a step in that > direction - even if it turns out that it doesn't affect your wording in > the end. Any kind of feedback is very welcome. Thank you! Martin > > > On 18/03/2024 08:03, Martin Uecker via Gcc wrote: > > > > > > > > Hi, > > > > > > > > can you please take a quick look at this? This is intended to align > > > > the C standard with existing practice with respect to aliasing by > > > > removing the special rules for "objects with no declared type" and > > > > making it fully symmetric and only based on types with non-atomic > > > > character types being able to alias everything. > > > > > > > > > > > > Unrelated to this change, I have another question: I wonder if GCC > > > > (or any other compiler) actually exploits the " or is copied as an > > > > array of byte type, " rule to make assumptions about the effective > > > > types of the target array? I know compilers do this work memcpy... > > > > Maybe also if a loop is transformed to memcpy? > > > > > > > > Martin > > > > > > > > > > > > Add the following definition after 3.5, paragraph 2: > > > > > > > > byte array > > > > object having either no declared type or an array of objects declared > > > > with a byte type > > > > > > > > byte type > > > > non-atomic character type > > > > > > > > Modify 6.5,paragraph 6: > > > > The effective type of an object that is not a byte array, for an access > > > > to its > > > > stored value, is the declared type of the object.97) If a value is > > > > stored into a byte array through an lvalue having a byte type, then > > > > the type of the lvalue becomes the effective type of the object for that > > > > access and for subsequent accesses that do not modify the stored value. > > > > If a value is copied into a byte array using memcpy or memmove, or is > > > > copied as an array of byte type, then the effective type of the > > > > modified object for that access and for subsequent accesses that do not > > > > modify the value is the effective type of the object from which the > > > > value is copied, if it has one. For all other accesses to a byte array, > > > > the effective type of the object is simply the type of the lvalue used > > > > for the access. > > > > > > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf > > > > > > > > > > > > > > > > > -- Univ.-Prof. Dr. rer. nat. Martin Uecker Graz University of Technology Institute of Biomedical Imaging