Hello gcc team, I have sent the following proposals to the committee, but they require them to be implemented at least into two major compilers, so I am proposing them to be implemented into gcc. This is going to be a rather lengthy e-mail, so TL;DR:
Proposal 1: New pointer-proof keyword _Lengthof to determine array length Motivation: solve silent bugs when a pointer is accidentally used instead of an array Proposal 2: Allow compound literals of static lifetime inside body functions Motivation: self-explanatory Proposal 1: New pointer-proof keyword to determine array length ----------------------------------------------------------------------------------- Extracting the number of elements in an array is a typical operation on C programs, as relying on magic constants or macros is very error-prone and will cause undefined behavior if the size of the array is somehow modified. Typically, most developers use the following operation to determine the number of elements inside an array, usually via a user-defined macro like #define ARRAY_SIZE(a) sizeof (a) / sizeof *(a) However, there is a well-known issue that occurs when a pointer is (most times accidentally) used instead of an array. Despite the similarities between pointers and arrays, a pointer will much likely return an unexpected value when this construct is used, creating a silent error that can be only detected at run-time. By using C11 features and GNU extension typeof, the ARRAY_SIZE macro can be improved so it returns a compile-time error when a pointer is given instead of an array: #define ARRAY_SIZE(a) \ _Generic(&(a), \ typeof (*a)**: (void)0, \ typeof (*a)*const *: (void)0, \ default: sizeof (a) / sizeof ((a)[0])) If this macro was used on the example above, the following compile-time error would trigger: $ gcc lengthof.c -std=gnu11 lengthof.c: In function ‘foo’: lengthof.c:7:30: error: void value not ignored as it ought to be typeof (*a)*const *: (void)0, \ ^~~~~~~ lengthof.c:11:29: note: in expansion of macro ‘ARRAY_SIZE’ for (size_t i = 0 ; i < ARRAY_SIZE(arr); i++) { ^~~~~~~~~~ However, error description is vague and confusing to the developer, and the GNU extension typedef is not part of the standard, as of C11. Since there is no portable, pointer-proof way to determine the number of elements of an array, this document suggests adding a new keyword to the C standard that aims not to introduce any breaking changes to existing code. According to the standard, symbol names preceded by a leading underscore and a capital letter or another underscore are reserved for the implementation. Therefore, the _Lengthof operator is suggested, aiming to provide the same functionality as the ARRAY_SIZE macro above, while providing better diagnostic messages when a pointer is accidentally given. The example above has been modified by introducing this new _Lengthof operator: #include <stddef.h> #include <stdio.h> int main(const int argc, const char *argv[]) { int arr[] = {1, 2, 3, 4, 5, 6}; for (size_t i = 0 ; i < _Lengthof arr; i++) { printf("arr[%zu] = %d\n", i, arr[i]); } return 0; } Similarly to the sizeof operator, the _Lengthof operator returns a value of type size_t that is resolved at compile-time, unless using _Lengthof on variable-length arrays, where complexity is O(1), as in sizeof. Except from this latter case, _Lengthof returns an integer constant that can be used with other constructs such as _Static_assert. For example: int main(const int argc, const char *argv[]) { int arr[] = {1, 2, 3, 4, 5, 6}; _Static_assert (_Lengthof arr == 6); return 0; } ------------------------------------------------------------------------------------------- Proposal 2: Allow compound literals of static lifetime inside body functions ----------------------------------------------------------------------------------- The current standard ISO/IEC 9899 6.5.2.5 (5) states: “If the compound literal occurs outside the body of a function, the object has static storage duration; otherwise, it has automatic storage duration associated with the enclosing block.” The following C11-compliant example provided below makes use of various C99 features (compound literals, designated initializers and variadic macros) and allows defining an array of instances containing arrays of arbitrary size at compile-time: #include <stddef.h> #include <stdio.h> typedef const struct { size_t len; const char *buf; } transfer; #define TRANSFER(...) \ { \ .len = sizeof (const char[]){__VA_ARGS__} \ / sizeof *(const char[]){__VA_ARGS__}, \ .buf = (const char[]){__VA_ARGS__} \ } static transfer tr[] = { /* Using the optional convenience macro. */ TRANSFER(1, 2, 3, 4, 5, 6, 7, 8), /* Or without using the optional convenience macro. */ {.len = 5, .buf = (const char[]){1, 2, 3, 4, 5}}, {.len = 3, .buf = (const char[]){1, 2, 3}} }; int main(const int argc, const char *argv[]) { for (size_t i = 0; i < sizeof tr / sizeof *tr; ++i) { transfer *const t = &tr[i]; for (size_t j = 0; j < t->len; ++j) { printf("operating on tr[%zu].buf[%zu] (0x%02X)\n", i, j, tr[i].buf[j]); } } return 0; } As shown, this code snippet will execute specific actions for each element in the arrays, while considering how many elements have been allocated for each instance. Although it works as expected on any C11-compliant implementation, transfer and tr are defined at a file scope, but are only used by main. However, any attempt to move transfer and tr inside the body of main will trigger a compile-time error since compound literals are not static inside a function body. In this proposal, it is suggested to allow developers to qualify compound literals as static so compound literals with static lifetime can still be used inside the body of a function. Given the previous example, the `static`qualifier is placed inside the compound literal and could be moved into the function body. static transfer tr[] = { /* Using the optional convenience macro. */ TRANSFER(1, 2, 3, 4, 5, 6, 7, 8), /* Or without using the optional convenience macro. */ {.len = 5, .buf = (static const char[]){1, 2, 3, 4, 5}}, {.len = 3, .buf = (static const char[]){1, 2, 3}} }; Motivation behind this proposal: Encouraging developers to reduce the scope of any symbol to its absolute minimum reduces the risk of accessing and/or modifying data accidentally, improves encapsulation and enhances readability. Allowing static compound literals to be defined inside the body of a function provides all these benefits without introducing any breaking changes to existing code, and should not imply great difficulty for currently available implementations. ------------------------------------------------------------------------------- If you made it here, thank you very much for reading. Any feedback is welcome. -- Xavier Del Campo Romero