Hello gcc team,

I have sent the following proposals to the committee, but they require them to 
be implemented at least into two major compilers, so I am proposing them to be 
implemented into gcc. This is going to be a rather lengthy e-mail, so TL;DR:

Proposal 1: New pointer-proof keyword _Lengthof to determine array length
Motivation: solve silent bugs when a pointer is accidentally used instead of an 
array

Proposal 2: Allow compound literals of static lifetime inside body functions
Motivation: self-explanatory

Proposal 1: New pointer-proof keyword to determine array length
-----------------------------------------------------------------------------------
Extracting the number of elements in an array is a typical operation on C 
programs, as relying on magic constants or macros is very error-prone and will 
cause undefined behavior if the size of the array is somehow modified.

Typically, most developers use the following operation to determine the number 
of elements inside an array, usually via a user-defined macro like #define 
ARRAY_SIZE(a) sizeof (a) / sizeof *(a)

However, there is a well-known issue that occurs when a pointer is (most times 
accidentally) used instead of an array. Despite the similarities between 
pointers and arrays, a pointer will much likely return an unexpected value when 
this construct is used, creating a silent error that can be only detected at 
run-time.

By using C11 features and GNU extension typeof, the ARRAY_SIZE macro can be 
improved so it returns a compile-time error when a pointer is given instead of 
an array:

#define ARRAY_SIZE(a)  \
     _Generic(&(a), \
         typeof (*a)**: (void)0, \
         typeof (*a)*const *: (void)0, \
         default: sizeof (a) / sizeof ((a)[0]))

If this macro was used on the example above, the following compile-time error 
would trigger:

$ gcc lengthof.c -std=gnu11
lengthof.c: In function ‘foo’:
lengthof.c:7:30: error: void value not ignored as it ought to be
          typeof (*a)*const *: (void)0,               \
                               ^~~~~~~
lengthof.c:11:29: note: in expansion of macro ‘ARRAY_SIZE’
      for (size_t i = 0 ; i < ARRAY_SIZE(arr); i++) {
                              ^~~~~~~~~~
However, error description is vague and confusing to the developer, and the GNU 
extension typedef is not part of the standard, as of C11.

Since there is no portable, pointer-proof way to determine the number of 
elements of an array, this document suggests adding a new keyword to the C 
standard that aims not to introduce any breaking changes to existing code.

According to the standard, symbol names preceded by a leading underscore and a 
capital letter or another underscore are reserved for the implementation. 
Therefore, the _Lengthof operator is suggested, aiming to provide the same 
functionality as the ARRAY_SIZE macro above, while providing better diagnostic 
messages when a pointer is accidentally given.

The example above has been modified by introducing this new _Lengthof operator:

#include <stddef.h>
#include <stdio.h>

int main(const int argc, const char *argv[]) {
     int arr[] = {1, 2, 3, 4, 5, 6};

     for (size_t i = 0 ; i < _Lengthof arr; i++) {
         printf("arr[%zu] = %d\n", i, arr[i]);
     }

     return 0;
}

Similarly to the sizeof operator, the _Lengthof operator returns a value of 
type size_t that is resolved at compile-time, unless using _Lengthof on 
variable-length arrays, where complexity is O(1), as in sizeof. Except from 
this latter case, _Lengthof returns an integer constant that can be used with 
other constructs such as _Static_assert. For example:

int main(const int argc, const char *argv[]) {
     int arr[] = {1, 2, 3, 4, 5, 6};
     _Static_assert (_Lengthof arr == 6);
     return 0;
}
-------------------------------------------------------------------------------------------

Proposal 2:  Allow compound literals of static lifetime inside body functions
-----------------------------------------------------------------------------------
The current standard ISO/IEC 9899 6.5.2.5 (5) states:
“If the compound literal occurs outside the body of a function, the object has 
static storage duration; otherwise, it has automatic storage duration 
associated with the enclosing block.”

The following C11-compliant example provided below makes use of various C99 
features (compound literals, designated initializers and variadic macros) and 
allows defining an array of instances containing arrays of arbitrary size at 
compile-time:

#include <stddef.h>
#include <stdio.h>

typedef const struct {
    size_t len;
    const char *buf;
} transfer;

#define TRANSFER(...) \
    { \
        .len = sizeof (const char[]){__VA_ARGS__} \
             / sizeof *(const char[]){__VA_ARGS__}, \
        .buf = (const char[]){__VA_ARGS__} \
    }

static transfer tr[] = {
    /* Using the optional convenience macro. */
    TRANSFER(1, 2, 3, 4, 5, 6, 7, 8),

    /* Or without using the optional convenience macro. */
    {.len = 5, .buf = (const char[]){1, 2, 3, 4, 5}},
    {.len = 3, .buf = (const char[]){1, 2, 3}}
};

int main(const int argc, const char *argv[]) {
    for (size_t i = 0; i < sizeof tr / sizeof *tr; ++i) {
        transfer *const t = &tr[i];

        for (size_t j = 0; j < t->len; ++j) {
            printf("operating on tr[%zu].buf[%zu] (0x%02X)\n",
            i, j, tr[i].buf[j]);
        }
    }

    return 0;
}
As shown, this code snippet will execute specific actions for each element in 
the arrays, while considering how many elements have been allocated for each 
instance. Although it works as expected on any C11-compliant implementation, 
transfer and tr are defined at a file scope, but are only used by main. 
However, any attempt to move transfer and tr inside the body of main will 
trigger a compile-time error since compound literals are not static inside a 
function body.

In this proposal, it is suggested to allow developers to qualify compound 
literals as static so compound literals with static lifetime can still be used 
inside the body of a function. Given the previous example, the 
`static`qualifier is placed inside the compound literal and could be moved into 
the function body.

   static transfer tr[] = {
         /* Using the optional convenience macro. */
         TRANSFER(1, 2, 3, 4, 5, 6, 7, 8),

         /* Or without using the optional convenience macro. */
         {.len = 5, .buf = (static const char[]){1, 2, 3, 4, 5}},
         {.len = 3, .buf = (static const char[]){1, 2, 3}}
     };

Motivation behind this proposal:

Encouraging developers to reduce the scope of any symbol to its absolute 
minimum reduces the risk of accessing and/or modifying data accidentally, 
improves encapsulation and enhances readability. Allowing static compound 
literals to be defined inside the body of a function provides all these 
benefits without introducing any breaking changes to existing code, and should 
not imply great difficulty for currently available implementations.
-------------------------------------------------------------------------------

If you made it here, thank you very much for reading. Any feedback is welcome.

-- 
Xavier Del Campo Romero

Reply via email to