On Wed, Jul 23, 2025 at 11:53:40AM +0000, Aaron Ballman wrote:
> That said, John McCall pointed out some usage patterns Apple has with
> their existing feature:
> 
> * 655 simple references to variables or struct members: __counted_by(len)
> * 73 dereferences of variables or struct members: __counted_by(*lenp)
> * 80 integer literals: __counted_by(8)
> * 60 macro references: __counted_by(NUM_EIGHT) [1]
> * 9 simple sizeof expressions: __counted_by(sizeof(eight_bytes_t))
> * 28 others my script couldn’t categorize:
>   * 7 more complicated integer constant expressions:
> __counted_by(num_bytes_for_bits(NUM_FIFTY_SEVEN)) [2]
>   * 16 arithmetically-adjusted references to a single variable or
> struct member: __counted_by(2 * len + 8)
>   * 1 nested struct member: __counted_by(header.len)
>   * 4 combinations of struct members: __counted_by(len + cnt) [3]
> 
> Do the Linux kernel folks think this looks somewhat like what their
> usage patterns will be as well?

Yes, this matches my expectations for its usage, though there is one
case I don't see explicitly mentioned above, which is referencing a
global variable (but if a function can be used, then an accessor can be
created for returning the global).

> If so, I'd like to argue for my
> personal stake in the ground: we don't need any new language features
> to solve this problem, we can use the existing facilities to do so and
> downscope the initial feature set until a better solution comes along
> for forward references. Use two attributes: counted_by (whose argument
> specifies an already in-scope identifier of what holds the count) and
> counts (whose argument specifies an already in-scope identifier of
> what it counts). e.g.,
> ```
> struct S {
>   char *start_buffer;
>   int start_len __counts(start_buffer);
>   int end_len;
>   char *end_buffer __counted_by(end_len);
> };
> 
> void func(char *buffer, int N __counts(buffer), int M, char *buffer
> __counted_by(M));
> ```
> It's kind of gross to need two attributes to do the same notional
> thing, but it does solve the vast majority of the usages seen in the
> wild if you're willing to accept some awkwardness around things like:
> ```
> struct S {
>   char *buffer;
>   int *len __counts(buffer); // Note that len is a pointer
> };
> ```
> because we'd need the semantics of `counts` to include dereferencing
> to the `int` in order to be a valid count. We'd be sacrificing the

The lone struct member delayed parsing is already implemented in Qing's
series, so that isn't an issue. i.e. this is parsed fine:

struct S {
  char *start_buffer __counted_by(start_len);
  int start_len;
  int end_len;
  char *end_buffer __counted_by(end_len);
};

Doing this for an _expression_ is, as I understand it, the sticking point.

> ability to handle the "others my script couldn't categorize", but
> that's 28 out of the 905 total cases and maybe that's acceptable?

Three of those patterns are pretty important in Linux, though:
- nested struct members
- arithmetic adjustments (e.g. the count of an array includes the
  rest of the struct size or is a byte count instead of element count)
- making calls to helper functions

For helper functions, the most common need is doing endian conversions
(e.g. for protocol (de)serializing, where a length is stored in a
different byte order than the native CPU byte order):

struct S {
  struct header hdr;
  __be32 bytes;
  struct info array[] __counted_by(be32_to_cpu(bytes) / sizeof(struct info));
};

-- 
Kees Cook

Reply via email to