On Wed, Jul 23, 2025 at 11:53:40AM +0000, Aaron Ballman wrote: > That said, John McCall pointed out some usage patterns Apple has with > their existing feature: > > * 655 simple references to variables or struct members: __counted_by(len) > * 73 dereferences of variables or struct members: __counted_by(*lenp) > * 80 integer literals: __counted_by(8) > * 60 macro references: __counted_by(NUM_EIGHT) [1] > * 9 simple sizeof expressions: __counted_by(sizeof(eight_bytes_t)) > * 28 others my script couldn’t categorize: > * 7 more complicated integer constant expressions: > __counted_by(num_bytes_for_bits(NUM_FIFTY_SEVEN)) [2] > * 16 arithmetically-adjusted references to a single variable or > struct member: __counted_by(2 * len + 8) > * 1 nested struct member: __counted_by(header.len) > * 4 combinations of struct members: __counted_by(len + cnt) [3] > > Do the Linux kernel folks think this looks somewhat like what their > usage patterns will be as well?
Yes, this matches my expectations for its usage, though there is one case I don't see explicitly mentioned above, which is referencing a global variable (but if a function can be used, then an accessor can be created for returning the global). > If so, I'd like to argue for my > personal stake in the ground: we don't need any new language features > to solve this problem, we can use the existing facilities to do so and > downscope the initial feature set until a better solution comes along > for forward references. Use two attributes: counted_by (whose argument > specifies an already in-scope identifier of what holds the count) and > counts (whose argument specifies an already in-scope identifier of > what it counts). e.g., > ``` > struct S { > char *start_buffer; > int start_len __counts(start_buffer); > int end_len; > char *end_buffer __counted_by(end_len); > }; > > void func(char *buffer, int N __counts(buffer), int M, char *buffer > __counted_by(M)); > ``` > It's kind of gross to need two attributes to do the same notional > thing, but it does solve the vast majority of the usages seen in the > wild if you're willing to accept some awkwardness around things like: > ``` > struct S { > char *buffer; > int *len __counts(buffer); // Note that len is a pointer > }; > ``` > because we'd need the semantics of `counts` to include dereferencing > to the `int` in order to be a valid count. We'd be sacrificing the The lone struct member delayed parsing is already implemented in Qing's series, so that isn't an issue. i.e. this is parsed fine: struct S { char *start_buffer __counted_by(start_len); int start_len; int end_len; char *end_buffer __counted_by(end_len); }; Doing this for an _expression_ is, as I understand it, the sticking point. > ability to handle the "others my script couldn't categorize", but > that's 28 out of the 905 total cases and maybe that's acceptable? Three of those patterns are pretty important in Linux, though: - nested struct members - arithmetic adjustments (e.g. the count of an array includes the rest of the struct size or is a byte count instead of element count) - making calls to helper functions For helper functions, the most common need is doing endian conversions (e.g. for protocol (de)serializing, where a length is stored in a different byte order than the native CPU byte order): struct S { struct header hdr; __be32 bytes; struct info array[] __counted_by(be32_to_cpu(bytes) / sizeof(struct info)); }; -- Kees Cook