Am Donnerstag, dem 24.07.2025 um 19:19 +0000 schrieb Aaron Ballman: > On Thu, Jul 24, 2025 at 6:11 PM Martin Uecker <ma.uec...@gmail.com> wrote: > > > > Am Donnerstag, dem 24.07.2025 um 16:26 +0000 schrieb Aaron Ballman: > > > On Thu, Jul 24, 2025 at 3:03 PM Martin Uecker <ma.uec...@gmail.com> wrote: > > > > > > > > Am Donnerstag, dem 24.07.2025 um 14:08 +0000 schrieb Aaron Ballman: > > > > > On Wed, Jul 23, 2025 at 8:38 PM Martin Uecker <ma.uec...@gmail.com> > > > > > wrote: > > > > > > > > > > > ... > > > > > > > > > > Personally, I'm not excited by it because one of the big sticking > > > > > points in the Clang community is shared header files with C++. Because > > > > > these attributes are used on structures and functions, the two most > > > > > common things you'll find in a shared header file, we *really* want > > > > > the feature to be workable in both languages to the greatest extent > > > > > possible. And once we care about C++, things get so much harder due to > > > > > the extra complexity it brings. So, for example, we'd have to figure > > > > > out how to handle things like: > > > > > ``` > > > > > template <typename Ty> > > > > > struct S { > > > > > char *buffer __counted_by_expr(Ty len; len + 7); > > > > > int len; // Oooooops > > > > > }; > > > > > > > > > > template <typename Ty, typename Uy> > > > > > struct T { > > > > > char *buffer __counted_by_expr(Ty len; len + 7); > > > > > Uy len; // Grrrrr > > > > > }; > > > > > ``` > > > > > > > > For my understanding: What is the problem here? I would be an > > > > error if the declared type of len is inconsistent between the > > > > attribute and the type that cames later in the member. I guess > > > > a compiler could also warn already when it sees a template > > > > like this where it refers to different template arguments. > > > > > > > > But then, templates also certainly do not appear in shared headers, > > > > so I am not sure why Clang could not simmply offer both, > > > > a late-parsing version and also a C-compatible __counted_by_expr. > > > > > > > > I can understand if this is not your first choice, but it seems > > > > to be a reasonable compromise to me. > > > > > > Ah, apologies, I wasn't clear. My thinking is: we're (Clang folks) > > > going to want it to work in C++ mode because of shared headers. If it > > > works in C++ mode, then we have to figure out what it means with all > > > the various C++ features that are possible, not just the use cases > > > that appear within a shared header. That's how I got onto templates as > > > just one example of where we'd have to figure out what the behavior > > > should be. Other questions come up outside of templates as well (use > > > in default arguments, lambda captures, use of types with conversion > > > operators, etc). > > > > Ok, but this assumes that clang does *not* offer two versions, but > > only one that then needs to work perfectly in all possible contexts. > > This certainly makes it much more challenging to find a solution. > > Oh, sorry, I think I missed that part of the design discussion! > > > If clang would support __counted_by with all the semantics you like > > for C++ and also __counted_by_expr for GCC / C compatibility, then > > you could simply forbid __counted_by_expr in templates or other > > C++ scenarious - if it happens to cause problems. > > I'll have to think about that more, but my initial reaction is: that's > making our implementation/design problems be the user's problem. Maybe > that's fine? But it would be kind of frustrating as a user to have > code using `__counted_by(foo)` that I want to modify to say > `__counted_by(foo * CHAR_BIT)` but then find out it needs to be > `__counted_by_expr(foo * CHAR_BIT)` and now there are contexts where > that doesn't work because of potential C++ shenanigans.
Well, I think the idea was that Clang may have a version where this works, if you think this is what your users want. But if you need shared headers with C / GCC as the kernel does, you can use __counted_by_expr. (As a clang user myself, I want forward parameter declaration, and support for [*], [n] in shared C/C++ headers, rather then the most succinct syntax for counted_by that completely breaks my assumptions about how C works.) > > > But then, I also I do not really understand why you think > > __counted_by_expr has more complicated interactions with C++ than > > delayed-parsing counted_by (see below). > > > > > > > I think some of these questions may even apply in C. > > > Like... is this valid? > > > > > > void func(char *buffer [[counted_by(int N[12]; N[10])]], int N[8]); > > > > > > Or this? > > > > > > void func(char *buffer [[counted_by(int N[*]; N[10])]], int N[*]); /* > > > Can't usually spelling int N[*] outside of a parameter list, but this > > > is a parameter list of sorts, so maybe it's fine? */ > > > > Any language feature would need a specification. In the discussion > > with Bill it was became clear that we would need to add > > constraints to a delayed-parsing version, which somebody would > > need to spell out and implement. so I do not see how this is an > > argument for or against a specific solution. > > It's pointing out the kinds of design concerns I see with the > approach. How the design concerns are resolved factors into whether > I'm for or against. So what version would you prefer? That it works like a parameter that adjusts int N[10] to int * or the version where it does not? > > > > I'm not saying it's not a reasonable compromise, FWIW. More just... I > > > think allowing for two declarations increases the complexity of the > > > feature in ways that aren't well understood yet. > > > > This is one thing where we are not on the same page. For me, > > delayed-parsing is the feature that is far more complex also > > in terms of interaction with the rest of the language. > > > > Forward declarations may look more complex, but it unambigously > > identifies the object and constrains the type, which makes things > > simpler IMHO. > > Yeah, I think we're not on the same page here as well. If used in the > common case, I can see how it unambiguously identifies the object and > constrains the type. It's the uncommon cases that I am worried about > (because if it's syntactically possible to spell, my experience is > that users will do it and file bug reports about it "not working"). > What I mean by common case are straightforward uses like: void > func(int n; char *buffer __counted_by(n), int n); with no name hiding > via other declarations, which are the majority of uses. I think any > choice of forward declarations, .N, late parsing, etc work for the > simple cases. Except if you want consistency with C size expressions where late parsing would not work because it would break existing code. > > For the uncommon cases, I think late parsing ends up with less > complexity for the user to deal with; most of the problems boil down > to ambiguous names (or am I wrong there?). Both C and C++ users are > already used to dealing with naming conflicts (in different ways) and > so I think it's less burden on them to say "and here's one more place > you can have name hiding" and handle ambiguities with diagnostics. And For C, I am sure I rather prefer consistency which this feature would break. For C++, where the part in the standard that describes name lookup extends over many pages with many complicated rules, it might not matter. > the added benefit is that for the straightforward case, when there's > no name hiding (which Apple's usage experience suggests is a rarity in > practice), late parsing requires the user to write less code, which I > think is valuable: I wonder what Apple's usage experience is then, because I often have an "len" etc. at different levels. If you only care about headers, you would not see this. > > void func(int n; char *buffer __counted_by(n), int n); > vs > void func(char *buffer __counted_by(n), int n);. We can write it like this in GNU C already, which I do prefer to both: void func(int n; char buffer[n], int n); And it work: https://godbolt.org/z/67zfbEv1P I wouldn't mind to have syntactic sugar for simple cases though: void func(char buffer[.n], int n); > > (FWIW, I consider .N to be late parsing as well because the only way I > think it can work in practice is if the type information is > available.) > > > > > > I think it's possible to handle these situations, but we'd have to sit > > > > > down and think through all the edge cases and whether we can handle > > > > > them all with some reasonable QoI. I think we'd ultimately run into > > > > > the same concerns here as we ran into with forward declared > > > > > parameters. I think the reason folks in Clang are more comfortable > > > > > with late parsing is because it means the user doesn't have to repeat > > > > > the type and the name which makes for less chances for the user to > > > > > screw things up and get into these weird situations. There can be > > > > > other weird situations with late parsing too, of course, but I think > > > > > the scope of those edge cases is a bit narrower. > > > > > > > > TBH, I am not terrible convinced about this argument. > > > > > > > > If I understood it correctly, the late parsing design seems to make > > > > no distinctions between which identifiers is used, the local or > > > > the global one and just prefers the local one if it exists, possibly > > > > giving a warning if there is also a global one. > > > > > > I think I'd describe it as following typical lexical scoping behavior > > > -- the closest declaration of the identifier is what's found by the > > > lookup. > > > > I am not sure what you mean by "typical lexical scoping behavior". > > > > The scoping behavior that C - so far - consistently uses everywhere > > works not like this (except for labels, which have no type). > > int n; // #4 > void func(int n /* #3 */, int array[n /* Matches with #3 if it's > declared, #4 otherwise */]) { > int n; // #2 > { > int n; // #1 > n = 12; // Matches with #1 > } > n = 100; // Matches with #2 if it's declared, #3 otherwise (this would collide with the parameter) > } Yes, in all these cases it matches the *prior* definition which is visible, but would not match a later one: int n; // #4 void func(int array[n /* Matches with #4 /], int n /* #3 */) { { n = 12; // Matches with #3 int n; // not used } n = 100; // Matches with #3 } So *not* what is proposed with late parsing. > > struct S { > int n; // #5 > int array[n]; // Matches #5 in C++, #4 in C, this is the interesting case. > }; Yes, in C the member exists in a different name space, so it is not even considered, but even in C++, it would *not* pick a later member. > > (#2 and #3 create a redefinition, so assume those are mutually > exclusive declarations) In all cases except the interesting one, the > use of the identifier matches with the declaration from the nearest > scope. > Only a *prior* declaration. In neither C nor C++ would it pick a later declaration in the same scope. > I would argue that making the interesting case behave the same > in C as it does in C++ would make C more consistent than it is today > because it would pick the closest declaration, same as what happens > within a function body or compound scope. > > > It also does not seem typical for C++, e.g. the > > following gives an error: > > > > struct foo { > > char buf[E]; > > static const int E = 1; > > }; > > > > https://godbolt.org/z/WTvcf7hza > > C++ has this notion of "complete-class context" > (https://eel.is/c++draft/class.mem.general#10) which identifies where > late parsing is used. Within an array extent, it's not a complete > class context. Confusingly, within a default initializer, it *is*: > https://godbolt.org/z/s71aW1z33 > > (I think if we went down the route of late parsing for bounds safety > attributes, WG21 should make any use within a declaration of a data > member be within a complete class context. The current situation > seems... not ideal IMO.) Indeed, C++ made some part inconsistent with most of the rest of the language. C did not yet do this. > > > > But in the event that causes a different lookup result from > > > what the current standard behavior would give you, it should be > > > diagnosed. Personally, I'd feel most comfortable if that diagnostic > > > was a warning which defaults to an error; basically, make the user > > > decide how to handle it on a case by case basis but "standard behavior > > > wins" if you disable the diagnostic. > > > > The reason I am not a fan of introducing a potential ambiguity and then > > have warnings/error to avoid problems is that this makes the code fragile. > > Introducing a name in some enclosing context (e.g. by including a header) > > will break the local code that uses the same name, even though it is > > unrelated. Scoping is meant to prevent exactly this! > > Yeah, that is a problem. But it's not a new problem, right? The same > fragility happens with macros. Though, not making that fragility worse > would be a good thing, it's not like users appreciate unintentional > conflicts with macro names. Indeed. It is the same problem as with macros, this is why we invent funny names in macros or even __COUNTER__ hacks to avoid collisions. This is not good, so why introduce a new language feature with the same problem as macros? And even worse: With macros this is an old problem, now I want to retrofit __counted_by to structure where I can not change the names! > > > I also like to point out that clang is not currently following this > > philosophy. In fact, it is that GCC rejects the following code in C++, > > while clang compiles it - but also not using the scoping rules > > proposed here, but the same one C uses! > > It's not that we're not following the philosophy, it's that bounds > safety work is still in-progress and we're waiting for a resolution to > this problem before moving forward with changes upstream. If late > parsing ends up being the approach that everyone can live with, I'd > expect this behavior to be updated in Clang accordingly Thanks for the explanation. Martin > > ~Aaron > > > > > constexpr int E = 1; > > > > struct foo { > > char buf[E]; > > static const int E = 3; > > static_assert(1 == sizeof(buf)); > > }; > > > > https://godbolt.org/z/zqWafsGh5. > > > > If you remove the file-scope E, both compilers reject this code. > > Neither clang nor GCC refer to the later identifier. > > > > > > > > > > > My C++ examples shows that you can easily run into UB here in C++, > > > > especially since subtle differnt rules apply in different but very > > > > similar scenarious. How can this not be error prone? > > > > > > > > The forward declaration, the [.N] syntax, and also __self__ etc. > > > > would all make this explicit which identifiers is meant. > > > > > > I think they come with tradeoffs but so far, everything seems to be > > > error prone in different ways. :-( > > > > > > > > The other downside is that we have more attributes that need to > > > > > support something similar, like the thread safety attributes (which I > > > > > believe is also an important use case for the Linux kernel folks?). We > > > > > could do this dance on a per-attribute basis, but if the solution > > > > > worked for all attributes *and* array extents at the same time, that > > > > > would be nice. Not certain it's a requirement though. > > > > > > > > True. But if it is to work properly for arrays in C too, then the > > > > C constraints are also important IMHO, not just the C++ rules. > > > > > > Agreed! I don't think "C++ rules should win, that's the end of the > > > discussion" is tenable; I think it's more that we need to handle all > > > the oddities of both languages otherwise we're going to end up with > > > something like C99's array parameter features that didn't get adopted > > > into C++ (e.g., [static 12] or [*], etc). I think that hampered their > > > adoption in the wild at least in part because of the cross-language > > > issues. > > > > > > In fact, one of best thing Clang / GCC could do for language > > interoperability would be to support [*] and [n] in C++ in > > function declarations, even if it is just a no op. > > > > > > > > > > > The the thing is that WG14 had (weak) consensus for parameter > > > > > > forward declarations and I think more consensus for [.N] > > > > > > syntax in structures already. So I had hoped that we will be > > > > > > able to make progress on this. > > > > > > > > > > Question on the .N syntax: I thought I heard that this was something > > > > > GCC could handle, but that it still requires late parsing to ensure > > > > > type information for N is available and that was a problem. e.g., > > > > > > > > > > void func(char *buffer __counted_by(.N * sizeof(.N)), int N); > > > > > > > > > > > > > > where we'd need to know both the name and the type. Am I wrong about > > > > > that being a challenge for GCC to support? > > > > > > > > I think it is generally a challenge to support. > > > > > > Thanks for the confirmation! > > > > > > > One could certainly > > > > store away the tokens and parse them later (this is certainly doable), > > > > but it adds a lot of issues because you need to add a lot of constraints > > > > for things which should then not be allwoed. And it is still not an > > > > acceptable solution for size arguments in C. > > > > > > Yeah, that's basically the same gripe I have about having forward > > > declarations; we have to figure out all the weird edge cases and what > > > constraints are necessary to have decent QoI. > > > > > > > .N would work here if you combine with a rule such as ".N" is always > > > > converted to "size_t". Or you require an explicit cast if is different > > > > to "size_t" . > > > > > > I think converting .N to size_t could work if the feature was limited > > > to just bounds safety, but it would be unusable for things like thread > > > safety attributes where the argument needs to be some kind of > > > mutex-like object. But even for bounds safety, I think we end up with > > > really weird behavior: void func(char *buffer __counted_by(sizeof(.N) > > > /* sizeof(size_t) */), typeof(.N) x /* size_t x */, int N /* wait, > > > what? */); > > > > I agree, this is why I would favour GCC's forward declaration for > > parameters. In structures I think .N might be ok, because you want > > to restrict the expressions anyhow, as they are evaluated implicitely > > on each member access. > > > > > > > > If we were to require explicit casts... I think we're back to the same > > > thing as having forward declarations in terms of concerns because > > > that's what opens up the possibility of type mismatches. > > > > I am not sure I understand the concern. You can have type errors > > in all expressions. > > > > The key insight is that the cast would hide the original type, so > > you do not need to know it when parsing the expression > > > > struct foo { > > int *buf __countey_by((size_t)N); > > int N; > > }; > > > > You only need to later check that type you actually have can be > > converted to the type used in the cast, but you need to support > > this anyhow! > > > > > > > > I guess the way I see it is: > > > > > > If there's only one declaration involved (late parsing approach), then > > > there's potential name lookup confusion which I think is worse in C > > > than it is in C++. > > > > I would value consistency in C more, but I agree that for the of > > one identifer late parsing is not causing confusion or implementation > > issues. But only if you look at this by itself, once you mix it > > with other language features that is already not clear anymore: > > > > constexpr size_t size = 4; > > struct foo { > > char (*buf)[size] __counted_by(size); // two different "size"! > > int size; > > }; > > > > > If there's more than one declaration involved (any kind of forward > > > declaration syntax, use of .N with explicit casts, etc), then there's > > > potential type confusion which I think is worse in C++ than it is in > > > C. > > > > > > Either one is going to need constraints to help with the confusion. I > > > don't know which one ends up with less constraints or more readable > > > code. > > > > Martin > > > > > > > > > >