[clang] [clang][sema] Add nonnull attribute to builtin format functions (PR #160988)

Aaron Ballman via cfe-commits Fri, 07 Nov 2025 06:25:23 -0800

Radovan =?utf-8?q?Božić?= <[email protected]>,
Radovan =?utf-8?q?Božić?= <[email protected]>,
Radovan =?utf-8?q?Božić?= <[email protected]>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/[email protected]>

AaronBallman wrote:

> > I consider whether a pointer is allowed to be null or not to be a 
> > fundamental part of the contract of the function; you don't?
> 
> Not necessarily. The standard doesn't make any guarantees, but libraries are 
> allowed to. That's the part the below example doesn't match. 
> `__builtin_unreachable()` isn't part of any standard, 

It's the implementation for `std::unreachable` so it does follow the standard. 
I used it as an example because the point to library builtins is to recognize 
calls to standard library functions and translate those into 
`__builtin_whatever` calls so the rest of the toolchain can reason about the 
call more easily. So for the compiler, there is potentially no difference 
between calling `std::unreachable` and calling `__builtin_unreachable` because 
they may the same thing. (We don't have library builtins for most of the C++ 
standard library because most of the C++ standard library does not lend itself 
to translation to builtins, but it does happen.)

> If Clang would hijack std::unreachable and made it unconditionally equivalent 
> to __builtin_unreachable it would override what the library wanted to happen.

Yes, but this is true of all semantic information for all of our builtins. 
That's kind of *the point* to the way our builtins work. We hijack things we 
think we can do more efficiently than the library can. Any difference between 
the library semantics and what we predetermine is a problem.

...but more below.

> ... IMO `nonnull` is an attribute that should always require good evidence 
> that there is no library defining the behaviour of a nullptr.

I understand that's your assertion but what I don't understand is the 
justification for it. To me, this is the opposite of how things usually work: 
if the standard defines it as UB, we optimize based on that unless there's a 
good reason *not* to. Your argument is that we need to provide a good reason 
*to* optimize based on UB otherwise we shouldn't be doing so. That's a valid 
stance to take, but isn't how we've approached things in the past and I think 
requires wider community buy-in.

> > That said, I don't think we have a good plan for hardened libraries. We 
> > have no notion of conditionally applying these markings in tablegen, and 
> > perhaps we need to think of something for that. e.g., if the user passes 
> > `-fhardened` to the compiler (or whatever we end up exposing), I can 
> > imagine we'd want to switch from `__attribute__((nonnull))` to `_Nonnull` 
> > semantics in that case. But I'm also not certain we need such a mechanism 
> > today for this PR, either.
> > CC @rjmccall @rnk @efriedma-quic @jyknight for some additional opinions, in 
> > case my stance is off-base.
> 
> I agree.

To me, I think this is the crux of the problem. There's a tension between the 
notion of a builtin statically knowing the semantics of the API and the notion 
of a specific library implementation wanting to have different (maybe stronger) 
semantics. We don't have a way for a library to signal "we're doing something 
beyond the standard semantics", so we have no way from the compiler to know 
which markings are reasonable. For example, maybe we want to encode whether an 
API sets `errno` internally or not (think: using the `pure` marking), but some 
libraries do set `errno` while others do not. But we can't necessarily rely on 
the library to mark things for us (or opt out of our inferences) because some 
popular libraries provide no markings (for example, musl or MSVC CRT). I'm not 
certain what a good solution for this is, so I think we're left doing what we 
think is best on a case-by-case basis with the static information. I think what 
I'm hearing on this PR is that we're all comfortable optimizing on it for the 
formatted IO functions (or am I reading the room wrong)?

Specific to this PR, are we happy with the design of `Nonnull` in tablegen 
meaning `__attribute__((nonnull))` or do we want something more explicit so we 
can distinguish between `_Nonnull` and `__attribute__((nonnull))` on a 
case-by-case basis (in the future)?

https://github.com/llvm/llvm-project/pull/160988
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][sema] Add nonnull attribute to builtin format functions (PR #160988)

Reply via email to