As always, Willy, this is excellent feedback. Thanks for your time.

First of all, I totally acknowledge that this idiom is very
widespread, and therefore unlikely to be the kind of UB that a
compiler would (ab)use during optimization. That's why this patch is a
CLEANUP and not a BUG.

> I'm having a strong doubt about the second part of that sentence, do
you have a link ? For me this is not a dereference, there's no memory
access nor no following of a pointer.

Sure thing. This is a tricky point that the C standard could
definitely make clearer, but I'm pretty sure that my (and LLVM's)
interpretation of the standard is reasonable.

Consider the following little program:
```
#include <stdint.h>
struct s {
    int i;
    char c;
};
int main(void) {
    return (uintptr_t)&(((struct s *)0)->c);
}
```
This program triggers UBSan on clang-19, which is evidence that the
problem is indeed NULL dereference, and not just void * arithmetic.

I'll now provide evidence from the standard that this program is
indeed executing UB.

>From N3220, 6.5.3.4:
> A postfix expression followed by the -> operator and an identifier designates 
> a member of a structure
or union object. The value is that of the named member of the object
to which the first expression
points, and is an lvalue.

>From N3220, 6.3.2.1:
> An lvalue is an expression (with an object type other than void) that 
> potentially designates an
object; if an lvalue does not designate an object when it is
evaluated, the behavior is undefined.

Clearly, ((struct s *)0)->c does not designate an object. Therefore,
if & evaluates its argument, then the expression in the little program
is UB.

Now, all that's left in order to demonstrate UB is to show that the
operand to the & is evaluated.

>From N3220, 6.5.4.4 p3:
> The unary & operator yields the address of its operand. If the operand has 
> type "type", the result has
type "pointer to type". If the operand is the result of a unary *
operator, neither that operator nor
the & operator is evaluated and the result is as if both were omitted,
except that the constraints on
the operators still apply and the result is not an lvalue. Similarly,
if the operand is the result of a []
operator, neither the & operator nor the unary * that is implied by
the [] is evaluated and the result
is as if the & operator were removed and the [] operator were changed
to a + operator.

In short, this is saying that C guarantees these identities:
    1. &(*p) is equivalent to p
    2. &(p[n]) is equivalent to p + n

As a consequence, &(*p) doesn't result in the evaluation of *p, only
the evaluation of p (and similar for []). There is no corresponding
special carve-out for ->. Maybe there should be; if you're interested,
I think this would be a reasonable addition to the standard.

Hopefully this clarifies things. I wish there were a single line in
the standard that stated simply that this was UB, but the unfortunate
reality is that this is a consequence of a few different, interacting
portions of the document.

> One strange thing here is that your patch makes use of:
>     typeof(*(t)NULL)
> which according to your own definition above should be seen as a NULL
deref by the sanitizer.

This works because typeof doesn't evaluate its argument. From N3220, 6.7.3.6 p4:
> If the type of the operand is a variably modified type, the operand is 
> evaluated; otherwise, the operand is not
evaluated.

Thus, it's okay to use invalid expressions inside of typeof. This is
analogous to the reason why the common pattern `p =
malloc(sizeof(*p))` is not UB.

If you're curious, here's an article on the PVS-Studio blog about this
exact topic, which was (reportedly) prepared in part by some of the
MSVC dev team: https://pvs-studio.com/en/blog/posts/cpp/0306/

Thanks!
Ben


Reply via email to