Hi Ben, On Sun, Jul 20, 2025 at 10:02:22PM +0000, Ben Kallus wrote: (...) > I'll now provide evidence from the standard that this program is > indeed executing UB. > > >From N3220, 6.5.3.4: > > A postfix expression followed by the -> operator and an identifier > > designates a member of a structure > or union object. The value is that of the named member of the object > to which the first expression > points, and is an lvalue. > > >From N3220, 6.3.2.1: > > An lvalue is an expression (with an object type other than void) that > > potentially designates an > object; if an lvalue does not designate an object when it is > evaluated, the behavior is undefined. > > Clearly, ((struct s *)0)->c does not designate an object. Therefore, > if & evaluates its argument, then the expression in the little program > is UB. > > Now, all that's left in order to demonstrate UB is to show that the > operand to the & is evaluated. > > >From N3220, 6.5.4.4 p3: > > The unary & operator yields the address of its operand. If the operand has > > type "type", the result has > type "pointer to type". If the operand is the result of a unary * > operator, neither that operator nor > the & operator is evaluated and the result is as if both were omitted, > except that the constraints on > the operators still apply and the result is not an lvalue. Similarly, > if the operand is the result of a [] > operator, neither the & operator nor the unary * that is implied by > the [] is evaluated and the result > is as if the & operator were removed and the [] operator were changed > to a + operator. > > In short, this is saying that C guarantees these identities: > 1. &(*p) is equivalent to p > 2. &(p[n]) is equivalent to p + n > > As a consequence, &(*p) doesn't result in the evaluation of *p, only > the evaluation of p (and similar for []). There is no corresponding > special carve-out for ->. Maybe there should be; if you're interested, > I think this would be a reasonable addition to the standard.
I think so as well, IMHO it results from an initial omission that turned into yet another stupid rule, because we all know that in practice it's not evaluated, and the proof being that so many programs use that idioù without failing since the lvalue is not evaluated. And going further, since at least the end of the 70s, x86 processors have adopted the "LEA" instruction in addition to the "MOV" instruction to do exactly this: return the effective address of an expression instead of evaluating it, which is used everywhere there's a "&p->q", and by extension everywhere people want to perform two additions at once. > Hopefully this clarifies things. I wish there were a single line in > the standard that stated simply that this was UB, but the unfortunate > reality is that this is a consequence of a few different, interacting > portions of the document. And the fact that this line is missing also indicates to me that it was simply overlooked, similar to the fact that it's not listed in the long list of known UBs in C23. But anyway there"s a trend these days in compilers to try to find UBs and exploit them to their maximum in order to bring optimizations that make no sense at all and only result in breaking long-working programs :-/ (...) > If you're curious, here's an article on the PVS-Studio blog about this > exact topic, which was (reportedly) prepared in part by some of the > MSVC dev team: https://pvs-studio.com/en/blog/posts/cpp/0306/ Ah, nice article, thanks for the link! While digging I've also spotted that interesting point for Andrew Pinski: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30368#c3 Basically he suggests just using a non-zero value as the base in the pointer calculation to work around the undefined in the offsetof() macro. I think I'll change the offsetof() definition to rely on that one when not having __builtin_offsetof(), at least so that the idiom is not recopied everywhere. I'll just double-check that it's not affected by __attribute__((packed)). Do you mind if I split your patch into multiple ones ? It addresses at least two distinct classes, one being the void* and the other being this issue. In this case I'd edit your commit message to add the references above. Thanks! Willy

