bwendling wrote:

> > Perhaps we need clarification on what GCC means by "may point to multiple 
> > objects" in this instance. To me that means either "get me the size of the 
> > largest of these multiple objects" or "size of the smallest." In my eyes, 
> > that means pointing to a union field.
> 
> Per @nikic's example, it seems reasonably clear to me that GCC's intended 
> semantics are to get either the best upper bound or the best lower bound that 
> the compiler is able to compute. I mean sure, we can ask, and it does no harm 
> to do so, but it is not reasonable to expect that a quantity like this coming 
> from the vagaries of GCC's implementation details will be exactly the same 
> for all examples when computed by a different implementation. (It's not even 
> the same across optimization levels in GCC.)

I completely agree that using GCC's implementation and documentation is 
definitely lacking. That's the pitfalls of relying upon features added not by 
the standards committee, but by compiler developers. They're likely to be buggy 
as hell for unforeseen corner cases (despite some developers being on the 
standards committee).

My answer for the question "what's the semantics of GCC's builtin X?" has 
always been "whatever GCC does." It's the best we can rely upon. But then we 
get into situations like this, where you and @nikic have one interpretation of 
their documentation and I have another. I can point to their behavior to back 
up my claim, but in the end it's probably not exactly clear even to GCC.

My concern is that we want to use this for code hardening. Without precise 
object sizes, we're hampered in our goal. The unfortunate reality is that we 
can only get that size via these `__builtin_[dynamic_]object_size` functions. 
Thankfully, Apple has a way to help alleviate some of these issues, which 
they're push out soon-ish.

> > I know that we lose precise struct information going to LLVM IR. If that's 
> > what's needed here, there are ways to pass this information along. We 
> > retain this information via DWARF. We could use similar metadata for this 
> > instance. Would that be acceptable?
> 
> In theory, yes, we could preserve enough information in metadata to compute 
> this in the middle-end. But we would need to emit a _lot_ of metadata, just 
> on the off-chance that after optimization we happen to have a 
> builtin_object_size query that points to each object that we emit, so in 
> practice I don't think there's any chance we can do this. We can't use DWARF, 
> because it won't necessarily be available (and typically won't be available 
> in the interesting case where we find the object size only after 
> optimization), and in any case, the presence or absence of DWARF isn't 
> supposed to affect the executable code. Also, we'd need to annotate things 
> that simply don't exist at all in the LLVM IR:
> 
> ```
> typedef struct X { int a, b; } X;
> int f(void) {
>   X *p = malloc(sizeof(X));
>   int *q = &p->a;
>   return __builtin_object_size(q, 1);
> }
> ```
> 
> Here, `f` ideally would return 4, but at the LLVM IR level, `p` and `q` are 
> identical values and the `&p->a` operation is a no-op. In cases like this, 
> the best we can realistically do is to return 8.

The sub-object for `&p->a` and even `&p->b` is `struct X`, not the integers 
themselves. If you want that, you'll have to use casts: `&((char *)p->b)[2];`. 
(I had to take care to get that correct.) So `f` should return `8` (note it's 
likely to get `8` from the `alloc_size` attribute on `malloc` in your example).

For the record, I didn't mean that we *should* use DWARF. I'm not that much of 
a masochist. :-)

https://github.com/llvm/llvm-project/pull/78526
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to