rapidsna wrote:
> > For __counted_by to work on a union, one of the following must be true:
> > Homogeneous Sizes: All members of the union must have the exact same
> > sizeof(). Byte-Size Semantics: The attribute would need to be __sized_by.
> > Plus, I'm assuming if you have a union field that has `counted_by` or
> > `sized_by`, all of the fields need the same annotation following the above
> > rule.
> > Requesting changes to add these restrictions.
>
> This makes sense to me for FAMs, but not for pointer members. e.g.:
>
> ```c
> struct foo {
> int count;
> union {
> u16 a[] __counted_by(count);
> s16 b[] __counted_by(count):
> };
> } *p;
> ```
>
> With this, `__builtin_dynamic_object_size(p, 1)` will be unambiguous. If `b`
> were an `int`, the compiler cannot know which member (`a` or `b`) to use to
> figure out the object size.
>
> For pointer members, though, it shouldn't matter. They can be anything:
>
> ```c
> struct foo {
> int count1, count2;
> union {
> int z;
> char *buf __counted_by(count1);
> short *weird __counted_by(count2);
> };
> } *p;
> ```
>
> Both `p->buf` and `p->weird` have unambiguous sizes. Is the concern about
> aliasing due to `buf` and `weird` having the same memory location?
Yes, the issue is that they share the same memory location, and -fbounds-safety
requires that `__counted_by` annotations are *always* correct, so bounds
checks can safely rely on it.
Consider this example:
```C
struct buffer {
int count;
union {
char *bytes __counted_by(count);
int *ints __counted_by(count);
};
};
```
There is no way to make both annotations correct simultaneously with a single
count value, because they point to the same memory address but interpret the
count differently:
```C
struct buffer s;
s.count = 10;
s.bytes = (char*)malloc(10);
// The size of `ints` is calculated as `10 * sizeof(int)` = 40 bytes,
// so this access is allowed by bounds checking, but it's actually OOB.
s.ints[9] = 0xdeadbeef;
```
The only "safe" count would be 0, or a value so conservative it's useless.
In your example, the problem is that `int z` aliases with the pointers,
allowing the count to become desynchronized:
```C
struct foo {
int count1, count2;
union {
int z;
char *buf __counted_by(count1);
short *weird __counted_by(count2);
};
} *p;
```
Nothing prevents this:
```C
struct foo s;
s.count1 = 100;
s.buf = (char*)malloc(100);
// Writing to z corrupts the pointer, but count1 still says 100 bytes
s.z = 0xdeadbeef;
// Bounds check uses count1, allowing access to invalid memory
s.buf[99] = 0xff; // Accesses memory at 0xdeadbe?? + 99
```
You could argue this is a type safety issue, but unions make type confusion
trivial, which makes it so easy to make `__counted_by` incorrect. We don't want
to allow constructs that are so easily misused.
Even for the "separate counts" case with only pointers:
```C
struct foo2 {
int count1, count2;
union {
char *buf1 __counted_by(count1);
short *buf2 __counted_by(count2);
};
} *p;
```
While you *could* theoretically keep everything in sync:
```C
struct foo2 s;
s.count1 = 10;
s.buf1 = (char*)malloc(10);
s.count2 = 10 / sizeof(short); // Must manually sync count2
```
This model is too complex for both users and analysis; Any assignment to the
union requires updating all counts.
https://github.com/llvm/llvm-project/pull/171996
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits