[clang] [Clang][Sema] Support 'counted_by' attribute on FAM/pointers in anonymous unions (PR #171996)

Yeoul Na via cfe-commits Wed, 14 Jan 2026 17:53:14 -0800

rapidsna wrote:

> > For __counted_by to work on a union, one of the following must be true:
> > Homogeneous Sizes: All members of the union must have the exact same 
> > sizeof(). Byte-Size Semantics: The attribute would need to be __sized_by.
> > Plus, I'm assuming if you have a union field that has `counted_by` or 
> > `sized_by`, all of the fields need the same annotation following the above 
> > rule.
> > Requesting changes to add these restrictions.
> 
> This makes sense to me for FAMs, but not for pointer members. e.g.:
> 
> ```c
> struct foo {
>     int count;
>     union {
>         u16 a[] __counted_by(count);
>         s16 b[] __counted_by(count):
>     };
> } *p;
> ```
> 
> With this, `__builtin_dynamic_object_size(p, 1)` will be unambiguous. If `b` 
> were an `int`, the compiler cannot know which member (`a` or `b`) to use to 
> figure out the object size.
> 
> For pointer members, though, it shouldn't matter. They can be anything:
> 
> ```c
> struct foo {
>     int count1, count2;
>     union {
>         int z;
>         char *buf __counted_by(count1);
>         short *weird __counted_by(count2);
>     };
> } *p;
> ```
> 
> Both `p->buf` and `p->weird` have unambiguous sizes. Is the concern about 
> aliasing due to `buf` and `weird` having the same memory location?


Yes, the issue is that they share the same memory location, and -fbounds-safety 
 requires that `__counted_by` annotations are *always* correct, so bounds 
checks can safely rely on it.

Consider this example:

```C
struct buffer {
    int count;
    union {
        char *bytes __counted_by(count);
        int *ints __counted_by(count);
    };
};
```

There is no way to make both annotations correct simultaneously with a single 
count value, because they point to the same memory address but interpret the 
count differently:

```C
struct buffer s;
s.count = 10;
s.bytes = (char*)malloc(10);

// The size of `ints` is calculated as `10 * sizeof(int)` = 40 bytes,
// so this access is allowed by bounds checking, but it's actually OOB.
s.ints[9] = 0xdeadbeef;
```

The only "safe" count would be 0, or a value so conservative it's useless.

In your example, the problem is that `int z` aliases with the pointers, 
allowing the count to become desynchronized:

```C
struct foo {
    int count1, count2;
    union {
        int z;
        char *buf __counted_by(count1);
        short *weird __counted_by(count2);
    };
} *p;
```

Nothing prevents this:

```C
struct foo s;
s.count1 = 100;
s.buf = (char*)malloc(100);

// Writing to z corrupts the pointer, but count1 still says 100 bytes
s.z = 0xdeadbeef;

// Bounds check uses count1, allowing access to invalid memory
s.buf[99] = 0xff;  // Accesses memory at 0xdeadbe?? + 99
```

You could argue this is a type safety issue, but unions make type confusion 
trivial, which makes it so easy to make `__counted_by` incorrect. We don't want 
to allow constructs that are so easily misused.

Even for the "separate counts" case with only pointers:

```C
struct foo2 {
    int count1, count2;
    union {
        char *buf1 __counted_by(count1);
        short *buf2 __counted_by(count2);
    };
} *p;
```

While you *could* theoretically keep everything in sync:

```C
struct foo2 s;
s.count1 = 10;
s.buf1 = (char*)malloc(10);
s.count2 = 10 / sizeof(short);  // Must manually sync count2
```

This model is too complex for both users and analysis; Any assignment to the 
union requires updating all counts.

https://github.com/llvm/llvm-project/pull/171996
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][Sema] Support 'counted_by' attribute on FAM/pointers in anonymous unions (PR #171996)

Reply via email to