Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

Qing Zhao via Gcc-patches Tue, 18 Jul 2023 09:25:43 -0700


> On Jul 18, 2023, at 12:03 PM, Martin Uecker <uec...@tugraz.at> wrote:
> 
> Am Dienstag, dem 18.07.2023 um 15:37 +0000 schrieb Qing Zhao:
>> 
>> 
>>> On Jul 17, 2023, at 7:40 PM, Kees Cook <keesc...@chromium.org>
>>> wrote:
>>> 
>>> On Mon, Jul 17, 2023 at 09:17:48PM +0000, Qing Zhao wrote:
>>>> 
>>>>> On Jul 13, 2023, at 4:31 PM, Kees Cook <keesc...@chromium.org>
>>>>> wrote:
>>>>> 
>>>>> In the bug, the problem is that "p" isn't known to be
>>>>> allocated, if I'm
>>>>> reading that correctly?
>>>> 
>>>> I think that the major point in PR109557
>>>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):
>>>> 
>>>> for the following pointer p.3_1, 
>>>> 
>>>> p.3_1 = p;
>>>> _2 = __builtin_object_size (p.3_1, 0);
>>>> 
>>>> Question: why the size of p.3_1 cannot use the TYPE_SIZE of the
>>>> pointee of p when the TYPE_SIZE can be determined at compile
>>>> time?
>>>> 
>>>> Answer:  From just knowing the type of the pointee of p, the
>>>> compiler cannot determine the size of the object.  
>>> 
>>> Why is that? "p" points to "struct P", which has a fixed size.
>>> There
>>> must be an assumption somewhere that a pointer is allocated,
>>> otherwise
>>> __bos would almost never work?
>> 
>> My understanding from the comments in PR109557 was: 
>> 
>> In general the pointer could point to the first object of an array
>> that has more elements, or to an object of a different type. 
>> Without seeing the real allocation to the pointer, the compiler
>> cannot assume that the pointer point to an object that has
>> the exact same type as its declaration. 
>> 
>> Sid and Martin, is the above understand correctly?
> 
> Yes. 
> 
> In the example, it *could* work because the compiler
> could inline 'store' or otherwise use its knowledge about
> the function definition to conclude that 'p' points to
> an object of size 'sizeof (struct P)'. But this is fragile
> because it relies on optimization and will not work across
> TUs.
> 
>> Honestly, I am still not 100% clear on this yet.
> 
>> Jakub, Sid and Martin, could  you please explain a little bit more
>> here, especially for the following small example?
>> 
>> [opc@qinzhao-ol8u3-x86 109557]$ cat t.c
>> #include <stdlib.h>
>> #include <assert.h>
>> struct P {
>>   int k;
>>   int x[10]; 
>> } *p;
>> 
>> void store(int a, int b) 
>> {
>>   p = (struct P *)malloc (sizeof (struct P));
>>   p->k = a;
>>   p->x[b] = 0;
>>   assert (__builtin_dynamic_object_size (p->x, 1) == sizeof
>> (int[10]));
>>   assert (__builtin_dynamic_object_size (p, 1) == sizeof (struct P));
>>   return;
>> }
>> 
>> int main()
>> {
>>   store(7, 7);
>>   assert (__builtin_dynamic_object_size (p->x, 1) == sizeof
>> (int[10]));
>>   assert (__builtin_dynamic_object_size (p, 1) == sizeof (struct P));
>>   free (p);
>> }
>> [opc@qinzhao-ol8u3-x86 109557]$ sh t
>> /home/opc/Install/latest/bin/gcc -O -fdump-tree-all t.c
>> a.out: t.c:21: main: Assertion `__builtin_dynamic_object_size (p->x,
>> 1) == sizeof (int[10])' failed.
>> t: line 19: 859944 Aborted                 (core dumped) ./a.out
>> 
>> 
>> In the above, among the 4 assertions, only the last one failed.
> 
> I don't know why this is the case. 
> 
>> 
>> Why GCC can use the TYPE_SIZE of the pointee of the pointer “p->x” as
>> the size of the object, 
> 
> I do not think it can do this in general. Is this how it 
> is implemented?


No. -:)

 I guess that the implementation of this should base on your following case,  
“observed accesses”:
Although I am not 100% sure on the definition of “observed accesses”.

p->x  is an access to the field of the object, so compiler can assume that the 
object exist and have
the type associate with this access?

On the other hand, “p” is just a plain pointer, no observed access.



> Thus would then seem incorrect to me.  
> 
>> but cannot use the TYPE_SIZE of the pointee of the pointer “p” as the
>> size of the object? 
> 
> In general, the type of a pointer does not say anything about the
> object it points to, because you could cast the pointer to a different
> type, pass it around, and then cast it back before use. 

Okay, I see.
> 
> Only observed allocations or observed accesses provide information
> about the type / existence of an object at the corresponding address.

What will be included in “observed accesses”?

> 
> The only other way in C which ensures that a pointer actually points
> to an object of the correct type is 'static':
> 
> void foo(struct P *p[static 1]);

Thanks for the info.

Qing
> 
> 
> 
> Martin
> 
> 
>> 
>>> 
>>>> Therefore the bug has been closed. 
>>>> 
>>>> In your following testing 5:
>>>> 
>>>>> I'm not sure this is a reasonable behavior, but
>>>>> let me get into the specific test, which looks like this:
>>>>> 
>>>>> TEST(counted_by_seen_by_bdos)
>>>>> {
>>>>>       struct annotated *p;
>>>>>       int index = MAX_INDEX + unconst;
>>>>> 
>>>>>       p = alloc_annotated(index);
>>>>> 
>>>>>       REPORT_SIZE(p->array);
>>>>> /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array));
>>>>>       /* Check array size alone. */
>>>>> /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1),
>>>>> SIZE_MAX);
>>>>> /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1),
>>>>> p->foo * sizeof(*p->array));
>>>>>       /* Check check entire object size. */
>>>>> /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX);
>>>>> /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1),
>>>>> sizeof(*p) + p->foo * sizeof(*p->array));
>>>>> }
>>>>> 
>>>>> Test 5 should pass as well, since, again, p can be examined.
>>>>> Passing p
>>>>> to __bdos implies it is allocated and the __counted_by
>>>>> annotation can be
>>>>> examined.
>>>> 
>>>> Since the call to the routine “alloc_annotated" cannot be
>>>> inlined, GCC does not see any allocation calls for the pointer p.
>>> 
>>> Correct.
>>> 
>>>> At the same time, due to the same reason as PR109986, GCC cannot
>>>> determine the size of the object by just knowing the TYPE_SIZE
>>>> of the pointee of p.  
>>> 
>>> So the difference between test 3 and test 5 is that "p" is
>>> explicitly
>>> dereferenced to find "array", and therefore an assumption is
>>> established
>>> that "p" is allocated?
>> 
>> Yes, this might be the assumption that GCC made  -:)
>>> 
>>>> So, this is exactly the same issue as PR109557.  It’s an existing
>>>> behavior per the current __buitlin_object_size algorithm.
>>>> I am still not very sure whether the situation in PR109557 can be
>>>> improved or not, but either way, it’s a separate issue.
>>> 
>>> Okay, so the issue is one of object allocation visibility (or
>>> assumptions there in)?
>> 
>> Might be, let’s see what Sid or Martin will say on this.
>>> 
>>>> Please see the new testing case I added for PR109557, you will
>>>> see that the above case 5 is a similar case as the new testing
>>>> case in PR109557.
>>> 
>>> I will ponder this a bit more to see if I can come up with a useful
>>> test case to replace the part from "test 5" above.
>>> 
>>>>> 
>>>>> If "return p->array[index];" would be caught by the sanitizer,
>>>>> then
>>>>> it follows that __builtin_dynamic_object_size(p, 1) must also
>>>>> know the
>>>>> size. i.e. both must examine "p" and its trailing flexible
>>>>> array and
>>>>> __counted_by annotation.
>>>>> 
>>>>>> 
>>>>>> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
>>>>>> 
>>>>>> for the following annotated structure: 
>>>>>> 
>>>>>> ====
>>>>>> struct annotated {
>>>>>>       unsigned long flags;
>>>>>>       size_t foo;
>>>>>>       int array[] __attribute__((counted_by (foo)));
>>>>>> };
>>>>>> 
>>>>>> 
>>>>>> struct annotated *p;
>>>>>> int index = 16;
>>>>>> 
>>>>>> p = malloc(sizeof(*p) + index * sizeof(*p->array));  //
>>>>>> allocated real size 
>>>>>> 
>>>>>> p->foo = index + 2;  // p->foo was set by a different value
>>>>>> than the real size of p->array as in test 9 and 10
>>>>> 
>>>>> Right, tests 9 and 10 are checking that the _smallest_ possible
>>>>> value of
>>>>> the array is used. (There are two sources of information: the
>>>>> allocation
>>>>> size and the size calculated by counted_by. The smaller of the
>>>>> two
>>>>> should be used when both are available.)
>>>> 
>>>> The counted_by attribute is used to annotate a Flexible array
>>>> member on how many elements it will have.
>>>> However, if this information can not accurately reflect the real
>>>> number of elements for the array allocated, 
>>>> What’s the purpose of such information? 
>>> 
>>> For example, imagine code that allocates space for 100 elements
>>> since
>>> the common case is that the number of elements will grow over time.
>>> Elements are added as it goes. For example:
>>> 
>>> struct grows {
>>>         int alloc_count;
>>>         int valid_count;
>>>         struct element item[] __counted_by(valid_count);
>>> } *p;
>>> 
>>> void something(void)
>>> {
>>>         p = malloc(sizeof(*p) + sizeof(*p->item) * 100);
>>>         p->alloc_count = 100;
>>>         p->valid_count = 0;
>>> 
>>>         /* this loop doesn't check that we don't go over 100. */
>>>         while (items_to_copy) {
>>>                 struct element *item_ptr = get_next_item();
>>>                 /* __counted_by stays in sync: */
>>>                 p->valid_count++;
>>>                 p->item[p->valid_count - 1] = *item_ptr;
>>>         }
>>> }
>>> 
>>> We would want to catch cases there p->item[] is accessed with an
>>> index
>>> that is >= p->valid_count, even though the allocation is
>>> (currently)
>>> larger.
>>> 
>>> However, if we ever reached valid_count >= alloc_count, we need to
>>> trap
>>> too, since we can still "see" the true allocation size.
>>> 
>>> Now, the __alloc_size hint is visible in very few places, so if
>>> there is
>>> a strong reason to do so, I can live with saying that __counted_by
>>> takes
>>> full precedence over __alloc_size. It seems it should be possible
>>> to
>>> compare when both are present, but I can live with __counted_by
>>> being
>>> the universal truth. :)
>>> 
>> 
>> Thanks for the example and the explanation. Understood now.
>> 
>> LLVM’s RFC requires the following relationship:
>> (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbound
>> s-safety/70854#maintaining-correctness-of-bounds-annotations-18)
>> 
>> Buffer’s real allocated size >= counted_by value
>> 
>> Should be true all the time. 
>> 
>> I think that this is a reasonable requirement. 
>> 
>> (Otherwise, if counted_by > buffer’s real allocated size, overflow
>> might happen)
>> 
>> Is the above enough to cover your use cases?
>> 
>> If so, I will study a little bit more to see how to implement this.
>>>> 
>>>>>> or
>>>>>> p->foo was not set to any value as in test 3 and 4
>>>>> 
>>>>> For tests 3 and 4, yes, this was my mistake. I have fixed up
>>>>> the tests
>>>>> now. Bill noticed the same issue. Sorry for the think-o!
>>>>> 
>>>>>> ====
>>>>>> 
>>>>>> i.e, the value of p->foo is NOT synced with the number of
>>>>>> elements allocated for the array p->array.  
>>>>>> 
>>>>>> I think that this should be considered as an user error, and
>>>>>> the documentation of the attribute should include
>>>>>> this requirement.  (In the LLVM’s RFC, such requirement was
>>>>>> included in the programing model: 
>>>>>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18
>>>>>> )
>>>>>> 
>>>>>> We can add a new warning option -Wcounted-by to report such
>>>>>> user error if needed.
>>>>>> 
>>>>>> What’s your opinion on this?
>>>>> 
>>>>> I think it is correct to allow mismatch between allocation and
>>>>> counted_by as long as only the least of the two is used.
>>>> 
>>>> What’s your mean by “only the least of the two is used”?
>>> 
>>> I was just restating the above: if size information is available
>>> via
>>> both __alloc_size and __counted_by, the smaller of the two should
>>> be
>>> used.
>> 
>> If we are consistent with LLVM’s RFC, __alloc_size >= __counted_by
>> all the time.
>> Then we can always use the counted_by info for the array size. 
>>> 
>>>> 
>>>>> This may be
>>>>> desirable in a few situations. One example would be a large
>>>>> allocation
>>>>> that is slowly filled up by the program.
>>>> 
>>>> So, for such situation, whenever the allocation is filled up, the
>>>> field that hold the “counted_by” attribute should be increased at
>>>> the same time,
>>>> Then, the “counted_by” value always sync with the real
>>>> allocation. 
>>>>> I.e. the counted_by member is
>>>>> slowly increased during runtime (but not beyond the true
>>>>> allocation size).
>>>> 
>>>> Then there should be source code to increase the “counted_by”
>>>> field whenever the allocated space increased too. 
>>>>> 
>>>>> Of course allocation size is only available in limited
>>>>> situations, so
>>>>> the loss of that info is fine: we have counted_by for
>>>>> everything else.
>>>> 
>>>> The point is: allocation size should synced with the value of
>>>> “counted_by”. LLVM’s RFC also have the similar requirement:
>>>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18
>>> 
>>> Right, I'm saying it would be nice if __alloc_size was checked as
>>> well,
>>> in the sense that if it is available, it knows without question
>>> what the
>>> size of the allocation is. If __alloc_size and __counted_by
>>> conflict,
>>> the smaller of the two should be the truth.
>> 
>>> 
>>> But, as I said, if there is some need to explicitly ignore
>>> __alloc_size
>>> when __counted_by is present, I can live with it; we just need to
>>> document it.
>>> 
>>> If the RFC and you agree that the __counted_by variable can only
>>> ever be
>>> (re)assigned after the flex array has been (re)allocated, then I
>>> guess
>>> we'll see how it goes. :) I think most places in the kernel using
>>> __counted_by will be fine, but I suspect we may have cases where we
>>> need
>>> to update it like in the loop I described above. If that's true, we
>>> can
>>> revisit the requirement then. :)
>> 
>> I guess if we can always keep  the relationship:   __alloc_size >=
>> __counted_by all the time. Should be fine.
>> 
>> Please check whether this is enough for kernel, I will check whether
>> this is doable For GCC.
>> 
>> thanks.
>> 
>> 
>> Qing
>>> 
>>> -Kees
>>> 
>>> -- 
>>> Kees Cook
>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

Reply via email to