Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

Richard Biener Tue, 07 Nov 2017 02:25:06 -0800

On Tue, Nov 7, 2017 at 4:23 AM, Martin Sebor <mse...@gmail.com> wrote:
> On 11/06/2017 11:41 AM, Jeff Law wrote:
>>
>> On 10/29/2017 10:15 AM, Martin Sebor wrote:
>>>
>>> Ping -- please see my reply below.
>>>
>>> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>>>>
>>>>> variable
>>>>>>>>
>>>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>>>> want to pass 'arg'?)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>>>> an array at the end of a struct.  For those the function returns
>>>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>>>> my tests.)
>>>>>>
>>>>>>
>>>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>>>>
>>>>> it will
>>>>>>
>>>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>>>>
>>>>> NULL.
>>>>>>
>>>>>> You could use get_ref_base_and_extent which will return the offset
>>>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>>>> is it returns offset in bits thus if the offset is too large for a
>>>>>
>>>>> HWI
>>>>>>
>>>>>> you'll instead get offset == 0 and max_size == -1.
>>>>>>
>>>>>> Thus I'm saying this is dead code for variable array accesses
>>>>>> (even for the array you are warning about).  Yes, for constant index
>>>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>>>> of variable indexes.
>>>>>>
>>>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>>>> for some extra precision (and possible lossage for very very large
>>>>>> structures).
>>>>>
>>>>>
>>>>> Computing bit offsets defeats the out-of-bounds flexible array
>>>>> index detection because the computation overflows (the function
>>>>> sets the offset to zero).
>>>>
>>>>
>>>> It shouldn't if you pass arg rather than ref.
>>>
>>>
>>> I suspect you missed my reply in IRC where I confirmed that this
>>> approach doesn't work for the reason you yourself mentioned above:
>>>
>>>>>> The disadvantage of get_ref_base_and_extent
>>>>>> is it returns offset in bits thus if the offset is too large
>>>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
>>>
>>>
>>> This means that using the function you suggest would either prevent
>>> detecting the out-of-bounds indices that overflow due to the bit
>>> offset, thus largely defeating the purpose of the patch (to detect
>>> excessively large indices), or not printing the value of the out-of
>>> bounds index in the warning in this case, which would in turn mean
>>> further changes to the rest of the logic.
>>
>> I think Richi's point is that it's more important (in his opinion) to
>> handle the varying objects within an access like a.b.c[i] than to handle
>> cases that would overflow when converted to bits.  Thus he'd prefer to
>> use get_ref_base_and_extent over get_addr_base_and_unit_offset.
>>
>> I wonder if a hybrid approach here would work.
>>
>> ie, use get_ref_base_and_extent per Richi's request.  When that returns
>> an max_size of -1, then call get_addr_base_and_unit_offset and if it
>> returns that the offset is huge, then warn without any deeper analysis.
>>
>> Yea, this could trip a false positive if someone makes an array that is
>> half the address space (give or take), but that's probably not at all
>> common.  Whereas the additional cases handled by get_ref_base_and_extent
>> are perhaps more useful.
>>
>> Thoughts?
>
>
> I'm always open to suggestions for improvements.  I just haven't
> been able to construct a test case to demonstrate the improvement
> gained by using get_ref_base_and_extent.  The only test cases I've
> come up with show false positives.  It could very well be because
> of my lack of experience with the APIs so if there is a trade-off
> here that could be compensated for by the approach you suggest I'm
> happy to incorporate it into the patch.  I just need a test case
> to verify that the solution works as intended.


The difficulty with a testcase like

struct { struct A { int b[1]; } a[5]; } x;

 x.a[i].b[j]

is that b is not considered an array at struct end since one of my
recent changes to array_at_struct_end (basically it disallows having
a flex array as the last member of an array).

It would still stand for non-array components with variable offset
but you can't create C testcases for that.

So yes, for the specific case within the array_at_struct_end_p condition
get_addr_base_and_unit_offset is enough.  IIRC the conditon was
a bit more than just get_addr_base_and_unit_offset.  up_bound !=
INTEGER_CST for example.  So make the above

void foo (int n, int i)
{
 struct { struct A { int b[n]; } a[5]; } x;
 return x.a[i].b[PTRDIFF_MAX/2];
}

with appropriately adjusted constant.  Does that give you the testcase
you want?

As of "it works, catches corner-cases, ..." - yes, it does, but it
adds code that needs to be maintained, may contain bugs, is
executed even for valid code.

Richard.


> Martin

Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

Reply via email to