Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

Bernd Edlinger Sat, 25 Aug 2018 11:37:09 -0700

On 08/25/18 19:32, Jeff Law wrote:
> On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
>> On 08/25/18 01:54, Jeff Law wrote:
>>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>>>> On 08/24/18 18:51, Jeff Law wrote:
>>>>>> Well, this is broken for wide character strings.
>>>>>> but I hope we can get rid of STRING_CST which are
>>>>>> not explicitly null terminated.
>>>>
>>>> I am afraid that is not going to happen.
>>>> Maybe we can get STRING_CST that are never longer
>>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>>>> need to take care that the string is zero-terminated.
>>>>
>>>> string_constant, should not promise the string is zero terminated.
>>>> But instead it can promise that:
>>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>>>> 2) mem_size is >= TREE_STRING_LENGTH
>>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>>>
>>>> It will not guarantee anything about zero termination any more.
>>> Interesting because those conditions would be sufficient to deal with a
>>> regression I stumbled over after fixing Martin's patch to not assume
>>> that all STRING_CSTs are NUL terminated.
>>>
>>> But I need to think about this a bit more.  Essentially the question
>>> we'd need to ask is whether or not these are sufficient in general or
>>> just in specific cases.
>>>
>>> I tend to think they're not sufficient in general. If a string returned
>>> by string_constant that didn't have a terminating NUL, but which did
>>> pass the tests above were ultimately passed to the runtime's str*
>>> routines, then the call may run off the end of the string.  We'd like to
>>> be able to warn for that.
>>>
>>> So ISTM those rules are only valid in contexts where we know the result
>>> isn't going to be passed to str* and friends within the C library.
>>>
>>> I do think they're sufficient to avoid problems with the
>>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>>> is to have that routine indicate it's willing to accept unterminated
>>> strings, then check the conditions above before optimizing the code.
>>>
>>
>> There are not too many callers of string_constant.
>> Not all need zero termination.
> Right.  And in retrospect we probably should have avoided default
> parameter overloads and just fixed the callers.  But that can be a
> follow-up.
> 
>>
>> But I think if the are interested in zero-termination
>> they should simply call c_strlen or c_getstr.
> Perhaps.
> 
> 
>>
>>
>>>>
>>>> In the end, the best approach might be to either merge my patch
>>>> with Martins, or step-wise, first fixing wrong code, and then
>>>> implementing warnings without fixing wrong code.
>>> Unsure at this time.  I've been working with both.  I suspect that if we
>>> went with yours that we'd then turn around and layer Martin's on top of
>>> it because of the desire to signal to callers that we have an
>>> unterminated string and have the callers take appropriate action.  Which
>>> begs the question of whether or not we just go with Martin's -- ie, is
>>> there really any value in using both.  I haven't seen indications there
>>> is value in that approach, but I'm still poking at things.
>>>
>>
>> Well, ya call it "layer one patch over the other"
>> I call it "incremental improvements".
> It is (of course) a case by case basis.  The way I try to look at these
> things is to ask whether or not the first patch under consideration
> would have any value/purpose after the second patch was installed.  If
> so, then it may make sense to include both.  If not, then we really just
> want one patch.
>

Agreed.  I think the question is which of the possible STRING_CST
semantics we want to have in the end (the middle-end).
Everything builds on top of the semantic properties of STRING_CSTs.

My first attempt of fix the STRING_CST semantic was trying to make
string_constant happy.

My second attempt is trying to make Richard happy.  And when I look
at both patches, I think the second one is better, and more simple.

BTW I need to correct on statement in my last e-mail:

On 08/24/18 23:55, Bernd Edlinger wrote:>
> here I quote form martins 1/6 patch:
>> +  /* Compute the lower bound number of elements (not bytes) in the array
>> +     that the string is used to initialize.  The actual size of the array
>> +     will be may be greater if the string is shorter, but the important
>> +     data point is whether the literal, including the terminating nul,
>> +     fits in the array. */
>> +  unsigned HOST_WIDE_INT array_elts
>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>> +
>> +  /* Compute the string length in (wide) characters.  */
> 
> So prior to my STRING_CST patch this will ICE on a flexible array member,
> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
> 
> I used:
> 
>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>                     TREE_STRING_LENGTH (init))
> 
> and this will not ICE with NULL, but consider it like infinity,
> and return 1.
> 
> So my version did not ICE in that case.
> 

Oooops,

actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
I actually tried to test that case, but have done something wrong.

I hope we can get rid if the incomplete types in the middle-end.

So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
here?   Or maybe just defer until we have clarity about the semantics.

Bernd.

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

Reply via email to