Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Jeff Law Mon, 20 Aug 2018 15:18:27 -0700

On 08/18/2018 11:38 AM, Martin Sebor wrote:
> On 08/17/2018 09:32 PM, Jeff Law wrote:
>> On 08/17/2018 02:17 PM, Martin Sebor wrote:
>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
>>>> On 08/17/18 20:23, Martin Sebor wrote:
>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote:
>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote:
>>>>>>
>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote:
>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote:
>>>>>>>>
>>>>>>>>> restores previous behavior.  The sprintf bits want to count
>>>>>>>>> element
>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be
>>>>>>>>
>>>>>>>>>    /* Compute the range the argument's length can be in.  */
>>>>>>>>> -  fmtresult slen = get_string_length (arg);
>>>>>>>>> +  int count_by = dir.specifier == 'S' || dir.modifier ==
>>>>>>>>> FMT_LEN_l ? 4 : 1;
>>>>>>>>
>>>>>>>> I don't see how a hardcoded 4 is correct here.  Surely you need to
>>>>>>>> example
>>>>>>>> wchar_type_node to determine its actual size for this target.
>>>>>>> We did kick this around a little.  IIRC Martin didn't think that it
>>>>>>> was
>>>>>>> worth handling the 2 byte wchar case.
>>>>>
>>>>> Sorry, I think we may have miscommunicated -- I didn't think it
>>>>> was useful to pass a size of the character type to the function.
>>>>> I agree that passing in a hardcoded constant doesn't seem right
>>>>> (even if GCC's wchar_t were always 4 bytes wide).
>>>>>
>>>>> I'm still not sure I see the benefit of passing in the expected
>>>>> element size given that the patch causes c_strlen() fail when
>>>>> the actual element size doesn't match ELTSIZE.  If the caller
>>>>> asks for the number of bytes in a const wchar_t array it should
>>>>> get back the number bytes.  (I could see it fail if the caller
>>>>> asked for the number of words in a char array whose size was
>>>>> not evenly divisible by wordsize.)
>>>>>
>>>>
>>>> I think in this case c_strlen should use the type which the %S format
>>>> uses at runtime, otherwise it will not have anything to do with
>>>> the reality.
>>>
>>> %S is not what I'm talking about.
>>>
>>> Failing in the case I described (a caller asking for the size
>>> in bytes of a constant object whose type is larger) prevents
>>> callers that don't care about the type from detecting the actual
>>> size of a constant.
>>>
>>> Specifically for sprintf, it means that the buffer overflow
>>> below is no longer diagnosed:
>>>
>>>   struct S { char a[2]; void (*pf)(void); };
>>>
>>>   void f (struct S *p)
>>>   {
>>>     const char *q = (char*)L"\x41424344\x45464748";
>>>
>>>     sprintf (p->a, "%s", q);
>>>   }
>> I don't think this is in the testsuite, is it?  I verified that there
>> was no regressions when I installed Bernd's patch and when I installed
>> yours.
> 
> No, there are very few tests that exercise these kinds of mixed
> argument types.  Code like that is most likely the result of
> a mistake, but it's not the kind of a bug I had even thought
> about until some of the codegen issues with mixed argument types
> were brought up (PR 86711/86714).
Phew.  I was worried I'd somehow missed the failure or tested the wrong
patch or who knows what.


Can you add that test, xfailed for now to the testsuite?

>
> I would just like the ability to get the length/size somehow
> so that the questionable code that started us down this path
> can be detected.  This is not just the sprintf(d, "%s", L"1")
> kind of a mismatch but also the missing nul detection in cases
> like:
[ ... ]
I know.  And ideally we'll be able to handle everything you want to.
But we also have to realize that sometimes we may have to punt.

Also remember that incremental progress is (usually) good.


> 
>   const wchar_t a[1] = L"\xffffffff";
>   strcpy(d, (char*)a);
This touches on both the representational issue (excess elements in the
initializer) and how to handle a non-terminated string.  Both are issues
we're debating right now.

> 
> I don't think this is necessarily an important use case to
> optimize for but it is one that GCC optimizes already nonetheless,
> and always has.  For example, this has always been folded into 4
> by GCC and still is even with the patch:
> 
>   const __WCHAR_TYPE__ wc[] = L"\x12345678";
> 
>   int f (void)
>   {
>     const char *p = (char*)wc;
>     return strlen (p);            // folded to 4
>   }
> 
> It's optimized because fold_const_call() relies on c_getstr()
> which returns the underlying byte representation of the wide
> string, and despite c_strlen() now trying to prevent it.
And I think you're hitting on some of issues already raised in the thread.

In this specific case though ISTM 4 is the right answer.  We casted to a
char *, so that's what we should be counting.  Or am I missing
something?  Also note that's the value we'd get from the strlen C
library call IIUC.



> 
> The missing nul variant of the same test case isn't folded (it
> ends up calling the library strlen) but the bug cannot be
> detected using the modified c_strlen():
> 
>   const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul
> 
>   int f (void)
>   {
>     const char *p = (char*)wc;
>     return strlen (p);            // not folded
>   }
> 
> To detect it, I'd have to use some other function, like
> c_getstr().  I could certainly do that but I don't think
> I should need to.
And that's on the list of things we're going to try and address next.  I
don't think it needs to be conflated with change which indicated to
c_strlen how to count.

I do hope you're getting all these testcases recorded.  I'd even support
installing them into the suite now, properly xfailed.  It's easier to
see how any patch under consideration affects the wider set of tests.

Jeff

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Reply via email to