On 10/24/2017 05:33 PM, Martin Sebor wrote:
>>> My concern with this pragma/attribute and inlining has to do with
>>> strings in one exec charset being propagated into functions that
>>> operate on strings in another charset.  E.g., like in the test
>>> case below that's "miscompiled" with your patch -- the first test
>>> for n == 7 is eliminated and the buffer overflow is not detected.
>>> If this cannot be made to work then I think some effort should be
>>> made to detect this mixing and matching and existing optimizations
>>> that assume the same charset (like the sprintf one does) disabled.
>>>
>>> static inline int
>>> f (char *d, const char *fmt)
>>> {
>>> #pragma GCC exec_charset ("utf8")
>>>    int n = __builtin_sprintf (d, fmt, 12345);
>>> #pragma GCC exec_charset (pop)
>>>
>>>    if (n == 7)   // incorrectly optimized away
>>>      __builtin_abort ();
>>
>> To my understanding the problem in your example isn't triggered through 
>> inlining.
>> The check gets optimized away because sprintf is called with "i=%i" being 
>> converted to EBCDIC first.
> 
> It's optimized away because the sprintf pass interprets the constant
> contents of the format string, which it only does when the function
> is inlined into its caller.  Otherwise, without inlining, it treats
> the format string as an unknown and so it cannot do the buffer
> overflow analysis or the optimization, and everything works fine
> at runtime.

I assumed that GCC is supposed to emulate the format string parsing of sprintf 
as if it were
compiled with the default execcharset. But instead it should emulate sprintf as 
if it was compiled
with the same exec-charset as the currently compiled code - right?!

So for an example like this:

  char d[3];
#pragma GCC exec_charset ("EBCDIC-US")
  char *s = "i=%i";
#pragma GCC exec_charset (pop)

  __builtin_sprintf (d, s, 1);

It currently warns:

t.c:6:13: warning: ‘~l’ directive writing 4 bytes into a region of size 3 
[-Wformat-overflow=]
   char *s = "i=%i";
             ^~~~~~
t.c:9:3: note: ‘__builtin_sprintf’ output 5 bytes into a destination of size 3
   __builtin_sprintf (d, s, 1);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~

That's wrong and happens because the GCC sprintf format string parsing uses the 
default charset
instead of the charset which has been used when originally parsing the string 
literal.  Hence it
does not recognize the %i format specifier and the output string remains 4 
bytes in length.

As Richard suggested we probably have to record the exec_charset in each string 
literal and use this
when trying to interpret it.

-Andreas-

Reply via email to