On 10/23/2017 06:14 PM, Martin Sebor wrote:
...
> It seems to me that before exposing a new mechanism to control
> the exec charset it would be prudent to a) plug at least the
> biggest holes to make the feature more reliable (in my mind,
> that's at least -Wformat), and b) make sure the pragma interacts
> correctly with existing features that work correctly with the
> -fexec-charset option.  Where it doesn't and where it cannot
> be made to work correctly (i.e., is undefined),  I would expect
> an effort to be made to detect and diagnose those undefined
> interactions if possible, or if that's too difficult, at
> a minimum document them.

Agreed. Getting a better understanding of the deficiencies with that mechanism 
and at least document
them is an important step.

...
> My concern with this pragma/attribute and inlining has to do with
> strings in one exec charset being propagated into functions that
> operate on strings in another charset.  E.g., like in the test
> case below that's "miscompiled" with your patch -- the first test
> for n == 7 is eliminated and the buffer overflow is not detected.
> If this cannot be made to work then I think some effort should be
> made to detect this mixing and matching and existing optimizations
> that assume the same charset (like the sprintf one does) disabled.
> 
> static inline int
> f (char *d, const char *fmt)
> {
> #pragma GCC exec_charset ("utf8")
>    int n = __builtin_sprintf (d, fmt, 12345);
> #pragma GCC exec_charset (pop)
> 
>    if (n == 7)   // incorrectly optimized away
>      __builtin_abort ();

To my understanding the problem in your example isn't triggered through 
inlining.
The check gets optimized away because sprintf is called with "i=%i" being 
converted to EBCDIC first.
 The sprintf implementation does not recognize the format letter and hence only 
prints the converted
format string without inserting the integer. That's kinda expected I would say. 
 You call a function
which is expected to deal with a UTF-8 string with an EBCDIC string. GCC is 
reasoning about what the
sprintf invocation would return and removes the condition since it would never 
be triggered.
Compiling with -O0 makes the program behave the same way even without the 
inlining and the check not
being removed.

It would also happen without using the pragma but compiling with 
-fexec-charset=EBCDIC-US instead.
E.g. compiling this program with -fexec-charset=EBCDIC-US does abort while it 
does not abort without
using -fexec-charset:

int main (void)
{
   char d[5];

   int n = __builtin_sprintf (d, "i=%i", 12345);

   if (n != 7)
     __builtin_abort ();
}

I'm not sure what kind of warning to expect for such code? Would it be helpful 
if the compiler tells
that the exec_charset setting does not affect "fmt" for this code?!

#pragma GCC exec_charset ("EBCDIC-US")
   int n = __builtin_sprintf (d, fmt, 12345);
#pragma GCC exec_charset (pop)

On the other hand, diagnostics like this might create false positives. One 
reason of course is that
this might be intentional. But it might also be because in C it is difficult to 
detect whether a
char* variable will eventually be interpreted as string or not.

While I agree with you that the feature is not that easy to use and more 
diagnostics would be good
I'm not really sure what to do about it.

Bye,

-Andreas-

Reply via email to