On 10/23/2017 06:14 PM, Martin Sebor wrote: ... > It seems to me that before exposing a new mechanism to control > the exec charset it would be prudent to a) plug at least the > biggest holes to make the feature more reliable (in my mind, > that's at least -Wformat), and b) make sure the pragma interacts > correctly with existing features that work correctly with the > -fexec-charset option. Where it doesn't and where it cannot > be made to work correctly (i.e., is undefined), I would expect > an effort to be made to detect and diagnose those undefined > interactions if possible, or if that's too difficult, at > a minimum document them.
Agreed. Getting a better understanding of the deficiencies with that mechanism and at least document them is an important step. ... > My concern with this pragma/attribute and inlining has to do with > strings in one exec charset being propagated into functions that > operate on strings in another charset. E.g., like in the test > case below that's "miscompiled" with your patch -- the first test > for n == 7 is eliminated and the buffer overflow is not detected. > If this cannot be made to work then I think some effort should be > made to detect this mixing and matching and existing optimizations > that assume the same charset (like the sprintf one does) disabled. > > static inline int > f (char *d, const char *fmt) > { > #pragma GCC exec_charset ("utf8") > int n = __builtin_sprintf (d, fmt, 12345); > #pragma GCC exec_charset (pop) > > if (n == 7) // incorrectly optimized away > __builtin_abort (); To my understanding the problem in your example isn't triggered through inlining. The check gets optimized away because sprintf is called with "i=%i" being converted to EBCDIC first. The sprintf implementation does not recognize the format letter and hence only prints the converted format string without inserting the integer. That's kinda expected I would say. You call a function which is expected to deal with a UTF-8 string with an EBCDIC string. GCC is reasoning about what the sprintf invocation would return and removes the condition since it would never be triggered. Compiling with -O0 makes the program behave the same way even without the inlining and the check not being removed. It would also happen without using the pragma but compiling with -fexec-charset=EBCDIC-US instead. E.g. compiling this program with -fexec-charset=EBCDIC-US does abort while it does not abort without using -fexec-charset: int main (void) { char d[5]; int n = __builtin_sprintf (d, "i=%i", 12345); if (n != 7) __builtin_abort (); } I'm not sure what kind of warning to expect for such code? Would it be helpful if the compiler tells that the exec_charset setting does not affect "fmt" for this code?! #pragma GCC exec_charset ("EBCDIC-US") int n = __builtin_sprintf (d, fmt, 12345); #pragma GCC exec_charset (pop) On the other hand, diagnostics like this might create false positives. One reason of course is that this might be intentional. But it might also be because in C it is difficult to detect whether a char* variable will eventually be interpreted as string or not. While I agree with you that the feature is not that easy to use and more diagnostics would be good I'm not really sure what to do about it. Bye, -Andreas-