[Bug c/65158] printf attribute error reporting assumes single-byte characters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158 --- Comment #5 from Manuel López-Ibáñez --- (In reply to Martin Sebor from comment #4) > What follows the percent sign must one of the C or POSIX conversion > specifiers (after any optional flags etc.) and those are all single byte > characters in most (all?) charsets. In %x\x1B$B" (in the test case from > PR33748) the \x1B character is the beginning of an ISO-2022 escape sequence > and not a valid conversion specifier so I think it's fine, even correct, to > only consider it as the (invalid) conversion specifier, print it in the > warning, and restart parsing with the byte after it. Treating what follows > % as a sequence of multibyte characters could otherwise throw off the > remaining parsing. To be honest, the support for utf8 and multi-byte encodings is so poor right now, that I don't think this is the most pressing issue, but note that the warning message is exactly the same for "%ñ" and "%í". The poor location info would not help to find where the problem is. : In function 'void foo(int)': :3:23: warning: unknown conversion type character '\xd1' in format [-Wformat=] 3 | __builtin_printf("%i%і%𝚒%ℹ", i); | ^~~~ :3:23: warning: unknown conversion type character '\xf0' in format [-Wformat=] :3:23: warning: unknown conversion type character '\xe2' in format [-Wformat=]
[Bug c/65158] printf attribute error reporting assumes single-byte characters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158 --- Comment #4 from Martin Sebor --- (In reply to Manuel López-Ibáñez from comment #3) > void foo(void) { > __builtin_printf("%ñ%中"); > __builtin_printf ("%\x1B$B"); /* Taken from PR33748 */ > } > > : In function 'void foo()': > :2:22: warning: unknown conversion type character '\xc3' in format > [-Wformat=] > 2 | __builtin_printf("%ñ%中"); > | ^ > :2:22: warning: unknown conversion type character '\xe4' in format > [-Wformat=] > :3:23: warning: unknown conversion type character '\x1b' in format > [-Wformat=] > 3 | __builtin_printf ("%\x1B$B"); > | ^ > > Note that ñ and 中 are multi-byte but the message only shows one byte. What follows the percent sign must one of the C or POSIX conversion specifiers (after any optional flags etc.) and those are all single byte characters in most (all?) charsets. In %x\x1B$B" (in the test case from PR33748) the \x1B character is the beginning of an ISO-2022 escape sequence and not a valid conversion specifier so I think it's fine, even correct, to only consider it as the (invalid) conversion specifier, print it in the warning, and restart parsing with the byte after it. Treating what follows % as a sequence of multibyte characters could otherwise throw off the remaining parsing.
[Bug c/65158] printf attribute error reporting assumes single-byte characters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158 Manuel López-Ibáñez changed: What|Removed |Added CC||dmalcolm at gcc dot gnu.org, ||manu at gcc dot gnu.org --- Comment #3 from Manuel López-Ibáñez --- void foo(void) { __builtin_printf("%ñ%中"); __builtin_printf ("%\x1B$B"); /* Taken from PR33748 */ } : In function 'void foo()': :2:22: warning: unknown conversion type character '\xc3' in format [-Wformat=] 2 | __builtin_printf("%ñ%中"); | ^ :2:22: warning: unknown conversion type character '\xe4' in format [-Wformat=] :3:23: warning: unknown conversion type character '\x1b' in format [-Wformat=] 3 | __builtin_printf ("%\x1B$B"); | ^ Note that ñ and 中 are multi-byte but the message only shows one byte. Another bug is that the location info fails to point within the format string (ccing David): void foo(int i) { __builtin_printf("%i%i%i%i%í%i%i",i,i,i,i,i,i); } :2:23: warning: unknown conversion type character '\xc3' in format [-Wformat=] 2 | __builtin_printf("%i%i%i%i%í%i%i",i,i,i,i,i,i); | ^
[Bug c/65158] printf attribute error reporting assumes single-byte characters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158 Tom Tromey changed: What|Removed |Added Status|WAITING |REOPENED --- Comment #2 from Tom Tromey --- (In reply to Martin Sebor from comment #1) > There is no format specifier in C or POSIX that involves a multibyte > character. They're all single byte characters in the 7-bit ASCII range that > should convert to single byte characters in most (all?) encodings. It would > take an unusual character set to map a 7-bit character to a multibyte > sequence. Is it worth worrying about this corner case? I think this is just a bug I noticed by inspection. The issue is that if the user typo the source somehow, gcc will print something invalid. So, yes, minor; but nevertheless a bug. I'm reopening on that basis.
[Bug c/65158] printf attribute error reporting assumes single-byte characters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158 Martin Sebor changed: What|Removed |Added Keywords||diagnostic Status|UNCONFIRMED |WAITING Last reconfirmed||2018-09-14 CC||msebor at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Sebor --- There is no format specifier in C or POSIX that involves a multibyte character. They're all single byte characters in the 7-bit ASCII range that should convert to single byte characters in most (all?) encodings. It would take an unusual character set to map a 7-bit character to a multibyte sequence. Is it worth worrying about this corner case? (-Wformat doesn't currently handle the -fexec-charset= option so that should presumably be a higher priority problem to fix.)