[Bug c/65158] printf attribute error reporting assumes single-byte characters

2018-09-15 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158

--- Comment #5 from Manuel López-Ibáñez  ---
(In reply to Martin Sebor from comment #4)
> What follows the percent sign must one of the C or POSIX conversion
> specifiers (after any optional flags etc.) and those are all single byte
> characters in most (all?) charsets.  In %x\x1B$B" (in the test case from
> PR33748) the \x1B character is the beginning of an ISO-2022 escape sequence
> and not a valid conversion specifier so I think it's fine, even correct, to
> only consider it as the (invalid) conversion specifier, print it in the
> warning, and restart parsing with the byte after it.  Treating what follows
> % as a sequence of multibyte characters could otherwise throw off the
> remaining parsing.

To be honest, the support for utf8 and multi-byte encodings is so poor right
now, that I don't think this is the most pressing issue, but note that the
warning message is exactly the same for "%ñ" and "%í". The poor location info
would not help to find where the problem is. 

: In function 'void foo(int)':
:3:23: warning: unknown conversion type character '\xd1' in format
[-Wformat=]
3 |  __builtin_printf("%i%і%𝚒%ℹ", i);
  |   ^~~~
:3:23: warning: unknown conversion type character '\xf0' in format
[-Wformat=]
:3:23: warning: unknown conversion type character '\xe2' in format
[-Wformat=]

[Bug c/65158] printf attribute error reporting assumes single-byte characters

2018-09-15 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158

--- Comment #4 from Martin Sebor  ---
(In reply to Manuel López-Ibáñez from comment #3)
> void foo(void) {
> __builtin_printf("%ñ%中");
> __builtin_printf ("%\x1B$B"); /* Taken from PR33748 */
>  }
> 
> : In function 'void foo()':
> :2:22: warning: unknown conversion type character '\xc3' in format
> [-Wformat=]
> 2 | __builtin_printf("%ñ%中");
>   |  ^
> :2:22: warning: unknown conversion type character '\xe4' in format
> [-Wformat=]
> :3:23: warning: unknown conversion type character '\x1b' in format
> [-Wformat=]
> 3 | __builtin_printf ("%\x1B$B");
>   |   ^
> 
> Note that ñ and 中 are multi-byte but the message only shows one byte.

What follows the percent sign must one of the C or POSIX conversion specifiers
(after any optional flags etc.) and those are all single byte characters in
most (all?) charsets.  In %x\x1B$B" (in the test case from PR33748) the \x1B
character is the beginning of an ISO-2022 escape sequence and not a valid
conversion specifier so I think it's fine, even correct, to only consider it as
the (invalid) conversion specifier, print it in the warning, and restart
parsing with the byte after it.  Treating what follows % as a sequence of
multibyte characters could otherwise throw off the remaining parsing.

[Bug c/65158] printf attribute error reporting assumes single-byte characters

2018-09-15 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158

Manuel López-Ibáñez  changed:

   What|Removed |Added

 CC||dmalcolm at gcc dot gnu.org,
   ||manu at gcc dot gnu.org

--- Comment #3 from Manuel López-Ibáñez  ---
void foo(void) {
__builtin_printf("%ñ%中");
__builtin_printf ("%\x1B$B"); /* Taken from PR33748 */
 }

: In function 'void foo()':
:2:22: warning: unknown conversion type character '\xc3' in format
[-Wformat=]
2 | __builtin_printf("%ñ%中");
  |  ^
:2:22: warning: unknown conversion type character '\xe4' in format
[-Wformat=]
:3:23: warning: unknown conversion type character '\x1b' in format
[-Wformat=]
3 | __builtin_printf ("%\x1B$B");
  |   ^

Note that ñ and 中 are multi-byte but the message only shows one byte.

Another bug is that the location info fails to point within the format string
(ccing David):

void foo(int i) {
 __builtin_printf("%i%i%i%i%í%i%i",i,i,i,i,i,i);
}

:2:23: warning: unknown conversion type character '\xc3' in format
[-Wformat=]
2 |  __builtin_printf("%i%i%i%i%í%i%i",i,i,i,i,i,i);
  |   ^

[Bug c/65158] printf attribute error reporting assumes single-byte characters

2018-09-14 Thread tromey at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158

Tom Tromey  changed:

   What|Removed |Added

 Status|WAITING |REOPENED

--- Comment #2 from Tom Tromey  ---
(In reply to Martin Sebor from comment #1)
> There is no format specifier in C or POSIX that involves a multibyte
> character.  They're all single byte characters in the 7-bit ASCII range that
> should convert to single byte characters in most (all?) encodings.  It would
> take an unusual character set to map a 7-bit character to a multibyte
> sequence.  Is it worth worrying about this corner case?

I think this is just a bug I noticed by inspection.

The issue is that if the user typo the source somehow, gcc will print
something invalid.  So, yes, minor; but nevertheless a bug.  I'm reopening
on that basis.

[Bug c/65158] printf attribute error reporting assumes single-byte characters

2018-09-14 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65158

Martin Sebor  changed:

   What|Removed |Added

   Keywords||diagnostic
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2018-09-14
 CC||msebor at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Sebor  ---
There is no format specifier in C or POSIX that involves a multibyte character.
 They're all single byte characters in the 7-bit ASCII range that should
convert to single byte characters in most (all?) encodings.  It would take an
unusual character set to map a 7-bit character to a multibyte sequence.  Is it
worth worrying about this corner case?

(-Wformat doesn't currently handle the -fexec-charset= option so that should
presumably be a higher priority problem to fix.)