Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On 05/05/2017 11:49 AM, Joseph Myers wrote: On Thu, 4 May 2017, Martin Sebor wrote: I like the flags2 idea. I split up the initialization array to also detect quoted %K, and unquoted %R and %r. With that I ran into test failures that took me a bit to debug. It turns out that there's code (a nasty hack, really) that makes assumptions about some of the conversion specifiers. I dealt with the failures by simplifying the initialization code and removing the hack. This patch is OK. Better testing exposed a couple of issues with the patch: 1) The init_dynamic_diag_info function depends on the order in which the identifiers it looks (e.g. tree_node) for are declared in the translation unit being compiled and in some instances wound up initializing the local_tree_type_node with void_type_node even though tree would subsequently be found to be declared. 2) The warning on %R/%r when used outside a quoted sequence was overfly restrictive. There is code in cp/error.c that relies on that when printing the inlining context of a function. I fixed both of these problems in the attached revision. I've also been running a script to build the targets in config-list.mk. So far it's built just over 40 of the 199 targets and found one quoting problem in a build of sparcv9-solaris2.11. The build fails due to an unrelated error (I opened bug 80673 for it), so I can't say I've found anything that should block the patch from checking in. I'll keep running the script but unless you or someone else would prefer I hold off until they all finish I'll go ahead and commit the attached patch, without waiting for the build of the remaining targets to complete, probably sometime in the second half of the week. Martin PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c gcc/c-family/ChangeLog: PR translation/80280 * c-format.h (struct format_flag_spec): Add new member. (T89_T): New macro. * c-format.c (local_tree_type_node): New global. (printf_flag_specs, asm_fprintf_flag_spec): Initialize new data. (gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs): Ditto. (strfmon_flag_specs): Likewise. (gcc_diag_char_table, gcc_cdiag_char_table): Split up specifiers with distinct quoting properties. (gcc_tdiag_char_table, gcc_cxxdiag_char_table): Same. (flag_chars_t::validate): Add argument and handle bad quoting. (check_format_info_main): Handle quoting problems. (init_dynamic_diag_info): Simplify. gcc/testsuite/ChangeLog: PR translation/80280 * gcc.dg/format/gcc_diag-10.c: New test. diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index 400eb66..2dba062 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -53,6 +53,9 @@ struct function_format_info unsigned HOST_WIDE_INT first_arg_num; /* number of first arg (zero for varargs) */ }; +/* Initialized in init_dynamic_diag_info. */ +static tree local_tree_type_node; + static bool decode_format_attr (tree, function_format_info *, int); static int decode_format_type (const char *); @@ -492,17 +495,17 @@ static const format_length_info gcc_gfc_length_specs[] = static const format_flag_spec printf_flag_specs[] = { - { ' ', 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, - { '+', 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, - { '#', 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, - { '0', 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, - { '-', 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, - { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"), STD_EXT }, - { 'I', 0, 0, N_("'I' flag"),N_("the 'I' printf flag"), STD_EXT }, - { 'w', 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, - { 'p', 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 }, - { 'L', 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 }, - { 0, 0, 0, NULL, NULL, STD_C89 } + { ' ', 0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, + { '+', 0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, + { '#', 0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, + { '0', 0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, + { '-', 0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, + { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"), STD_EXT }, + { 'I', 0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"), STD_EXT }, + { 'w', 0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, + { 'p', 0, 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 },
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On Thu, 4 May 2017, Martin Sebor wrote: > I like the flags2 idea. I split up the initialization array to also > detect quoted %K, and unquoted %R and %r. With that I ran into test > failures that took me a bit to debug. It turns out that there's code > (a nasty hack, really) that makes assumptions about some of > the conversion specifiers. I dealt with the failures by simplifying > the initialization code and removing the hack. This patch is OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On 05/03/2017 03:27 PM, Joseph Myers wrote: On Wed, 3 May 2017, Martin Sebor wrote: Clarifying the comment is helpful, but a data structure involving putting the same character in both still doesn't make sense to me. It would seem a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and "EK" cases, where "EK" uses NULL there just like "s", "d" etc. do. Then the begin/end strings for the "DFTV" entry will be the empty string (to indicate that they are expected to be quoted), as in the attached incremental diff. Let me know if I misunderstood and you had something else in mind. Yes, that's what I'd expect (incrementally). FWIW, I don't mind doing this way if you prefer, but I'm hard pressed to see the improvement. All it did is grow the size of the tables. The code stayed the same. Really I think it might be better not to have pointers / strings there at all - rather, have a four-state enum value that says directly whether those format specifiers are quote-neutral, should-be-quoted, left-quote or right-quote. Or that information could go in the existing flags2 field, '"' to mean should-be-quoted, '<' to mean left-quote and '>' to mean right-quote, for example. I like the flags2 idea. I split up the initialization array to also detect quoted %K, and unquoted %R and %r. With that I ran into test failures that took me a bit to debug. It turns out that there's code (a nasty hack, really) that makes assumptions about some of the conversion specifiers. I dealt with the failures by simplifying the initialization code and removing the hack. Martin PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c gcc/c-family/ChangeLog: PR translation/80280 * c-format.h (struct format_flag_spec): Add new member. (T89_T): New macro. * c-format.c (local_tree_type_node): New global. (printf_flag_specs, asm_fprintf_flag_spec): Initialize new data. (gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs): Ditto. (strfmon_flag_specs): Likewise. (gcc_diag_char_table, gcc_cdiag_char_table): Split up specifiers with distinct quoting properties. (gcc_tdiag_char_table, gcc_cxxdiag_char_table): Same. (flag_chars_t::validate): Add argument and handle bad quoting. (check_format_info_main): Handle quoting problems. (init_dynamic_diag_info): Simplify. gcc/testsuite/ChangeLog: PR translation/80280 * gcc.dg/format/gcc_diag-10.c: New test. diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index 400eb66..98fff4c 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -53,6 +53,9 @@ struct function_format_info unsigned HOST_WIDE_INT first_arg_num; /* number of first arg (zero for varargs) */ }; +/* Initialized in init_dynamic_diag_info. */ +static tree local_tree_type_node; + static bool decode_format_attr (tree, function_format_info *, int); static int decode_format_type (const char *); @@ -492,17 +495,17 @@ static const format_length_info gcc_gfc_length_specs[] = static const format_flag_spec printf_flag_specs[] = { - { ' ', 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, - { '+', 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, - { '#', 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, - { '0', 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, - { '-', 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, - { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"), STD_EXT }, - { 'I', 0, 0, N_("'I' flag"),N_("the 'I' printf flag"), STD_EXT }, - { 'w', 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, - { 'p', 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 }, - { 'L', 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 }, - { 0, 0, 0, NULL, NULL, STD_C89 } + { ' ', 0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, + { '+', 0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, + { '#', 0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, + { '0', 0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, + { '-', 0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, + { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"), STD_EXT }, + { 'I', 0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"), STD_EXT }, + { 'w', 0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, + { 'p', 0, 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 }, + { 'L', 0, 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 }, + { 0, 0, 0, 0, NULL, NULL, STD_C89 } }; @@
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On Wed, 3 May 2017, Martin Sebor wrote: > > Clarifying the comment is helpful, but a data structure involving putting > > the same character in both still doesn't make sense to me. It would seem > > a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and > > "EK" cases, where "EK" uses NULL there just like "s", "d" etc. do. > > Then the begin/end strings for the "DFTV" entry will be the empty > string (to indicate that they are expected to be quoted), as in > the attached incremental diff. Let me know if I misunderstood > and you had something else in mind. Yes, that's what I'd expect (incrementally). > FWIW, I don't mind doing this way if you prefer, but I'm hard > pressed to see the improvement. All it did is grow the size of > the tables. The code stayed the same. Really I think it might be better not to have pointers / strings there at all - rather, have a four-state enum value that says directly whether those format specifiers are quote-neutral, should-be-quoted, left-quote or right-quote. Or that information could go in the existing flags2 field, '"' to mean should-be-quoted, '<' to mean left-quote and '>' to mean right-quote, for example. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On 05/03/2017 01:59 PM, Joseph Myers wrote: On Wed, 3 May 2017, Martin Sebor wrote: Use contrib/config-list.mk, with a native compiler with this patch in the PATH, to test building compilers for many configurations. (No doubt you'll also find existing build issues, which may or may not be filed in Bugzilla.) I can do some of it but not all of it. The work doesn't involve just building the compiler but also running the tests and fixing up regressions in those that are written to expect the unquoted diagnostics. I don't have the ability to run the test suite on all the targets, or the bandwidth to run it on a subset of them beyond powerpc64 and x86_64. So I'm hoping to find a way for the core of the patch to be checked in and for the cleanup to proceed subsequently, as target maintainers find time. When it's architecture-specific tests, I think it's up to people testing on those architectures to fix them. diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h index 13ca8ea..8932861 100644 --- a/gcc/c-family/c-format.h +++ b/gcc/c-family/c-format.h @@ -132,6 +132,11 @@ struct format_type_detail struct format_char_info { const char *format_chars; + /* Strings of FORMAT_CHARS characters that begin and end quoting, + respectively, and pairs of which should be matched in the same + format string. NULL when no such pairs exist. */ + const char *quote_begin_chars; + const char *quote_end_chars; I don't think this comment is sufficiently detailed to make it obvious what should appear in each field for each of the four kinds of format specifiers I enumerated. The best I can reverse-engineer from the code is: NULL for specifiers that may be used either quoted or unquoted *or* listing those specifiers in both quote_begin_chars and quote_end_chars (but I think the option of listing them in both should be removed as conceptually wrong); if the character is an opening or closing quote, list it in the appropriate one; if it must be used quoted, but isn't a quote itself, both strings must be non-NULL but it must not be listed in either. Whether that's right or wrong, the comment needs to make the rules clear. I've clarified the comment. Clarifying the comment is helpful, but a data structure involving putting the same character in both still doesn't make sense to me. It would seem a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and "EK" cases, where "EK" uses NULL there just like "s", "d" etc. do. Then the begin/end strings for the "DFTV" entry will be the empty string (to indicate that they are expected to be quoted), as in the attached incremental diff. Let me know if I misunderstood and you had something else in mind. FWIW, I don't mind doing this way if you prefer, but I'm hard pressed to see the improvement. All it did is grow the size of the tables. The code stayed the same. Martin diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index f64b6e0..ad56835 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -688,7 +688,8 @@ static const format_char_info gcc_diag_char_table[] = { "K", NULL, NULL, 0, STD_C89, { T89_V, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q","", NULL }, { "r", NULL, NULL, 1, STD_C89, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "","cR", NULL }, - { "<>'R","<'R", ">'R", 0, STD_C89, NOARGUMENTS, "", "", NULL }, + { "<>","<", ">", 0, STD_C89, NOARGUMENTS, "", "", NULL }, + { "'R", NULL, NULL, 0, STD_C89, NOARGUMENTS, "", "", NULL }, { "m", NULL, NULL, 0, STD_C89, NOARGUMENTS, "q", "", NULL }, { NULL, NULL, NULL, 0, STD_C89, NOLENGTHS, NULL, NULL, NULL } }; @@ -706,12 +707,13 @@ static const format_char_info gcc_tdiag_char_table[] = /* Custom conversion specifiers. */ /* These will require a "tree" at runtime. */ - { "DFKTEV", "EK", "EK", 0, STD_C89, { T89_V, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q+", "", NULL }, - + { "DFTV", "", "", 0, STD_C89, { T89_V, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q+", "", NULL }, + { "EK", NULL, NULL, 0, STD_C89, { T89_V, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q+", "", NULL }, { "v", NULL, NULL, 0, STD_C89, { T89_I, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "q#", "", NULL }, { "r", NULL, NULL, 1, STD_C89, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "","cR", NULL }, - { "<>'R", "<'R", ">'R", 0, STD_C89, NOARGUMENTS, "", "", NULL }, + { "<>", "<", ">",0, STD_C89, NOARGUMENTS, "", "", NULL }, + { "'R", NULL, NULL, 0, STD_C89, NOARGUMENTS, "", "", NULL }, { "m", NULL,
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On Wed, 3 May 2017, Martin Sebor wrote: > > Use contrib/config-list.mk, with a native compiler with this patch in the > > PATH, to test building compilers for many configurations. (No doubt > > you'll also find existing build issues, which may or may not be filed in > > Bugzilla.) > > I can do some of it but not all of it. The work doesn't involve > just building the compiler but also running the tests and fixing > up regressions in those that are written to expect the unquoted > diagnostics. I don't have the ability to run the test suite on > all the targets, or the bandwidth to run it on a subset of them > beyond powerpc64 and x86_64. So I'm hoping to find a way for > the core of the patch to be checked in and for the cleanup to > proceed subsequently, as target maintainers find time. When it's architecture-specific tests, I think it's up to people testing on those architectures to fix them. > > > diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h > > > index 13ca8ea..8932861 100644 > > > --- a/gcc/c-family/c-format.h > > > +++ b/gcc/c-family/c-format.h > > > @@ -132,6 +132,11 @@ struct format_type_detail > > > struct format_char_info > > > { > > >const char *format_chars; > > > + /* Strings of FORMAT_CHARS characters that begin and end quoting, > > > + respectively, and pairs of which should be matched in the same > > > + format string. NULL when no such pairs exist. */ > > > + const char *quote_begin_chars; > > > + const char *quote_end_chars; > > > > I don't think this comment is sufficiently detailed to make it obvious > > what should appear in each field for each of the four kinds of format > > specifiers I enumerated. The best I can reverse-engineer from the code > > is: NULL for specifiers that may be used either quoted or unquoted *or* > > listing those specifiers in both quote_begin_chars and quote_end_chars > > (but I think the option of listing them in both should be removed as > > conceptually wrong); if the character is an opening or closing quote, list > > it in the appropriate one; if it must be used quoted, but isn't a quote > > itself, both strings must be non-NULL but it must not be listed in either. > > > > Whether that's right or wrong, the comment needs to make the rules clear. > > I've clarified the comment. Clarifying the comment is helpful, but a data structure involving putting the same character in both still doesn't make sense to me. It would seem a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and "EK" cases, where "EK" uses NULL there just like "s", "d" etc. do. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On 05/03/2017 12:47 PM, Martin Sebor wrote: I can do some of it but not all of it. The work doesn't involve just building the compiler but also running the tests and fixing up regressions in those that are written to expect the unquoted diagnostics. I don't have the ability to run the test suite on all the targets, or the bandwidth to run it on a subset of them beyond powerpc64 and x86_64. So I'm hoping to find a way for the core of the patch to be checked in and for the cleanup to proceed subsequently, as target maintainers find time. Just a note. It's not terrible to run non-execute tests with cross tools. In fact, it's fairly easy. I'm happy to walk you through the process. Long term I think we want to move away from config-list.mk and instead use a buildbot. I've been experimenting with jenkins.I've got a series of jobs that will build the *-elf, *-rtems, *-gnu and *-linux targets. binutils, then gcc, then newlib/glibc. The newlib (*-elf and *-rtems) pipeline is pretty solid at this point and covers ~70 targets from config-list.mk building through newlib. The glibc pipeline is more problematic, but improving. It's about ready to leave my little cluster of slaves and relocate to the servers in Toronto. It should be fairly straightforward to take that jenkins work and tie in testsuite runs. Even if we just start with compile.exp and expand one day (via the simulators) to the execution tests. Jeff
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On 05/03/2017 09:53 AM, Joseph Myers wrote: On Tue, 2 May 2017, Martin Sebor wrote: In bug 80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c, a translator points out one of a number of kinds of cosmetic problems that tend to come up late in development, during translation of GCC messages. Other, arguably more minor, kinds of issues are caused by forgetting to use proper quoting when referencing tree nodes, such as %D or %T. To help avoid these kinds of bugs, the attached patch enhances the -Wformat checker to detect these common quoting issues and report them as warnings. As I see it, there are four kinds of format specifiers in this context: * Specifiers that have no special interaction with quoting and may be used either quoted or unquoted. For example, %d, %s and %E. (%E is only of that kind when used for expressions - when hopefully it should only be used for simple expressions such as constants. Uses for identifiers should be quoted; cases where there can be complex expressions should be replaced by use of location ranges. Ideally identifiers and expressions should have different formats, so these cases can be distinguished and so identifiers can end up with a static type other than "tree".) * Specifiers that should be quoted. For example, %D. * Specifiers that open quoting (%<). * Specifiers that close quoting (%>). (In addition, the q flag serves to quote the specifier it's applied to.) Thanks for comments. That above matches my view. The remaining six patches in the series correct the problems highlighted by the warning and get GCC to bootstrap and pass regression tests on x86_64. I suspect additional fixes will be needed to get GCC to bootstrap on other targets. I'll do powerpc64le next but besides a general review I'm looking for suggestions how to do the same cleanup on all the other targets with the least disruption. (E.g., if there's a way to roll out a warning one target at a time I could introduce it under a suboption of -Wformat and enable the subobption with -Werror only on the targets that have already been verified.) Use contrib/config-list.mk, with a native compiler with this patch in the PATH, to test building compilers for many configurations. (No doubt you'll also find existing build issues, which may or may not be filed in Bugzilla.) I can do some of it but not all of it. The work doesn't involve just building the compiler but also running the tests and fixing up regressions in those that are written to expect the unquoted diagnostics. I don't have the ability to run the test suite on all the targets, or the bandwidth to run it on a subset of them beyond powerpc64 and x86_64. So I'm hoping to find a way for the core of the patch to be checked in and for the cleanup to proceed subsequently, as target maintainers find time. + /* Diagnose nested or unmatched quoting directives such as GCC's +"%<...%<" and "%>...%>". As a special case, a directive can +be considered to both begin and end quoting (e.g., GCC's %E). +Such a directive is recognized but not diagosed. */ I don't think it makes conceptual sense to consider %E to both begin and end quoting. I'd expect %E to have exactly the same begin_quote and end_quote (or whatever) data as %d and %s, because it has the same properties (may be used quoted or unquoted). Yes, that's exactly how %E is treated by the code. The special case the comment above refers to is meant to explain how the begin_quote and end_quote local variables get set, not describe a concept in the design. I renamed the variables so as not to suggest otherwise. diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h index 13ca8ea..8932861 100644 --- a/gcc/c-family/c-format.h +++ b/gcc/c-family/c-format.h @@ -132,6 +132,11 @@ struct format_type_detail struct format_char_info { const char *format_chars; + /* Strings of FORMAT_CHARS characters that begin and end quoting, + respectively, and pairs of which should be matched in the same + format string. NULL when no such pairs exist. */ + const char *quote_begin_chars; + const char *quote_end_chars; I don't think this comment is sufficiently detailed to make it obvious what should appear in each field for each of the four kinds of format specifiers I enumerated. The best I can reverse-engineer from the code is: NULL for specifiers that may be used either quoted or unquoted *or* listing those specifiers in both quote_begin_chars and quote_end_chars (but I think the option of listing them in both should be removed as conceptually wrong); if the character is an opening or closing quote, list it in the appropriate one; if it must be used quoted, but isn't a quote itself, both strings must be non-NULL but it must not be listed in either. Whether that's right or wrong, the comment needs to make the rules clear. I've clarified the comment. Thanks Martin PR translation/80280 - Missing closing quote (%>)
Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
On Tue, 2 May 2017, Martin Sebor wrote: > In bug 80280 - Missing closing quote (%>) c/semantics.c and > c/c-typeck.c, a translator points out one of a number of kinds > of cosmetic problems that tend to come up late in development, > during translation of GCC messages. Other, arguably more minor, > kinds of issues are caused by forgetting to use proper quoting > when referencing tree nodes, such as %D or %T. > > To help avoid these kinds of bugs, the attached patch enhances > the -Wformat checker to detect these common quoting issues and > report them as warnings. As I see it, there are four kinds of format specifiers in this context: * Specifiers that have no special interaction with quoting and may be used either quoted or unquoted. For example, %d, %s and %E. (%E is only of that kind when used for expressions - when hopefully it should only be used for simple expressions such as constants. Uses for identifiers should be quoted; cases where there can be complex expressions should be replaced by use of location ranges. Ideally identifiers and expressions should have different formats, so these cases can be distinguished and so identifiers can end up with a static type other than "tree".) * Specifiers that should be quoted. For example, %D. * Specifiers that open quoting (%<). * Specifiers that close quoting (%>). (In addition, the q flag serves to quote the specifier it's applied to.) > The remaining six patches in the series correct the problems > highlighted by the warning and get GCC to bootstrap and pass > regression tests on x86_64. I suspect additional fixes will > be needed to get GCC to bootstrap on other targets. I'll do > powerpc64le next but besides a general review I'm looking for > suggestions how to do the same cleanup on all the other targets > with the least disruption. (E.g., if there's a way to roll out > a warning one target at a time I could introduce it under > a suboption of -Wformat and enable the subobption with -Werror > only on the targets that have already been verified.) Use contrib/config-list.mk, with a native compiler with this patch in the PATH, to test building compilers for many configurations. (No doubt you'll also find existing build issues, which may or may not be filed in Bugzilla.) > + /* Diagnose nested or unmatched quoting directives such as GCC's > + "%<...%<" and "%>...%>". As a special case, a directive can > + be considered to both begin and end quoting (e.g., GCC's %E). > + Such a directive is recognized but not diagosed. */ I don't think it makes conceptual sense to consider %E to both begin and end quoting. I'd expect %E to have exactly the same begin_quote and end_quote (or whatever) data as %d and %s, because it has the same properties (may be used quoted or unquoted). > diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h > index 13ca8ea..8932861 100644 > --- a/gcc/c-family/c-format.h > +++ b/gcc/c-family/c-format.h > @@ -132,6 +132,11 @@ struct format_type_detail > struct format_char_info > { >const char *format_chars; > + /* Strings of FORMAT_CHARS characters that begin and end quoting, > + respectively, and pairs of which should be matched in the same > + format string. NULL when no such pairs exist. */ > + const char *quote_begin_chars; > + const char *quote_end_chars; I don't think this comment is sufficiently detailed to make it obvious what should appear in each field for each of the four kinds of format specifiers I enumerated. The best I can reverse-engineer from the code is: NULL for specifiers that may be used either quoted or unquoted *or* listing those specifiers in both quote_begin_chars and quote_end_chars (but I think the option of listing them in both should be removed as conceptually wrong); if the character is an opening or closing quote, list it in the appropriate one; if it must be used quoted, but isn't a quote itself, both strings must be non-NULL but it must not be listed in either. Whether that's right or wrong, the comment needs to make the rules clear. -- Joseph S. Myers jos...@codesourcery.com
[PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)
In bug 80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c, a translator points out one of a number of kinds of cosmetic problems that tend to come up late in development, during translation of GCC messages. Other, arguably more minor, kinds of issues are caused by forgetting to use proper quoting when referencing tree nodes, such as %D or %T. To help avoid these kinds of bugs, the attached patch enhances the -Wformat checker to detect these common quoting issues and report them as warnings. The remaining six patches in the series correct the problems highlighted by the warning and get GCC to bootstrap and pass regression tests on x86_64. I suspect additional fixes will be needed to get GCC to bootstrap on other targets. I'll do powerpc64le next but besides a general review I'm looking for suggestions how to do the same cleanup on all the other targets with the least disruption. (E.g., if there's a way to roll out a warning one target at a time I could introduce it under a suboption of -Wformat and enable the subobption with -Werror only on the targets that have already been verified.) Thanks Martin PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c gcc/c-family/ChangeLog: PR translation/80280 * c-format.h (struct format_char_info): Add new members. (struct format_flag_spec): Likweise. * c-format.c (printf_flag_specs, asm_fprintf_flag_spec): Initialize them. (gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs, strfmon_flag_specs, print_char_table, asm_fprintf_char_table): Likewise. (gcc_diag_char_table, gcc_cdiag_char_table, gcc_tdiag_char_table): Likewise. (gcc_cxxdiag_char_table, gcc_gfc_char_table, scan_char_table): Ditto. (time_char_table, monetary_char_table): Same here. (flag_chars_t::validate): Add argument and handle bad quoting. (check_format_info_main): Handle quoting problems. gcc/testsuite/ChangeLog: PR translation/80280 * gcc.dg/format/ext-9.c: New test. diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index 400eb66..a7a8d4e 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -492,17 +492,17 @@ static const format_length_info gcc_gfc_length_specs[] = static const format_flag_spec printf_flag_specs[] = { - { ' ', 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, - { '+', 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, - { '#', 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, - { '0', 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, - { '-', 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, - { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"), STD_EXT }, - { 'I', 0, 0, N_("'I' flag"),N_("the 'I' printf flag"), STD_EXT }, - { 'w', 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, - { 'p', 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 }, - { 'L', 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 }, - { 0, 0, 0, NULL, NULL, STD_C89 } + { ' ', 0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, + { '+', 0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, + { '#', 0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, + { '0', 0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, + { '-', 0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, + { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"), STD_EXT }, + { 'I', 0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"), STD_EXT }, + { 'w', 0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, + { 'p', 0, 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 }, + { 'L', 0, 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 }, + { 0, 0, 0, 0, NULL, NULL, STD_C89 } }; @@ -516,15 +516,15 @@ static const format_flag_pair printf_flag_pairs[] = static const format_flag_spec asm_fprintf_flag_specs[] = { - { ' ', 0, 0, N_("' ' flag"),N_("the ' ' printf flag"), STD_C89 }, - { '+', 0, 0, N_("'+' flag"),N_("the '+' printf flag"), STD_C89 }, - { '#', 0, 0, N_("'#' flag"),N_("the '#' printf flag"), STD_C89 }, - { '0', 0, 0, N_("'0' flag"),N_("the '0' printf flag"), STD_C89 }, - { '-', 0, 0, N_("'-' flag"),N_("the '-' printf flag"), STD_C89 }, - { 'w', 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 }, - { 'p', 0, 0, N_("precision"), N_("precision in printf format"), STD_C89 }, - { 'L', 0, 0,