Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-08 Thread Martin Sebor

On 05/05/2017 11:49 AM, Joseph Myers wrote:

On Thu, 4 May 2017, Martin Sebor wrote:


I like the flags2 idea.  I split up the initialization array to also
detect quoted %K, and unquoted %R and %r.  With that I ran into test
failures that took me a bit to debug.  It turns out that there's code
(a nasty hack, really) that makes assumptions about some of
the conversion specifiers.  I dealt with the failures by simplifying
the initialization code and removing the hack.


This patch is OK.


Better testing exposed a couple of issues with the patch:

1) The init_dynamic_diag_info function depends on the order in which
   the identifiers it looks (e.g. tree_node) for are declared in
   the translation unit being compiled and in some instances wound
   up initializing the local_tree_type_node with void_type_node even
   though tree would subsequently be found to be declared.

2) The warning on %R/%r when used outside a quoted sequence was
   overfly restrictive.  There is code in cp/error.c that relies
   on that when printing the inlining context of a function.

I fixed both of these problems in the attached revision.  I've
also been running a script to build the targets in config-list.mk.
So far it's built just over 40 of the 199 targets and found one
quoting problem in a build of sparcv9-solaris2.11.  The build
fails due to an unrelated error (I opened bug 80673 for it), so
I can't say I've found anything that should block the patch from
checking in.

I'll keep running the script but unless you or someone else would
prefer I hold off until they all finish I'll go ahead and commit
the attached patch, without waiting for the build of the remaining
targets to complete, probably sometime in the second half of
the week.

Martin
PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c

gcc/c-family/ChangeLog:

	PR translation/80280
	* c-format.h (struct format_flag_spec): Add new member.
	(T89_T): New macro.
	* c-format.c (local_tree_type_node): New global.
	(printf_flag_specs, asm_fprintf_flag_spec): Initialize new data.
	(gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs): Ditto.
	(strfmon_flag_specs): Likewise.
	(gcc_diag_char_table, gcc_cdiag_char_table): Split up specifiers
	with distinct quoting properties.
	(gcc_tdiag_char_table, gcc_cxxdiag_char_table): Same.
	(flag_chars_t::validate): Add argument and handle bad quoting.
	(check_format_info_main): Handle quoting problems.
	(init_dynamic_diag_info): Simplify.

gcc/testsuite/ChangeLog:

	PR translation/80280
	* gcc.dg/format/gcc_diag-10.c: New test.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 400eb66..2dba062 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -53,6 +53,9 @@ struct function_format_info
   unsigned HOST_WIDE_INT first_arg_num;	/* number of first arg (zero for varargs) */
 };
 
+/* Initialized in init_dynamic_diag_info.  */
+static tree local_tree_type_node;
+
 static bool decode_format_attr (tree, function_format_info *, int);
 static int decode_format_type (const char *);
 
@@ -492,17 +495,17 @@ static const format_length_info gcc_gfc_length_specs[] =
 
 static const format_flag_spec printf_flag_specs[] =
 {
-  { ' ',  0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
-  { '+',  0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
-  { '#',  0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
-  { '0',  0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
-  { '-',  0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
-  { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
-  { 'I',  0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
-  { 'w',  0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
-  { 'p',  0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
-  { 'L',  0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
-  { 0, 0, 0, NULL, NULL, STD_C89 }
+  { ' ',  0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
+  { '+',  0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
+  { '#',  0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
+  { '0',  0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
+  { '-',  0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
+  { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
+  { 'I',  0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
+  { 'w',  0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
+  { 'p',  0, 0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },

Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-05 Thread Joseph Myers
On Thu, 4 May 2017, Martin Sebor wrote:

> I like the flags2 idea.  I split up the initialization array to also
> detect quoted %K, and unquoted %R and %r.  With that I ran into test
> failures that took me a bit to debug.  It turns out that there's code
> (a nasty hack, really) that makes assumptions about some of
> the conversion specifiers.  I dealt with the failures by simplifying
> the initialization code and removing the hack.

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-04 Thread Martin Sebor

On 05/03/2017 03:27 PM, Joseph Myers wrote:

On Wed, 3 May 2017, Martin Sebor wrote:


Clarifying the comment is helpful, but a data structure involving putting
the same character in both still doesn't make sense to me.  It would seem
a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and
"EK" cases, where "EK" uses NULL there just like "s", "d" etc. do.


Then the begin/end strings for the "DFTV" entry will be the empty
string (to indicate that they are expected to be quoted), as in
the attached incremental diff.  Let me know if I misunderstood
and you had something else in mind.


Yes, that's what I'd expect (incrementally).


FWIW, I don't mind doing this way if you prefer, but I'm hard
pressed to see the improvement.  All it did is grow the size of
the tables.  The code stayed the same.


Really I think it might be better not to have pointers / strings there at
all - rather, have a four-state enum value that says directly whether
those format specifiers are quote-neutral, should-be-quoted, left-quote or
right-quote.  Or that information could go in the existing flags2 field,
'"' to mean should-be-quoted, '<' to mean left-quote and '>' to mean
right-quote, for example.


I like the flags2 idea.  I split up the initialization array to also
detect quoted %K, and unquoted %R and %r.  With that I ran into test
failures that took me a bit to debug.  It turns out that there's code
(a nasty hack, really) that makes assumptions about some of
the conversion specifiers.  I dealt with the failures by simplifying
the initialization code and removing the hack.

Martin

PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c

gcc/c-family/ChangeLog:

	PR translation/80280
	* c-format.h (struct format_flag_spec): Add new member.
	(T89_T): New macro.
	* c-format.c (local_tree_type_node): New global.
	(printf_flag_specs, asm_fprintf_flag_spec): Initialize new data.
	(gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs): Ditto.
	(strfmon_flag_specs): Likewise.
	(gcc_diag_char_table, gcc_cdiag_char_table): Split up specifiers
	with distinct quoting properties.
	(gcc_tdiag_char_table, gcc_cxxdiag_char_table): Same.
	(flag_chars_t::validate): Add argument and handle bad quoting.
	(check_format_info_main): Handle quoting problems.
	(init_dynamic_diag_info): Simplify.

gcc/testsuite/ChangeLog:

	PR translation/80280
	* gcc.dg/format/gcc_diag-10.c: New test.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 400eb66..98fff4c 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -53,6 +53,9 @@ struct function_format_info
   unsigned HOST_WIDE_INT first_arg_num;	/* number of first arg (zero for varargs) */
 };
 
+/* Initialized in init_dynamic_diag_info.  */
+static tree local_tree_type_node;
+
 static bool decode_format_attr (tree, function_format_info *, int);
 static int decode_format_type (const char *);
 
@@ -492,17 +495,17 @@ static const format_length_info gcc_gfc_length_specs[] =
 
 static const format_flag_spec printf_flag_specs[] =
 {
-  { ' ',  0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
-  { '+',  0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
-  { '#',  0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
-  { '0',  0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
-  { '-',  0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
-  { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
-  { 'I',  0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
-  { 'w',  0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
-  { 'p',  0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
-  { 'L',  0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
-  { 0, 0, 0, NULL, NULL, STD_C89 }
+  { ' ',  0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
+  { '+',  0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
+  { '#',  0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
+  { '0',  0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
+  { '-',  0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
+  { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
+  { 'I',  0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
+  { 'w',  0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
+  { 'p',  0, 0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
+  { 'L',  0, 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
+  { 0, 0, 0, 0, NULL, NULL, STD_C89 }
 };
 
 
@@ 

Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-03 Thread Joseph Myers
On Wed, 3 May 2017, Martin Sebor wrote:

> > Clarifying the comment is helpful, but a data structure involving putting
> > the same character in both still doesn't make sense to me.  It would seem
> > a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and
> > "EK" cases, where "EK" uses NULL there just like "s", "d" etc. do.
> 
> Then the begin/end strings for the "DFTV" entry will be the empty
> string (to indicate that they are expected to be quoted), as in
> the attached incremental diff.  Let me know if I misunderstood
> and you had something else in mind.

Yes, that's what I'd expect (incrementally).

> FWIW, I don't mind doing this way if you prefer, but I'm hard
> pressed to see the improvement.  All it did is grow the size of
> the tables.  The code stayed the same.

Really I think it might be better not to have pointers / strings there at 
all - rather, have a four-state enum value that says directly whether 
those format specifiers are quote-neutral, should-be-quoted, left-quote or 
right-quote.  Or that information could go in the existing flags2 field, 
'"' to mean should-be-quoted, '<' to mean left-quote and '>' to mean 
right-quote, for example.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-03 Thread Martin Sebor

On 05/03/2017 01:59 PM, Joseph Myers wrote:

On Wed, 3 May 2017, Martin Sebor wrote:


Use contrib/config-list.mk, with a native compiler with this patch in the
PATH, to test building compilers for many configurations.  (No doubt
you'll also find existing build issues, which may or may not be filed in
Bugzilla.)


I can do some of it but not all of it.  The work doesn't involve
just building the compiler but also running the tests and fixing
up regressions in those that are written to expect the unquoted
diagnostics.  I don't have the ability to run the test suite on
all the targets, or the bandwidth to run it on a subset of them
beyond powerpc64 and x86_64.  So I'm hoping to find a way for
the core of the patch to be checked in and for the cleanup to
proceed subsequently, as target maintainers find time.


When it's architecture-specific tests, I think it's up to people testing
on those architectures to fix them.


diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
index 13ca8ea..8932861 100644
--- a/gcc/c-family/c-format.h
+++ b/gcc/c-family/c-format.h
@@ -132,6 +132,11 @@ struct format_type_detail
 struct format_char_info
 {
   const char *format_chars;
+  /* Strings of FORMAT_CHARS characters that begin and end quoting,
+ respectively, and pairs of which should be matched in the same
+ format string.  NULL when no such pairs exist.  */
+  const char *quote_begin_chars;
+  const char *quote_end_chars;


I don't think this comment is sufficiently detailed to make it obvious
what should appear in each field for each of the four kinds of format
specifiers I enumerated.  The best I can reverse-engineer from the code
is: NULL for specifiers that may be used either quoted or unquoted *or*
listing those specifiers in both quote_begin_chars and quote_end_chars
(but I think the option of listing them in both should be removed as
conceptually wrong); if the character is an opening or closing quote, list
it in the appropriate one; if it must be used quoted, but isn't a quote
itself, both strings must be non-NULL but it must not be listed in either.

Whether that's right or wrong, the comment needs to make the rules clear.


I've clarified the comment.


Clarifying the comment is helpful, but a data structure involving putting
the same character in both still doesn't make sense to me.  It would seem
a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and
"EK" cases, where "EK" uses NULL there just like "s", "d" etc. do.


Then the begin/end strings for the "DFTV" entry will be the empty
string (to indicate that they are expected to be quoted), as in
the attached incremental diff.  Let me know if I misunderstood
and you had something else in mind.

FWIW, I don't mind doing this way if you prefer, but I'm hard
pressed to see the improvement.  All it did is grow the size of
the tables.  The code stayed the same.

Martin

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index f64b6e0..ad56835 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -688,7 +688,8 @@ static const format_char_info gcc_diag_char_table[] =
   { "K", NULL, NULL,   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q","",   NULL },
 
   { "r", NULL, NULL,   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","cR",   NULL },
-  { "<>'R","<'R", ">'R", 0, STD_C89, NOARGUMENTS, "",  "",   NULL },
+  { "<>","<", ">", 0, STD_C89, NOARGUMENTS, "",  "",   NULL },
+  { "'R", NULL, NULL,  0, STD_C89, NOARGUMENTS, "",  "",   NULL },
   { "m", NULL, NULL,   0, STD_C89, NOARGUMENTS, "q", "",   NULL },
   { NULL, NULL, NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL }
 };
@@ -706,12 +707,13 @@ static const format_char_info gcc_tdiag_char_table[] =
   /* Custom conversion specifiers.  */
 
   /* These will require a "tree" at runtime.  */
-  { "DFKTEV", "EK", "EK", 0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
-
+  { "DFTV", "", "", 0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
+  { "EK", NULL, NULL, 0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
   { "v", NULL, NULL,   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q#",  "",   NULL },
 
   { "r", NULL, NULL,   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","cR",   NULL },
-  { "<>'R", "<'R", ">'R", 0, STD_C89, NOARGUMENTS, "",  "",   NULL },
+  { "<>", "<", ">",0, STD_C89, NOARGUMENTS, "",  "",   NULL },
+  { "'R", NULL, NULL,  0, STD_C89, NOARGUMENTS, "",  "",   NULL },
   { "m", NULL, 

Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-03 Thread Joseph Myers
On Wed, 3 May 2017, Martin Sebor wrote:

> > Use contrib/config-list.mk, with a native compiler with this patch in the
> > PATH, to test building compilers for many configurations.  (No doubt
> > you'll also find existing build issues, which may or may not be filed in
> > Bugzilla.)
> 
> I can do some of it but not all of it.  The work doesn't involve
> just building the compiler but also running the tests and fixing
> up regressions in those that are written to expect the unquoted
> diagnostics.  I don't have the ability to run the test suite on
> all the targets, or the bandwidth to run it on a subset of them
> beyond powerpc64 and x86_64.  So I'm hoping to find a way for
> the core of the patch to be checked in and for the cleanup to
> proceed subsequently, as target maintainers find time.

When it's architecture-specific tests, I think it's up to people testing 
on those architectures to fix them.

> > > diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
> > > index 13ca8ea..8932861 100644
> > > --- a/gcc/c-family/c-format.h
> > > +++ b/gcc/c-family/c-format.h
> > > @@ -132,6 +132,11 @@ struct format_type_detail
> > >  struct format_char_info
> > >  {
> > >const char *format_chars;
> > > +  /* Strings of FORMAT_CHARS characters that begin and end quoting,
> > > + respectively, and pairs of which should be matched in the same
> > > + format string.  NULL when no such pairs exist.  */
> > > +  const char *quote_begin_chars;
> > > +  const char *quote_end_chars;
> > 
> > I don't think this comment is sufficiently detailed to make it obvious
> > what should appear in each field for each of the four kinds of format
> > specifiers I enumerated.  The best I can reverse-engineer from the code
> > is: NULL for specifiers that may be used either quoted or unquoted *or*
> > listing those specifiers in both quote_begin_chars and quote_end_chars
> > (but I think the option of listing them in both should be removed as
> > conceptually wrong); if the character is an opening or closing quote, list
> > it in the appropriate one; if it must be used quoted, but isn't a quote
> > itself, both strings must be non-NULL but it must not be listed in either.
> > 
> > Whether that's right or wrong, the comment needs to make the rules clear.
> 
> I've clarified the comment.

Clarifying the comment is helpful, but a data structure involving putting 
the same character in both still doesn't make sense to me.  It would seem 
a lot clearer to (for example) split "DFKTEV" into separate "DFTV" and 
"EK" cases, where "EK" uses NULL there just like "s", "d" etc. do.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-03 Thread Jeff Law

On 05/03/2017 12:47 PM, Martin Sebor wrote:



I can do some of it but not all of it.  The work doesn't involve
just building the compiler but also running the tests and fixing
up regressions in those that are written to expect the unquoted
diagnostics.  I don't have the ability to run the test suite on
all the targets, or the bandwidth to run it on a subset of them
beyond powerpc64 and x86_64.  So I'm hoping to find a way for
the core of the patch to be checked in and for the cleanup to
proceed subsequently, as target maintainers find time.
Just a note.  It's not terrible to run non-execute tests with cross 
tools.   In fact, it's fairly easy.  I'm happy to walk you through the 
process.


Long term I think we want to move away from config-list.mk and instead 
use a buildbot.  I've been experimenting with jenkins.I've got a 
series of jobs that will build the *-elf, *-rtems, *-gnu and *-linux 
targets.  binutils, then gcc, then newlib/glibc.


The newlib (*-elf and *-rtems) pipeline is pretty solid at this point 
and covers ~70 targets from config-list.mk building through newlib.


The glibc pipeline is more problematic, but improving.

It's about ready to leave my little cluster of slaves and relocate to 
the servers in Toronto.


It should be fairly straightforward to take that jenkins work and tie in 
testsuite runs.  Even if we just start with compile.exp and expand one 
day (via the simulators) to the execution tests.


Jeff


Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-03 Thread Martin Sebor

On 05/03/2017 09:53 AM, Joseph Myers wrote:

On Tue, 2 May 2017, Martin Sebor wrote:


In bug 80280 - Missing closing quote (%>) c/semantics.c and
c/c-typeck.c, a translator points out one of a number of kinds
of cosmetic problems that tend to come up late in development,
during translation of GCC messages.  Other, arguably more minor,
kinds of issues are caused by forgetting to use proper quoting
when referencing tree nodes, such as %D or %T.

To help avoid these kinds of bugs, the attached patch enhances
the -Wformat checker to detect these common quoting issues and
report them as warnings.


As I see it, there are four kinds of format specifiers in this context:

* Specifiers that have no special interaction with quoting and may be used
either quoted or unquoted.  For example, %d, %s and %E.  (%E is only of
that kind when used for expressions - when hopefully it should only be
used for simple expressions such as constants.  Uses for identifiers
should be quoted; cases where there can be complex expressions should be
replaced by use of location ranges.  Ideally identifiers and expressions
should have different formats, so these cases can be distinguished and so
identifiers can end up with a static type other than "tree".)

* Specifiers that should be quoted.  For example, %D.

* Specifiers that open quoting (%<).

* Specifiers that close quoting (%>).

(In addition, the q flag serves to quote the specifier it's applied to.)


Thanks for comments.  That above matches my view.




The remaining six patches in the series correct the problems
highlighted by the warning and get GCC to bootstrap and pass
regression tests on x86_64.  I suspect additional fixes will
be needed to get GCC to bootstrap on other targets.  I'll do
powerpc64le next but besides a general review I'm looking for
suggestions how to do the same cleanup on all the other targets
with the least disruption. (E.g., if there's a way to roll out
a warning one target at a time I could introduce it under
a suboption of -Wformat and enable the subobption with -Werror
only on the targets that have already been verified.)


Use contrib/config-list.mk, with a native compiler with this patch in the
PATH, to test building compilers for many configurations.  (No doubt
you'll also find existing build issues, which may or may not be filed in
Bugzilla.)


I can do some of it but not all of it.  The work doesn't involve
just building the compiler but also running the tests and fixing
up regressions in those that are written to expect the unquoted
diagnostics.  I don't have the ability to run the test suite on
all the targets, or the bandwidth to run it on a subset of them
beyond powerpc64 and x86_64.  So I'm hoping to find a way for
the core of the patch to be checked in and for the cleanup to
proceed subsequently, as target maintainers find time.




+  /* Diagnose nested or unmatched quoting directives such as GCC's
+"%<...%<" and "%>...%>".  As a special case, a directive can
+be considered to both begin and end quoting (e.g., GCC's %E).
+Such a directive is recognized but not diagosed.  */


I don't think it makes conceptual sense to consider %E to both begin and
end quoting.  I'd expect %E to have exactly the same begin_quote and
end_quote (or whatever) data as %d and %s, because it has the same
properties (may be used quoted or unquoted).


Yes, that's exactly how %E is treated by the code.  The special
case the comment above refers to is meant to explain how the
begin_quote and end_quote local variables get set, not describe
a concept in the design.  I renamed the variables so as not to
suggest otherwise.




diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
index 13ca8ea..8932861 100644
--- a/gcc/c-family/c-format.h
+++ b/gcc/c-family/c-format.h
@@ -132,6 +132,11 @@ struct format_type_detail
 struct format_char_info
 {
   const char *format_chars;
+  /* Strings of FORMAT_CHARS characters that begin and end quoting,
+ respectively, and pairs of which should be matched in the same
+ format string.  NULL when no such pairs exist.  */
+  const char *quote_begin_chars;
+  const char *quote_end_chars;


I don't think this comment is sufficiently detailed to make it obvious
what should appear in each field for each of the four kinds of format
specifiers I enumerated.  The best I can reverse-engineer from the code
is: NULL for specifiers that may be used either quoted or unquoted *or*
listing those specifiers in both quote_begin_chars and quote_end_chars
(but I think the option of listing them in both should be removed as
conceptually wrong); if the character is an opening or closing quote, list
it in the appropriate one; if it must be used quoted, but isn't a quote
itself, both strings must be non-NULL but it must not be listed in either.

Whether that's right or wrong, the comment needs to make the rules clear.


I've clarified the comment.

Thanks
Martin
PR translation/80280 - Missing closing quote (%>) 

Re: [PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-03 Thread Joseph Myers
On Tue, 2 May 2017, Martin Sebor wrote:

> In bug 80280 - Missing closing quote (%>) c/semantics.c and
> c/c-typeck.c, a translator points out one of a number of kinds
> of cosmetic problems that tend to come up late in development,
> during translation of GCC messages.  Other, arguably more minor,
> kinds of issues are caused by forgetting to use proper quoting
> when referencing tree nodes, such as %D or %T.
> 
> To help avoid these kinds of bugs, the attached patch enhances
> the -Wformat checker to detect these common quoting issues and
> report them as warnings.

As I see it, there are four kinds of format specifiers in this context:

* Specifiers that have no special interaction with quoting and may be used 
either quoted or unquoted.  For example, %d, %s and %E.  (%E is only of 
that kind when used for expressions - when hopefully it should only be 
used for simple expressions such as constants.  Uses for identifiers 
should be quoted; cases where there can be complex expressions should be 
replaced by use of location ranges.  Ideally identifiers and expressions 
should have different formats, so these cases can be distinguished and so 
identifiers can end up with a static type other than "tree".)

* Specifiers that should be quoted.  For example, %D.

* Specifiers that open quoting (%<).

* Specifiers that close quoting (%>).

(In addition, the q flag serves to quote the specifier it's applied to.)

> The remaining six patches in the series correct the problems
> highlighted by the warning and get GCC to bootstrap and pass
> regression tests on x86_64.  I suspect additional fixes will
> be needed to get GCC to bootstrap on other targets.  I'll do
> powerpc64le next but besides a general review I'm looking for
> suggestions how to do the same cleanup on all the other targets
> with the least disruption. (E.g., if there's a way to roll out
> a warning one target at a time I could introduce it under
> a suboption of -Wformat and enable the subobption with -Werror
> only on the targets that have already been verified.)

Use contrib/config-list.mk, with a native compiler with this patch in the 
PATH, to test building compilers for many configurations.  (No doubt 
you'll also find existing build issues, which may or may not be filed in 
Bugzilla.)

> +  /* Diagnose nested or unmatched quoting directives such as GCC's
> +  "%<...%<" and "%>...%>".  As a special case, a directive can
> +  be considered to both begin and end quoting (e.g., GCC's %E).
> +  Such a directive is recognized but not diagosed.  */

I don't think it makes conceptual sense to consider %E to both begin and 
end quoting.  I'd expect %E to have exactly the same begin_quote and 
end_quote (or whatever) data as %d and %s, because it has the same 
properties (may be used quoted or unquoted).

> diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
> index 13ca8ea..8932861 100644
> --- a/gcc/c-family/c-format.h
> +++ b/gcc/c-family/c-format.h
> @@ -132,6 +132,11 @@ struct format_type_detail
>  struct format_char_info
>  {
>const char *format_chars;
> +  /* Strings of FORMAT_CHARS characters that begin and end quoting,
> + respectively, and pairs of which should be matched in the same
> + format string.  NULL when no such pairs exist.  */
> +  const char *quote_begin_chars;
> +  const char *quote_end_chars;

I don't think this comment is sufficiently detailed to make it obvious 
what should appear in each field for each of the four kinds of format 
specifiers I enumerated.  The best I can reverse-engineer from the code 
is: NULL for specifiers that may be used either quoted or unquoted *or* 
listing those specifiers in both quote_begin_chars and quote_end_chars 
(but I think the option of listing them in both should be removed as 
conceptually wrong); if the character is an opening or closing quote, list 
it in the appropriate one; if it must be used quoted, but isn't a quote 
itself, both strings must be non-NULL but it must not be listed in either.

Whether that's right or wrong, the comment needs to make the rules clear.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 1/7] enhance -Wformat to detect quoting problems (PR 80280 et al.)

2017-05-02 Thread Martin Sebor

In bug 80280 - Missing closing quote (%>) c/semantics.c and
c/c-typeck.c, a translator points out one of a number of kinds
of cosmetic problems that tend to come up late in development,
during translation of GCC messages.  Other, arguably more minor,
kinds of issues are caused by forgetting to use proper quoting
when referencing tree nodes, such as %D or %T.

To help avoid these kinds of bugs, the attached patch enhances
the -Wformat checker to detect these common quoting issues and
report them as warnings.

The remaining six patches in the series correct the problems
highlighted by the warning and get GCC to bootstrap and pass
regression tests on x86_64.  I suspect additional fixes will
be needed to get GCC to bootstrap on other targets.  I'll do
powerpc64le next but besides a general review I'm looking for
suggestions how to do the same cleanup on all the other targets
with the least disruption. (E.g., if there's a way to roll out
a warning one target at a time I could introduce it under
a suboption of -Wformat and enable the subobption with -Werror
only on the targets that have already been verified.)

Thanks
Martin

PR translation/80280 - Missing closing quote (%>) c/semantics.c and c/c-typeck.c

gcc/c-family/ChangeLog:

	PR translation/80280
	* c-format.h (struct format_char_info): Add new members.
	(struct format_flag_spec): Likweise.
	* c-format.c (printf_flag_specs, asm_fprintf_flag_spec): Initialize
	them.
	(gcc_diag_flag_specs, scanf_flag_specs, strftime_flag_specs,
	strfmon_flag_specs, print_char_table, asm_fprintf_char_table): Likewise.
	(gcc_diag_char_table, gcc_cdiag_char_table, gcc_tdiag_char_table):
	Likewise.
	(gcc_cxxdiag_char_table, gcc_gfc_char_table, scan_char_table): Ditto.
	(time_char_table, monetary_char_table): Same here.
	(flag_chars_t::validate): Add argument and handle bad quoting.
	(check_format_info_main): Handle quoting problems.

gcc/testsuite/ChangeLog:

	PR translation/80280
	* gcc.dg/format/ext-9.c: New test.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 400eb66..a7a8d4e 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -492,17 +492,17 @@ static const format_length_info gcc_gfc_length_specs[] =
 
 static const format_flag_spec printf_flag_specs[] =
 {
-  { ' ',  0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
-  { '+',  0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
-  { '#',  0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
-  { '0',  0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
-  { '-',  0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
-  { '\'', 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
-  { 'I',  0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
-  { 'w',  0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
-  { 'p',  0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
-  { 'L',  0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
-  { 0, 0, 0, NULL, NULL, STD_C89 }
+  { ' ',  0, 0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
+  { '+',  0, 0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
+  { '#',  0, 0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
+  { '0',  0, 0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
+  { '-',  0, 0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
+  { '\'', 0, 0, 0, N_("''' flag"),N_("the ''' printf flag"),  STD_EXT },
+  { 'I',  0, 0, 0, N_("'I' flag"),N_("the 'I' printf flag"),  STD_EXT },
+  { 'w',  0, 0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
+  { 'p',  0, 0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
+  { 'L',  0, 0, 0, N_("length modifier"), N_("length modifier in printf format"), STD_C89 },
+  { 0, 0, 0, 0, NULL, NULL, STD_C89 }
 };
 
 
@@ -516,15 +516,15 @@ static const format_flag_pair printf_flag_pairs[] =
 
 static const format_flag_spec asm_fprintf_flag_specs[] =
 {
-  { ' ',  0, 0, N_("' ' flag"),N_("the ' ' printf flag"),  STD_C89 },
-  { '+',  0, 0, N_("'+' flag"),N_("the '+' printf flag"),  STD_C89 },
-  { '#',  0, 0, N_("'#' flag"),N_("the '#' printf flag"),  STD_C89 },
-  { '0',  0, 0, N_("'0' flag"),N_("the '0' printf flag"),  STD_C89 },
-  { '-',  0, 0, N_("'-' flag"),N_("the '-' printf flag"),  STD_C89 },
-  { 'w',  0, 0, N_("field width"), N_("field width in printf format"), STD_C89 },
-  { 'p',  0, 0, N_("precision"),   N_("precision in printf format"),   STD_C89 },
-  { 'L',  0, 0,